Improve w3af's score for WAVSEP XSS by at least 20% #37

andresriancho · 2013-01-22T12:40:44Z

User story

As a user I want w3af to find as many XSS vulnerabilities as possible.

Conditions of satisfaction

Unittests for all XSS sections of WAVSEP need to be written
Coverage % needs to be calculated and asserted
Once that's done, improve detection rate by 20%
If we're not at 100%, create a new task (similar to this one) to improve another 10%

Context code rewrite v3.0

While I was re-writing the code as specified in the Context code rewrite v2.0 section one main issue appeared: finding context based on inside_context (see below and code in xss branch) was too error/false-positive prone. With inside_context the context accuracy is rather low 👎

The main reason for using the previous strategy was that we're sending payloads that might break the HTML structure, this classic parsers would be unable to determine the payload's location. A solution for this issue would be to:

Send the payload
Replace the payload with something "clean" such as "123456789abcdef"
Find the context for the clean payload, which doesn't break any HTML structures

Using that strategy we would be able to use almost any HTML parser to determine the context, the problems that I foresee now are:

Parsers might not be able to identify all the contexts we need
Parsers might not make a difference between single/double/backtick delimited tag attributes
lxml had problems with memory leaks - if we choose an HTML parser which gives us everything we need but is not tested, we might end up once again with a problem like that

The same problem with context appears when parsing JavaScript and CSS, so we might need to find a parser for that too.

An option that we might experiment with is to build our own parser-lexer

This method looks promising. If we use this there is a silly trick that might come handy to determine if the payload is in single/double/backtick quotes, simply dump the tag text, get the attribute value (containing the payload) and search for "fooPAYLOADbar", then if that's not in the tag text search for the same but with single quotes and so on. The only detail would be to check for " and ' escapes

Context code rewrite v2.0

The code that processes the HTML and makes it possible to identify the HTML context where the payload landed is complex, hard to debug and extend. So I propose a rewrite with the following objectives:

Completely remove the ByteChunk
Make each context self-contained and testeable
Write contexts in such a way that they can be nested
Write contexts to decode JS

New context classes

class Context(object):
    NAME = 'HTML'

    @staticmethod
    def match(normalized_html):
        return True

    def can_break(payload):
        raise NotImplementedError

    def executable():
        return False

    def inside_context(normalized_html, context_start, context_end):
        """
        :return: True if we perform a reverse find of context_start (ie '<script'), then
                     a reverse find of context_end (ie. '</script>') and the second has a
                     lower index; meaning we're still in the context of '<script>' tag

        :param context_start: Would be '<script' in the example above
        :param context_start: Would be '</script>' in the example above
        """

    def current_context_content(normalized_html, context_start, context_end):
        """
        Extract the current context text, handles the following cases:
            <script type="application/json">foo();</script>
            <script>foo();</script>

        Returning 'foo();' in both.

        :param context_start: Would be '<script' in the example above
        :param context_start: Would be '</script>' in the example above
        """
        pass

This context matches if we're inside a <script> tag:

class ScriptTagContext(Context):
    NAME = 'SCRIPT_TAG'

    @staticmethod
    def match(normalized_html):
        return Context.inside_context(normalized_html, '<script', '</script>')

This context matches when we're inside a script tag and a multi-line comment:

class ScriptTagMultiLineCommentContext(ScriptTagContext):
    NAME = 'SCRIPT_MULTI_COMMENT'

    @staticmethod
    def match(normalized_html):
        if not ScriptTagContext.match(normalized_html):
            return False
        script_code = Context.current_context_content(normalized_html, '<script', '</script>')
        js_context = get_js_context(script_code)
        if js_context is None:
            return False
        return isinstance(js_context, JSMultiLineComment)

    def can_break(payload)
        return '*/' in payload

Since contexts don't hold any data I believe it would be smart to make the match method a @staticmethod and only create an instance when we find a match and need to return it to the xss plugin.

With this new approach the order in which we match the contexts is very important. It MUST start with the more specific contexts and then move down to the generic cases (ScriptTagMultiLineCommentContext goes before ScriptTagContext)

It might be a good idea to write the inside_context(normalized_html, '<script', '</script>') with a lru cache, since we might call it with the same params several times (one for each context+sub-contexts inside it).

Sub-contexts for JavaScript

JavaScript can be found in several places:
* <script>...</script>
* <a href="javascript:...">
* <a onmouseover="...">

When found we need to analyze the JS code to understand if in this context we can run arbitrary code (or not). In order to do this we'll need JavaScript sub-contexts.

What I have in mind is to:

First write a context analyzer for JS which given a piece of JS code and a payload will tell me if I can_break or if it's executable (just like the other contexts)
Then call these context handler each time JS is found in the HTML-level contexts

Sub-contexts for Style Sheets

Same as above but for CSS

The text was updated successfully, but these errors were encountered:

Related to #37

* All the rest is broken

* Still need to hook all this with JavaScript and CSS parsing stuff

* Fixed bug in HTML attribute can_break #37

ghost assigned andresriancho Jan 22, 2013

andresriancho added the MUST-HAVE label Mar 28, 2014

andresriancho changed the title ~~Improve WAVSEP score~~ Improve WAVSEP score for XSS Mar 29, 2014

andresriancho mentioned this issue Mar 29, 2014

Improve WAVSEP score for SQL injection by at least 20% #1799

Open

20 tasks

andresriancho added plugin labels Mar 29, 2014

andresriancho mentioned this issue Mar 29, 2014

Write unittests against all WAVSEP sections #1800

Open

11 tasks

andresriancho changed the title ~~Improve WAVSEP score for XSS~~ Improve w3af's score for WAVSEP XSS Mar 29, 2014

andresriancho modified the milestones: OLD .17, 1.7 - Multiprocessing release, 1.7 - Release Mar 30, 2014

andresriancho removed their assignment Mar 30, 2014

andresriancho changed the title ~~Improve w3af's score for WAVSEP XSS~~ Improve w3af's score for WAVSEP XSS by at least 20% Mar 30, 2014

andresriancho added priority:medium and removed MUST-HAVE labels Jun 15, 2015

andresriancho added a commit that referenced this issue Aug 26, 2015

Refactoring related with improving context detection for #37

42cb816

andresriancho added a commit that referenced this issue Aug 26, 2015

More refactoring before actually working on #37

f3b5419

andresriancho added a commit that referenced this issue Aug 26, 2015

Commit to move to a different task

08a4b8a

Related to #37

andresriancho added a commit that referenced this issue Aug 31, 2015

* HTML contexts are working with the new ideas defined at #37

7959a04

* All the rest is broken

andresriancho added a commit that referenced this issue Aug 31, 2015

Adding tests for javascript code parser #37

7b0b633

andresriancho added a commit that referenced this issue Sep 2, 2015

Learning process for context #37

ef9303c

andresriancho added a commit that referenced this issue Sep 9, 2015

* Context code rewrite v3.0 for HTML #37

525c504

* Still need to hook all this with JavaScript and CSS parsing stuff

andresriancho added a commit that referenced this issue Sep 9, 2015

New contexts and handling of invalid html #37

4e680cb

andresriancho added a commit that referenced this issue Sep 9, 2015

Minimalistic JavaScript parser #37

c509ccb

andresriancho added a commit that referenced this issue Sep 10, 2015

Great progress on JavaScript parsing #37

21f383a

andresriancho added a commit that referenced this issue Sep 10, 2015

* Improved test suite

4deaabf

* Fixed bug in HTML attribute can_break #37

andresriancho added a commit that referenced this issue Sep 10, 2015

HTML delegate to CSS parser for can_break #37

8f32f75

andresriancho added a commit that referenced this issue Sep 10, 2015

WAVSEP XSS coverage 100% #37

e9bd0a6

andresriancho added a commit that referenced this issue Sep 10, 2015

Merge branch 'feature/xss-improve-wavsep' into develop #37

0681d6b

andresriancho closed this as completed Sep 10, 2015

andresriancho mentioned this issue Sep 10, 2015

Performance improvements to HTML context detection engine #12559

Open

2 tasks

avenkat3 mentioned this issue Mar 31, 2018

Unable to launch w3af GUI or console in Linux Mint #16877

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve w3af's score for WAVSEP XSS by at least 20% #37

Improve w3af's score for WAVSEP XSS by at least 20% #37

andresriancho commented Jan 22, 2013

Improve w3af's score for WAVSEP XSS by at least 20% #37

Improve w3af's score for WAVSEP XSS by at least 20% #37

Comments

andresriancho commented Jan 22, 2013

User story

Conditions of satisfaction

Context code rewrite v3.0

Context code rewrite v2.0

New context classes

Sub-contexts for JavaScript

Sub-contexts for Style Sheets