Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve w3af's score for WAVSEP XSS by at least 20% #37

Closed
6 tasks done
andresriancho opened this issue Jan 22, 2013 · 0 comments
Closed
6 tasks done

Improve w3af's score for WAVSEP XSS by at least 20% #37

andresriancho opened this issue Jan 22, 2013 · 0 comments

Comments

@andresriancho
Copy link
Owner

User story

As a user I want w3af to find as many XSS vulnerabilities as possible.

Conditions of satisfaction

  • Unittests for all XSS sections of WAVSEP need to be written
  • Coverage % needs to be calculated and asserted
  • Once that's done, improve detection rate by 20%
  • If we're not at 100%, create a new task (similar to this one) to improve another 10%

Context code rewrite v3.0

While I was re-writing the code as specified in the Context code rewrite v2.0 section one main issue appeared: finding context based on inside_context (see below and code in xss branch) was too error/false-positive prone. With inside_context the context accuracy is rather low 👎

The main reason for using the previous strategy was that we're sending payloads that might break the HTML structure, this classic parsers would be unable to determine the payload's location. A solution for this issue would be to:

  • Send the payload
  • Replace the payload with something "clean" such as "123456789abcdef"
  • Find the context for the clean payload, which doesn't break any HTML structures

Using that strategy we would be able to use almost any HTML parser to determine the context, the problems that I foresee now are:

  • Parsers might not be able to identify all the contexts we need
  • Parsers might not make a difference between single/double/backtick delimited tag attributes
  • lxml had problems with memory leaks - if we choose an HTML parser which gives us everything we need but is not tested, we might end up once again with a problem like that

The same problem with context appears when parsing JavaScript and CSS, so we might need to find a parser for that too.

An option that we might experiment with is to build our own parser-lexer

This method looks promising. If we use this there is a silly trick that might come handy to determine if the payload is in single/double/backtick quotes, simply dump the tag text, get the attribute value (containing the payload) and search for "fooPAYLOADbar", then if that's not in the tag text search for the same but with single quotes and so on. The only detail would be to check for " and ' escapes

Context code rewrite v2.0

The code that processes the HTML and makes it possible to identify the HTML context where the payload landed is complex, hard to debug and extend. So I propose a rewrite with the following objectives:

  • Completely remove the ByteChunk
  • Make each context self-contained and testeable
  • Write contexts in such a way that they can be nested
  • Write contexts to decode JS

New context classes

class Context(object):
    NAME = 'HTML'

    @staticmethod
    def match(normalized_html):
        return True

    def can_break(payload):
        raise NotImplementedError

    def executable():
        return False

    def inside_context(normalized_html, context_start, context_end):
        """
        :return: True if we perform a reverse find of context_start (ie '<script'), then
                     a reverse find of context_end (ie. '</script>') and the second has a
                     lower index; meaning we're still in the context of '<script>' tag

        :param context_start: Would be '<script' in the example above
        :param context_start: Would be '</script>' in the example above
        """

    def current_context_content(normalized_html, context_start, context_end):
        """
        Extract the current context text, handles the following cases:
            <script type="application/json">foo();</script>
            <script>foo();</script>

        Returning 'foo();' in both.

        :param context_start: Would be '<script' in the example above
        :param context_start: Would be '</script>' in the example above
        """
        pass

This context matches if we're inside a <script> tag:

class ScriptTagContext(Context):
    NAME = 'SCRIPT_TAG'

    @staticmethod
    def match(normalized_html):
        return Context.inside_context(normalized_html, '<script', '</script>')

This context matches when we're inside a script tag and a multi-line comment:

class ScriptTagMultiLineCommentContext(ScriptTagContext):
    NAME = 'SCRIPT_MULTI_COMMENT'

    @staticmethod
    def match(normalized_html):
        if not ScriptTagContext.match(normalized_html):
            return False
        script_code = Context.current_context_content(normalized_html, '<script', '</script>')
        js_context = get_js_context(script_code)
        if js_context is None:
            return False
        return isinstance(js_context, JSMultiLineComment)

    def can_break(payload)
        return '*/' in payload

Since contexts don't hold any data I believe it would be smart to make the match method a @staticmethod and only create an instance when we find a match and need to return it to the xss plugin.

With this new approach the order in which we match the contexts is very important. It MUST start with the more specific contexts and then move down to the generic cases (ScriptTagMultiLineCommentContext goes before ScriptTagContext)

It might be a good idea to write the inside_context(normalized_html, '<script', '</script>') with a lru cache, since we might call it with the same params several times (one for each context+sub-contexts inside it).

Sub-contexts for JavaScript

JavaScript can be found in several places:
* <script>...</script>
* <a href="javascript:...">
* <a onmouseover="...">

When found we need to analyze the JS code to understand if in this context we can run arbitrary code (or not). In order to do this we'll need JavaScript sub-contexts.

What I have in mind is to:

  • First write a context analyzer for JS which given a piece of JS code and a payload will tell me if I can_break or if it's executable (just like the other contexts)
  • Then call these context handler each time JS is found in the HTML-level contexts

Sub-contexts for Style Sheets

Same as above but for CSS

@ghost ghost assigned andresriancho Jan 22, 2013
@andresriancho andresriancho changed the title Improve WAVSEP score Improve WAVSEP score for XSS Mar 29, 2014
@andresriancho andresriancho changed the title Improve WAVSEP score for XSS Improve w3af's score for WAVSEP XSS Mar 29, 2014
@andresriancho andresriancho modified the milestones: OLD .17, 1.7 - Multiprocessing release, 1.7 - Release Mar 30, 2014
@andresriancho andresriancho removed their assignment Mar 30, 2014
@andresriancho andresriancho changed the title Improve w3af's score for WAVSEP XSS Improve w3af's score for WAVSEP XSS by at least 20% Mar 30, 2014
andresriancho added a commit that referenced this issue Aug 26, 2015
andresriancho added a commit that referenced this issue Aug 31, 2015
andresriancho added a commit that referenced this issue Sep 2, 2015
andresriancho added a commit that referenced this issue Sep 9, 2015
* Still need to hook all this with JavaScript and CSS parsing stuff
andresriancho added a commit that referenced this issue Sep 9, 2015
andresriancho added a commit that referenced this issue Sep 10, 2015
* Fixed bug in HTML attribute can_break #37
andresriancho added a commit that referenced this issue Sep 10, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant