# Web and GUI Testing

We apply our techniques on Graphical User Interfaces (GUIs), notably on Web interfaces.

**Prerequisites**

* _Refer to earlier chapters as notebooks here, as here:_ [Earlier Chapter](Fuzzer.ipynb).

## A Web User Interface

### Taking Orders

In [None]:
import fuzzingbook_utils

In [None]:
from IPython.core.display import HTML, display

In [None]:
fuzzingbook_swag = {
    "tshirt": "One FuzzingBook T-Shirt",
    "drill": "One FuzzingBook Rotary Hammer",
    "lockset": "One FuzzingBook Lock Set"
}

In [None]:
html_order_form = """
<html><body>
<form action="/order" style="border:3px; border-style:solid; border-color:#FF0000; padding: 1em;">
  <!-- We don't use h2, h3, etc. here as it interferes with notebook tocs -->
  <strong style="font-size: x-large">Fuzzingbook Swag Order Form</strong>
  <p>
  Yes! Please send me at your earliest convenience
  <select name="item">
  """

for item in fuzzingbook_swag:
    html_order_form += '<option value="{item}">{name}</option>'.format(item=item, 
        name=fuzzingbook_swag[item])

html_order_form += """
  </select>
  <br>
  <table>
  <tr><td>
  <label for="name">Name: </label><input type="text" name="name">
  </td><td>
  <label for="email">Email: </label><input type="email" name="email"><br>
  </td></tr>
  <tr><td>
  <label for="city">City: </label><input type="text" name="city">
  </td><td>
  <label for="zip">ZIP Code: </label><input type="number" name="zip">
  </tr></tr>
  </table>
  <input type="checkbox" name="tandc"><label for="tandc">I have read 
  the <a href="/">terms and conditions</a></label><br>
  <button>Place order</button>
</p>
</form>
</body></html>
"""

In [None]:
HTML(html_order_form)

### Processing Orders

In [None]:
html_order_received = """
<html><body>
<div style="border:3px; border-style:solid; border-color:#FF0000; padding: 1em;">
  <strong style="font-size: x-large">Thank you for your Fuzzingbook Order!</strong>
  <p>
  We will send <strong>{item_name}</strong> to {name} in {city}, {zip}<br>
  A confirmation mail will be sent to {email}.
  </p>
</div>
</body></html>
"""

In [None]:
HTML(html_order_received.format(item_name="One FuzzingBook Rotary Hammer", 
                                name="Jane Doe", 
                                email="doe@example.com",
                                city="Seattle",
                                zip="98104"))

### Handling HTTP Requests

In [None]:
from multiprocessing import Process, Queue

In [None]:
from http.server import HTTPServer, BaseHTTPRequestHandler, HTTPStatus

In [None]:
import urllib.parse

In [None]:
import html

In [None]:
class MyHTTPRequestHandler(BaseHTTPRequestHandler):
    def do_HEAD(self):
        # print("HEAD " + self.path)
        self.send_response(HTTPStatus.OK)
        self.send_header("Content-type", "text/html")
        self.end_headers()
            
    def do_GET(self):
        # print("GET " + self.path)
        if self.path == "/":
            self.send_order_form()
        elif self.path.startswith("/order"):
            self.send_order_received()
        else:
            self.send_response(HTTPStatus.NOT_FOUND, "Not found")

In [None]:
class MyHTTPRequestHandler(MyHTTPRequestHandler):
    def send_order_form(self):
        self.send_response(HTTPStatus.OK, "Place your order")
        self.send_header("Content-type", "text/html")
        self.end_headers()
        self.wfile.write(html_order_form.encode("utf8"))

In [None]:
class MyHTTPRequestHandler(MyHTTPRequestHandler):
    def get_field_values(self):
        # self.path is sth like "/order?item=foo&name=bar"
        # Note: this fails to decode non-ASCII characters properly
        query_string = urllib.parse.urlparse(self.path).query
        
        # fields is { 'item': ['tshirt'], 'name': ['Jane Doe'], ...}
        fields = urllib.parse.parse_qs(query_string, keep_blank_values=True)

        values = {}
        html_values = {}
        for key in fields:
#             values[key] = urllib.parse.unquote(html.unescape(fields[key][0]))
#             html_values[key] = html.escape(urllib.parse.unquote(values[key]))
            values[key] = fields[key][0]
            html_values[key] = values[key]
            
        return values, html_values

In [None]:
class MyHTTPRequestHandler(MyHTTPRequestHandler):
    def send_order_received(self):
        try:
            values, html_values = self.get_field_values()
            values["item_name"] = fuzzingbook_swag[values["item"]]
            html_values["item_name"] = values["item_name"]  # Should use html.escape()

            confirmation = html_order_received.format(**html_values).encode("utf8")

            self.send_response(HTTPStatus.OK, "Order received")
            self.send_header("Content-type", "text/html")
            self.end_headers()
            self.wfile.write(confirmation)
        except Exception:
            self.internal_server_error()

### Error Handling

In [None]:
class MyHTTPRequestHandler(MyHTTPRequestHandler):
    def not_found(self):
        self.send_response(HTTPStatus.NOT_FOUND, "Not found")

In [None]:
html_internal_server_error = """
<html><body>
<div style="border:3px; border-style:solid; border-color:#FF0000; padding: 1em;">
  <strong style="font-size: x-large">Internal Server Error</strong>
  <p>
  The server has encountered an internal error.  Please come back later.
  <pre>{error_message}</pre>
  </p>
</div>
</body></html>
  """

In [None]:
HTML(html_internal_server_error)

In [None]:
import sys
import traceback

In [None]:
class MyHTTPRequestHandler(MyHTTPRequestHandler):
    def internal_server_error(self):
        self.send_response(HTTPStatus.INTERNAL_SERVER_ERROR, "Internal Error")
        self.send_header("Content-type", "text/html")
        self.end_headers()

        exc = traceback.format_exc()
        self.log_message("%s", exc.strip())
        # print(exc, file=sys.stderr, end="")

        message = html_internal_server_error.format(error_message=exc)
        self.wfile.write(message.encode("utf8"))

### Logging

In [None]:
httpd_message_queue = Queue()

In [None]:
class MyHTTPRequestHandler(MyHTTPRequestHandler):
    def log_message(self, format, *args):
        message = ("%s - - [%s] %s\n" %
                         (self.address_string(),
                          self.log_date_time_string(),
                          format % args))
        httpd_message_queue.put(message)

In [None]:
def display_httpd_message(message):
    display(HTML('<pre style="background: NavajoWhite">' + message + "</pre>"))

In [None]:
display_httpd_message("I am a httpd server message")

In [None]:
def print_httpd_messages():
    while not httpd_message_queue.empty():
        message = httpd_message_queue.get()
        display_httpd_message(message)

In [None]:
httpd_message_queue.put("I am another message")

In [None]:
httpd_message_queue.put("I am one more message")

In [None]:
print_httpd_messages()

In [None]:
import Carver

In [None]:
def webbrowser(url):
    try:
        contents = Carver.webbrowser(url)
    finally:
        print_httpd_messages()
    return contents

### Running the Server

In [None]:
def run_httpd():
    host = "127.0.0.1"
    for port in range(8800, 9000):
        httpd_address = (host, port)

        try:
            httpd = HTTPServer(httpd_address, MyHTTPRequestHandler)
            break
        except OSError:
            continue

    httpd_url = "http://" + host + ":" + repr(port) + "/"
    httpd_message_queue.put(httpd_url)
    httpd.serve_forever()

In [None]:
httpd_process = Process(target=run_httpd)
httpd_process.start()

In [None]:
httpd_url = httpd_message_queue.get()
httpd_url

In [None]:
HTML('<a href="' + httpd_url + '">' + httpd_url + "</a>")

In [None]:
print_httpd_messages()

In [None]:
contents = webbrowser(httpd_url)

In [None]:
HTML(contents)

How can we test this?  By sending one command after another.

In [None]:
HTML(webbrowser(httpd_url + 
                "order?item=tshirt&name=Jane+Doe&email=doe%40example.com&city=Seattle&zip=98104"))

## Fuzzing A Web Form

### Fuzzing with Expected Values

In [None]:
from Grammars import crange, is_valid_grammar

In [None]:
ORDER_GRAMMAR = {
    "<start>": [ "<order>" ],
    "<order>": [ "order?item=<item>&name=<name>&email=<email>&city=<city>&zip=<zip>" ],
    "<item>": [ "tshirt", "drill", "lockset" ],
    "<name>": [ "Jane+Doe", "John+Smith" ],
    "<email>": [ "j.doe%40example.com", "j_smith%40example.com"],
    "<city>": [ "Seattle", "New+York"],
    "<zip>": [ "<digit>" * 5 ],
    "<digit>": crange('0', '9')
}
assert is_valid_grammar(ORDER_GRAMMAR)

In [None]:
from GrammarFuzzer import GrammarFuzzer

In [None]:
order_fuzzer = GrammarFuzzer(ORDER_GRAMMAR)
[order_fuzzer.fuzz() for i in range(5)]

In [None]:
url = httpd_url + order_fuzzer.fuzz()
HTML(webbrowser(url))

### Fuzzing with Unexpected Values

So far, so good.  But what happens when we enter random values?

In [None]:
from MutationFuzzer import MutationFuzzer

In [None]:
seed = order_fuzzer.fuzz()
seed

In [None]:
mutate_order_fuzzer = MutationFuzzer([seed], min_mutations=1, max_mutations=1)
[mutate_order_fuzzer.fuzz() for i in range(5)]

In [None]:
from ExpectError import ExpectError

In [None]:
import traceback

In [None]:
import urllib

In [None]:
while True:
    url = httpd_url + mutate_order_fuzzer.fuzz()
    try:
        answer = webbrowser(url)
    except urllib.request.HTTPError:
        traceback.print_exc()
        break

In [None]:
from Reducer import DeltaDebuggingReducer

In [None]:
from Fuzzer import Runner

In [None]:
class WebRunner(Runner):
    def run(self, inp):
        url = httpd_url + inp
        try:
            answer = webbrowser(url)
            return inp, Runner.PASS
        except urllib.request.HTTPError:
            return inp, Runner.FAIL
        except Exception:
            return inp, Runner.UNRESOLVED

In [None]:
web_runner = WebRunner()

In [None]:
while True:
    failing_input, outcome = web_runner.run(mutate_order_fuzzer.fuzz())
    if outcome == Runner.FAIL:
        break

In [None]:
failing_input

In [None]:
web_reducer = DeltaDebuggingReducer(web_runner)

In [None]:
web_reducer.reduce(failing_input)

In [None]:
with ExpectError():
    webbrowser(httpd_url + "order")

We see that we have a lot to do to make our Web Server more robust against unexpected inputs.

## Crafting Web Attacks

More interesting, though: Values that are not as common as these

In [None]:
import string

In [None]:
def cgi_encode(s):
    ret = ""
    for c in s:
        if c in string.ascii_letters or c in string.digits:
            ret += c
        elif c == ' ':
            ret += '+'
        else:
            ret += "%%%02x" % ord(c)
    return ret

In [None]:
s = cgi_encode("'DOW50' is down .24%")
s

In [None]:
from Coverage import cgi_decode

In [None]:
cgi_decode(s)

### Injecting Code

In [None]:
from Grammars import extend_grammar

In [None]:
ORDER_GRAMMAR_WITH_HTML_INJECTION = extend_grammar(ORDER_GRAMMAR, {
    "<name>": [ cgi_encode('''
    Jane Doe<p>
    <strong><a href="www.lots.of.malware">Click here for cute cat pictures!</a></strong>
    </p>
    ''')],
})

In [None]:
html_injection_fuzzer = GrammarFuzzer(ORDER_GRAMMAR_WITH_HTML_INJECTION)
html_injection_order = html_injection_fuzzer.fuzz()
html_injection_order

In [None]:
HTML(webbrowser(httpd_url + html_injection_order))

Instead of injecting HTML, as in this example, we could also insert JavaScript code that would then automatically be executed on any Web page that shows customer info – in particular on the vendor's site, where it could be set to retrieve credentials or other means to access the entire database.

### Injecting SQL Commands

In [None]:
from Grammars import extend_grammar

In [None]:
ORDER_GRAMMAR_WITH_SQL_INJECTION = extend_grammar(ORDER_GRAMMAR, {
    "<name>": [ cgi_encode("Jane'; DROP TABLE orders; --")],  # https://xkcd.com/327/
})

In [None]:
sql_injection_fuzzer = GrammarFuzzer(ORDER_GRAMMAR_WITH_SQL_INJECTION)
sql_injection_order = sql_injection_fuzzer.fuzz()
sql_injection_order

In [None]:
HTML(webbrowser(httpd_url + sql_injection_order))

## Extracting Grammars for Web GUIs

In [None]:
html_doc = webbrowser(httpd_url)
html_doc

We could define a grammar to parse HTML, but it is much easier to use the existing, dedicated parser:

In [None]:
from html.parser import HTMLParser

In [None]:
class MyHTMLParser(HTMLParser):
    def reset(self):
        super().reset()
        self.fields = {}
        self.action = ""

    def handle_starttag(self, tag, attrs):
        attributes = {attr_name: attr_value for attr_name, attr_value in attrs}
        # print(tag, attributes)

        if tag == "form":
            self.action = attributes.get("action", "")

        if tag == "input":
            if "name" in attributes:
                name = attributes["name"]
                self.fields[name] = attributes.get("type", "text")

In [None]:
class HTMLGrammarMiner(object):
    def __init__(self, html_doc):
        html_parser = MyHTMLParser()
        html_parser.feed(html_doc)
        self.fields = html_parser.fields
        self.action = html_parser.action

In [None]:
html_miner = HTMLGrammarMiner(html_doc)
html_miner.action

In [None]:
html_miner.fields

In [None]:
from Grammars import crange, srange, new_symbol, unreachable_nonterminals, CGI_GRAMMAR

In [None]:
class HTMLGrammarMiner(HTMLGrammarMiner):
    QUERY_GRAMMAR = extend_grammar(CGI_GRAMMAR, {
        "<start>": ["<action>?<query>"],

        "<text>": ["<string>"],

        "<number>": ["<digits>"],
        "<digits>": ["<digit>", "<digits><digit>"],
        "<digit>": crange('0', '9'),
        
        "<checkbox>": ["on", "off"],
        "<email>": ["<string>%40<string>"],
    })

In [None]:
class HTMLGrammarMiner(HTMLGrammarMiner):
    def mine_grammar(self):
        grammar = extend_grammar(self.QUERY_GRAMMAR)
        grammar["<action>"] = [self.action]

        query = ""
        for field in self.fields:
            field_symbol = new_symbol(grammar, "<" + field + ">")
            field_type = self.fields[field]

            if query != "":
                query += "&"
            query += field_symbol
            grammar[field_symbol] = [field + "=<" + field_type + ">"]

        grammar["<query>"] = [query]

        # Remove unused parts
        for nonterminal in unreachable_nonterminals(grammar):
            del grammar[nonterminal]
        assert is_valid_grammar(grammar)
            
        return grammar

In [None]:
html_miner = HTMLGrammarMiner(html_doc)
grammar = html_miner.mine_grammar()
grammar["<start>"]

In [None]:
grammar["<action>"]

In [None]:
grammar["<query>"]

In [None]:
grammar["<zip>"]

In [None]:
grammar["<tandc>"]

In [None]:
order_fuzzer = GrammarFuzzer(grammar)
[order_fuzzer.fuzz() for i in range(5)]

We see (one more time) that we can mine a grammar automatically from given data.

Limitations:

* Limited to one form per page; no escaping, CGI encoding, etc.
* Limited to GET actions (no POST, PUT, etc.)  Consider http://docs.python-requests.org/en/latest/api/
* No Javascript handling for dynamic Web pages
* Could use specific values (or ranges) for specific fields (e.g. ZIP as five digits)

## Lessons Learned

* _Lesson one_
* _Lesson two_
* _Lesson three_

We're done, so we can clean up:

In [None]:
1/0

In [None]:
import time

In [None]:
time.sleep(5)
http_process.terminate()

## Next Steps

_Link to subsequent chapters (notebooks) here, as in:_

* [use _mutations_ on existing inputs to get more valid inputs](MutationFuzzer.ipynb)
* [use _grammars_ (i.e., a specification of the input format) to get even more valid inputs](Grammars.ipynb)
* [reduce _failing inputs_ for efficient debugging](Reducer.ipynb)


## Background

_Cite relevant works in the literature and put them into context, as in:_

The idea of ensuring that each expansion in the grammar is used at least once goes back to Burkhardt \cite{Burkhardt1967}, to be later rediscovered by Paul Purdom \cite{Purdom1972}.

## Exercises

_Close the chapter with a few exercises such that people have things to do.  To make the solutions hidden (to be revealed by the user), have them start with_

```markdown
**Solution.**
```

_Your solution can then extend up to the next title (i.e., any markdown cell starting with `#`)._

_Running `make metadata` will automatically add metadata to the cells such that the cells will be hidden by default, and can be uncovered by the user.  The button will be introduced above the solution._

### Exercise 1: _Title_

_Text of the exercise_

In [None]:
# Some code that is part of the exercise
pass

_Some more text for the exercise_

**Solution.** _Some text for the solution_

In [None]:
# Some code for the solution
2 + 2

_Some more text for the solution_

### Exercise 2: _Title_

_Text of the exercise_

**Solution.** _Solution for the exercise_