# Combinatorial Fuzzing

In this chapter, we explore how to systematically cover software configurations – that is, the settings that govern the execution of a program on its (regular) input data.  By _automatically inferring configuration options_, we can apply these techniques out of the box, with no need for writing a grammar.

**Prerequisites**

* You should have read the [chapter on grammars](Grammars.ipynb).
* You should have read the [chapter on grammar coverage](GrammarCoverage.ipynb).

## Configuration Options


In [1]:
import fuzzingbook_utils

In [2]:
import argparse

In [3]:
def process_numbers(args=[]):
    parser = argparse.ArgumentParser(description='Process some integers.')
    parser.add_argument('integers', metavar='N', type=int, nargs='+',
                        help='an integer for the accumulator')
    group = parser.add_mutually_exclusive_group(required=True)
    group.add_argument('--sum', dest='accumulate', action='store_const',
                        const=sum,
                        help='sum the integers')
    group.add_argument('--min', dest='accumulate', action='store_const',
                        const=min,
                        help='compute the minimum')
    group.add_argument('--max', dest='accumulate', action='store_const',
                        const=max,
                        help='compute the maximum')

    args = parser.parse_args(args)
    print(args.accumulate(args.integers))

In [4]:
process_numbers(["--min", "100", "200", "300"])

100


In [5]:
process_numbers(["--sum", '1', '2', '3'])

6


In [6]:
from Grammars import crange, srange, convert_ebnf_grammar, is_valid_grammar, START_SYMBOL

In [7]:
PROCESS_NUMBERS_GRAMMAR_EBNF = {
    "<start>": ["<operator> <integers>"],
    "<operator>": ["--sum", "--min", "--max"],
    "<integers>": ["<integer>", "<integers> <integer>"],
    "<integer>": ["<digit>+"],
    "<digit>": crange('0', '9')
}

assert is_valid_grammar(PROCESS_NUMBERS_GRAMMAR_EBNF)

In [8]:
PROCESS_NUMBERS_GRAMMAR = convert_ebnf_grammar(PROCESS_NUMBERS_GRAMMAR_EBNF)

In [9]:
from GrammarCoverageFuzzer import GrammarCoverageFuzzer

In [10]:
f = GrammarCoverageFuzzer(PROCESS_NUMBERS_GRAMMAR, min_nonterminals=10)
for i in range(3):
    print(f.fuzz())

--max 9 5 8 210 80 9756431
--sum 9 4 99 1245 612370
--min 2 3 0 46 15798 7570926


## Mining Options


In [11]:
import sys

In [12]:
import string

In [13]:
class ParseInterrupt(Exception):
    pass

In [14]:
class OptionGrammarMiner(object):
    def __init__(self, function, log=False):
        self.function = function    # FIXME: Should this be a runner?
        self.log = log

In [15]:
class OptionGrammarMiner(OptionGrammarMiner):
    OPTION_SYMBOL   = "<options>" 
    ARGUMENT_SYMBOL = "<arguments>" 
    def mine_ebnf(self):
        self.grammar = { 
            START_SYMBOL: [self.OPTION_SYMBOL + self.ARGUMENT_SYMBOL],
            self.OPTION_SYMBOL: [""], 
            self.ARGUMENT_SYMBOL: [""]
        }
        assert is_valid_grammar(self.grammar)
        
        old_trace = sys.settrace(self.traceit)
        try:
            self.function()
        except ParseInterrupt:
            pass
        sys.settrace(old_trace)
        
        return self.grammar
    
    def mine(self):
        return convert_ebnf_grammar(self.mine_ebnf())

In [16]:
class OptionGrammarMiner(OptionGrammarMiner):
    def traceit(self, frame, event, arg):
        if event != "call":
            return

        if "self" not in frame.f_locals:
            return
        self_var = frame.f_locals["self"]

        method_name = frame.f_code.co_name
        if method_name == "add_argument":
            self.process_argument(frame.f_locals)

        if method_name == "parse_args":
            raise ParseInterrupt

        return None

In [17]:
class OptionGrammarMiner(OptionGrammarMiner):
    def process_argument(self, locals):
        args = locals["args"]
        kwargs = locals["kwargs"]

        if self.log:
            print(args)
            print(kwargs)
            print()

        for arg in args:
            if arg.startswith('-'):
                target = self.OPTION_SYMBOL
                metavar = None
                arg = " " + arg
            else:
                target = self.ARGUMENT_SYMBOL
                metavar = arg
                arg = ""

            if "nargs" in kwargs:
                nargs = kwargs["nargs"]
            else:
                nargs = 1
            
            if "action" in kwargs:
                # No argument
                param = ""
                nargs = 0
            else:
                if "type" in kwargs and issubclass(kwargs["type"], int):
                    type_ = "int"
                else:
                    type_ = "str"

                if metavar is None and "metavar" in kwargs:
                    metavar = kwargs["metavar"]
                    
                if metavar is not None:
                    self.grammar["<" + metavar + ">"] = ["<" + type_ + ">"]
                else:
                    metavar = type_
                    
                if type_ == "int":
                    self.grammar["<int>"] = ["(-)?<digit>+"]
                    self.grammar["<digit>"] = crange('0', '9')
                    param = " <" + metavar + ">"
                else:
                    self.grammar["<str>"] = ["<char>+"]
                    self.grammar["<char>"] = srange(string.digits + string.ascii_letters + string.punctuation)
                    param = " <" + metavar + ">"

            if isinstance(nargs, int):
                for i in range(nargs):
                    arg += param
            else:
                assert nargs in "?+*"
                arg += '(' + param + ')' + nargs
                    
            if target == self.OPTION_SYMBOL:
                self.grammar[target][0] += '(' + arg + ')?'
            else:
                self.grammar[target][0] += arg

In [18]:
om = OptionGrammarMiner(process_numbers, log=True)
grammar_ebnf = om.mine_ebnf()
assert is_valid_grammar(grammar_ebnf)
grammar_ebnf

('-h', '--help')
{'action': 'help', 'default': '==SUPPRESS==', 'help': 'show this help message and exit'}

('integers',)
{'metavar': 'N', 'type': <class 'int'>, 'nargs': '+', 'help': 'an integer for the accumulator'}

('--sum',)
{'dest': 'accumulate', 'action': 'store_const', 'const': <built-in function sum>, 'help': 'sum the integers'}

('--min',)
{'dest': 'accumulate', 'action': 'store_const', 'const': <built-in function min>, 'help': 'compute the minimum'}

('--max',)
{'dest': 'accumulate', 'action': 'store_const', 'const': <built-in function max>, 'help': 'compute the maximum'}



{'<start>': ['<options><arguments>'],
 '<options>': ['( -h)?( --help)?( --sum)?( --min)?( --max)?'],
 '<arguments>': ['( <integers>)+'],
 '<integers>': ['<int>'],
 '<int>': ['(-)?<digit>+'],
 '<digit>': ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']}

In [19]:
grammar = convert_ebnf_grammar(grammar_ebnf)
assert is_valid_grammar(grammar)

In [20]:
f = GrammarCoverageFuzzer(grammar)
for i in range(10):
    print(f.fuzz())

 -h --help --sum 5
 --min --max -207 14 6 -8 39
 -h --help --max 9575
 1
 --help --sum --min -2 -9 -89430541142
 -h -61 -7
 --sum --min --max 3
 --max -2 1
 -h --sum --min --max -825 9
 --help --sum 2


\todo{Handle exclusive groups}

## Complex Args

In [21]:
!autopep8 --help

usage: autopep8 [-h] [--version] [-v] [-d] [-i] [--global-config filename]
                [--ignore-local-config] [-r] [-j n] [-p n] [-a]
                [--experimental] [--exclude globs] [--list-fixes]
                [--ignore errors] [--select errors] [--max-line-length n]
                [--line-range line line] [--hang-closing]
                [files [files ...]]

Automatically formats Python code to conform to the PEP 8 style guide.

positional arguments:
  files                 files to format or '-' for standard in

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  -v, --verbose         print verbose messages; multiple -v result in more
                        verbose messages
  -d, --diff            print the diff for the fixed source
  -i, --in-place        make changes to files in place
  --global-config filename
                        path to a global pep8 confi

In [22]:
import autopep8

In [23]:
om = OptionGrammarMiner(autopep8.main, log=True)
grammar_ebnf = om.mine_ebnf()
grammar_ebnf

('-h', '--help')
{'action': 'help', 'default': '==SUPPRESS==', 'help': 'show this help message and exit'}

('--version',)
{'action': 'version', 'version': '%(prog)s 1.3.4 (pycodestyle: 2.4.0)'}

('-v', '--verbose')
{'action': 'count', 'default': 0, 'help': 'print verbose messages; multiple -v result in more verbose messages'}

('-d', '--diff')
{'action': 'store_true', 'help': 'print the diff for the fixed source'}

('-i', '--in-place')
{'action': 'store_true', 'help': 'make changes to files in place'}

('--global-config',)
{'metavar': 'filename', 'default': '/Users/zeller/.config/pep8', 'help': 'path to a global pep8 config file; if this file does not exist then this is ignored (default: /Users/zeller/.config/pep8)'}

('--ignore-local-config',)
{'action': 'store_true', 'help': "don't look for and apply local config files; if not passed, defaults are updated with any config files in the project's root directory"}

('-r', '--recursive')
{'action': 'store_true', 'help': 'run recursively o

{'<start>': ['<options><arguments>'],
 '<options>': ['( -h)?( --help)?( --version)?( -v)?( --verbose)?( -d)?( --diff)?( -i)?( --in-place)?( --global-config <filename>)?( --ignore-local-config)?( -r)?( --recursive)?( -j <n>)?( --jobs <n>)?( -p <n>)?( --pep8-passes <n>)?( -a)?( --aggressive)?( --experimental)?( --exclude <globs>)?( --list-fixes)?( --ignore <errors>)?( --select <errors>)?( --max-line-length <n>)?( --line-range <line> <line>)?( --range <line> <line>)?( --indent-size <int>)?( --hang-closing)?'],
 '<arguments>': ['( <files>)*'],
 '<filename>': ['<str>'],
 '<str>': ['<char>+'],
 '<char>': ['0',
  '1',
  '2',
  '3',
  '4',
  '5',
  '6',
  '7',
  '8',
  '9',
  'a',
  'b',
  'c',
  'd',
  'e',
  'f',
  'g',
  'h',
  'i',
  'j',
  'k',
  'l',
  'm',
  'n',
  'o',
  'p',
  'q',
  'r',
  's',
  't',
  'u',
  'v',
  'w',
  'x',
  'y',
  'z',
  'A',
  'B',
  'C',
  'D',
  'E',
  'F',
  'G',
  'H',
  'I',
  'J',
  'K',
  'L',
  'M',
  'N',
  'O',
  'P',
  'Q',
  'R',
  'S',
  'T',
  '

In [24]:
grammar = convert_ebnf_grammar(grammar_ebnf)
assert is_valid_grammar(grammar)

In [25]:
f = GrammarCoverageFuzzer(grammar, max_nonterminals=40)
for i in range(10):
    print(f.fuzz())

 --version -v -d --in-place -r --pep8-passes -7 -a --select E --max-line-length 0 --indent-size 38 --hang-closing *r
 -h --help --verbose --diff -i --global-config j --ignore-local-config --recursive -j -293 --jobs 5 -p 1 --aggressive --experimental --exclude O --list-fixes --ignore ~@ --line-range 02 6 --range -735 -4 B_ z 1 x9| b5 {e !#? U <[Koim RhJ 0 P c } 8 Gn&qQ S^ N>Dl:' Y F g A 7HvTM k`+u ,36p IL - \ ys= $ ] Z 4d ").t % C2 VW( X/f w ;R a b3
 -v -d --diff -i --in-place -r -j -0102 -p -230 -a --aggressive --max-line-length -61 --range -4 -577 --indent-size -2583 --hang-closing ( X 6 )
 -h --help --verbose -i --pep8-passes 118 -a --select l --line-range -58 -4 --range -0 0 --hang-closing
 -h -d --diff --in-place --ignore-local-config --pep8-passes -3 -a --aggressive --exclude 7 --max-line-length 7942 --hang-closing
 -h --version --verbose --diff -i --in-place --global-config U --ignore-local-config -j -3 --jobs 70 --pep8-passes 62436 -a --aggressive --exclude g=Dhc --list-fixes --

## _Section 2_

\todo{Add}

## _Section 3_

\todo{Add}

_If you want to introduce code, it is helpful to state the most important functions, as in:_

* `random.randrange(start, end)` - return a random number [`start`, `end`]
* `range(start, end)` - create a list with integers from `start` to `end`.  Typically used in iterations.
* `for elem in list: body` executes `body` in a loop with `elem` taking each value from `list`.
* `for i in range(start, end): body` executes `body` in a loop with `i` from `start` to `end` - 1.
* `chr(n)` - return a character with ASCII code `n`

In [26]:
import fuzzingbook_utils

In [27]:
# More code
pass

In [28]:
# Even more code
pass

## _Section 4_

\todo{Add}

## Lessons Learned

* _Lesson one_
* _Lesson two_
* _Lesson three_

## Next Steps

_Link to subsequent chapters (notebooks) here, as in:_

* [use _mutations_ on existing inputs to get more valid inputs](MutationFuzzer.ipynb)
* [use _grammars_ (i.e., a specification of the input format) to get even more valid inputs](Grammars.ipynb)
* [reduce _failing inputs_ for efficient debugging](Reducing.ipynb)


## Background

_Cite relevant works in the literature and put them into context, as in:_

The idea of ensuring that each expansion in the grammar is used at least once goes back to Burkhardt \cite{Burkhardt1967}, to be later rediscovered by Paul Purdom \cite{Purdom1972}.

## Exercises

_Close the chapter with a few exercises such that people have things to do.  To make the solutions hidden (to be revealed by the user), have them start with_

```markdown
**Solution.**
```

_Your solution can then extend up to the next title (i.e., any markdown cell starting with `#`)._

_Running `make metadata` will automatically add metadata to the cells such that the cells will be hidden by default, and can be uncovered by the user.  The button will be introduced above the solution._

### Exercise 1: _Title_

_Text of the exercise_

In [29]:
# Some code that is part of the exercise
pass

_Some more text for the exercise_

**Solution.** _Some text for the solution_

In [30]:
# Some code for the solution
2 + 2

4

_Some more text for the solution_

### Exercise 2: _Title_

_Text of the exercise_

**Solution.** _Solution for the exercise_