# EvoGFuzz: An Evolutionary Approach to Grammar-Based Fuzzing

**EvoGFuzz** stands for *evolutionary grammar-based fuzzing*. This approach leverages evolutionary optimization techniques to systematically explore the space of a program's potential inputs, with a particular emphasis on identifying inputs that could lead to exceptional behavior. With a user-defined objective, EvoGFuzz can adapt and refine the input generation strategy over time, making it a powerful tool for uncovering software defects and vulnerabilities.

Efficient detection of defects and vulnerabilities hinges on the ability to automatically generate program inputs that are both valid and diverse. One common strategy is to use grammars, which provide structured and syntactically correct inputs. This approach leads to the concept of grammar-based fuzzing, where fuzzing strategies are guided by the rules defined within the grammar.

A further enhancement to this concept is probabilistic grammar-based fuzzing, where competing grammar rules are associated with probabilities that guide their application. By carefully assigning and optimizing these probabilities, we gain considerable control over the nature of the generated inputs. This enables us to direct the fuzzing process towards specific areas of interest—for example, those functions that are deemed critical, have a higher propensity for failures, or have undergone recent modifications. 

In essence, EvoGFuzz represents a potent blend of evolutionary optimization and probabilistic grammar-based fuzzing, poised to reveal hidden defects and vulnerabilities in a targeted and efficient manner.

## Fuzzing a Program

Our program under investigation is `The Calculator`. This program acts as a typical calculator, capable of evaluating not just arithmetic expressions but also trigonometric functions, such as sine, cosine, and tangent. Furthermore, it also supports the calculation of the square root of a given number.

In [1]:
import math

def calculator(inp: str) -> float:
    """
        A simple calculator function that can evaluate arithmetic expressions 
        and perform basic trigonometric functions and square root calculations.
    """
    return eval(
        str(inp), {"sqrt": math.sqrt, "sin": math.sin, "cos": math.cos, "tan": math.tan}
    )

**Side Note:** In the `calculator`, we use Python's `eval` function, which takes a string and evaluates it as a Python expression. We provide a dictionary as the second argument to eval, mapping names to corresponding mathematical functions. This enables us to use the function names directly within the input string. 

In [2]:
# Evaluating the cosine of 2π
print(calculator('cos(6*3.141)'))

0.999993677717667


In [3]:
# Calculating the square root of 36
print(calculator('sqrt(6*6)'))

6.0


Each of these calls to the calculator will evaluate the provided string as a mathematical expression, and print the result.

Now, to find new defects, we need to introduce an oracle that tells us if the error that is triggered is something we expect or a new/unkonwn defect. The `OracleResult` is an enum with two possible values, `NO_BUG` and `BUG`. `NO_BUG` donates a passing test case and `BUG` a failing one.

We import the `OracleResult` enumerated type from the `evogfuzz` library. This is used in the oracle function to indicate the outcome of executing the 'calculator' function with a given input.

In [4]:
from evogfuzz.oracle import OracleResult

This is a function called **oracle**, which acts as an intermediary to handle and classify exceptions produced by the calculator function when given a certain input.

In [5]:
# Make sure you use the OracleResult from the evogfuzz library
from evogfuzz.oracle import OracleResult

def oracle(inp: str):
    """
    This function serves as an oracle or intermediary that catches and handles exceptions 
    generated by the 'calculator' function. The oracle function is used in the context of fuzz testing.
    It aims to determine whether an input triggers a bug in the 'calculator' function.

    Args:
        inp (str): The input string to be passed to the 'calculator' function.

    Returns:
        OracleResult: An enumerated type 'OracleResult' indicating the outcome of the function execution.
            - OracleResult.NO_BUG: Returned if the calculator function executes without any exception or only with CalculatorSyntaxError
            - OracleResult.BUG: Returned if the calculator function raises a ValueError exception, indicating a potential bug.
    """
    try:
        calculator(inp)
    except ValueError as e:
        return OracleResult.BUG
    
    return OracleResult.NO_BUG

This **oracle** function is used in the context of fuzzing to determine the impact of various inputs on the program under test (in our case the _calculator_). When the calculator function behaves as expected (i.e., no exceptions occur), the **oracle** function returns `OracleResult.NO_BUG`. However, when the `calculator` function raises an unexpected exception, the **oracle** interprets this as a potential bug in the `calculator` and returns `OracleResult.BUG`.

We can see this in action by testing a few initial inputs:

In [6]:
initial_inputs = ['sqrt(1)', 'cos(912)', 'tan(4)']

for inp in initial_inputs:
    print(inp.ljust(20), oracle(inp))

sqrt(1)              NO_BUG
cos(912)             NO_BUG
tan(4)               NO_BUG


The following code represents a simple context-free grammar for our calculator function. This grammar encompasses all the potential valid inputs to the calculator, which include mathematical expressions involving square roots, trigonometric functions, and integer and decimal numbers:

In [7]:
from fuzzingbook.Grammars import Grammar, is_valid_grammar

CALCGRAMMAR: Grammar = {
    "<start>":
        ["<function>(<term>)"],

    "<function>":
        ["sqrt", "tan", "cos", "sin"],
    
    "<term>": ["-<value>", "<value>"], 
    
    "<value>":
        ["<integer>.<integer>",
         "<integer>"],

    "<integer>":
        ["<digit><integer>", "<digit>"],

    "<digit>":
        ["1", "2", "3", "4", "5", "6", "7", "8", "9"]
}
    
grammar_alhazen: Grammar = {
    "<start>": ["<arith_expr>"],
    "<arith_expr>": ["<function>(<number>)"],
    "<function>": ["sqrt", "sin", "cos", "tan"],
    "<number>": ["<maybe_minus><onenine><maybe_digits><maybe_frac>"],
    "<maybe_minus>": ["", "-"],
    "<onenine>": [str(num) for num in range(1, 10)],
    "<digit>": [str(num) for num in range(0, 10)],
    "<maybe_digits>": ["", "<digits>"],
    "<digits>": ["<digit>", "<digit><digits>"],
    "<maybe_frac>": ["", ".<digits>"],
}
    
assert is_valid_grammar(grammar_alhazen)

The defined grammar CALCGRAMMAR provides a structured blueprint for creating various inputs for our fuzz testing. Each rule in this grammar reflects a possible valid input that our calculator function can handle. By fuzzing based on this grammar, we can systematically explore the space of valid inputs to the calculator function.

### Leveraging EvoGFuzz to Unearth New Defects

We apply our `EvoGFuzz` class to carry out fuzz testing using evolutionary grammar-based fuzzing. This is aimed at uncovering potential defects in our 'calculator' function.

To initialize our EvoGFuzz instance, we require a grammar (in our case, `CALCGRAMMAR`), an oracle function, an initial set of inputs, a fitness function, and the number of iterations to be performed in the fuzzing process.

Upon creating the `EvoGFuzz` instance, we can execute the fuzzing process. The `fuzz()` method runs the fuzzing iterations, evolving the inputs based on our fitness function, and returns a collection of inputs that lead to exceptions in the 'calculator' function.

In [8]:
from evogfuzz.evogfuzz_class import EvoGFuzz
from evogfuzz.helper import Tournament_Selection_Mode
import statistics

levenshtein_list=[]

while len(levenshtein_list)!=30:
    try:
        epp = EvoGFuzz(
            grammar=CALCGRAMMAR,
            oracle=oracle,
            inputs=initial_inputs,
            iterations=10,
            tournament_selection_mode = Tournament_Selection_Mode.HIERARCHICAL_LEVENSHTEIN
        )

        found_exception_inputs = epp.fuzz()
        
        levenshtein_list.append(len(found_exception_inputs))
        print(len(levenshtein_list),f"EvoGFuzz found {len(found_exception_inputs)} bug-triggering inputs!")
    except Exception as eee:
        print(eee)
statistics.mean(levenshtein_list)

EvoGFuzz found 240 bug-triggering inputs!
EvoGFuzz found 240 bug-triggering inputs!
EvoGFuzz found 318 bug-triggering inputs!
EvoGFuzz found 279 bug-triggering inputs!
EvoGFuzz found 240 bug-triggering inputs!
EvoGFuzz found 212 bug-triggering inputs!
EvoGFuzz found 267 bug-triggering inputs!
EvoGFuzz found 211 bug-triggering inputs!
EvoGFuzz found 247 bug-triggering inputs!
EvoGFuzz found 241 bug-triggering inputs!
EvoGFuzz found 180 bug-triggering inputs!
EvoGFuzz found 244 bug-triggering inputs!
EvoGFuzz found 84 bug-triggering inputs!
EvoGFuzz found 217 bug-triggering inputs!
EvoGFuzz found 220 bug-triggering inputs!
EvoGFuzz found 199 bug-triggering inputs!
EvoGFuzz found 297 bug-triggering inputs!
EvoGFuzz found 214 bug-triggering inputs!
EvoGFuzz found 302 bug-triggering inputs!
EvoGFuzz found 111 bug-triggering inputs!
EvoGFuzz found 207 bug-triggering inputs!
EvoGFuzz found 341 bug-triggering inputs!
EvoGFuzz found 272 bug-triggering inputs!
EvoGFuzz found 277 bug-triggering i

238.06666666666666

In [11]:
jaro_list=[]

while len(jaro_list)!=30:
    try:
        epp = EvoGFuzz(
            grammar=CALCGRAMMAR,
            oracle=oracle,
            inputs=initial_inputs,
            iterations=10,
            tournament_selection_mode = Tournament_Selection_Mode.HIERARCHICAL_JARO
        )
        found_exception_inputs = epp.fuzz()
        jaro_list.append(len(found_exception_inputs))
        print(len(jaro_list),f"EvoGFuzz found {len(found_exception_inputs)} bug-triggering inputs!")
    except Exception as eee:
        print(eee)
statistics.mean(jaro_list)

1
EvoGFuzz found 190 bug-triggering inputs!
2
EvoGFuzz found 189 bug-triggering inputs!
3
EvoGFuzz found 271 bug-triggering inputs!
4
EvoGFuzz found 172 bug-triggering inputs!
5
EvoGFuzz found 135 bug-triggering inputs!
6
EvoGFuzz found 215 bug-triggering inputs!
7
EvoGFuzz found 209 bug-triggering inputs!
8
EvoGFuzz found 221 bug-triggering inputs!
9
EvoGFuzz found 294 bug-triggering inputs!
10
EvoGFuzz found 207 bug-triggering inputs!
11
EvoGFuzz found 88 bug-triggering inputs!
12
EvoGFuzz found 257 bug-triggering inputs!
13
EvoGFuzz found 266 bug-triggering inputs!
14
EvoGFuzz found 275 bug-triggering inputs!
15
EvoGFuzz found 190 bug-triggering inputs!
16
EvoGFuzz found 245 bug-triggering inputs!
17
EvoGFuzz found 187 bug-triggering inputs!
18
EvoGFuzz found 224 bug-triggering inputs!
19
EvoGFuzz found 278 bug-triggering inputs!
20
EvoGFuzz found 287 bug-triggering inputs!
21
EvoGFuzz found 274 bug-triggering inputs!
22
EvoGFuzz found 188 bug-triggering inputs!
23
EvoGFuzz found 25

215.93333333333334

In [9]:
cos_list = []
while len(cos_list)!=30:
    try:
        epp = EvoGFuzz(
            grammar=CALCGRAMMAR,
            oracle=oracle,
            inputs=initial_inputs,
            iterations=10,
            tournament_selection_mode = Tournament_Selection_Mode.HIERARCHICAL_FEATURE_COS
        )
        found_exception_inputs = epp.fuzz()
        
        cos_list.append(len(found_exception_inputs))
        print(len(cos_list)f"EvoGFuzz found {len(found_exception_inputs)} bug-triggering inputs!")
    except Exception as eee:
        print(eee)
statistics.mean(cos_list)

1
EvoGFuzz found 350 bug-triggering inputs!
2
EvoGFuzz found 388 bug-triggering inputs!
3
EvoGFuzz found 225 bug-triggering inputs!
4
EvoGFuzz found 289 bug-triggering inputs!
5
EvoGFuzz found 329 bug-triggering inputs!
6
EvoGFuzz found 225 bug-triggering inputs!
7
EvoGFuzz found 244 bug-triggering inputs!
8
EvoGFuzz found 309 bug-triggering inputs!
9
EvoGFuzz found 299 bug-triggering inputs!
10
EvoGFuzz found 258 bug-triggering inputs!
11
EvoGFuzz found 254 bug-triggering inputs!
12
EvoGFuzz found 342 bug-triggering inputs!
13
EvoGFuzz found 329 bug-triggering inputs!
14
EvoGFuzz found 340 bug-triggering inputs!
15
EvoGFuzz found 336 bug-triggering inputs!
16
EvoGFuzz found 262 bug-triggering inputs!
17
EvoGFuzz found 207 bug-triggering inputs!
18
EvoGFuzz found 279 bug-triggering inputs!
19
EvoGFuzz found 253 bug-triggering inputs!
20
EvoGFuzz found 319 bug-triggering inputs!
21
EvoGFuzz found 335 bug-triggering inputs!
22
EvoGFuzz found 146 bug-triggering inputs!
23
EvoGFuzz found 1

282.4

In [10]:
normal_list = []
while len(normal_list)!=30:
    try:
        epp = EvoGFuzz(
            grammar=CALCGRAMMAR,
            oracle=oracle,
            inputs=initial_inputs,
            iterations=10,
        )
        found_exception_inputs = epp.fuzz()
        
        normal_list.append(len(found_exception_inputs))
        print(len(normal_list),f"EvoGFuzz found {len(found_exception_inputs)} bug-triggering inputs!")
    except Exception as eee:
        print(eee)
statistics.mean(normal_list)

1
EvoGFuzz found 148 bug-triggering inputs!
2
EvoGFuzz found 241 bug-triggering inputs!
3
EvoGFuzz found 196 bug-triggering inputs!
4
EvoGFuzz found 0 bug-triggering inputs!
5
EvoGFuzz found 258 bug-triggering inputs!
6
EvoGFuzz found 478 bug-triggering inputs!
7
EvoGFuzz found 209 bug-triggering inputs!
8
EvoGFuzz found 290 bug-triggering inputs!
9
EvoGFuzz found 59 bug-triggering inputs!
10
EvoGFuzz found 183 bug-triggering inputs!
11
EvoGFuzz found 206 bug-triggering inputs!
12
EvoGFuzz found 204 bug-triggering inputs!
13
EvoGFuzz found 168 bug-triggering inputs!
14
EvoGFuzz found 0 bug-triggering inputs!
15
EvoGFuzz found 175 bug-triggering inputs!
16
EvoGFuzz found 0 bug-triggering inputs!
17
EvoGFuzz found 152 bug-triggering inputs!
18
EvoGFuzz found 114 bug-triggering inputs!
19
EvoGFuzz found 167 bug-triggering inputs!
20
EvoGFuzz found 279 bug-triggering inputs!
21
EvoGFuzz found 241 bug-triggering inputs!
22
EvoGFuzz found 32 bug-triggering inputs!
23
EvoGFuzz found 111 bug-t

170.83333333333334

In [12]:
levenshtein_list

[240,
 240,
 318,
 279,
 240,
 212,
 267,
 211,
 247,
 241,
 180,
 244,
 84,
 217,
 220,
 199,
 297,
 214,
 302,
 111,
 207,
 341,
 272,
 277,
 101,
 262,
 247,
 275,
 295,
 302]

In [13]:
jaro_list

[190,
 189,
 271,
 172,
 135,
 215,
 209,
 221,
 294,
 207,
 88,
 257,
 266,
 275,
 190,
 245,
 187,
 224,
 278,
 287,
 274,
 188,
 257,
 198,
 147,
 193,
 135,
 333,
 226,
 127]

In [14]:
cos_list

[350,
 388,
 225,
 289,
 329,
 225,
 244,
 309,
 299,
 258,
 254,
 342,
 329,
 340,
 336,
 262,
 207,
 279,
 253,
 319,
 335,
 146,
 125,
 192,
 301,
 372,
 218,
 306,
 323,
 317]

In [15]:
normal_list

[148,
 241,
 196,
 0,
 258,
 478,
 209,
 290,
 59,
 183,
 206,
 204,
 168,
 0,
 175,
 0,
 152,
 114,
 167,
 279,
 241,
 32,
 111,
 11,
 53,
 185,
 378,
 119,
 327,
 141]

In [17]:
from scipy.stats import mannwhitneyu
sample_bugcount_list = [levenshtein_list, jaro_list, cos_list, normal_list]
names = ["levenshtein_list", "jaro_list", "cos_list", "normal_list"]

i=0
for elem in sample_bugcount_list:
    j=0
    for elem2 in sample_bugcount_list:
        U1, p = mannwhitneyu(elem, elem2, method="exact")
        U2 = len(elem)*len(elem2) - U1
        #print(names[i],names[j],statistics.mean(elem)>statistics.mean(elem2),statistics.mean(elem),statistics.mean(elem2),p)
        if p < 0.05:
            print(names[i],names[j],statistics.mean(elem)>statistics.mean(elem2),statistics.mean(elem),statistics.mean(elem2),p)
        j+=1
    i+=1

levenshtein_list cos_list False 238.06666666666666 282.4 0.004105053737194172
levenshtein_list normal_list True 238.06666666666666 170.83333333333334 0.001443220539875183
jaro_list cos_list False 215.93333333333334 282.4 7.215017756201447e-05
jaro_list normal_list True 215.93333333333334 170.83333333333334 0.02846603224190847
cos_list levenshtein_list True 282.4 238.06666666666666 0.004105053737194172
cos_list jaro_list True 282.4 215.93333333333334 7.215017756201447e-05
cos_list normal_list True 282.4 170.83333333333334 6.7930920708445525e-06
normal_list levenshtein_list False 170.83333333333334 238.06666666666666 0.001443220539875183
normal_list jaro_list False 170.83333333333334 215.93333333333334 0.02846603224190847
normal_list cos_list False 170.83333333333334 282.4 6.7930920708445525e-06
