# SWEN90006 Tutorial 8

## Introduction

The aim of this tutorial is for you to familiarise yourself with generation-based and mutation-based fuzzing. In today's session, we will first reflect on mutation-based fuzzing a bit more using last week's example. And then, we will use generation-based/model-based black-box fuzzing tools like Peach fuzzer to generate inputs to trigger the faults in two versions of the read_and_process program. 

## Mutation-based fuzzing (cont'd)

Please recap the BMP header excecise, review the concept of Mutation-based fuzzing, and answer Question 3 and Question 4 from last week again. 

### Question 3
Suppose you have a valid 54-byte header and you mutate an arbitrary
(uniformly randomly chosen) byte in the header to a new value (different
from its original value). What is the probability of producing a valid
header?

### Question 4
Imagine you had to write a fuzzer to fuzz some BMP processing code that
can process BMP files of the format described above. If you had to choose
between generating completely random inputs vs. performing random mutation 
on existing (valid) BMP files, which strategy would you choose?



## Generation-based fuzzing


### Building a Docker Container

We provided a docker container that has all tools introduced in the lecture. This docker will be used to help you understand fuzzing better, reproduce the demos during the lecture, and carry out fuzzing experiments in tutorials for the next few weeks. 

Following the instructions at https://github.com/SWEN90006-2021/security-testing to setup a Docker image and Docker container.


### Week 8 in-class exercise

This part of the instructions are the same as `Week 8 in-class exercise: generation-based fuzzing` on Canvas.

We will look at two exercises in which you are asked to apply these fuzzing techniques to discover the faults in two versions of a program named `read_and_process.c` (stored in `read_and_process.zip`). This program mimics some functionalities of media processing libraries like LibPNG. The program takes a file as input and the file is expected to adhere to a specific format. A valid file starts with a 4-byte "signature". After that, the file contains a list of chunks and each chunk has three parts: i) a chunk type stored in 4 bytes, ii) a 4-byte data length, and iii) the chunk data.

The below images taken from the lecture slides illustrates the file format:

![File Format](figures/Input_structure_tut8.png)

### Question 1: Manual analysis

What is the fault in [read_and_process_v1.c](https://github.com/SWEN90006-2021/security-testing/blob/main/read_and_process_v1.c)? What are the conditions to trigger the fault?

### Question 2: Write an input model for the given file format 

Create a new file with name `input_model.xml`

Hint: the input model is also on the lecture slides

### Question 3: Generation-based fuzzing
Use the input model and use generation-based fuzzing (`generation_fuzzer.sh`) to automatically generate an input to trigger that fault. You would need to update the fuzzer scripts to capture SIGABORT (return code = 134) instead of SIGFAULT (return code = 139).

<div style="background-color: rgb(50, 50, 50);">

```shell
// First, compile the buggy program read_and_process_v1.c
cd $WORKDIR
gcc -o read_and_process_v1 read_and_process_v1.c

// Next, run generation-based fuzzer to fuzz the read_and_process_v1 program
generation_fuzzer.sh ./read_and_process_v1 input_model.xml 20 results-no-seeds
```
</div>

### Question 4: Work on another program using all steps from questions 1 - 3
What is the fault in [read_and_process_v2.c](https://github.com/SWEN90006-2021/security-testing/blob/main/read_and_process_v2.c)? 
What are the conditions to trigger that fault? 
Is the input model written in Question-2 helpful to discover that fault? If it does not work, discuss the ideas to fuzz test this program.
