# <strong>Buffer Overflow</strong>

A buffer overflow is a vulnerability that occurs in programs that handle memory in an unsafe manner. Buffer overflow attacks are commonly associated with C, C++, and assembly languages, which allow direct memory management and pointer arithmetic. Programming languages that permit manual memory allocation are susceptible to this type of attack. Buffer overflows can also happen unintentionally due to programming errors, without any malicious intent. Before learning about this vulnerability, it is important to review the architecture of a computer to understand how memory is managed and how buffer overflows can affect program execution.

First, examine this diagram, which is called the Process Address Space:

<figure><center><img src="resources/buffer/buffer_overflow_diagram.png" style="width: 20%; height: 10%;"></img></center></figure>

These are the parts of the Process Address Space:
- The <strong>kernel code</strong> is the communication between the process and the hardware. Like the code block at the bottom of the address space, it does not change when it's running.
- The <strong>stack</strong> is memory that is <u>automatically</u> allocated while code is executing. The stack contains temporary variables that the program uses, and is often freed when the value is returned.
  - The stack is the "scratch pad" for the program.
  - Stacks and heaps share the same unallocated memory. There is no reserved amount of free storage that are specific to just the stack or the heap.
- The <strong>heap</strong> is memory that is <u>dynamically</u> allocated while code is executing. This memory is allocated when a pointer is created. When pointers are freed from memory, the heap shrinks.
  - To remember this: Think of a heap of dirt. When dirt (memory) is added, it grows up in size. Similar to how memory is allocated in the process address space.
- Your program's <strong>code</strong> lies at the bottom of this space. This part of the space doesn't get modified when the process is running.
  - Constant variables never change, so they lay within the code portion of the address space.
 
It can be confusing to understand why the stack and the heap are separate in the address space. The stack changes much quicker since different functions will clear the stack whenever it's complete, but the heap will need to keep the data for being the function's call. If you would like to read more about why these two types of memory are separate in the address space, check out <a href="https://stackoverflow.com/questions/8173353/why-is-memory-split-up-into-stack-and-heap">this Stack Overflow thread</a>.

<u>When variables on the stack or the heap grow too much without memory constraints, they will override important addresses that the program will depend on.</u> This is a buffer overflow. A buffer is the amount of space that a variable is allocated, but an overflow is when the buffer exceeds the amount of space that it was allocated.

<strong>This lab will contain four topics, and you will learn the following:</strong>

1. Introduction/Review of C
2. Breaking Unsafe C Functions
3. Fixing Unsafe C Functions
4. A Large-Scale Buffer Overflow Attack

In [1]:
# Setting up the lab.
import ipywidgets as widgets
from IPython.display import display, HTML, Javascript
from IPython.core.magic import register_line_magic
import os
# For accessing the nodes:
import subprocess
# For the stopexp command:
import re

# When true, it will not auto-save at each step.
runAllSteps = False

###### Used for saving notebooks. ######
import threading
# Threading required in case steps are progressed too quickly.
save_lock = threading.Lock()

# The save function itself.
def save_notebook():
    with save_lock:
        result = subprocess.run('su - umdsectc -c "/home/umdsectc/notebooks/resources/save.py buffer"', shell=True, capture_output=True, text=True)

# Creating a thread to save the notebook.
def trigger_save():
    save_thread = threading.Thread(target=save_notebook)
    save_thread.start()

###### Used for loading notebooks. ######
import queue

load_lock = threading.Lock()
result_queue = queue.Queue()

def load_notebook():
    with load_lock:
        result = subprocess.run('su - umdsectc -c "/home/umdsectc/notebooks/resources/load.py buffer"', shell=True, capture_output=True, text=True)
        result_queue.put(result)  # Put the result in the queue.

# Creating a thread to load the notebook.
def trigger_load():
    load_thread = threading.Thread(target=load_notebook)
    load_thread.start()
    load_thread.join()  # Wait for the thread to complete before adding result to the queue.
    return result_queue.get()  # Get the result from the queue.

### Step 0: Begin the experiment.

Click the button to begin creating the experiment.

In [2]:
# Click the button below to start the experiment.
def startlab(button):
    # Defining the lab name.
    labname = "bufferumd"

    # Writing the information to an empty field below the button.
    with output0:
        output0.clear_output()
        
        # First, checking if the materialization exists. May have been stopped by a previous lab.
        materialPattern = "real." + labname + ".umdsec[a-z]{1,2}"

        # Listing the materializations to find if there's an existing one for this lab.
        checkMaterial = os.popen('su - umdsectc -c "mrg list materializations"').read()
        regex = re.compile(materialPattern)
        # Getting the matches:
        match = regex.search(checkMaterial)

        if match:
            display(HTML("<span style='color: orange;'>An existing materialization for this lab already exists. </span><span>You might have ran another \
            lab without stopping this one. Attaching the existing materialization.</span>"))
            subprocess.run(f'su - umdsectc -c "mrg xdc attach xdc '+ match.group(0) + '"', capture_output=True, text=True, shell=True)
            display(HTML("<newline><span style='color: green;'><strong>Setup complete. You may begin the lab! </strong></span>" \
                         "<span>When you're finished, close your lab at the bottom of the notebook.</span>"))
        
        else:
            display(HTML("<span>No existing materializations are found.</span>"))
        
            # Second, start the lab.
            display(HTML("<span>Starting " + labname + " lab. This will take a few minutes to process. Please wait.</span> \
            <span><img width='12px' height='12px' style='margin-left: 3px;' src='resources/loading.gif'></span>"))
            startexp = subprocess.run('su - umdsectc -c "bash /share/startexp ' + labname + '"', capture_output=True, text=True, shell=True)
            output0.clear_output()
            display(HTML("<span>Done. Result:</span>"))
            print(startexp.stdout)

            # Another lab is already attached to the XDC.
            if (("XDC already attached") in startexp.stdout):
                existingLab = re.search(r"real.(.*).umdsec[a-z]{1,2}", startexp.stdout).group(1)

                # Shouldn't happen.
                if (labname == existingLab):
                    display(HTML("<span style='color: red;'>Your lab was already started. </span><span>Please continue to the next step.</span>"))

                # Detaching the existing lab, then attaching the current one.
                else:
                    display(HTML("<span style='color: orange;'>Warning: You did not stop your previous experiment. </span><span>Please stop your experiments \
                    before starting a new one. Detaching the " + existingLab + " experiment.</span>"))
                    os.popen('su - umdsectc -c "mrg xdc detach xdc.umdsectc"')
                    display(HTML("<span>Attaching the current lab.</span>"))
                    os.popen('su - umdsectc -c "mrg xdc attach xdc ' + materialPattern + '"')
    
            # Third, get the lab materials onto the node.
            display(HTML("<span>Allocating lab resources onto the node. <u>Please wait a little longer...</u></span>"))
            runlab = subprocess.run('su - umdsectc -c "bash /home/runlab ' + labname + '"', capture_output=True, text=True, shell=True)
            display(HTML("<newline><span style='color: green;'><strong>Setup complete. You may begin the lab! </strong></span>" \
                         "<span>When you're finished, close your lab at the bottom of the notebook. Your lab will be active for one week.</span>"))

# Creating the button.
startButton = widgets.Button(description="Start Lab")

# Creating an output area.
output0 = widgets.Output()

# Run the command on click.
startButton.on_click(startlab)

# Display the output.
display(startButton, output0)

Button(description='Start Lab', style=ButtonStyle())

Output()

If you previously stopped your lab, you may restore your progress below by clicking "Load Lab". <u>You do not have to load your lab if you signed out, closed your notebook, or exited your node(s) or XDC by using ```exit```.</u>

In [3]:
# Click the button below to load your lab.
def loadlab(b):
    with output0_2:
        output0_2.clear_output()
        display(HTML("<span>Searching for an existing lab in your notebook...</span>"))

    if (os.path.exists("/home/umdsectc/notebooks/saves/umdsectc_buffer.tar.gz")):
        with output0_2:
            output0_2.clear_output()
            display(HTML("<span>Loading your lab...</span> \
                <span><img width='12px' height='12px' style='margin-left: 3px;' src='resources/loading.gif'></span>"))
            result = trigger_load()
            if (result.returncode == 0):
                output0_2.clear_output()
                display(HTML("<span style='color: green;'>Your lab has been successfully loaded. Please click on the <img width='20px' height='20px' style='margin-left: 1px;' src='resources/fast_forward.png'> icon at the top of your notebook to reflect your changes.</span>"))
            elif (result.returncode == 2):
                output0_2.clear_output()
                display(HTML("<span style='color: red;'>The buffer node is inaccessible. Please start your lab. If you have already started it, wait a minute and try again.</span>"))
            else:
                output0_2.clear_output()
                display(HTML("<span style='color: red;'>An error occurred while loading your lab.</span>"))

# Creating the button.
loadButton = widgets.Button(description="Load Lab")

# Creating an output area.
output0_2 = widgets.Output()

# Run the command on click.
loadButton.on_click(loadlab)

# Display the output.
display(loadButton, output0_2)

Button(description='Load Lab', style=ButtonStyle())

Output()

## <strong>Topic 1: Introduction/Review of C</strong>

C is a programming language created over 50 years ago, which was designed to be very fast and efficient. Unfortunately, C can also be a learning curve for students when learning how to program, which can make programs unsafe if students are unfamiliar with unsafe functions. Despite C being an old programming language, it's still useful for modern applications, since there are many benefits that C can provide developers. C is sometimes preferred over C++ because it's quicker, more foundational, and is more portable, meaning that it doesn't contain as many features as C++ when it comes to object-oriented design.

C (and C++) is known to be a "middle-level" language, rather than a "high-level" language. High-level languages are designed to be more human readable, easier to learn, and include lots of built-in libraries to keep code clean, and to reuse what's pre-implemented into the language.

Since C is a middle-level language, this means that it sits between a high-level language and a low-level language (assembly/binary). This means that C provides a mix of being both human readable, and it allows more control of your hardware, which is why pointers are used in C and C++. High-level languages perform these tasks automatically for you, and do not let you manage low-leveled details.

C sacrifices security to provide more control over hardware and can run much faster than high-leveled languages because it doesn't focus on preliminary checks. C does not check for memory leaks, garbage collection, array index restrictions, uninitialized variables, and a few more. This is to make the language run as fast as possible.

## <strong>Topic 1: Introduction/Review of C</strong>

C is a programming language created over 50 years ago, which was designed to be very fast and efficient. Unfortunately, C can also be a learning curve for students when learning how to program, which can make programs unsafe if students are unfamiliar with unsafe functions. Despite C being an old programming language, it's still useful for modern applications, since there are many benefits that C can provide developers. C is sometimes preferred over C++ because it's quicker, more foundational, and is more portable, meaning that it doesn't contain as many features as C++ when it comes to object-oriented design.

C (and C++) is known to be a "middle-level" language, rather than a "high-level" language. High-level languages are designed to be more human readable, easier to learn, and include lots of built-in libraries to keep code clean, and to reuse what's pre-implemented into the language.

Since C is a middle-level language, this means that it sits between a high-level language and a low-level language (assembly/binary). This means that C provides a mix of being both human readable, and it allows more control of your hardware, which is why pointers are used in C and C++. High-level languages perform these tasks automatically for you, and do not let you manage low-leveled details.

C sacrifices security to provide more control over hardware and can run much faster than high-leveled languages because it doesn't focus on preliminary checks. C does not check for memory leaks, garbage collection, array index restrictions, uninitialized variables, and a few more. This is to make the language run as fast as possible.

### Step 1: Create a Basic C Program (Part 1)

C programs can easily be written in a Unix environment, but C is <strong>not</strong> an interpreted language, meaning that it does not need to be converted into machine code first in order to be executed. Languages like Python and JavaScript are interpreted, meaning that you do not need to explicitly compile them before running them. However, C is a compiler language, which means that you will need to build them before they can be ran. If you would like to learn more about the differences between interpreted and compiled languages, you may read <a href="https://www.freecodecamp.org/news/compiled-versus-interpreted-languages/">this article</a>.

The most common compiler that developers use for C is ```gcc```, which stands for GNU Compiler Collection. This compiler is already installed on the ```buffer``` node for you, which you will use for this lab.

When working with input and output of a C program, C uses the ```stdio.h``` library, which stands for "STanDard Input Output" (stdio). A basic C file looks like this:

```
#include <stdio.h>

int main() {
    // Content goes here.

    return 0;
}
```

Returning 0 indicates that a program is successful. When a program is unsuccessful, it will return 1.

Using the template above, navigate into your home directory on your ```buffer``` node. You may access this node by typing ```ssh buffer```. A directory was created and named ```topic_1/```.  Create a file inside of ```topic_1/``` called ```step_1.c```, then write a program that returns a string of your username: ```umdsectc```.

Use the ```printf()``` statement. ```printf``` stands for "print formatted", which you will read more about in Step 4. 

You will be taught how to compile this program in the next step. If you are unsure if your program works, try skipping this step and proceed to Step 2 before running the check for Step 1. Your program will be automatically compiled by the notebook to check your work, but you will learn how to do this step next.

In [7]:
# Click the button below to check your work.
step1Complete = False

# Function to check the permissions.
def step_1():
    # Required to change boolean value.
    global step1Complete

    with output1:
        output1.clear_output()
        display(HTML("<span><img width='14px' height='14px' style='margin-left: 3px;' src='resources/loading.gif'></span>"))
    
    # This subprocess statement is a little different. Need to initiate environment variables at the same time when running the command.
    result = subprocess.run('ssh -i /home/umdsectc/.ssh/merge_key umdsectc@buffer /home/.checker/section_1.py 1', shell=True, capture_output=True, text=True)
    
    if (result.returncode == 1):
        output1.clear_output()
        with output1:
            display(HTML("<span style='color: green;'>Success! You may continue onto the next step.</span>"))
            step1Complete = True

    elif (result.returncode == 0):
        output1.clear_output()
        with output1:
            display(HTML("<span style='color: red;'>The victim's account was accessed, but did not print their URL on your page. Try again.</span>"))
            step1Complete = False
    
    elif (result.returncode == 2):
        output1.clear_output()
        with output1:
            display(HTML("<span style='color: red;'>An error occurred when checking this step. Please contact your professor or TA.</span>"))
            step1Complete = False

def check_step_1(b):
    step_1()

    """
    # Auto-save.
    if (not runAllSteps):
        trigger_save()
    """

# Creating the button.
button = widgets.Button(description="Check File")

# Creating an output area.
output1 = widgets.Output()

# Run the command on click.
button.on_click(check_step_1)

# Display the output.
display(button, output1)

Button(description='Check File', style=ButtonStyle())

Output()

### Step 2: Creating a Basic C Program (Part 2)

With your C program ready to become compiled, you will use ```gcc``` to compile your program and view the output of it.

This is the syntax for compiling a C program by using ```gcc```: ```gcc -o output_name file_name.c```

This is the breakdown:
- ```gcc``` indicates that you're executing a ```gcc``` command.
- ```-o``` indicates "output". This will indicate that the next argument will be the executable that your C file will be named as.
- ```output_name``` is the name of your executable that your C file will produce.
  - This is also called a "C binary file". This converts your C program into machine-readable language. Attempting to read this file will not work, since this is how the computer interprets your C file.
  - Note that the output name does not have an file extension.
- ```file_name.c``` is the name of the C file that you wish to compile.

Using this command, compile your C program. Name your output file as ```step_2```. Recall that your C file is named ```step_1.c```. <u>Keep doing your work inside of ```topic_1/```.

<span style="color: green"><strong><img src="resources/idea.png" style="width: 12px"> Tip:</strong></span> Running ```gcc``` will detect any compiler errors. These are errors that can be detected on compile time, meaning anything that the computer knows is missing, such as a missed semicolon, calling an undefined variable/function name, using the wrong number of parameters for a statement, and more.

If any compiler errors are detected, this step will not pass.

### Step 3: Creating a Basic C Program (Part 3)

Lastly, you may now execute your program by running ```./step_2```. When executing a file within the same location, ```./``` needs to be in front of the file that you wish to execute. If you attempt to run ```step_2```, the Unix environment assumes that you are running a command, not executing a step. However, if you are executing a file in another directory, you may type the entire pathname, such as ```/home/umdsectc/topic_1/step_2```. The ```.``` is not required to be in front of your file name.

Try running your program. This step will not pass if it produces an error. If your program prints your username and returns 0, this step will be passed.

### Step 4: Variables in C

Variables are required to have their types defined in C. These are the different types of variables that C can detect.

- Integers are defined with ```int```. These are whole numbers with no decimal value.
- Floats and doubles are defined with ```float``` and ```double``` respectively in C. Floats are precise up to 7 bits, and doubles are precise up to 15 bits without losing precision.
- Characters (or chars) are defined with ```char```.

<strong>For this step</strong>, you are going to print the sum of two numbers. Create a C file inside of ```~/topic_1/``` named ```step_4.c``` and produce an output file named ```step_4``` using ```gcc```. Then, create two variables that hold integer values. You may pick whichever two numbers that you'd like, as long as they are positive numbers and between 0 through 1,000. 

When using ```printf()``` for this, you must cast your answer as a string. Casting variables converts them from one type to another. Here is an example of printing a number in C:

```
printf("My variable's value is: %i", variable_name);

>> My variable's value is: 900
```

A breakdown:
- ```printf()``` indicates that you are going to print a formatted string.
- ```"%i"``` indicates that you are going to print an integer data type. Hence, the "i".
  - ```"%s"``` prints a string, ```"%c"``` prints a char, ```"%d"``` prints a digit/decimal (an int datatype), ```"%f"``` prints a float/double. You may also choose to print a specific amount of decimal places when using a float/double, such as ```"%.4f"```, which will print up to four decimal places.
- ```variable_name``` is the variable that you wish to print. Make sure that the datatype of your variable aligns with your indicator within ```printf```. Otherwise, you will get a compiler error. Use the sub-bullet above for assistance with how to print specific datatypes.

You only have to print the value of the sum of two random numbers between 1 through 1,000. You do not need any additional output besides the sum.

### Step 5: Strings in C

Strings in C work differently than other languages. This will be the first datatype that you will use in this lab that can have unsafe behavior.

All strings in C end with a character called the null terminator, which is ```\0```. Additionally, strings in C are arrays of characters whose last element is the null terminator.

This is a valid string in C:

```
char my_string[] = {'H', 'e', 'l', 'l', 'o', '\0'};
printf("%s", my_string);

>> Hello
```

Clearly, this can be very tedious. Additionally, when a string is not closed with a null character, this can occur:

```
char my_string[] = {'H', 'e', 'l', 'l', 'o'};
printf("%c", my_string[10]);

>> �
```

The program will be able to read past the end of the string and print something that is either unreadable or is a random character. When printing strings like this without null terminators, accidental data leaks can occur. Printing an incorrectly terminated string will produce unexpected output, such as this:

```
char good_string[] = {'h', 'e', 'l', 'l', 'o', '\0'};
char bad_string[] = {'w', 'o', 'r', 'l', 'd'};
printf("%s\n", bad_string);

>> worldhello
```

```good_string``` and ```bad_string``` are stored next to each other in the stack. Printing ```bad_string``` will print all data up until the first null terminator that it will find. Since ```bad_string``` wasn't terminated, it will keep reading from the stack until it encounters its first null character.

Fortunately, C has a shorthand (and safe) way to store strings as variables. Initiating a variable as a ```char``` array will accept a string upon assignment, which will automatically add a null terminator. Here's an example:

```
char string_var[] = "Hello, world!";
printf("%s", string_var);
```

<strong>For this task</strong>, create a C file named ```step_5.c``` inside of ```~/topic_1/```, with an output file named ```step_5```. Then, store your username as a string into a variable named ```username```. 

Print: ```Hello, umdsecXX!``` as your final output.

<span style="color: green"><strong><img src="resources/idea.png" style="width: 12px"> Tip:</strong></span> Use the example from the previous step to combine a output with a variable.

### Step 6: Pointers in C (Part 1)

One of the features with C being a middle-leveled language is the use of pointers. Pointers allow you to allocate memory to the heap of the Process Address Space. High-leveled languages do not use pointers, since the compiler will handle heap memory automatically. 

Whenever a variable is passed to a function's parameter in C, it's only passed a <strong>copy</strong> of the variable. This means that the value of the variable is passed to the function, but the address of the variable <strong>does not</strong> get passed.

Observe this function carefully in C:

```
#include <stdio.h>

void sum(int a, int b, int c) {
    c = a + b;
}

int main() {
    int a, b, c;
    a = 1;
    b = 2;
    c = 0;

    sum(a, b, c);

    printf("%d", c);
    return 0;
}
```

In the field below, what is the output of this function?

### Step 7: Pointers in C (Part 2)

In order to update this function so that it properly works, pointers must be introduced. A pointer is an address in memory that holds the value of a variable. Here is an example of a pointer. 

```
int* ptr;
int a = 10;
ptr = &a;
printf("Pointer: %p\n", &ptr);
printf("Value: %d", *ptr);

>> Pointer: 0x7ffdb02fe240
>> Value: 10
```

Here's a breakdown of how this works.
- Create an integer pointer named ```ptr```. This will hold the address of a variable.
- Create an integer named ```a```, which has the value 10.
- The pointer holds the <strong>address</strong> of the value. Therefore, retrieve the address of ```a``` by calling ```&a```. The ```&``` is the <u>address operator</u> in C.
- Print the <strong>address</strong> of the pointer itself by printing ```&ptr```.
  - To get the address of the variable that it holds, you would need to call ```(void*)ptr```. This provides the address that ```a``` was stored on.
- Print the <strong>value</strong> that ```ptr``` holds by printing ```*ptr```.

Your task is to recreate the function in Step 6, but use pointers so that ```a```, ```b```, and ```c``` are properly updated in ```main()```. Originally, copies of the variables were passed to ```sum()```. Now, in order for the variable ```c``` to update, you will need to use a pointer to change its value in memory. 

A template of this function is provided to you below.

```
#include <stdio.h>

void sum(int* a, int* b, int* c) {
    // Your answer here.
}

int main() {
    int a, b, c;

    a = 1;
    b = 2;
    c = 0;

    // Your answer here.

    printf("%d", c);
    return 0;
}
```

Upon completing the previous step, a file named ```~/topic_1/step_7.c``` was created. Click the button below to check your work.

<span style="color: green"><strong><img src="resources/idea.png" style="width: 12px"> Tips:</strong></span> 
- It's important to think about how functions and pointers work in C. ```sum()``` is a function that accepts pointers for parameters. This means that when you call ```a```, ```b```, and ```c```, they are pointers. When accessing the value of a pointer, you must have an asterisk in the variable name when using it.
- Since ```sum()``` accepts pointers, you must pass the addresses of the variables to the parameters.

<span style="color: orange"><strong><img src="resources/alert.png" style="width: 12px"> Notice:</strong></span> Starting with this step, the notebook will automatically generate the ```step_X.c``` files for you for the rest of the lab. You MUST complete the previous step before the file is generated for you. However, you are still required to compile the C file with ```gcc``` at each step. This is required anytime you change a file that's written in C.

### Step 8: Pointers in C (Part 3)

Arrays in C must be declared with a fixed length, unlike other languages which allow you to dynamically add elements to arrays. An array of char is typically used to represent a string in C, and arrays of other types (like int, float, etc.) follow similar syntax. Here's an array of integers in C:

```
int[] array_of_ints = {1, 2, 3, 4, 5};
```

If you don't want to create an array with initial values, you must declare the array with a fixed size:

```
int[5] array_of_ints.
```

When arrays in C are passed to a function, they are automatically passed by reference. This means that the function receives a pointer to the first element of the array, not a copy of the entire array. Therefore, an array does not need to be passed as a pointer explicitly to a function if you need to change any array elements within that function.

```
#include <stdio.h>

void modify_array(int arr[], int size) {
    for (int i = 0; i < size; ++i) {
        arr[i] += 1;
    }
}

int main() {
    int array_of_ints[5] = {1, 2, 3, 4, 5};
    modify_array(array_of_ints, 5);
    for (int i = 0; i < 5; ++i) {
        printf("%d ", array_of_ints[i]); // Prints 2 3 4 5 6
    }
    return 0;
}
```

Arrays are being mentioned in this topic about pointers because the size of arrays may not be known during the time of execution. This brings up the keyword ```malloc```. The word ```malloc``` is short for "memory allocation", and is used to reserve a specific amount of storage in the heap. 

This is the signature for ```malloc```: ```void *malloc(size_t size);```
- ```*malloc``` returns a pointer of whatever type that you initiate with ```*```. If it's unsuccessful, it returns an empty pointer. In this example, it's void, so it returns no type.
- ```size_t``` is an unsigned integer type. "Unsigned" means non-negative.
- ```size``` is the amount of memory that you wish to allocate in bytes in the heap.

<u>C requires that you ```free``` the variable once you finished using it.</u> This is only when you use ```malloc```. The pointer example in the previous step did not require ```free``` because it was not dynamically allocated.

Here is an example of using ```malloc``` to create an array of integers.

```
#include <stdio.h>
#include <stdlib.h> // Required for using malloc().

int main() {
    int length = 10;
    int array_of_ints = (int*)malloc(length * sizeof(int));

    // Do stuff with array_of_ints.

    free(array_of_ints);
    return 0;
}
```

For this task, you are going to be given this template:

```
#include <stdio.h>
#include <stdlib.h>

int main() {
    // Create the variables.
    int num_elements;
    const float PI = 3.14159;
    
    // Ask for user input.
    printf("Enter the number of elements: ");
    scanf("%d", &num_elements);

    // TASK 1: Use malloc to create array_of_floats.
    // It should be of type "float"!

    // Populating the array.
    for (int i = 0; i < num_elements; ++i) {
        array_of_floats[i] = i * PI;
    }

    // Printing the elements.
    printf("Array elements: ");
    for (int i = 0; i < num_elements; ++i) {
        printf("%.5f ", array_of_floats[i]);
    }
    printf("\n");

    // TASK 2: Free the array.

    return 0;
}
```

Here's a breakdown of the code:
- Create an integer variable that holds the number of elements in a list, as well as a constant named "pi".
- The program will ask for a number from you. The ```scanf()``` function takes input from the user, then assigns it to the address of ```num_elements```.
- <strong>Task 1:</strong> Create an array named ```array_of_floats``` by using malloc. Use the number of elements as the length of the array that you're allocating. As the name implies, it should be a float datatype.
- Multiply each index of the array by pi.
- Print array in a "for" loop, with up to 5 decimal precision.
- <strong>Task 2:</strong> Free ```array_of_floats``` from memory.

```step_8.c``` was automatically created for you upon completing Step 7. You are still required to use ```gcc``` in order to compile ```step_8.c```.

## <strong>Topic 2: Breaking Unsafe C Functions</strong>

Many functions in C are "unbounded", meaning that they do not check bounds when they are executed. This is what leads to buffer overflows. These unbounded functions are located in the ```string.h``` header file, which you may read more about <a href="https://www.geeksforgeeks.org/c-library-string-h/">here</a>.

This is a list of unsafe C functions that could lead to a buffer overflow: ```strcpy```, ```strcmp```, ```strcat```, ```strchr```, ```strspn```, ```strcspn```, ```strpbrk```, ```sttchr```, ```strstr```, ```strtok```, and ```strlen```.

In this topic, you will be breaking some of these string functions, then learn how to fix them. <u>The functions that you will be breaking are purposefully written inside of a user-defined function.</u> User-defined functions create a new "stack frame" within the Process Address Space. All variables and return addresses for this function are located in the stack.

A <strong>stack canary</strong> is a value that gets placed in the stack, which lies after the variables, but before the return address of the function. If a buffer overflow overwrites the stack canary, the program will detect this alteration and typically terminate the process with an error, such as ```stack smashing detected```. Stack canaries are not located within ```main()```, which makes it more difficult to detect a buffer overflow, unless a segmentation fault occurs or a variable becomes overwritten.

### Step 9: Breaking ```strcpy```

The ```strcpy``` function stands for "string copy". This is the signature of the ```strcpy``` command:

```char* strcpy(char* destination, const char* source);```

Parameters:
- ```destination``` is a pointer to the destination array where the content is to be copied.
- ```source``` is the C string to be copied.

<a href="https://cplusplus.com/reference/cstring/strcpy/">Link to official documentation.</a>

Upon completing Step 8, a file named ```step_9.c``` was automatically created and compiled for you. You can find this file inside of ```~/topic_2```. This is the source code:

```
#include <stdio.h>
#include <string.h>

void copy_string() {
    char *str1 = "Hello!";
    char str2[10];

    strcpy(str2, str1);
    
    printf("%s", str2);
}

int main() {
    copy_string();
    return 0;
}
```

Run this file and observe its functionality. Your task is to produce a buffer overflow with this function. Make changes to this function, then click "Check Work" to see if your program crashes. If your program crashes, then you will pass this step.

### Step 10: Breaking ```strcmp```

The ```strcmp``` function stands for "string compare". This is the signature of the ```strcmp``` command:

```int strcmp (const char* str1, const char* str2);```

Parameters:
- ```str1``` and ```str2``` are strings to be compared with each other.

<a href="https://cplusplus.com/reference/cstring/strcmp/">Link to official documentation.</a>

Upon completing Step 9, a file named ```step_10.c``` was automatically created and compiled for you. You can find this file inside of ```~/topic_2```. This is the source code:

```
#include <stdio.h>
#include <string.h>

void compare_string() {
    char str1[] = "Hello";
    char str2[] = "Hello";

    int result = strcmp(str1, str2);

    if (result == 0) {
        printf("The strings are the same!\n");
    } 
    
    else {
        printf("The strings are NOT the same!\n");
    }
}

int main() {
    compare_string();
    return 0;
}
```

Run this file and observe its functionality. Your task is to make ```str1``` and ```str2``` be the same string, but print ```The strings are NOT the same!```. Your two variables are required to be named ```str1``` and ```str2```. Do not rename these, or your step will not pass.

<span style="color: green"><strong><img src="resources/idea.png" style="width: 12px"> Tip:</strong></span> Review Step 5 if you are stuck.

### Step 11: Breaking ```strcat```

The ```strcat``` function stands for "string concatenation". This is the signature of the ```strcat``` command:

```char* strcat(char* destination, const char* source);```

Parameters:
- ```destination``` is a pointer to the destination array, which should contain a C string, and be large enough to contain the concatenated resulting string.
- ```source``` is the C string to be appended. This should not overlap destination.

<a href="https://cplusplus.com/reference/cstring/strcat/">Link to official documentation.</a>

Upon completing Step 10, a file named ```step_11.c``` was automatically created and compiled for you. You can find this file inside of ```~/topic_2```. This is the source code:

```
#include <stdio.h>
#include <string.h>

void concat_string() {
    char str1[20] = "Hello,";
    char *str2 = " World!";

    strcat(str1, str2);

    printf("%s", str1);
}

int main() {
    concat_string();
    return 0;
}
```

Run this file and observe its functionality. Your task is to produce a buffer overflow with this function. Make the changes to the function, then click "Check Work" to see if your program crashes. If your program crashes, then you will pass this step.

### Step 12: Breaking ```sprintf```

The ```sprintf``` function is a function that wasn't mentioned at the beginning of Topic 2, but can still be easily broken. You have already seen the ```printf``` statement many times already throughout this lab. The ```sprintf``` function means "<u>s</u>tring <u>print</u> <u>f</u>ormatted". <u>This function is not in ```string.h```, unlike the previous three functions.</u>

<strong>To put simply</strong>, this function will assign a string to a variable name that you already initiated.

This is the signature:

```int sprintf (char* str, const char* format, ...);```

Parameters: 
- ```str``` is a pointer to a buffer where the resulting C-string is stored. The buffer should be large enough to contain the resulting string.
- ```format``` is the C string that contains a format string that follows the same specifications as format in ```printf```.
- ```...``` are additional parameters that can be used, but would be unnecessary for this step.

<a href="https://cplusplus.com/reference/cstdio/sprintf/?kw=sprintf">Link to official documentation.</a>

Upon completing Step 11, a file named ```step_12.c``` was automatically created and compiled for you. You can find this file inside of ```~/topic_2```. This is the source code:

```
#include <stdio.h>

void sprintf_example() {
    char buffer[15];
    sprintf(buffer, "Hello, world!");
    printf("%s\n", buffer);
}

int main() {
    sprintf_example();
    return 0;
}

```

Run this file and observe its functionality. Your task is to continue calling ```sprintf```, but create a buffer overflow.

## <strong>Topic 3: Fixing Unsafe C Functions</strong>

In the previous topic, you have experimented with four of the many unsafe C functions. There should be a common pattern that occurs with all of these functions. The functions broke due because of a null terminator not existing, or the bounds of a buffer (your string) were not properly constrainted, which created a "stack buffer overflow".

Now, you are going to take the four functions that you broke, and you will be applying fixes to them.

### Step 13: Fixing ```strcpy```

Your file called ```step_9.c``` had its ```strcpy``` function broken. To keep the functionality of this file, but to make it safer, you will use ```strncpy```. Here is the signature for the function:

```char* strncpy (char* destination, const char* source, size_t num);```

Parameters:
- ```destination``` is a pointer to the destination array where the content is to be copied.
- ```source``` is the C string to be copied.
- ```num``` is the maximum number of characters to be copied from source. Recall that ```size_t``` is an unsigned integer, meaning that it can't be negative. 

<a href="https://cplusplus.com/reference/cstring/strncpy/">Link to official documentation.</a>

When you completed Step 12, a copy of ```step_9.c``` was created for you. Apply the changes to ```step_9_fix.c```. Click "Check Work" to test your fix.

### Step 14: Fixing ```strcmp```

Your file called ```step_10.c``` had its ```strcmp``` function broken. To keep the functionality of this file, but to make it safer, you will use ```strncmp```. Here is the signature for the function:

```int strncmp (const char* str1, const char* str2, size_t num);```

Parameters:
- ```str1``` and ```str2``` are strings to be compared with each other.
- ```num``` is the maximum number of characters to be copied from source. Recall that ```size_t``` is an unsigned integer, meaning that it can't be negative. 

<a href="https://cplusplus.com/reference/cstring/strncmp/">Link to official documentation.</a>

When you completed Step 13, a copy of ```step_10.c``` was created for you. Apply the changes to ```step_10_fix.c```. Click "Check Work" to test your fix.

### Step 15: Fixing ```strcat```

Your file called ```step_11.c``` had its ```strcat``` function broken. To keep the functionality of this file, but to make it safer, you will use ```strcat```. Here is the signature for the function:

```char* strncat (char* destination, const char* source, size_t num);```

Parameters:
- ```destination``` is a pointer to the destination array, which should contain a C string, and be large enough to contain the concatenated resulting string.
- ```source``` is the C string to be appended. This should not overlap destination.
- ```num``` is the maximum number of characters to be copied from source. Recall that ```size_t``` is an unsigned integer, meaning that it can't be negative. 

<a href="https://cplusplus.com/reference/cstring/strncat/">Link to official documentation.</a>

When you completed Step 14, a copy of ```step_11.c``` was created for you. Apply the changes to ```step_11_fix.c```. Click "Check Work" to test your fix.

### Step 16: Fixing ```sprintf```

Your file called ```step_12.c``` had its ```sprintf``` function broken. To keep the functionality of this file, but to make it safer, you will use ```snprintf```. Here is the signature for the function:

```int snprintf (char* s, size_t n, const char* format, ...);```

Parameters: 
- ```str``` is a pointer to a buffer where the resulting C-string is stored. The buffer should be large enough to contain the resulting string.
- ```n``` is the maximum number of characters to be copied from source. Recall that ```size_t``` is an unsigned integer, meaning that it can't be negative. 
- ```format``` is the C string that contains a format string that follows the same specifications as format in ```printf```.
- ```...``` are additional parameters that can be used, but would be unnecessary for this step.

<a href="https://cplusplus.com/reference/cstdio/snprintf/?kw=snprintf">Link to official documentation.</a>

When you completed Step 15, a copy of ```step_12.c``` was created for you. Apply the changes to ```step_12_fix.c```. Click "Check Work" to test your fix.

## <strong>Topic 4: A Large-Scale Buffer Overflow Attack</strong>

In this final topic, you will be accessing a nuclear reactor simulator with four different vulnerabilities which can commonly appear in programs written in C:

1. <strong>Buffer Overflow:</strong> When more bytes are put into a buffer than the buffer has space for.
2. <strong>Off-By-One Errors:</strong> When a counting or inequality issue results in a arithmetic mistake of 1.
3. <strong>Integer Overflow/Underflow and Sign Errors:</strong> When the "wrapping" of an integer type, or an incorrect sign causes something unexpected to happen.
4. <strong>String Format Vulnerabilities:</strong> When a user is allowed to provide a format string to a ```printf``` statement.

You have already explored plenty of buffer overflow examples, as well as string format vulnerabilities. Off-By-One errors and integer errors are common mistakes that can occur in programming. However, as said before, C does not perform preliminary checks. This means when a small arithmetic error occurs, it may not be detected, and will cause logic errors.

<u>The scenario</u>: Wormwood is a nuclear reactor simulator. Written by an inexperienced C developer, plenty of mistakes are littered throughout the simulator. Many potential issues are present, and all it takes is for someone (either experienced or unexperienced) to use the wrong input, and catastrophe could occur.

You have already practice a lot of buffer overflow. These are the other three vulnerabilities that are present in the simulator:

<strong>Off-By-One errors</strong> occur when the arithmetic of a "for" loop is mishandled. Here's a common off-by-one error:

```
char list[] = {1, 2, 3, 4};
for (int i = 0; i <= 4; ++i) {
    printf("%d ", list[i]);
}
```

Which prints: ```1 2 3 4 0```

This does NOT cause an error. Despite accessing more elements than the list contains. An off-by-one error doesn't occur in just for loops. This can occur anytime that you're trying to access a specific element from an array or a char from a string. The error is most common when indices are not double-checked, since novice programmers tend to forget that indices start at 0, not 1.

<strong>Integer overflow/underflow and sign errors</strong> occur when a number "rolls over" the maximum number of the integer's type. Take a look at this code:

```
#include <limits.h>
#include <stdio.h>

int main() {
    printf("%d", INT_MAX);
    return 0;
}

>> 2147483647
```

The ```INT_MAX``` constant is the highest number that an integer can be. Attempting to add one to ```INT_MAX``` will result in this:

```
#include <limits.h>
#include <stdio.h>

int main() {
    printf("%d", INT_MAX + 1);
    return 0;
}

>> main.c: In function ‘main’:
main.c:5:26: warning: integer overflow in expression of type ‘int’ results in ‘-2147483648’ [-Woverflow]
    5 |     printf("%d", INT_MAX + 1);
      |                          ^

>> -2147483648
```

The integer has rolled over, and is now ```INT_MIN```. This is similar to rolling over 999,999 miles on an odometer, where the odometer will read 000,000 as the new mileage because it cannot output 1,000,000.

Another similar example of an integer overflow is an occurrence back in 2020. An Android user took an image of St. Mary Lake in Glacier National Park, but accidentally captured a color which would not be readable in a standard RGB (sRGB) code. The RGB value of a specific pixel was 257, which was more than the maximum, which was 255. Since Android phones would not be able to process this "color", they would crash if they ever loaded the image onto their device. <a href="https://nokiamob.net/2020/06/09/explained-why-this-photo-crashes-some-android-phones/">You may read more about this here.</a>

A <strong>string format vulnerability</strong> occurs when a ```printf``` statement is used, but a format (such as ```%s```, ```%d```, ```%f```, etc) isn't provided. When a user provides an input that contains a format specifier, then information from the stack may be shown. Suppose a hacker sends ```%x %x %p %p %s %s``` through user input. This will fool ```printf``` to print two hex values, two pointers, and two random elements from the stack.

If you would like to read more about this type of vulnerability, you may visit <a href="https://www.geeksforgeeks.org/format-string-vulnerability-and-prevention-with-example/">this article</a>.