# Software security
The requirements for a program are:
* has to be correct
* has to be efficient
* has to be secure

When a program is able to perform actions outside of its intended behaviour, it can lead to insecurity.

## Security issues
### Improper implementation
When a program is not implemented properly, it can allow attackers to deviate from the programmer's intent.

### Unanticipated input
When the attacker is able to supply unanticipated input, it can cause the process to:
* access sensitive information
* deviate from the intended execution path
* execute injected code

This is a form of privilege escalation.

Programming languages have a huge wealth of functionality.
Not knowing the nuances can lead to subtle implication of functionality.
This results in the program doing more than the developer's expected.

## Computer architecture
### Code vs Data
Modern computer uses the **Von Neumann computer architecture**.
This means that code and data are stored together in memory, thus there is no clear distinction between code and data.
This is unlike the **Harvard architecture** which has separate hardware for storing code and data.

Thus, this give rise to implications which allows programs to be tricked into treating data as code, which is the basis for **code-injection attacks**.

### Attacks on software
#### Integer overflow

Integer arithmetic in most programming languages are actually modulo arithmetic.
Because integer are (usually) stored as a fix number of bits, there is a maximum size that the integer can hold.
If the datatype is made to hold a value larger than its maximum size, it will overflow.
In some languages, it will throw an explicit error; however in most languages, the value will implicitly wrap around.

(Note that because Python do not have fix size integer primitives, our discussions will not be referring to Python)

For example, if the datatype has size of 8 bits (1 byte), then the range of values it can take as an unsigned integer would be 0-255.
Now suppose that `x=255`. If we increment `x` by 5, then we get `x=4`, because it wrapped around back to 0 when we increment it by 1, and we further increment it 4 times.
This causes the condition that `x < x + 1` to be false at this boundary condition.
If the programmer made this assumption when developing the program, this can lead to a vulnerability.

Thus, consider the following banking system.
The system allows the user to draw funds up to a certain limit, before further withdrawals are denied.

In [2]:
%cd software-security-example

/home/own3d/wellspring/cyber-security/software-security-example


In [10]:
!cat withdraw.c

#include <stdio.h>

int process_withdraw(int previous_withdraw_amt, int requested_amt, int withdraw_limit) {
    if (previous_withdraw_amt + requested_amt < withdraw_limit)
        return requested_amt;
    else
        return 0;
}

int main(int argc, char *argv[])  {
   if(argc != 2) {
   	printf("Wrong number of arguments.\n");
	return 1;
   }
   
   int requested_amt;
   sscanf(argv[1], "%d", &requested_amt);

   int previous_withdraw_amt = 80;
   int WITHDRAW_LIMIT = 100;

   printf("You have previously withdrawn $%d\n", previous_withdraw_amt);
   printf("You have requested to withdraw $%d\n", requested_amt);
   
   int payout = process_withdraw(previous_withdraw_amt, requested_amt, WITHDRAW_LIMIT);
   
   if (payout > 0)
      printf("Here is your $%d. Have a nice day!\n", payout);
   else
      printf("Sorry, the requested amount is beyond the limit\n");
}


In [12]:
!./withdraw 10

You have previously withdrawn $80
You have requested to withdraw $10
Here is your $10. Have a nice day!


Since we have withdrawn \\$80, we can still withdraw up to \\$20, thus the above transaction works.

In [13]:
!./withdraw 30

You have previously withdrawn $80
You have requested to withdraw $30
Sorry, the requested amount is beyond the limit


As we can see, we cannot withdraw beyond our limit, or so it seem.

In [14]:
!./withdraw 2147483647

You have previously withdrawn $80
You have requested to withdraw $2147483647
Here is your $2147483647. Have a nice day!


As we can see, by requesting a large value, we get `previous_withdraw_amt + requested_amt = 80 + 2147483647 = 79`, which is less than the withdraw limit of 100.
Keen readers would recognize the value of `2147483647` to be the maximum an signed integer can hold using 32 bits.

Thus, we can bypass the check and withdraw more than the allowed amount.

#### Inconsistent data string representation
When different parts of the program adopt different data representation, it could lead to a vulnerability.

We have seen one such vulnerability in [null-byte injection of domain](./public_key_infrastructure.ipynb#null-byte-injection), which happened when the verifying of certificate uses non-null byte terminated strings while the address checking uses null byte terminated strings.

We can also consider the following system.
The system stores each user's documents in their own home directory.
It exposes a public interface where users are allowed to query for files using their file names.
The user's home directory will be searched for the desired file.

In [10]:
from urllib.parse import unquote

BASE_URL = '/home'
USER = 'alice'

def _get_file(base_url, user, file_name):
    file_name = unquote(file_name)
    target_file = f'{base_url}/{user}/{file_name}'
    print(f'Giving the user the file: {target_file}')

alice_get_file = lambda f: _get_file(BASE_URL, USER, f)

Note the use of `unquote`.
Because the file name may be sent via a query parameter encoded in the URL, and the file name may contain special characters that are not allowed in URL, the file name may be "percentage encoded" to represent these illegal characters.

For example, suppose the user requests the file `e=mc.txt`, the encoded representation that is sent to the system will be `e%3Dmc2.txt`, because `=` is a reserved character in URL.

In [22]:
alice_get_file('note.txt')

Giving the user the file: /home/alice/note.txt


Typical usage involves the user requesting the file name, and the system returning the file within their home directory.

In [12]:
alice_get_file('../bob/note.txt')

Giving the user the file: /home/alice/../bob/note.txt


However, notice that the user can inject `../` to change the directory that the system is returning.
In UNIX systems, `../` refers to the parent directory, thus the file path will resolve to `/home/bob/notes.txt`.
Thus, with this, Alice can illegally obtain Bob's documents.

Thus, a programmer may implement a sanitization function which ensures that `../` is not part of the file name requested.

In [21]:
from urllib.parse import unquote

BASE_URL = '/home'
USER = 'alice'

def _sanitized_get_file(base_url, user, file_name):
    if '../' in file_name:
        raise Exception('"../" detected in file name')
    file_name = unquote(file_name)
    target_file = f'{base_url}/{user}/{file_name}'
    print(f'Giving the user the file: {target_file}')

alice_get_file = lambda f: _sanitized_get_file(BASE_URL, USER, f)

In [18]:
alice_get_file('../bob/note.txt')

Exception: "../" detected in file name

As we can see, the system (seemingly) works.
However, consider the following:

In [20]:
alice_get_file('%2e./bob/note.txt')
alice_get_file('..%2fbob/note.txt')

Giving the user the file: /home/alice/../bob/note.txt
Giving the user the file: /home/alice/../bob/note.txt


Since the sanitization of the input is represented differently from the actual path used, this allowed the attacker to bypass the sanitization by supplying percentage encoded strings which were not picked up by the sanitization process.

#### Buffer overflow

##### Background
Refer to [computer organization](../computer-organization/stack.ipynb).

In C/C++, memory is managed by the programmer, thus illegal memory is allowed.
Notice that the variables are stored sequentially on the stack, thus other variables can be access through an variable higher on the stack by access outside of the bounds of the memory.
This allows variables to be illegally read or written.

Consider the following code:

In [28]:
!cat buffer_overflow.c

#include <stdio.h>

void change_value(int index, int value) {
	char arr[10];
	char b = 100;

	printf("The value of b is %d\n", b);	
	printf("Changing index %d of a to %d\n", index, value);

	arr[index] = value;
	printf("The value of b is %d\n", b);	
}

int main(int argc, char *argv[])  {
	if(argc != 3) {
   		printf("Wrong number of arguments.\n");
		return 1;
   	}

	int index, value;
	sscanf(argv[1], "%d", &index);
	sscanf(argv[2], "%d", &value);

	change_value(index, value);	
}


Note that we use `char` for the `a` and `b` so that the variables are neatly aligned on the stack.
Notice that since `b` is below `a` in the stack, we even though we are modifying variable `a`, it is possible for use to modify variable `b` as well, as per below.

In [51]:
%%bash
./buffer_overflow 10 42

The value of b is 100
Changing index 10 of a to 42
The value of b is 42


From the background knowledge of how variables are stored, readers may have noticed that the return address is also stored on the stack.
Thus, it is possible for attackers to modify the return address by writing over it, thus causing the function to jump to a different function rather than the original caller.
This attack is called **stack smashing**.

If we called `./buffer_overflow 11 9`, it will lead to a segmentation fault because it would have overwritten the return address, causing the program to return to an un

#### SQL injection

Suppose we have the following system, where the program only allows access to users who can provide a valid user name and secret name pair.
The users are stored in a SQL database, and the list of users are retrieved via SQL commands.

In [8]:
import sqlite3
from sqlite3 import Error

class GateKeeper:
    def __init__(self):
        def create_connection(path):
            connection = sqlite3.connect(path)
        
            return connection
    
        self.connection = create_connection('./users.sqlite')

    def authenticate(self, name, secret_name):
        select_users = f"SELECT * FROM users WHERE name='{name}' AND secret_name='{secret_name}'"
        users = self._execute_read_query(select_users)
        
        if len(users) == 0:
            print("Name and secret name does not match. Villains are not allowed!")
        else:
            user = users[0]
            _, name, secret_name = user
            print(f"Name and secret name matches. Welcome to the club, {name} (alias {secret_name}).")

    def dump_data(self):
        users = self._execute_read_query("SELECT * FROM USERS")
        
        for user in users:
            print(user)
            
    def _execute_read_query(self, query):
        print(f"Executing query: {query}")
        cursor = self.connection.cursor()
        result = None
        try:
            cursor.execute(query)
            result = cursor.fetchall()
            return result
        except Error as e:
            print(f"The error '{e}' occurred")

gate_keeper = GateKeeper()

In [9]:
gate_keeper.dump_data()

Executing query: SELECT * FROM USERS
(1, 'Alice', 'Diana')
(2, 'Bob', 'Clark')
(3, '0WN3D', '0WN463')


The above is the list of users currently in the database.

In [6]:
input_name, input_secret_name = "0WN3D", "0WN463"
gate_keeper.authenticate(input_name, input_secret_name)

Executing query: SELECT * FROM users WHERE name='0WN3D' AND secret_name='0WN463'
Name and secret name matches. Welcome to the club, 0WN3D (alias 0WN463).


In [11]:
input_name, input_secret_name = "Hacker", "pwner_1337"
gate_keeper.authenticate(input_name, input_secret_name)

Executing query: SELECT * FROM users WHERE name='Hacker' AND secret_name='pwner_1337'
Name and secret name does not match. Villains are not allowed!


As we can see, valid users are allowed while invalid users are denied, or so it seems.

Notice that to determine the query, the user's input is directly substituted into the command.
Suppose what happens if the user's input contains a `'`, for instance `' something something` for the name field.
The resultant SQL command ran would be `SELECT * FROM users WHERE name=''something something AND secret_name='SOME_SECRET'`

In [12]:
input_name, input_secret_name = "'something something", "SOME_SECRET"
gate_keeper.authenticate(input_name, input_secret_name)

Executing query: SELECT * FROM users WHERE name=''something something' AND secret_name='SOME_SECRET'
The error 'near "something": syntax error' occurred


TypeError: object of type 'NoneType' has no len()

Notice that the resultant SQL command is invalid, thus causing an error.
This is an indication that we are able to modify the underlying SQL command.
Now, suppose that the attacker sets the name field to be `' OR 1=1 --`.


The resultant SQL command will be: 
```
SELECT * FROM users WHERE name='' OR 1=1 --' AND secret_name='SOME_SECRET'
```

In SQL, `--` symbolizes that the characters after that are comments, thus the functional command is actually:
```
SELECT * FROM users WHERE name='' OR 1=1
```

The `--` is there to truncate further SQL statements that were part of the original SQL template, because the further part is likely to trigger a syntax error.

Now, notice that in the SQL statement, we are checking if the name is blank, which is false for all users in the system.
However, we perform an `OR` operation against `1=1`, which is always true.
Thus, the resultant statement is always true for all users.
Hence, we can trick the system into thinking we provided credentials that matched one of the users.

In [18]:
input_name, input_secret_name = "' OR 1=1 --", "SOME_SECRET"
gate_keeper.authenticate(input_name, input_secret_name)

Executing query: SELECT * FROM users WHERE name='' OR 1=1 --' AND secret_name='SOME_SECRET'
Name and secret name matches. Welcome to the club, Alice (alias Diana).


Thus, we have authenticated at an endpoint without knowing the credentials.


An example of the layers being less defined in computer system is between the OS and the kernel.
Some literature consider them part of the same layer.

**Process integrity** is the assurance that the process will not deviate from its intended execution path.

The layers are arranged from the least privileged (application) to the most privileged (hardware).
Thus, a secure system would be such that if one of the layers is compromised by an attacker, they are not able to manipulate the objects in the inner layers.
(Note that this is rather difficult to achieve, due to numerous issues such as implementation errors, user error *etc*)

One can imagine the chaos if an attacker who is able to perform SQL injection on the DBMS (service layer) is somehow able to obtain the password file (OS layer) of the system through it.
Or if an attacker that employed cross-site scripting to compromised the browser (application layer) is able to burn out the CPU (hardware layer).

## Access control model
Access control are required in a computer system to restrict the **operations** that can be performed by some **entity** on some **objects**.

Operations can be categorized as:
* Observe (*eg* reading a file)
* Alter (*eg* writing to a file, replacing a file, deleting a file, changing ownership)
* Action (*eg* executing a file)

Suppose that a **subject/principal** wants perform some **operation** on some **object**.
An example would be the user with name `John` wishes to `read` the file `johns_grades.txt`. 
The entity known as the **reference monitor** is responsible to deciding whether to allow or deny the access.

There are two main types of access control

### Discretionary access control model
The owner of the object decides the rights

### Mandatory access control model
A system-wide policy that decides the rights.

## Access control representation
An **access control matrix** shows the relationship between principals and objects.

|  | sudo | passwd | common.txt |
| --- | ---| --- | --- |
|root | {run} | {read, write} | {read, write} |
| Alice | {run} | {read} | {read, write} |
| Bob | {} |  {read} | {read, write} |

This matrix can be represented in two ways.

### Access control list
Stores the rights to a particular object as a list.

```
sudo: [(root, {run}), (Alice, {run})]
passwd: [(root, {read, write}), (Alice, {read}, (Bob, {read})]
common.txt: [(root, {read, write}), (Alice, {read, write}), (Bob, {read, write})]
```

### Capabilities
Stores the capabilities of each subject as a list.

```
root: [(sudo, {run}), (passwd, {read, write}), (common.txt, {read, write})]
Alice: [(sudo, {run}), (passwd, {read}), (common.txt, {read, write})]
Bob: [(passwd, {read}), (common.txt{read, write})]
```

Note that is is difficult to know which objects does a particular subject has access to in ACL, but easy in capabilities.
The vice versa occurs when finding who has access to a certain object.

However, notice that the matrix can get really large if there are multiple users, and there are numerous files.

Thus, to help simplify the matrix representation, we can group the users and define access rights to these groups instead.

## UNIX access control
File permissions consist of rights to the following user classes:
* owner
* owner's group(s)
* others

In UNIX, groups can only be created by `root`.

Because all resources are treated as files in UNIX, we can define access controls on these resources (hardware, I/O) the same way we do for files.

Each user:
* has a unique username
* has a unique user identifier (**UID**)
    * stored in `/etc/passwd`
* can belong to one or more groups
    * first group is stored in `/etc/passwd`
    * other groups are stored in`/etc/group`
    
Each group:
* has a unique group name
* has a unique group identifier (**GID**)

Purpose of UID/GID:
* determine ownership of system resources
* determine credential of running processes
* control permissions granted to processes

There is a special user called the **super use**, with UID 0, typically called `root`.
All security checks are disabled for the super user.

### passwd file
The `passwd` file is made readable by everyone because some processes requires information in it.

In older versions of UNIX, the password hash is stored in the file.
This allowed attackers to perform offline password guessing to crack the password.

In newer versions, the hash is stored elsewhere, typically in `/etc/shadow`.
This hash is not readable by everyone.

### shadow file

Within the shadow file, each entry is formated as follows (separated by `:`)
* login name
* hashed password
* date of last password change, 
* minimum password age
* maximum password age
* password warning period
* password inactivity period
* account expiration date
* reserved field 

For the second field (hashed password), has the following format: 

`$id$salt$hash`

where `id` correspond to the hash-method used (5=SHA-256, 6= SHA-512, *etc*)
    

### Processes
A new process is spawned when running an executable file, or a child is forked from a parent process.

Each process has its own **process ID (PID)**.
Use `ps aux` to see all processes and their PID.

### File permissions
File permissions are represented by 9 characters, with each triplet corresponding to a certain user class.
The first triplet corresponds to the owner; the next to users in the group and the last to everyone else.

Each triplet corresponds to the three actions, `r` read, `w` write, `x` execute.

In [47]:
!exa -l

[1;34md[33mr[31mw[32mx[0m[33mr[38;5;244m-[32mx[33mr[38;5;244m-[32mx[0m    [38;5;244m-[0m [1;33mown3d[0m [34m10 Jul 11:19[0m [1;34maccess-control-example[0m
.[1;33mr[31mw[0m[38;5;244m-[33mr[38;5;244m--[33mr[38;5;244m--[0m  [1;32m25[0m[32mk[0m [1;33mown3d[0m [34m10 Jul 11:45[0m access_control.ipynb
.[1;33mr[31mw[0m[38;5;244m-[33mr[38;5;244m--[33mr[38;5;244m--[0m  [1;32m17[0m[32mk[0m [1;33mown3d[0m [34m 6 Jul 08:28[0m authentication.ipynb
.[1;33mr[31mw[0m[38;5;244m-[33mr[38;5;244m--[33mr[38;5;244m--[0m  [1;32m75[0m[32mk[0m [1;33mown3d[0m [34m26 Jun 11:12[0m classical_ciphers.ipynb
.[1;33mr[31mw[0m[38;5;244m-[33mr[38;5;244m--[33mr[38;5;244m--[0m  [1;32m25[0m[32mk[0m [1;33mown3d[0m [34m 6 Jul 23:49[0m data_origin_authentication.ipynb
.[1;33mr[31mw[0m[38;5;244m-[33mr[38;5;244m--[33mr[38;5;244m--[0m [1;32m4.2[0m[32mk[0m [1;33mown3d[0m [34m 7 Jul 23:16[0m introduction.ipynb
.[1;33mr

Note that I use `exa` instead of `ls` because it is more convenient for everyday use.

#### Special permissions
##### Set-UID
Represented by an `s` replacing the owner's `x` bit, this causes the process' **effective UID** to be the owner's, rather than the user running it.

##### Set-GID
Represented by an `s` replacing the group's `x` bit, this causes the process' **effective GID** to be the group owner.

##### Sticky bit
Represented by an `t` replacing the other's `x` bit, if set on a directory, only the owner or root can delete the files in that directory.

Note that only the owner or root can change the permission of a file.

A process has its **process credentials**, determined by its **real UID** and **effective UID**.

Real UID is inherited from the user who ran the process, which identifies the real owner of the process.

When set-UID bit is not set, the effective UID of the process is the real UID.

When set-UID bit is set, the effective UID of the process is the file owner's UID.

#### Purpose of set-UID
Suppose we have a password file, which contains all the user's password.
Thus, it makes sense to not make it readable/writable to everyone.

However, suppose that we also are required to allow users to change their own passwords.
Because the file contains everyone else's passwords, we cannot allow it to be writable to the user, thus we cannot satisfy the above requirement.

Hence, `root` can create special program `change_password` which interacts with the password file.
This file is made executable by all the users.
To allow the password file to be modified when the user runs the program, we set the set-UID bit of the program.

With the set-UID bit, the process inherits the UID of `root`, allowing access to the password file in a controlled manner through this program.
This allows **temporary privilege escalation** of the user when they run the program.

Since the privilege of the user is temporarily escalated, it is important to ensure that there is no vulnerabilities in the program.
If the attacker finds a vulnerability, they can use it to perform malicious actions that they otherwise were not able to perform due to insufficient privileges.
These are known as **privilege escalation attacks**

##### Example

In [36]:
%cd access-control-example
from getpass import getpass

[Errno 2] No such file or directory: 'access-control-example'
/home/own3d/wellspring/cyber-security/access-control-example


This is our file structure.

In [37]:
!exa -l

.[1;33mr[31mw[0m[35ms[33mr[38;5;244m-[32mx[33mr[38;5;244m-[32mx[0m [1;32m53[0m[32mk[0m root  [34m10 Jul 11:14[0m [1;32mchange_secret_name[0m
.[1;33mr[31mw[0m[38;5;244m-[33mr[38;5;244m--[33mr[38;5;244m--[0m [1;32m359[0m [1;33mown3d[0m [34m10 Jul 11:14[0m change_secret_name.py
.[1;33mr[31mw[0m[38;5;244m-------[0m  [1;32m52[0m root  [34m10 Jul 11:19[0m secret_names.txt


Within `secret_names.txt`, there is a list of users and their corresponding secret names.
We require that no one is able to peek at the other's secret name, thus the file is not world-readable and not world-writable.

(Note that all commands which executes using `sudo` is not supposed to be allowed by the regular user. We need it to display the contents of the file for demonstration)

In [38]:
!echo {getpass()} | sudo -S cat secret_names.txt | sed '1 i\ '

········
[sudo] password for own3d:  
name	secret name
own3d	0wn463
alice	diana
bob	clark


In [39]:
!cat secret_names.txt

cat: secret_names.txt: Permission denied


Thus, we employ a binary with a set-uid bit set (take note of the set-uid bit in the file permission) to allow controlled access to the `secret_names.txt` file

`change_secret_name` is the compile C binary using `change_secret_name.py` as the source code.
We need to compile the Python script into C because the Linux kernel does not inherit set-uid privileges for interpreted scripts, such as Bash and Python.

In [40]:
!cat change_secret_name.py

import os
import sys
import re

SECRET_FILE = 'secret_names.txt'

with open(SECRET_FILE, 'r') as f:
    data = f.read()

user = os.getenv('USER')
new_secret = sys.argv[1]

print(f'{user} is changing their secret name to {new_secret}')

new_data = re.sub(rf'{user}\t.*', rf'{user}\t{new_secret}', data)

with open(SECRET_FILE, 'w') as f:
    f.write(new_data)


###### Changing secret name


In [41]:
!./change_secret_name new_name

own3d is changing their secret name to new_name


In [42]:
!echo {getpass()} | sudo -S cat secret_names.txt | sed '1 i\ '

········
[sudo] password for own3d:  
name	secret name
own3d	new_name
alice	diana
bob	clark


As we can see, we can use the executable to change the name without having direct root access.

###### Vulnerability

However, some keen users may have already seen the vulnerability in the program.
It assumed that the user of the executable corresponds to the `USER` variable.
However, the `USER` variable can be modified by the caller of the executable, like so.

In [43]:
!USER=alice ./change_secret_name enemies_stand

alice is changing their secret name to enemies_stand


In [44]:
!echo {getpass()} | sudo -S cat secret_names.txt | sed '1 i\ '

········
[sudo] password for own3d:  
name	secret name
own3d	new_name
alice	enemies_stand
bob	clark


Thus, I am able to modify `alice`'s secret name even though I am not `alice`.