# CS 458

# Lecture 1: May 2, 2018

## What is security

In the context of computers, usually means:

1. **C**confidentiality
    * Access to systems or data is limited to authorized parties
2. **I**ntegrity
    * Receive the "right" data
3. **A**vailability
    * System or data is there when needed

A computing system is _usually_ said to be secure if it has all three properties.

### Security and reliability

Security goes hand in hand with "reliability."  
A secure system can be relied on to:

1. Keep personal data confidential
2. Allow only authorized access or modifications to resources
3. Ensrue that any prouced results are correct
4. Give correct and meaningful results _on demand_

## What is privacy

A useful definition of privacy: "informational self-determination"

* _user_ gets to _control_ information _about ther user_
* "Control" means:
    * Who gets to see it
    * Who gets to use it
    * What they can use it for
    * Who they can give it to
    * etc

### PIPEDA

Personal Information Protection and Electronic Documents Act.  
Canada's private-sector privacy legislation.

Ten Fair Information Principles companies need to abide by:

1. Identify the purpose of data collection
2. Obtain consent
3. Limit collection
4. Limit use, disclosure and retention
5. Use appropriate safeguards
6. Give individuals access
7. Be accurate
8. Be open
9. Be accountable
10. Provide recourse

## Security vs. privacy

Sometimes people places security and privacy as if they're opposing forces.  
Are thye really? Do we have to give up one to get the other?

## Who are the adversaries

Who's trying to mess with us?  
Various groups:

* Murphy
* Amateurs
* "Script kiddies"
* Crackers
* Organised crime
* Government "cyberwarriors"
* Terrorists

## Terminologies

### Assets

Things we might want to protect, such as:

* Hardware
* Software
* Data

### Vulnerabilities

Weaknesses in a system that may be able to be **exploited** in order to cause loss or harm

### Threats

A loss or harm that might befall a system

Four major categories of threats:

1. Interception
2. Interruption
3. Modification
4. Fabrication

When designing a system, we need to state the **threat model**,

* Set of threats we are undertaking to defend against
* **Whom** do we want to prevent from doing **what**

### Attack

An action which **exploits** a **vulnerability** to **execute** a **threat**

### Control/Defence

Removing or reducing a vulnerability.  
You **control** a **vulnerability** to prevent an **attack** and defense against a **threat**.  

## Methods of defence

* **Prevent it**: prevent the attack*
* **Deter it**: make the attack harder or more expensive
* **Deflect it**: make yourself less attractive to attacker
* **Detect it**: notice that attack is occurring (or has occurred)
* **Recover from it**: mitigate the effects of the attack

Often, we'll want to do many things to defend against the same threat: "**Defense in depth**"

## How secure should we make it?

Principle of Easiest Penetration

* "A system is only as strong as its weakest link"
* The attacker will go after whatever part of the system is easiest for attacker, not most convenient for defender
* In order to build secure systems, we need to **learn how to think like an attacker**

Principle of Adequate Protection

* "Security is economics"
* Don't spent a lot of money to protect a system that can only cause a little damage

## Defence of computer systems

We may want to protect any of our **assets**: hardware, software, data.

Many ways to achieve this (not exhaustive).

### Cryptography

Protecting data by making it unreadable to an attacker.  
Authenticatig users with digital signatures.  
Authenticating transactions with cryptographic protocols.  
Ensuring the integrity of stored data.  
Aid customers' privacy by having their personal information automatically become unreadable after a certain length of time.

### Software controls

Passowrds and other forms of access control.  
Operating systems separate users' actions from each other.  
Virus scanners watch for some kinds of malware.  
Development controls enforce quality measures on the original source code.  
Personal firewalls that run on your desktop.

### Hardware controls

Not usually protection of the hardware itself, but rather using separate hardware to protect the system as a whole.  
Fingerprint readers.  
Smart totkens.  
Firewalls.  
Intrusion detection systems.

### Physical controls

Protection of the hardware itself, as well as physical access to the console, storage media, etc.  
Locks.  
Guards.  
Off-site backups.  
Don't put your data centre on a fault line in California.  
Don't put your nuclear power plant in a tsunami zone.

### Policies and procedures

Non-technical means can be used to protect against some classes of attack.

If an employee connects his own Wi-Fi access point to the internal company network, that can accidentally open the network to outside attack, so don't allow the employee to do that!

Rules about choosing passwords.  
Training in best security practices.f

# Lecture 2: May 7, 2018

## Secure programs

Why is it so hard to write secure programs?  
A simple answer:

* Axiom (Murphy): programs have bugs
* Corollary: security-relevant programs have security bugs

## Flaw, faults, and failures

A **flaw** is a problem in a program.  
A **security flaw** is a problem that affects security in some way; confidentiality, integrity, and availability.  

Flaws come in two types: **faults** and **failures**.

A **fault** is a mistake, "behind the scenes"

* An error in the code, data, specification, process, etc.
* A fault is a **potential problem**

A **failure** is when something _actually_ goes wrong

* You log in to the library’s web site, and it shows you someone else’s account
* "Goes wrong" means a deviation from the desired behaviour, not necessarily from the specified behaviour!
    * The specification itself may be wrong

A fault is the programmer/specifier/inside view.  
A failure is the user/outside view.

### Finding and fixing faults

How do you find a fault?

* If a user experiences a failure, you can try to work backwards to uncover the underlying fault
* What about faults that haven't (yet) led to failures?
* Intentionally try to _cause_ failures, then proceed as above
    * Think like an attacker!

Fixing faults:

* Usually by making small edits (**patches**) to the program; this process is "penetrate and patch"
* ex. Microsoft's "Patch Tuesday"

### Problems with patching

Sometimes patching makes things _worse_!

* Pressure to patch a fault is often high, causing a narrow focus on the observed failure, instead of a broad look at what may be a more serious underlying problem
* The fault may have caused other, unnoticed failures, and a partial fix may cause inconsistencies or other problem
* The patch for this fault may be introducing new faults

Alternatives to patching?

* Very difficult... How can programmers inform the users in the best way possible to avoid exposing the fault

### Unexpected behaviour

When a behaviour is specified, the spec usually lists the things the program must do; e.g.

* `ls` must list the names of the files in the directory whose name is given, if the user has permissions to read that directory

Most implementors wouldn't care if it did additional things as well

* Sorting the list alphabetically before outputting them is fine

But from a security/privacy point of view, extra behaviour could be bad

* After displaying the filenames, post the list to a public web site
* After displaying the filenames, delete the files

When implementing a security or privacy relevant program, you should consider "and nothing else" to be implicitly added to the spec

* "should do" vs. "shouldn't do"
* Testing for "shouldn't do"

### Types of security flaws

A way to divide up security flaws is be genesis (where they came from).

Some flaws are **intentional**/**inherent**

* **Malicious** flaws are intentionally inserted to attack systems, either in general, or certain systems in particular
    * If it's meant to attack some particular system, we call it a targeted malicious flaw
    * Otherwise, it's a general flaw
* **Nonmalicious** (but intentional or inherent) flaws are often features that are meant to the be in the system, and are correctly implemented, but nonetheless can cause a failure when used by an attacker

Most security flaws are caused by **unintentional** program errors.

## Unintentional security flaw

### The Heartbleed bug in OpenSSL (April 2014)

The **TLS Heartbeat mechanism** is designed to keep SSL/TLS connections alive even when no data is being transmitted.  
Heartbeat messages sent by one peer contain random data and a payload length.  
The other peer is suppose to respond with a mirror of exactly the same data.

User $\rightarrow$ server:  
`Type | Length | Payload`  
e.g. `HB_RQST | 64KB | H`

Relevant [xkcd](http://imgs.xkcd.com/comics/heartbleed_explanation.png).

There was a **missing bounds check**!  
An attacker can request that a TLS server hand over a relatively large slice (up to 64KB) of its private memory space.  
This is the _same_ memory space where OpenSSL also stores the server's private key material as well as TLS session keys.

### Apple's SSL/TLS bug (February 2014)

The bug occurs in the code used to check the validity of the server's signature on a key used in an SSL/TLS connection.  
Bug existed in certain versions of OSX 10.9 and iOS 6.1 and 7.0.  
An attacker ("man in the middle") could potentially exploit the flaw to get a user to accept a counterfeit key chosen by the attacker.

#### Buggy Code

```swift
static OSStatus
SSLVerifySignedServerKeyExchange(SSLContext *ctx, bool isRsa, SSLBuffer signedParams,
                                 uint8_t *signature, UInt16 signatureLen)
{
    OSStatus        err;
    ...

    if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
        goto fail;
    if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
        goto fail;
        goto fail;
    if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
        goto fail;
    ...

fail:
    SSLFreeBuffer(&signedHashes);
    SSLFreeBuffer(&hashCtx);
    return err;
}
```

#### Problem

The are two consecutive `goto fail` statements!  
The second `goto fail` statement is always executed _if_ the first two checks succeeded.  
In this case, the third check is bypassed and $0$ is returned as the value of `err`.

### Types of unintentional flaws

#### Buffer overflows

Most commonly exploited type of security flaw.

Upshot: if the attacker can write data past the end of an array on the stack, attacker can _usually_ overwrite things like the saved return address.  
When the function returns, it will jump to any address of the attacker's choosing.

Targets: programs on a local machine that run with setuid (superuser) privileges, or a network daemons on a remote machine.

##### Kinds of buffer overflows

In addition to the classic attack which overflows a buffer on the stack to jump to shellcode, there are many variants:

* Attack which work when a **single byte** can be written past the end of the buffer (often caused by a common off-by-one error)
* Overflows or buffers on the heap instead of the stack
* Jump to other parts of the program, or parts of standard libraries, instead of shellcode

##### Defences against buffer overflows

Programmer: use a language with bounds checking

* Also catch those exceptions

Compiler: place padding between data and return address ("canaries")

* Detect if the stack has been overwritten before the return from each function

Memory: non-executable stack

* "W$\oplus$X" (memory page is either writable or executable, but never both)

OS: stack (and sometimes code, heap, libraries) at random virtual addresses for each process

* All mainstream OSes do this

#### Integer overflows

Machine integers represents a finite set of numbers.  
This may not correspond to a programmer's mental model.

Suppose Program assumes that integer is always positive, overflow will make (signed) integer wrap and become negative, which will violate assumption!

* Program casts large unsigned integers to signed integer
* Result of a mathematical operation causes overflow

Attack can pass values to program that will trigger overflow.

#### Format string vulnerabilities

Class of vulnerabilities discovered only in 2000.

Unfiltered user input is used as format string in `printf()`, `fprintf()`, `sprintf()`, ...  
e.g. `printf(buffer)` instead of `printf("%s", buffer)`

* `printf(buffer)` will parse buffer for %'s and use whatever is currently on the stack to process found format parameters

`printf("%s%s%s%s")` likely crashes your program.  
`printf("%x%x%x%x")` dumps parts of the stack.  
`%n` will **write** to an address found on the stack.

##### Example expansion code

```c
char output [44];
char buffer [44];

snprintf(buffer, sizeof(buffer), "Input %s", input);
sprintf(output, buffer);
```

What happens if input=%48d+(address of a libc routine)?

#### Incomplete mediation

Inputs to programs are often specified by untrusted users,

* web-based applications are a common example

Users sometimes mistype data in web forms, e.g.:

* phone number: 51998884567
* email: sross#uwaterloo.ca

The web application needs to ensure that what the user has entered constitutes a **meaningful** request.  
This is called **mediation**.

Incomplete mediation occurs when the application accepts incorrect data from the user.  
Sometimes this is hard to avoid, e.g. `519-886-4567` as a phone number which is a reasonable entry.

Focus on catching entries that are clearly wrong, e.g.:

* not well formed; DOB: 1980-04-31
* unreasonable values; DOB: 1876-10-12
* inconsistent with other entries

##### Why do we care?

Security concerns:

* Buffer overflow
* SQL injection; [relevant xkcd](https://xkcd.com/327/)

Any user-supplied input falls within well-specified values, known to be safe.

##### Client-side mediation

There are some web sites with form that do **client-side** mediation (via Javascript).  
If invalid data is entered, a popup will prevent the user from submitting it.

Related issues: client-side state

* Many web sites rely on the client to keep state for them
* They will put hidden fields in the form which are passed back to the server when the user submits the form

Problem: what if the user

* turns off Javascript
* edits the form before submitting it
* writes a script that interacts with the web server instead of using a web browser at all
* connects to the server "manually"

Note that the user can send arbitrary (unmediated) values to the server this way.  
The user can also modify any client-side state.

##### Defences against incomplete mediation

Client-side mediation is an okay method to use in order to have a friendlier user interface, but is useless for security purposes.

**server-side mediation** is required regardless of whether client-side mediation is used.

For values entered by the user:

* Always check carefully on the values of all fields
* These values can potentially contain completely arbitrary 8-bit data (including accented chars, control chars, etc.) and by of any length

For state stored by the client:

* Ensure client has not modified the data in any way

#### TOCTTOU errors

TOCTTOU ("TOCK-too") errors

* Time-Of-Check To Time-Of-Use
* Also known as "race condition" errors

These errors may occur when the following happens:

* User requests the system to perform an action
* The system verifies the user is allowed to perform the action
* The system performs the action

What happens if the state of the system changes between steps 2 and 3?

##### Example problem

A particular Unix terminal program is `setuid` (runs with superuser privileges) so that it can allocate terminals to users (a privileged operation).

It supports a command to write the contents of the terminal to a log file.  
It first checks if the user has permissions to write to the requested file; if so, it opens the file for writing.

The attacker makes a symbolic link:  
`logfile -> file_she_owns`

Between the "check" and the "open", the attacker changes it
`logfile -> /etc/passwd`

The state of the system _changed_ between the check for permission and the execution of the operation.  
The file whose permissions were checked for writeability by the user (`file_she_owns`) wasn't (`/etc/passwd`).

Can the attacker really "win this race"? **Yes**.

##### Defences against TOCTTOU errors

When performing a privileged action on behalf of another party, make sure all information relevant to the access control decisions is **constant** between the time of the check and the time of the action ("the race")

* Keep a private copy of the request itself so that the request can't be altered during the race
* Where possible, act on the object itself, and not on some level of indirection
    * e.g. make access control decisions based on filehandles, not filenames
* If that's not possible, use locks to ensure the object is not changed during the race

## Malicious code: Malware

Various forms of software written with malicious content.  
A common characteristic is that it needs to be executed in order to cause harm.

Ways a malware get executed:

* User action
    * Downloading and running malicious software
    * Viewing a web page containing malicious code
    * Opening an executable email attachment
    * Inserting a CD/DVD or USB flash drive
* Exploiting an existing flaw in a system
    * Buffer overflows in network daemons
    * Buffer overflows in email clients or web browser

### Types of malware

* Virus
    * Malicious code that adds itself to benign programs/files
    * Code for spreading + code for actual attack
    * _Usually_ activated by users
* Worms
    * Malicious code spreading with no or little user involvement
* Trojans
    * Malicious code hidden in seemingly innocent program that you downloaded
* Logic Bombs
    * Malicious code hidden in programs already on your machine

### Virus

A **virus** is a particular kind of malware that infects other files

* Traditionally, a virus could infect only executable programs
* Nowadays, many data document formats can contain executable code (such as macros)
    * Many different types of files can be infected with viruses now

Typically, when the file is executed (or sometimes just opened), the virus activates, and tries to infect other files with copies of itself.

In this way, the virus can spread between files, or between computers.

#### Infection

The virus wants to modify an existing (non-malicious) program or document (the **host**) in such a way that executing or opening it will transfer control to the virus

* The virus can do its "dirty work" and then transfer control back the host

For executable programs:  
Typically, the virus will modify other programs and copy itself to the beginning of the targets' program code.

For documents with macros:  
The virus will edit other documents to add itself as a macro which starts automatically when the file is opened.

In addition to infecting other files, a virus will often try to infect the computer itself

* This way, every time the computer is booted, the virus is automatically activated

It might put itself in the boot sector of the hard disk.

It might add itself to the list of programs the OS runs at boot time.

It might infect one or more of the programs the OS runs at boot time.

It might try many of these strategies, but it is still trying to _evade detection_.

#### Spreading

For a virus to spread between computers,

* Usually, when the user sends infected files (hopefully not knowing they're infected!) to others
    * Or puts them on a p2p network
* A virus usually requires some kind of user action in order to spread to another machine
    * If it can spread on its own (e.g. via email), it's more likely to be a worm than a virus

#### Payload

In addition to trying to spread, what else might a virus try to do?

Some viruses try to evade detection by disabling any active virus scanning software.

Most viruses have some sort of **payload**.  
At some point, the payload of an infected machine will activate, and something (usually bad) will happen

* Erase hard drive
* Subtly corrupt some of your spreadsheets
* Install a keystroke logger to capture online banking password
* Start attacking a particular target website

#### Spotting viruses

When to look for viruses:

* As files are added to the computer
    * via portable media
    * via a network
* From time to time, scan the entire state of the computer
    * to catch anything missed when on its way in
    * however, any damage the virus have done may not be reversible

How to look for viruses:

* Signature-based protection
* Behaviour-based protection

#### Signature-based protection

Keep a list of all known viruses.  
Each virus has some characteristic feature (the **signature**),

* Most signature-based systems use features of the virus code itself
    * the infection code
    * the payload code
* Can also try to identify other patterns characteristics of a particular virus
    * where on the system it tries to hide itself
    * how it propagates from one place to another

##### Polymorphism

To try to evade signature-based virus scanners, some viruses are **polymorphic**,

* The virus makes a _modified_ copy instead of a _perfect_ copy every time it infects a new file
* Often done by having most of the virus code encrypted
    * The virus starts with a decryption routine which decrypts the rest of the virus, which is then executed
    * When the virus spreads, it encrypts the new copy with a newly chosen random key

#### Behaviour-based protection

Signature-based protection systems have a major limitation

* Can only scan for viruses that are in the list
* Several new viruses identified _every day_
    * one anti-virus recognizes over _36 million_ signatures

Behaviour-based systems look for suspicious patterns of behaviour, rather than for specific code fragments; some systems run suspicious code in a sandbox first.

##### False negatives and positives

Any kind of test or scanner can have two types of errors:

1. False negatives: fail to identify a threat that is present
2. False positives: claim a thread is present when it is not

Which is worse? How do you think signature-based and behaviour-based systems compare?

##### Base rate fallacy

Suppose a breathalyzer reports false drunkness in 5% of cases, but never fails to detect true drunkness.  
Suppose that 1 in every 1000 drivers is drunk (the **base rate**).  
If a breathalyzer test of a random driver indicates that he or she is drunk, what is the probability that he or she really is drunk?

Applied to a virus scanner, these numbers imply that there will be many more false positives than true positives, potentially causing the true positives to be overlooked or the scanner disabled.

### Worms

A **worm** is a self-contained piece of code that can replicate with little or no user involvement.  
Worms often exploit security flaws in widely deployed software as a path to infection.

Typically:

* A worm exploits a security flaw in some software on the computer, infecting it
* The worm immediately starts searching for other computers to infect
* There may or may not be a payload that activates at a certain time, or by another trigger

#### The Morris worm

First Internet worm, launched by a graduate student at Cornell in 1988.  
Once infected, a machine would try to infect other machines in three ways:

1. Exploit a buffer overflow in the "finger" daemon
2. Use a back door left in the "sendmail" mail daemon
3. Try a "dictionary attack" against local users' passwords.  
    If successful, log in as them, and spread to other machines they can access without requiring a password

All three methods were well known!  
First example of buffer overflow exploit in the wild.  
Thousands of systems were offline for several days.

#### The Code Red worm

Launched in 2001, exploited a buffer overflow in Microsft's IIS web server (for which a patch had been available for a month).  
An infected machine would

* Deface its home page
* Launch attacks on other web servers (IIS or not)
* Launch a DoS attack on a handful of websites, including www.whitehouse.gov
* Installed a back door to deter disinfection

Infected 250k systems in 9 hours.

#### The Slammer worm

Launched in 2003, performed DoS attacks.  
First example of a "Warhol worm"; a worm which can infect nearly all vulnerable machines in just 15 minutes.  
Exploited a buffer overflow in Microsoft's SQL Server (also having a patch available).  
A vulnerable machine could be infected with a single UDP packet!

* This enabled the worm to spread extremely quickly
* Exponential growth, double every _8.5 seconds_
* 90% of vulnerable hosts infected in 10 minutes

Dropped due to midterm conflicts. :(