# Web Security
## Background
### Overview of HTTP
This is a quick rundown of what happens when a website is accessed:
1. User goes to a URL
2. HTTP request is sent to the server
3. Server response with a HTML file
4. User's browser renders the HTML files as a website



Note that the HTML may contain sub-resources (**eg** images, videos, icons) which are hosted on external website.
To obtain these resources, the browser will sent more HTTP requests to these endpoints.

HTTP request/response contains headers, which gives hint to the receiver as to what to do with the data.
For example, it can contain details about the content-length or the data type.

One crucial element for web security are **cookies**.

### Cookies
A cookie is a textual data sent from the server, using the `Set-Cookie` field.
They contain key-value pairs of data.
Cookies are stored on the user's browser while they are browsing.
When the user revisits the site, the browser will send all **in-scope** cookies to the server via the `Cookie` header.

There are several types of cookies:
* Session cookies
    * deleted at the end of the browsing session
* Persistent cookies
    * expires at a specific time, or after a specified time
* Secure cookies
    * can only be transmitted via HTTPS

#### Purpose
Because HTTP is stateless, there needs to be another mechanism for tracking a session.
Cookies are a common method of setting and tracking a session ID between server and user.

By using **token-based authentication**, the user do not need to repeat their login each time they visit the site.

1. User logs in with their credentials
2. Server verifies the credentials, and sends a token to the user
3. User receives the token and the browser saves the token as a cookie
4. Subsequent interaction with the server will send the token to the server via the cookie, which the server will verify

On the server side, they need to store a mapping between users and their currently active tokens, so that they can verify the token they receive from the user.

To avoid having to store the table, the server can perform the following:
1. Decide on what information should be on the token
    * for example, user name and expiry date
2. Compute the MAC of the above information, using a secret key
3. Concatenate the MAC with the above information and send it as a token
4. When the server receives a token, they compute the MAC of the information given, and check that it matches the MAC advertised in the token

This mechanism uses the basis that only the server (who knows the secret key) can produce a valid MAC for the given information.

### Javascript
Javascript is a programming language that can be embedded into HTML pages (using `<script>` tags).
They allow interactive design of the web page, by using the plethora of capabilities of a programming language.


## Threats
### Attackers as another end system
We can model the attacker as another end system.
Hence, they fall into 2 categories:

* Forum poster
    * Weakest type of attacker
    * Akin to a malicious user of the system
* <span id="web-attacker"/> Web attacker 
    * Has access to their own domain and web server (with valid SSL certificate)
    * Akin to a malicious server
    * **Cannot** modify or view traffic to other sites

### Attackers as network attackers
Another category of attackers are network attackers.
They have access to the network between the user and the server.
They fall into the following categories:

* Passive network attacker
    * Can view, but cannot modify, traffic between users and server
    * Can additionally act as a [web attacker](#web-attacker)
* Active network attacker
    * Can view and modify network traffic
    * Can additionally act as a [web attacker](#web-attacker)
    * The most powerful threat model

## Attacks
### Misleading the user
#### Using similar URL
URL have a **host name** component and other components.
For example, the host name of `https://en.wikipedia.org/wiki/URL` is `en.wikipedia.org`, while `/wiki/URL` is the other components of the URL.

The host name dictates which server the browser will connect to.
Suppose that attacker were to send the following URL instead `https://en.wikipedia.org|wiki/URL`.
The host that the user will visit will be `https://en.wikipedia.org|wiki` instead, which the attacker can control.

##### Prevention
In modern browsers, the host name portion of the URL have a different font intensity than the other components.
This help users spot bogus website that are trying to impersonate legitimate sites.

#### Address bar spoofing
The address bar is an important facet in protecting the user by showing which site they are visiting.
However, address bar only indicates to the user what website they are visiting.
They do not explicitly protect the user from phishing sites.
A poorly designed browser may allow attackers to thwart these efforts.

In early browser design, popups are allowed to be rendered at any location of the screen.
This allowed attackers to overlay a spoofed address bar over the real address bar using a popup.

Even recently, [address bar spoofing vulnerabilities are being found on modern devices](https://cve.circl.lu/cve/CVE-2019-12278).

### Cross site scripting (XSS)
In many websites, users are allowed to submit content to the server, which it will then respond with HTML rendered with the content.

#### Reflected XSS
In this case, the user is allowed to submit content as part of the URL, often via **query parameters** 
For example, the URL `https://www.google.de/search?q=query_string` will render a Google search page with `query_string` in its search parameter.

Notice that this means that Javascript can be injected into the rendered HTML page via the query parameter, like so `https://www.google.de/search?q=<script>do_the_hacks()</script>`

Thus, the following is possible:
1. Attacker crafts a URL with the script as part of the query parameters
2. User clicks on malicious link
3. Server receives request and sends the HTML with the injected script
4. User's browser receives the HTML and renders it, executing the malicious code

Note that because the script is ran on the site that the user is currently visiting, the script has access to the site's cookies because it is in-scope.
Thus, attackers can have the script send the cookies to them, stealing the cookie.

#### Persistent XSS
Consider the case where user's content is stored on the server's database, and this content is rendered when someone requests it.
For example, users can make a post on Facebook, and it will appear for all other users whenever they visit the site.

This is more dangerous than reflected XSS because the attack can happen on the site's legitimate URL rather than needed a malicious URL.
Also, the attack has a larger impact because it can affect many users at once.

--- 

The basis of these attack exploits **the client's trust in the server**, where they believe all content in the HTML is from the server, and thus safe.


#### Defense
Most defense relies on mechanism on the server side.
A simple method is to employ sanitization of user input to escape any potentially malicious scripts.
Another method include declaring parts of the webpage forbidden to be ran with Javascript.

### Cross site request forgery (CSRF)
Suppose that actions on a website can be encoded as part of the URL.
For example, `www.bank.com/transfer?to=Bob&amount=100` will transfer \\$100 to Bob from the current user's account.

Now consider the following:
1. User is already logged in at `www.bank.com`.
2. Attacker sends a malicious link to the user, which is `www.bank.com?to=Attacker?amount=100`
3. User clicks the link and sends the request to the server
4. Because the user is already authenticated, the authentication cookie will be sent to the server as part of the request and the transaction will go through

Note that the attacker can perform the attacker without the user explicitly requesting the webpage.
For example, they can lure the victim into visiting their malicious website, which contains an image who's content is advertised to be at `www.bank.com/transfer?to=Bob&amount=100`.
This causes the browser to make the request to obtain the picture, thus triggering the transaction.

Thus, this attack exploits the flipside of the previous exploit, where it targets the **server's trust in the client**, where it believe every request is from the client.

#### Defense
To protect against this, server can employ a dynamic content to be issued and required at any of it's transaction sites.

For example, within the HTML form of `www.bank.com/transfer` will have an hidden field `CSRFToken` which contains a random value generated by the server.
When the request is send to the server, this information will be included as part of the form for server to verify.
Because the attacker cannot know the value of this random value without having access to the user's session, they are unable to forge a request which advertises the correct value to the server, thus the transaction will not succeed.