# SQL Injection

In this notebook, we show a simple SQL injection example in which we inject and execute arbitrary SQL code in the database. Afterwards, we discuss the state-of-the-art solution to prevent SQL injections.

Copyright Joris Nix & Jens Dittrich, [Big Data Analytics Group](https://bigdata.uni-saarland.de/), [CC-BY-SA](https://creativecommons.org/licenses/by-sa/4.0/legalcode)

## Setup 

In [1]:
import psycopg

dsn = 'dbname=postgres user=postgres host=/var/run/postgresql/'  # host may be /tmp/ on other systems

uname_pw = [('immanuel','ThisIsImmi'),
            ('joris', 'bestpwintheworld.'),
            ('kai', 'secretstr1ng'),
            ('felix', 'gueswaht?'),
            ('lukas', 'youll_never_know'),
            ('marcel', 's3cby0psc'),
            ("' OR '1'='1", 'nopassword')]    

def init_db(dsn):
    with psycopg.connect(dsn) as conn:
        
        # Open a cursor to perform database operations
        with conn.cursor() as cur:

            # Drop table if existing
            cur.execute(query="DROP TABLE IF EXISTS users;")

            # Create accounts table
            cur.execute(query="""CREATE TABLE users
                                 (id SERIAL PRIMARY KEY,
                                 username varchar(42) UNIQUE,
                                 pwd char(128));""")

            # Insert sample data into accounts table
            cur.executemany("""INSERT INTO users(username, pwd) VALUES (%s, %s);""",
                            ((u, p) for u, p in uname_pw))

conn = psycopg.connect(dsn)
conn.autocommit = True

init_db(dsn)

## SQL Injection Example

First, we define a function that queries the database for information about a user with the name `username`.

In [2]:
def execute_query(query):
    cur = conn.cursor() # open cursor to perform db operation
    cur.execute(query) # execute query
    res = cur.fetchall() # fetch results
    print(res)
    
def get_user_info(username):
    statement = "SELECT * FROM users WHERE username = '" + username + "'";
    execute_query(statement)

We expect the user to enter something like this.

In [3]:
username = "marcel" 
    
get_user_info(username)

[(6, 'marcel', 's3cby0psc                                                                                                                       ')]


However, the user can also enter something like this, which should be a valid user in our system.

In [4]:
username = "' OR '1'='1"

This constructs the following query,
```
SELECT * FROM users WHERE username = '' OR '1'='1'
```
which results in a WHERE clause that is always true and therefore, returns the complete `users` table.

In [5]:
get_user_info(username) # prints the whole table content

[(1, 'immanuel', 'ThisIsImmi                                                                                                                      '), (2, 'joris', 'bestpwintheworld.                                                                                                               '), (3, 'kai', 'secretstr1ng                                                                                                                    '), (4, 'felix', 'gueswaht?                                                                                                                       '), (5, 'lukas', 'youll_never_know                                                                                                                '), (6, 'marcel', 's3cby0psc                                                                                                                       '), (7, "' OR '1'='1", 'nopassword                                                                                           

In [6]:
# or even this
username = "'; DROP TABLE users; SELECT 42 WHERE '42'='42"

This constructs the following query.
```
SELECT * FROM users WHERE username = 'bla'; DROP TABLE users;
SELECT 42 WHERE '42' = '42';
```
After executing this query, the table `users` and all its content is deleted from the database.

In [7]:
get_user_info(username)

[]


We can check that the `users` table was indeed deleted by trying to retrieve the tuples from the table afterwards, which results in a `UndefinedTable` exception.

In [8]:
statement = "SELECT * FROM users;"
try:
    execute_query(statement)
except Exception as e:
    print(e)

relation "users" does not exist
LINE 1: SELECT * FROM users;
                      ^


# Prevent Attack using Prepared Statements

The underlying problem and the reason that makes SQL Injections possible is the mixing of code and data. Therefore, SQL Injections can be prevented by sending the data (user input) separately from the SQL code to the database server.

In PostgreSQL, this can be achieved by using so called [prepared statements](https://www.postgresql.org/docs/13/sql-prepare.html). Prepared statements allow us to define parameters and their datatype before executing a parameterized query. In the example from above the prepared statement would look as follows:
```SQL
PREPARE get_user_info(text) AS
    SELECT *
    FROM users
    WHERE username=$1
EXECUTE get_user_info(username);
```
This will make sure that when executing `get_user_info()` the entire user provided username is interpreted as string and the code and data are send separately to the database server.


Below, we define a secure function to query the database for user information using a prepared statement.

In [9]:
def get_user_info_prepared(username):
    # open cursor to perform db operation
    cur = conn.cursor()
    # we provide the user input as a parameter and set the `prepare` flag to True
    cur.execute(query="SELECT * FROM users WHERE username=%s;", params=(username,), prepare=True)
    res = cur.fetchall()
    print(res)

# initialize database again (`users` table previously deleted)
init_db(dsn)

When executing the secure function using a valid username, we receive the expected output.

In [10]:
get_user_info_prepared("lukas")

[(5, 'lukas', 'youll_never_know                                                                                                                ')]


However, if we now try to get information for the user `' OR '1'='1` (or inject malicious code into the query), the username is interpreted as a string and therefore, the correct user information is returned (beforehand the whole table was returned).

In [11]:
get_user_info_prepared("' OR '1'='1")

[(7, "' OR '1'='1", 'nopassword                                                                                                                      ')]


It is also not possible to delete the `users` table anymore. The malicious code `DROP TABLE users;` is **not** executed. The user `'; DROP TABLE users; SELECT 42 WHERE 42='42` does not exist in the table `users` and therefore, an empty result is returned.

In [12]:
get_user_info_prepared("'; DROP TABLE users; SELECT 42 WHERE 42='42")

[]


We can verify this by querying the `users` table afterwards.

In [13]:
execute_query("SELECT * FROM users;")

[(1, 'immanuel', 'ThisIsImmi                                                                                                                      '), (2, 'joris', 'bestpwintheworld.                                                                                                               '), (3, 'kai', 'secretstr1ng                                                                                                                    '), (4, 'felix', 'gueswaht?                                                                                                                       '), (5, 'lukas', 'youll_never_know                                                                                                                '), (6, 'marcel', 's3cby0psc                                                                                                                       '), (7, "' OR '1'='1", 'nopassword                                                                                           

**Note:** The `psycopg` PostgreSQL database adapter does not use the SQL statements `PREPARE` and `EXECUTE` internally. Instead it uses ["protocol level commands such as the ones exposed by `PQsendPrepare`, `PQsendQueryPrepared`"](https://www.psycopg.org/psycopg3/docs/advanced/prepare.html).

## Excursion: psycopg Placeholder vs Prepared Statements

The definition of our `get_user_info_prepared()` query using prepared statements does not only set the `prepare` flag to `True` but also uses the `%s` placeholder in the query string to indicate that the parameter is a string.
This already separates the user input from the SQL code and prevents the SQL injection.

In [14]:
def get_user_info_placeholder(username):
    # open cursor to perform db operation
    cur = conn.cursor()
    # we provide the user input as a parameter and set the `prepare` flag to True
    cur.execute(query="SELECT * FROM users WHERE username=%s;", params=(username,), prepare=False)
    res = cur.fetchall()
    print(res)

Therefore, the following query correctly returns the information for user `' OR '1'='1`. Note, that we set the `prepare` flag to `False` in the above definition of the function `get_user_info_placeholder`.

In [15]:
get_user_info_placeholder("' OR '1'='1")

[(7, "' OR '1'='1", 'nopassword                                                                                                                      ')]


Prepared statements are not only used to prevent SQL injections but also to optimize query execution by avoiding the re-compilation of the query (the query is cached) in case the same query (even with different parameters) is send multiple times to the database server.

If we use the `psycopg` database adapter, it is sufficient to separate the input data from the SQL code by using placeholders. By setting the `prepare` flag to `True`, we can also increase efficieny by caching the plan.

**Note:** When using PostgreSQL natively, we have to use prepared statements (SQL keywords `PREPARE` and `EXECUTE` as shown before) to separate the input data from the SQL code.