# SQL Injection

In this notebook, we show a simple SQL injection example in which we inject and execute arbitrary SQL code in a DuckDB database. Afterwards, we discuss the state-of-the-art solution to prevent SQL injections.

The notebook is based on our analog notebook [SQL Injection](https://github.com/BigDataAnalyticsGroup/bigdataengineering/blob/master/SQL%20Injection.ipynb), that shows SQL Injection and in particular prepared statements in the context of a [PostgreSQL](https://www.postgresql.org/) database using the [`psycopg`](https://www.psycopg.org/psycopg3/docs/index.html) package for Python.

Copyright Joris Nix & Jens Dittrich, [Big Data Analytics Group](https://bigdata.uni-saarland.de/), [CC-BY-SA](https://creativecommons.org/licenses/by-sa/4.0/legalcode)

## Setup 

In [1]:
# dataset
!cat data/sql-injection/users.csv

id,username,pwd
0,immanuel,ThisIsImmi
1,joris,'bestpwintheworld.
2,kai,'secretstr1ng
3,felix,'gueswaht?
4,lukas,'youll_never_know
5,marcel,'s3cby0psc
6,' OR '1'='1,nopassword


In [2]:
import duckdb

conn = duckdb.connect(database=':memory:')

def init_db(conn):
    
    # Drop table if it exists
    conn.execute("DROP TABLE IF EXISTS users;")

    # Create accounts table
    conn.execute("""CREATE TABLE users (
                        id INTEGER PRIMARY KEY,
                        username VARCHAR(42) UNIQUE,
                        pwd char(128));""")
    
    conn.execute("COPY users FROM './data/sql-injection/users.csv' (FORMAT CSV, HEADER, DELIMITER ',')")
    
init_db(conn)

## SQL Injection Example

First, we define a function that queries the database for information about a user with the name `username`.

In [3]:
def execute_query(query):
    conn.execute(query) # execute query
    res = conn.fetchall() # fetch results
    display(res)
    
def get_user_info(username):
    statement = "SELECT * FROM users WHERE username = '" + username + "'";
    execute_query(statement)

We expect the user to enter something like this.

In [4]:
username = "marcel" 
    
get_user_info(username)

[(5, 'marcel', "'s3cby0psc")]

However, the user can also enter something like this, which should be a valid user in our system.

In [5]:
username = "' OR '1'='1"

This constructs the following query,
```
SELECT * FROM users WHERE username = '' OR '1'='1'
```
which results in a WHERE clause that is always true and therefore, returns the complete `users` table.

In [6]:
get_user_info(username) # prints the whole table content

[(0, 'immanuel', 'ThisIsImmi'),
 (1, 'joris', "'bestpwintheworld."),
 (2, 'kai', "'secretstr1ng"),
 (3, 'felix', "'gueswaht?"),
 (4, 'lukas', "'youll_never_know"),
 (5, 'marcel', "'s3cby0psc"),
 (6, "' OR '1'='1", 'nopassword')]

In [7]:
# or even this
username = "'; DROP TABLE users; SELECT 42 WHERE '42'='42"

This constructs the following query.
```
SELECT * FROM users WHERE username = 'bla'; DROP TABLE users;
SELECT 42 WHERE '42' = '42';
```
After executing this query, the table `users` and all its content is deleted from the database.

In [8]:
get_user_info(username)

[(42,)]

We can check that the `users` table was indeed deleted by trying to retrieve the tuples from the table afterwards, which results in a `UndefinedTable` exception.

In [9]:
statement = "SELECT * FROM users;"
try:
    execute_query(statement)
except Exception as e:
    print(e)

Catalog Error: Table with name users does not exist!
Did you mean "temp.information_schema.tables"?
LINE 1: SELECT * FROM users;
                      ^


# Prevent Attack using Prepared Statements

The underlying problem and the reason that makes SQL Injections possible is the mixing of code and data. Therefore, SQL Injections can be prevented by sending the data (user input) separately from the SQL code to the database server.

In DuckDB, this can be achieved by using so called [prepared statements](https://duckdb.org/docs/api/python/dbapi#querying). Prepared statements allow us to define parameters before executing a parameterized query. In the example from above the prepared statement would look as follows:
```SQL
PREPARE get_user_info AS
    SELECT *
    FROM users
    WHERE username=$1;
```
To run the prepared statement, we can use the `EXECUTE` statement.
```SQL
EXECUTE get_user_info(username);
```
This will make sure that when executing `get_user_info()` the entire user provided username is interpreted as string and the code and data are send separately to the database server.


Below, we define a secure function to query the database for user information using a prepared statement.

**Note:** Prepared statements are not only used to prevent SQL injections but also to optimize query execution by avoiding the re-compilation of the query (the query is cached) in case the same query (even with different parameters) is send multiple times to the database server. For more information, refer to [this](https://duckdb.org/docs/api/c/prepared) part of the DuckDB documentation.

In [10]:
def get_user_info_prepared(username):
    # we provide the user input as a parameter
    conn.execute("SELECT * FROM users WHERE username=?;", [username])
    res = conn.fetchall()
    display(res)

# initialize database again (`users` table previously deleted)
init_db(conn)

When executing the secure function using a valid username, we receive the expected output.

In [11]:
get_user_info_prepared("lukas")

[(4, 'lukas', "'youll_never_know")]

However, if we now try to get information for the user `' OR '1'='1` (or inject malicious code into the query), the username is interpreted as a string and therefore, the correct user information is returned (beforehand the whole table was returned).

In [12]:
get_user_info_prepared("' OR '1'='1")

[(6, "' OR '1'='1", 'nopassword')]

It is also not possible to delete the `users` table anymore. The malicious code `DROP TABLE users;` is **not** executed. The user `'; DROP TABLE users; SELECT 42 WHERE 42='42` does not exist in the table `users` and therefore, an empty result is returned.

In [13]:
get_user_info_prepared("'; DROP TABLE users; SELECT 42 WHERE 42='42")

[]

We can verify this by querying the `users` table afterwards.

In [14]:
execute_query("SELECT * FROM users;")

[(0, 'immanuel', 'ThisIsImmi'),
 (1, 'joris', "'bestpwintheworld."),
 (2, 'kai', "'secretstr1ng"),
 (3, 'felix', "'gueswaht?"),
 (4, 'lukas', "'youll_never_know"),
 (5, 'marcel', "'s3cby0psc"),
 (6, "' OR '1'='1", 'nopassword')]