# Salt

In the case of data that tend to be similar for different users (for example, common used passwords), if the database with all hashes is compromised, an attacker could see if any of the hashes corresponds with a hash from a pre-computed table, examples of this are [Rainbow Attacks](https://en.wikipedia.org/wiki/Rainbow_table) and [Dictionary Attacks](https://en.wikipedia.org/wiki/Dictionary_attack).

If a random array of bytes is added to the data to be hashed, then the results will not match any pre-computed table attackers might have. This also implies the array of random bytes should be provided along with the hash in order to validate the hash is correct. These bytes are called **salt** and they are not secret (i.e. they can be store in plain text). Even though one might think that salts should be kept secret, there are other vulnerabilities associated with *secret salts*, such as [Length Extension Attacks](https://www.wikiwand.com/en/Length_extension_attack)

The hashlib module has the function [scrypt](https://docs.python.org/3/library/hashlib.html#hashlib.scrypt), which provides a convinient interface to add salts to hashes. This function is an implementation of the [RFC 7914](https://datatracker.ietf.org/doc/html/rfc7914.html), it was designed to be memory-intensive and prevent GPU, ASIC and FPGA attacks.

**Note:** Salt could be of any lenght, however a minimum of 128bits (10 bytes) is recommended.

In [128]:
import os

random_bytes = os.urandom(10)
data = b"Hello World!"
data_hashed = hashlib.scrypt(data, salt=random_bytes, n=64, r=8, p=1).hex()
salt_string = random_bytes.hex()

print(f"   Original: {data}")
print(f"     Hashed: {data_hashed}")
print(f"Salt+Hashed: {salt_string}:{data_hashed}")

   Original: b'Hello World!'
     Hashed: 74c846a532a6fabae3f3164072c3e1e2a0e6a0c98401377cdc9841f19d4e4722d5cff8a4fe2e5665f56e15ce3974a3006d6438fe060c21efcc41ef4a55b1c334
Salt+Hashed: a6dcbf98989d490ae241:74c846a532a6fabae3f3164072c3e1e2a0e6a0c98401377cdc9841f19d4e4722d5cff8a4fe2e5665f56e15ce3974a3006d6438fe060c21efcc41ef4a55b1c334


## Scrypt Disadvantages

The hashed string depends on several parameters, **n**, **r**, **p** and the **salt**. Unless they are hardcoded in the source code it is a good practice to store them along with the hash.

The delimiter character is usually either "**$**" or "**:**". Other options include using several columns in the database

In [150]:
import os

random_bytes = os.urandom(10)
data = b"Hello World!"

n = 2 ** 6
r = 8
p = 1
data_hashed = hashlib.scrypt(data, salt=random_bytes, n=n, r=r, p=p).hex()
salt_string = random_bytes.hex()

print(f"n+r+pSalt+Hashed: {n}${r}${p}${salt_string}${data_hashed}")

n+r+pSalt+Hashed: 64$8$1$e05d9ea8cf8acf9243d0$22c4c4e3a1cfa785989d29fea1a5d01dc860172f09cf04c57b8096246a72076e593691b83bb9b25a3005077988bfcc67ce1616ea22751ba5275ab67d2a1e6ca6


## Scrypt Bonuses

Using the scrypt function also provides some additional advantages

### Customizing Length

There is an additional parameter, **dklen**, which allows to generate arbitrary long passwords, the default value is 64. 

**Important Note:** Using short lenghts will increase collision risks

In [129]:
import os

data = b"Hello World!"

In [136]:
lenght = 2**2
random_bytes = os.urandom(10)

data_hashed = hashlib.scrypt(data, salt=random_bytes, n=64, r=8, p=1, dklen=lenght).hex()
salt_string = random_bytes.hex()

print(f"Salt+Hashed: {salt_string}:{data_hashed}")

Salt+Hashed: 6fab1a0f0dbf318b8734:d76b29b0


In [137]:
lenght = 2**4
random_bytes = os.urandom(10)

data_hashed = hashlib.scrypt(data, salt=random_bytes, n=64, r=8, p=1, dklen=lenght).hex()
salt_string = random_bytes.hex()

print(f"Salt+Hashed: {salt_string}:{data_hashed}")

Salt+Hashed: 4468a52d7b848e964a8d:1087eb055739f2aa51a5a6429c3fbd13


In [138]:
lenght = 2**6
random_bytes = os.urandom(10)

data_hashed = hashlib.scrypt(data, salt=random_bytes, n=64, r=8, p=1, dklen=lenght).hex()
salt_string = random_bytes.hex()

print(f"Salt+Hashed: {salt_string}:{data_hashed}")

Salt+Hashed: 0572bf6d3bde90ae8d2e:f4ed107ddb7343a36b9a686554631339b10e6a10d7f90c491dc4193261002ca0292e8834ab904dcea9fa87e9221f03215505484f562606787a06b7e3153a69a6


### Customizing Execution Time

In some application making the hash slower can increase the security, the scrypt algorithm can be customized to take different amounts of memory and processing time.

The memory used will be equal to: 128 * n * r * p bytes

Examples:

- Low Memory Footprint: 128 * 64 * 8 * 1 = 64 KB
- Large Memory Footprint: 128 * 2^17 * 8 * 1 = 128 MB

In [110]:
n = 2 ** 20
r = 8
p = 1
memory_bytes = 128 * n * r * p
memory_kilo_bytes = memory_bytes / 1024
memory_mega_bytes = memory_kilo_bytes / 1024
print(f"Memory Consumed: {memory_bytes} bytes = {memory_kilo_bytes:.2f} KB = {memory_mega_bytes:.2f} MB")

Memory Consumed: 1073741824 bytes = 1048576.00 KB = 1024.00 MB


##### Code Examples

In [99]:
import os

data = b"Hello World!"

In [105]:
%%timeit
n = 2**6  # 64
random_bytes = os.urandom(16)
data_hashed = hashlib.scrypt(data, salt=random_bytes, n=n, r=8, p=1).hex()

419 µs ± 20.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [101]:
%%timeit
n = 2**12
random_bytes = os.urandom(16)
data_hashed = hashlib.scrypt(data, salt=random_bytes, n=n, r=8, p=1).hex()

6.18 ms ± 166 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [120]:
%%timeit
n = 2**14
random_bytes = os.urandom(16)
data_hashed = hashlib.scrypt(data, salt=random_bytes, n=n, r=8, p=1).hex()

104 ms ± 2.67 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


## Example: User Authentication

### Auxiliary Functions

In [182]:
import os
import secrets

def generate_hash(data:str, salt: bytes) -> str:
    data_bytes = data.encode("utf-8")
    data_hashed = hashlib.scrypt(data_bytes, salt=random_bytes, n=64, r=8, p=1)
    return f"{salt.hex()}:{data_hashed.hex()}"


def sign_up(email, password, database_):
    database = database_.copy()
    random_bytes = os.urandom(10)
    database[email] = generate_hash(password, random_bytes)
    print("Successfully Singed Up")
    return database


def login(email, password, database):
    if email not in database:
        print(f"ERROR: User {email} not in Database")
        return

    expected_password = database[email]
    salt, hashed = expected_password.split(":")
    salt_bytes = bytes.fromhex(salt)
    calculated_hash = generate_hash(password, salt_bytes)
    passwords_matched = secrets.compare_digest(expected_password, calculated_hash) 
    if passwords_matched:
        print(f"Successfully Signed in: {email}")
        return
    
    print(f"ERROR: Incorrect Password for: {email}")

### Sign Up

In [183]:
email = "johndoe@example.com"
password = "password123"
user_database = {}

user_database = sign_up(email, password, user_database)

Successfully Singed Up


### Wrong Email

In [184]:
email = "janedoe@example.com"
password = "password123"

login(email, password, user_database)

ERROR: User janedoe@example.com not in Database


### Wrong Password

In [185]:
email = "johndoe@example.com"
password = "password"

login(email, password, user_database)

ERROR: Incorrect Password for: johndoe@example.com


### Successful Login

In [186]:
email = "johndoe@example.com"
password = "password123"

login(email, password, user_database)

Successfully Signed in: johndoe@example.com
