# PBKDF2

Password Based Key Derivation Function 2.

Part of PKCS #5 v2.0 also published as [RFC-2898](https://tools.ietf.org/rfc/rfc2898.txt) and [NIST 800-132](https://csrc.nist.gov/publications/detail/sp/800-132/final)

Passwords are used to decrypt or encrypt data. However, passwords used by humans are not suitable to be used as encryption. The Password Based Key Derivation Function derives an encryption key from a user password. The PBKDF is essentially `key = repeat(10000, hash(password+salt))`

In [1]:
import math, hmac

In [2]:
# Function to XOR Byte string instead fo converting to integer and back.
def bytexor(b1, b2):
    # pre-pad the byte strings with \x00 to get both strings of equal length.
    l1 = len(b1)
    l2 = len(b2)
    if (l1 > l2):
        d = l1 - l2
        zp = b'\0x00' * d
        b2 = zp + b2
    else:
        d = l2 - l1
        zp = b'\x00' * d
        b1 = zp + b1
    l = len(b1)

    # Byte by byte XOR
    xor_result = b''
    for i in range(0, l):
        x = b1[i] ^ b2[i]
        xor_result = xor_result + x.to_bytes(1, byteorder='big')
    return xor_result

In [3]:
# Test for bytexor

# A = b'\x55\xAA'
# B = b'\xAA\x55'
# print(bytexor(A, B))

# KDF

The long master key is derived by concatenating several hashes. Each hash block ($T_1, T_2, T_3 ... T_{len}$) that contributes to the master key ($MK$) is a result of several iterations ($C$) of hashing the password with a salt. The salt for the first round is specified and is concatenated with the block number $i$. For subsequent rounds, the output of the pseudorandom function is used as the salt.

![Flow chart of PBKDF2](https://upload.wikimedia.org/wikipedia/commons/7/70/Pbkdf2_nist.png)

For a desired key length of 25 octets and using SHA1-HMAC as the pseudorandom function, the hash length of SHA1-HMAC is 20 octets (`hLen = 20`). This would result in `len = 2`.

$$
len = \big\lceil{\frac{dkLen}{hLen}}\big\rceil
$$

The master key is formed by concatenating $T_1 | T_2 | T_3 ... T_{len-1} | T_{len} \langle MSB octet : MSBoctet -r \rangle$

So, we take the first pseudorandom hash, $T_1$ as is and concatenate it with the 5 higher order octets of $T_2$ to form the master key.

In [4]:
def KDF(password, salt, iterC, dkLen):
    # Digest Length of the hashing function - 20 for SHA512
    hLen = 20

    # Throw an error if the key length requested is too long.
    if (dkLen > 4294967295 * hLen):
        raise ValueError('derived key too long')
        return None

    # Number of blocks (of hLen size hashes) in the derrived key.
    l = math.ceil(dkLen / hLen)
    print("Number of blocks for key length " + str(dkLen) + " = " + str(l))
    # Remainder - bits that the last processed blok contributes to the derived key
    r = dkLen - (l - 1) * hLen
    print("Octets included from the last block : " + str(r) + " bytes")

    # Derrived Key blocks list - T.
    T = [b''] * l
    # For each block,
    for i in range(0, l):
        print("i = " + str(i))
        T[i] = F(password, salt, iterC, i + 1)
    masterKey = ""
    if (l >= 2):
        masterKey = "".join([kp.hex() for kp in T[0:l - 1]])
    masterKey = masterKey + T[l - 1][0:r].hex()
    return masterKey

In [5]:
def F(password, salt, iterC, i):
    # Start with a blank key.
    K = b''
    # Salt for the first iteration is user_specified_salt | block#
    salt_in = salt + (i).to_bytes(length=4, byteorder='big')
    for c in range(0, iterC):
        PRF_out = PRF(password, salt_in)
        # Salt for next iteration is output of the pseudorandom function from the last round.
        salt_in = PRF_out
        #print("POUT:"+str(Pout))
        K = bytexor(K, PRF_out)
        print("   K : " + K.hex() + "\n")
    return K

In [6]:
# Pseudorandom Function - using SHA1-HMAC
def PRF(password, salt):
    #print("SALT: "+str(salt))
    H = hmac.new(password, salt, 'sha1')
    #print("LD:"+str(len(H.digest())))
    print("   PRF(" + str(password) + ", " + str(salt) + ") \n      = " + str(H.digest()))
    return H.digest()

# PBKDF 1

The difference between PBKDF1 and PBKDF2 are -

**Hash Function instead of pseudorandom function**
PBKDF1 used a hash function such as `MD2`, `MD5` or `SHA1` instead of the `PRF`. This was implemented as -

$T_0 = hash(password | salt)$

$T_1 = hash(password | T_0)$

$T_2 = hash(password | T_1)$

$...$

$T_{len} = hash(password | T_{len-1})$

PBKDF2 uses $T_n = PRF(password, T_{n-1})$. Which although may be functionally identical, is resistant to cryptanalysis.



**Derived Key length limited by hash size**
The derived key of the PBKDF1 was bound by the hash length of the function used - 20 for SHA1, 16 for MD2 and MD5.

## Test

In [7]:
# Self made test vector to demonstarte the steps of the PBKDF2.
MK_test=KDF(b'password',b'salt',4,45)

Number of blocks for key length 45 = 3
Octets included from the last block : 5 bytes
i = 0
   PRF(b'password', b'salt\x00\x00\x00\x01') 
      = b'\x0c`\xc8\x0f\x96\x1f\x0eq\xf3\xa9\xb5$\xaf`\x12\x06/\xe07\xa6'
   K : 0c60c80f961f0e71f3a9b524af6012062fe037a6

   PRF(b'password', b'\x0c`\xc8\x0f\x96\x1f\x0eq\xf3\xa9\xb5$\xaf`\x12\x06/\xe07\xa6') 
      = b'\xe6\x0c\xc9BQ2a\xfd>\xb7l\x0ea}S\xf6\xf7>\xbe\xf1'
   K : ea6c014dc72d6f8ccd1ed92ace1d41f0d8de8957

   PRF(b'password', b'\xe6\x0c\xc9BQ2a\xfd>\xb7l\x0ea}S\xf6\xf7>\xbe\xf1') 
      = b'\x81"\'_\x9b\x08\xa0\xadc+3\xf3\x9b\xe98\x1a\xf6\xaf\x7f\xa8'
   K : 6b4e26125c25cf21ae35ead955f479ea2e71f6ff

   PRF(b'password', b'\x81"\'_\x9b\x08\xa0\xadc+3\xf3\x9b\xe98\x1a\xf6\xaf\x7f\xa8') 
      = b'\xaf\x8c=\xe0\xe7\xd3\xda`\xee\xbb(}\xc9}\xc0,\xb1\x05\xc6\x99'
   K : c4c21bf2bbf61541408ec2a49c89b9c69f743066

i = 1
   PRF(b'password', b'salt\x00\x00\x00\x02') 
      = b'\xe0\xf0\xeb\x94\xfe\x8f\xc4k\xdccqd\xac.z\x8e?\x9d.\x83'
   K : e0f0eb94

In [8]:
print(MK_test)
print("Len = "+str(int(len(MK_test)/2)))

c4c21bf2bbf61541408ec2a49c89b9c69f743066f3c034d7a789a6922cdd362069e9a8c2b9171164f55ed77999
Len = 45


[RFC-6070](https://tools.ietf.org/rfc/rfc6070.txt) - Test vectors for PBKDF2.

In [9]:
MK1 = KDF(b'password', b'salt', 1, 20)

Number of blocks for key length 20 = 1
Octets included from the last block : 20 bytes
i = 0
   PRF(b'password', b'salt\x00\x00\x00\x01') 
      = b'\x0c`\xc8\x0f\x96\x1f\x0eq\xf3\xa9\xb5$\xaf`\x12\x06/\xe07\xa6'
   K : 0c60c80f961f0e71f3a9b524af6012062fe037a6



In [10]:
print(MK1)
print("Len = "+str(int(len(MK1)/2)))

0c60c80f961f0e71f3a9b524af6012062fe037a6
Len = 20


In [11]:
MK2 = KDF(b'password', b'salt', 2, 20)

Number of blocks for key length 20 = 1
Octets included from the last block : 20 bytes
i = 0
   PRF(b'password', b'salt\x00\x00\x00\x01') 
      = b'\x0c`\xc8\x0f\x96\x1f\x0eq\xf3\xa9\xb5$\xaf`\x12\x06/\xe07\xa6'
   K : 0c60c80f961f0e71f3a9b524af6012062fe037a6

   PRF(b'password', b'\x0c`\xc8\x0f\x96\x1f\x0eq\xf3\xa9\xb5$\xaf`\x12\x06/\xe07\xa6') 
      = b'\xe6\x0c\xc9BQ2a\xfd>\xb7l\x0ea}S\xf6\xf7>\xbe\xf1'
   K : ea6c014dc72d6f8ccd1ed92ace1d41f0d8de8957



In [12]:
print(MK2)
print("Len = "+str(int(len(MK2)/2)))

ea6c014dc72d6f8ccd1ed92ace1d41f0d8de8957
Len = 20


In [13]:
# MK3=KDF(b'password',b'salt',4096,20)

In [14]:
# print(MK3)
# print("Len = "+str(int(len(MK3)/2)))

In [15]:
# MK4=KDF(b'passwordPASSWORDpassword', b'saltSALTsaltSALTsaltSALTsaltSALTsalt', 4096, 25)

In [16]:
# print(MK4)
# print("Len = "+str(int(len(MK4)/2)))

In [17]:
# MK5=KDF(b'pass\x00word', b'sa\x00lt', 4096, 16)

In [18]:
# print(MK5)
# print("Len = "+str(int(len(MK5)/2)))

In [19]:
# MK6 = KDF(b'password', b'ATHENA.MIT.EDUraeburn', 1, 16)

In [20]:
# print(MK6)
# print("Len = "+str(int(len(MK6)/2)))

### Reference
1. [Wikipedia](https://en.wikipedia.org/wiki/PBKDF2)
2. [RFC-2898](https://tools.ietf.org/rfc/rfc2898.txt)
3. [NIST 800-132](https://csrc.nist.gov/publications/detail/sp/800-132/final)
4. [RFC-6070](https://tools.ietf.org/rfc/rfc6070.txt) - Test vectors for PBKDF2.