### **CSC 369 Project: Distributed Password Cracking**

 Goal: Using Ray create a more efficient system for cracking passwords from a data breach using a list of common passwords.

Run this block to ensure libraries needed are present.

In [2]:
!pip install ray
!pip install bcrypt

Collecting ray
  Downloading ray-1.9.0-cp37-cp37m-manylinux2014_x86_64.whl (57.6 MB)
[K     |████████████████████████████████| 57.6 MB 1.5 MB/s 
Collecting redis>=3.5.0
  Downloading redis-4.0.2-py3-none-any.whl (119 kB)
[K     |████████████████████████████████| 119 kB 53.6 MB/s 
Collecting deprecated
  Downloading Deprecated-1.2.13-py2.py3-none-any.whl (9.6 kB)
Installing collected packages: deprecated, redis, ray
Successfully installed deprecated-1.2.13 ray-1.9.0 redis-4.0.2
Collecting bcrypt
  Downloading bcrypt-3.2.0-cp36-abi3-manylinux2010_x86_64.whl (63 kB)
[K     |████████████████████████████████| 63 kB 2.6 MB/s 
Installing collected packages: bcrypt
Successfully installed bcrypt-3.2.0


Mount Google Drive (Where the data file is stored)

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


Import libraries.  

In [4]:
import time
import bcrypt
import ray

Two identical functions, one using the ray remote decorator and one that doesn't.

In [5]:
@ray.remote
def hash(password):
    # generate the salt
    salt = bcrypt.gensalt(8)
    # hash the password with the given salt
    hash = bcrypt.hashpw(password.encode(), salt)
    # return a tuple with the hash and the password
    return (hash,password)

# same logic as above
def test_hash(password):
    salt = bcrypt.gensalt(8)
    hash = bcrypt.hashpw(password.encode(), salt)
    return (hash,password)

Opens the data file and then creates then hashes each password twice. Goes through once with the assistance of Ray and then does it again sequentially.

In [6]:
# open the data file for reading from my google drive
password_file = open('/content/drive/My Drive/passwords.txt', 'r') 

# call just incase ray is already running
ray.shutdown()

# start up ray
ray.init()

# list of each line in the data file (each line is a password) (100000 passwords)
lines = [i.strip() for i in password_file.readlines()]

# start of experiment with Ray
# get start time
start = time.time()
# call the remote function for each line in the dataset
futures = [hash.remote(i) for i in lines]
# get a list of all of the hashes and their corresponding password
hashes = ray.get(futures)
# get end time
end = time.time()
# print out the first 10 results to verify that it worked
print(hashes[:10])
# print the total time ray took to run
print('RAY TIME: ', end-start, ' seconds.')

# get start time
start = time.time()
# calculate all of the hashes sequentially
futures = [test_hash(i) for i in lines]
# get end time
end = time.time()
# print out the first 10 results to verify that it worked
print(futures[:10])
# print the total time it took to run without ray
print('NORMAL TIME: ', end-start, ' seconds.')

[2m[36m(hash pid=371)[0m 
[2m[36m(hash pid=372)[0m 
[2m[36m(hash pid=372)[0m 


[2m[36m(hash pid=371)[0m 


[2m[36m(hash pid=372)[0m 
[2m[36m(hash pid=371)[0m 


[2m[36m(hash pid=372)[0m 


[2m[36m(hash pid=371)[0m 
[2m[36m(hash pid=371)[0m 
[2m[36m(hash pid=371)[0m 
[2m[36m(hash pid=372)[0m 
[2m[36m(hash pid=372)[0m 
[2m[36m(hash pid=372)[0m 
[2m[36m(hash pid=371)[0m 
[2m[36m(hash pid=372)[0m 


[2m[36m(hash pid=372)[0m 


[2m[36m(hash pid=372)[0m 
[2m[36m(hash pid=371)[0m 
[2m[36m(hash pid=371)[0m 
[2m[36m(hash pid=371)[0m 
[2m[36m(hash pid=372)[0m 
[(b'$2b$08$vZXU2ynDsLKfCR4U4QQNY.Shru8h/3/0ZWv8n0AZm.urRoSVQj45y', '123456'), (b'$2b$08$veCPNpugv3iVKzDtMmrc6OYPVwyr5Nx5MFEtEOWTu2t/1Q.HESt2y', 'password'), (b'$2b$08$qqV1Oz5GKzc/MpooGn4ZwuVS8XbFed03QsQKjMAHni10.mP7Ga7Ci', '12345678'), (b'$2b$08$u9mMcR88j.CYhSJQhE9FeO5j17mtWWcUTXTC.FdzOrOZBydrKFkti', 'qwerty'), (b'$2b$08$/MvV4QsW8Nw0pFDvrg7SYO3XKt.JnnvDoViz7D5DB25MUAAepNjB.', '123456789'), (b'$2b$08$9YvbCeFpxdXMMf.LH1mcBenJ3oGoVq2SGOeV0H3ipZ1cQCx4keQR2', '12345'), (b'$2b$08$c42LXhvtQU4Np9hkr3jRE.9q/L1Dy4uT6FTs16O1HCQJ0kYjTfde6', '1234'), (b'$2b$08$y0JPvYTW4EjlWQriWH8aL..Epqa2LKyDpl/39x3a.32JoY0yc0SZC', '111111'), (b'$2b$08$lCpz.NtBwc2O72WBf4MKUe8XKpv9FMILEbvhMhIpg3ygKb3BApLLu', '1234567'), (b'$2b$08$whucSrC7fst1EAN84nNSfOq2xmHXmf7jL2PtQAHPmRt7GQeqx0GmK', 'dragon')]
RAY TIME:  1182.7197754383087  seconds.
[(b'$2b$08$ww5KFHHThrAIaKZs1n4tOOEYjojc

Creates a dictionary that contains a mapping from a hash (made by Ray) and then the original value that was put into the hashing function.

In [8]:
# also called rainbow table, stores a mapping from a given hash to the word that created it
hash_dict = {}

# maps each hash to its corresponding password
for hash, password in hashes:
  hash_dict[hash] = password

# verify thatt a known hash will bring up the password
print(hash_dict[b'$2b$08$vZXU2ynDsLKfCR4U4QQNY.Shru8h/3/0ZWv8n0AZm.urRoSVQj45y'])

123456


**How do websites store passwords?**  
Normally only a username and the hashed value of a user's password is stored. The website should never store the plaintext password. This is done because if there is a data breach it is impossible to get the original value from a hash. They are one-way functions. However, the main strategy to cracking passwords from a given data breach is to use a 'rainbow table' that contains a mapping from hashes to what created them. This is used with the most common passwords to help attackers find the plaintext version of the hash.    
  
  Below, is a simple example that shows how easy it is to recover passwords if the website stores the plaintext password. 

In [23]:
import random
import string

# simple data breach example
simple_data_breach = []
# arbitrary username length
length = 4

# create the 'data breach'
for i in range(1000000):
  simple_data_breach.append((''.join((random.choice(string.ascii_lowercase) for x in range(length))),lines[random.randrange(0,len(lines))]))

# see that it simply contains usernames and passwords, no need to use ray
print(simple_data_breach[:10])

[('yxvo', 'doggy2'), ('difd', 'rattlesn'), ('pyqq', '1232123'), ('dagk', '17031972'), ('hqhj', 'seadoo96'), ('eges', '03031978'), ('dwaa', '198585'), ('xgim', 'westbrom'), ('brhs', 'rezeda'), ('rnwe', '260855')]


This is a more realistic example of what an attacker would get in a data breach on a website with decent security.

In [27]:
# more realistic data breach example
data_breach = []
# arbitrary username length
length = 4

# create the 'data breach'
for i in range(1000000):
  data_breach.append((''.join((random.choice(string.ascii_lowercase) for x in range(length))),hashes[random.randrange(0,len(hashes))][0]))

# what we get is a tuple with the username and the hashed password
print(data_breach[:10])

[('yrem', b'$2b$08$gaJzFoUeDQ5QSR6KG/0/p.xPKtNZadWPleZZb/qVKldrNa7bG0hUe'), ('qije', b'$2b$08$Wvxu.BaQ95OLSYIo8B8GQe4WwwpGEyPiEI1SJ1ydtDcAvkRIQRTj6'), ('jqlp', b'$2b$08$YXBvmf9uCE82aek2ubw54.Z/HM3cSOj0R4uqBVa2hEmZyhGMM2wDi'), ('lxww', b'$2b$08$ht3mnRBuPYSGfqYADqw3sOxbR693Xs23qz6fowhvpTTRtYekecQC6'), ('paic', b'$2b$08$WO7Rqmo73rocBoV/x3S2ouAhnttysZNolbJNUEhj7U/ndValOXul.'), ('ykxs', b'$2b$08$XAc6PyWm9.DbZqvQrse9EeDeSPeGTSmDwm8zs.q0z23wUoOuKDokG'), ('xxto', b'$2b$08$fIaaEn0znljLWMkBlGu5m.SJq/yL7oEJ6po.tBBIlAwcYAl/i73XC'), ('nsua', b'$2b$08$dlSrEReuPtGE9mWvTr/VR.5ADb4cW7Azxp/ZhybxaKc4UrKYF7sFK'), ('vfpe', b'$2b$08$QV1Bnk8iuNLb03e2rGprXeJKP5JS1iCzHjfh71Tpb3xC/kccv/.Nm'), ('slhw', b'$2b$08$5B5Pvl34yK3xQtEKejAPpeJ9NUqLAPdFEOV45bIqLKp.rCFl.F.Ya')]


Using the rainbow table and the above data breach I was able to demonstrate how to recover plaintext passwords from hashes.

In [32]:
# used for mapping usernames to cracked passwords
cracked_passwords = {}

# go through and find the correct mappings
for username, hashed_password in data_breach:
  cracked_passwords[username] = hash_dict[hashed_password]

# see plaintext passwords for some sample users
print(cracked_passwords['yrem'])
print(cracked_passwords['qije'])
print(cracked_passwords['jqlp'])
print(cracked_passwords['lxww'])

swinger
24121968
030902
qwerty777
