# Password cracking exercise

**READ FIRST!**

Before you run any cells, switch to GPU runtime.

Runtime -> Change Runtime type -> GPU -> Save

## Set up

In [None]:
cd /content/

In [None]:
!git clone https://gitlab.com/CBDS/estp_2021/password_cracking.git

In [None]:
cd password_cracking/

In [None]:
!./setup.sh

In [None]:
!$(pwd)/john-1.9.0-jumbo-1/run/john --list=opencl-devices

Make sure the output of the above cell contains `Device type: GPU`. Otherwise the following calculations will take _much_ longer. If it doesn't run on GPU, please re-check that you have selected the GPU runtime.

## Crack passwords

Run the following cell for **5 mins** and then cancel it by clicking on the square button on its left. This will use a generic standard approach to find passwords:


1.   Try a list of most-used passwords
2.   Try random characters

Note that variations like appending numbers or replacing characters (changing O to 0, 1 to !, S to $) can be done automatically. The output will contain cracked passwords and their username like so:
```
password    (username)
```




In [None]:
!$(pwd)/john-1.9.0-jumbo-1/run/john --format=opencl passwd.txt

Next we try the **rockyou.txt** wordlist. The company RockYou used to store user passwords unencrypted, got hacked and had the passwords leaked, see https://en.wikipedia.org/wiki/RockYou#Data_breach. While bad for the users, this incident gave researchers valuable insight in real-world password usage.

In [None]:
!$(pwd)/john-1.9.0-jumbo-1/run/john --wordlist=word_files/rockyou.txt --format=opencl passwd.txt

Let's try some more known passwords. The synthetic dataset **email_phished.txt** is a made-up list of passwords that could have been phished via email. See: https://en.wikipedia.org/wiki/Phishing

In [None]:
!$(pwd)/john-1.9.0-jumbo-1/run/john --wordlist=word_files/email_phished.txt --format=opencl passwd.txt

Re-using passwords is very, *very* bad practise. Is any of our users re-using passwords that were leaked from this (fictive) webshop?

In [None]:
!$(pwd)/john-1.9.0-jumbo-1/run/john --wordlist=word_files/webshop_leak.txt --format=opencl passwd.txt

Some people recommend using a random combination of words as passwords. Is this good advice?

During the setup at the beginning of the notebook, we downloaded a dictionary of the 10000 most used words in the English language - **google-10000-english.txt**. For demonstrational purposes, we shortened that list to 1500 words and 100 words, respectively. The **combine.py** script concatenates N different words from a given word file. We pipe these into the password cracking program.

Let's start with two words from the top 1500 list.

In [None]:
!./combine.py 2 word_files/google-1500-english.txt | $(pwd)/john-1.9.0-jumbo-1/run/john --format=opencl -stdin passwd.txt

And combining four words from the top 100 list.

In [None]:
!./combine.py 4 word_files/google-100-english.txt | $(pwd)/john-1.9.0-jumbo-1/run/john --format=opencl -stdin passwd.txt

We can also check for specific patterns, like using numbers only. This drastically shrinks the parameter space we have to try. Cancel this cell after 5 mins or one match.

In [None]:
!$(pwd)/john-1.9.0-jumbo-1/run/john --incremental:digits --format=opencl passwd.txt

Once we run out of ideas, we can brute-force passwords by trying a certain number of completely random characters. We start with three. Write down the runtime of this cell and compare it to the following ones. How much longer does it take once we add one more character?

You will realize that completely random characters are significantly harder to crack than the passwords we found out above.

In [None]:
!$(pwd)/john-1.9.0-jumbo-1/run/john --incremental:Short3 --format=opencl passwd.txt

Next up is four characters...

In [None]:
!$(pwd)/john-1.9.0-jumbo-1/run/john --incremental:Short4 --format=opencl passwd.txt

... and five characters. After you find one password with this method, cancel the run by clicking on the square button on the left of that cell. Warning: this might take more than an hour. Grab a coffee, work on something else in the meantime. Check back here every 15 mins.

In [None]:
!$(pwd)/john-1.9.0-jumbo-1/run/john --incremental:Short5 --format=opencl passwd.txt

Six characters is next. The runtime might terminate before you can find a password hash. Free, online GPU time has it's limits. Feel free to skip this cell or try to crack it locally on your own hardware.

In [None]:
!$(pwd)/john-1.9.0-jumbo-1/run/john --incremental:Short6 --format=opencl passwd.txt

Cracking seven random characters will probably not work in this environment anymore. Give it a shot if you feel adventurous and have a lot of time on your hands, or just believe me it would take a long time in this online notebook. For more powerful hardware, however, this will be a matter of seconds.

In [None]:
!$(pwd)/john-1.9.0-jumbo-1/run/john --incremental:Short7 --format=opencl passwd.txt

Let's review which user's passwords we could crack easily on (as good as) no-cost hardware. Check above with which approach you cracked these. What mistakes were made? What would you advise to change?

In [None]:
!$(pwd)/john-1.9.0-jumbo-1/run/john --show passwd.txt

Check out this [password cracking chart](https://i.imgur.com/e3mGIFY.png) made by Reddit user *u/HelmedHorror*. It gives a good impression of password length vs. effort to crack. This Google notebook can manage a few 100,000,000 password hasher per second, using the GPU runtime. Serious attackers have hardware many orders of magnitude faster at their disposal. And as time goes by, this hardware will get more affordable and more powerful.