_Example project tasks taken from:_ https://learn.datacamp.com/projects/analyzing_password_strength
# Bad passwords and the NIST guidelines

In [18]:
# To be able to run tests locally in the notebook, install the following:

# pip install nose
# pip install git+https://github.com/datacamp/ipython_nose

# You might need to change pip into pip3 depending on how you installed Python.

# Then load in the ipython_nose extension like this:
%load_ext ipython_nose

The ipython_nose extension is already loaded. To reload it, use:
  %reload_ext ipython_nose


# 1. The NIST Special Publication 800-63B

### Context 
If you – 50 years ago – needed to come up with a secret password you were probably part of a secret espionage organization or (more likely) you were pretending to be a spy when playing as a kid. Today, many of us are forced to come up with new passwords all the time when signing into sites and apps. As a password inventeur it is your responsibility to come up with good, hard-to-crack passwords. But it is also in the interest of sites and apps to make sure that you use good passwords. The problem is that it's really hard to define what makes a good password. However, the National Institute of Standards and Technology (NIST) knows what the second best thing is: To make sure you're at least not using a bad password.

In this notebook, we will go through the rules in NIST Special Publication 800-63B which details what checks a verifier (what the NIST calls a second party responsible for storing and verifying passwords) should perform to make sure users don't pick bad passwords. We will go through the passwords of users from a fictional company and use Python to flag the users with bad passwords. But us being able to do this already means the fictional company is breaking one of the rules of 800-63B:

    Verifiers SHALL store memorized secrets in a form that is resistant to offline attacks. Memorized secrets SHALL be salted and hashed using a suitable one-way key derivation function.

That is, never save users' passwords in plaintext, always encrypt the passwords! Keeping this in mind for the next time we're building a password management system, let's load in the data.

Warning: The list of passwords and the fictional user database both contain real passwords leaked from real websites. These passwords have not been filtered in any way and include words that are explicit, derogatory and offensive.

Load in and inspect the usernames and passwords of the fictional users.

    Load the pandas module.
    Load the user data from the file contained in the path datasets/users.csv and store it as a DataFrame called users.
    Print the number of rows (i.e. users).
    Show the first 12 rows in users.

### Solution Code

In [31]:
# Importing the pandas module
import pandas as pd

# Loading in datasets/users.csv 
users = pd.read_csv("datasets/users.csv")

# Printing out how many users we've got
print(len(users))

# Taking a look at the 12 first users
users.head(12)

982


Unnamed: 0,id,user_name,password
0,1,vance.jennings,joobheco
1,2,consuelo.eaton,0869347314
2,3,mitchel.perkins,fabypotter
3,4,odessa.vaughan,aharney88
4,5,araceli.wilder,acecdn3000
5,6,shawn.harrington,5278049
6,7,evelyn.gay,master
7,8,noreen.hale,murphy
8,9,gladys.ward,lwsves2
9,10,brant.zimmerman,1190KAREN5572497


### Tests

In [25]:
%%nose # needed at the start of every tests cell

import pandas as pd

def test_users_read_in_correctly():
    correct_users = pd.read_csv("datasets/users.csv")
    assert correct_users.equals(users), \
    '`users` should contain the data in "datasets/users.csv".'

1/1 tests passed


## 2. Passwords should not be too short

### Context

If we take a look at the first 12 users above we already see some bad passwords. But let's not get ahead of ourselves and start flagging passwords manually. What is the first thing we should check according to the NIST Special Publication 800-63B?

> Verifiers SHALL require subscriber-chosen memorized secrets to be at least 8 characters in length.

Ok, so the passwords of our users shouldn't be too short. Let's start by checking that!

Flag the passwords that are too short.

- Add the column length to users which should list the number of characters in each password.
- Flag the users with too short passwords by adding the column users['too_short'] which should be True when users['length'] is less than 8.
- Print the count of the number of users with passwords that are too short.
- Show the first 12 rows in users.

### Solution Code

In [26]:
# Calculating the lengths of users' passwords
users['length'] = users['password'].str.len() 

# Flagging the users with too short passwords
users['too_short'] = users['length'] < 8

# Counting and printing the number of users with too short passwords
print(users['too_short'].sum())

# Taking a look at the 12 first rows
users.head(12)

376


Unnamed: 0,id,user_name,password,length,too_short
0,1,vance.jennings,joobheco,8,False
1,2,consuelo.eaton,0869347314,10,False
2,3,mitchel.perkins,fabypotter,10,False
3,4,odessa.vaughan,aharney88,9,False
4,5,araceli.wilder,acecdn3000,10,False
5,6,shawn.harrington,5278049,7,True
6,7,evelyn.gay,master,6,True
7,8,noreen.hale,murphy,6,True
8,9,gladys.ward,lwsves2,7,True
9,10,brant.zimmerman,1190KAREN5572497,16,False


### Tests

In [27]:
%%nose # needed at the start of every tests cell

def test_length_sum_correct():
    assert (users['password'].str.len() < 8).sum() == users['too_short'].sum(), \
    "users['too_short'] should be a True/False column where all rows with passwords < 8 are True."

1/1 tests passed


## 3. Passwords should not be your name

### Context
It turns out many of our passwords were common English words too! Next up on the NIST list:

> Verifiers SHALL compare the prospective secrets against a list that contains [...] context-specific words, such as the name of the service, the username, and derivatives thereof.

Ok, so there are many things we could check here. One thing to notice is that our users' usernames consist of their first names and last names separated by a dot. For now, let's just flag passwords that are the same as either a user's first or last name.

Flag passwords that are the same as the users first or last name.

- Extract users first names from users['user_name'] into the new column users['first_name'].
- Similarly, extract last names into the new column users['last_name'].
- Add the column users['uses_name'] which should be True when a password is the same as each users' first or last name.
- Count and print the number of users using names as passwords.
- Show the first 12 rows in users.

### Solution Code

In [28]:
# Extracting first and last names into their own columns
users['first_name'] = users['user_name'].str.extract(r'(^\w+)', expand = False)
users['last_name'] = users['user_name'].str.extract(r'(\w+$)', expand = False)

# Flagging the users with passwords that matches their names
users['uses_name'] = (
    (users['password'].str.lower() == users['first_name']) |
    (users['password'].str.lower() == users['last_name']))

# Counting and printing the number of users using names as passwords
users['uses_name'].sum()

# Taking a look at the 12 first rows
users.head(12)

Unnamed: 0,id,user_name,password,length,too_short,first_name,last_name,uses_name
0,1,vance.jennings,joobheco,8,False,vance,jennings,False
1,2,consuelo.eaton,0869347314,10,False,consuelo,eaton,False
2,3,mitchel.perkins,fabypotter,10,False,mitchel,perkins,False
3,4,odessa.vaughan,aharney88,9,False,odessa,vaughan,False
4,5,araceli.wilder,acecdn3000,10,False,araceli,wilder,False
5,6,shawn.harrington,5278049,7,True,shawn,harrington,False
6,7,evelyn.gay,master,6,True,evelyn,gay,False
7,8,noreen.hale,murphy,6,True,noreen,hale,False
8,9,gladys.ward,lwsves2,7,True,gladys,ward,False
9,10,brant.zimmerman,1190KAREN5572497,16,False,brant,zimmerman,False


### Tests

In [29]:
%%nose

def test_not_same_as_name():
    correct_first_name = users['user_name'].str.extract(r'(^\w+)', expand = False)
    correct_last_name = users['user_name'].str.extract(r'(\w+$)', expand = False)

    # Flagging the users with passwords that matches their names
    correct_uses_name = (
        (users['password'].str.lower() == users['first_name']) |
        (users['password'].str.lower() == users['last_name']))
    
    assert correct_uses_name.sum() == users['uses_name'].sum(), \
    "users['uses_name'] should be True for each row with a password which is also the first or last name."

1/1 tests passed
