# Rules of choosing a good password
In this notebook, we will go through the rules in NIST Special Publication 800-63B which details what checks a verifier (what the NIST calls a second party responsible for storing and verifying passwords) should perform to make sure users don't pick bad passwords. We will go through the passwords of users from a fictional company and use R to flag the users with bad passwords. 

The NIST guidelines are:

         1. Verifiers SHALL require subscriber-chosen memorized secrets to be at least 8 characters in length
         2. Verifiers SHALL compare the prospective secrets against a list that contains values known to be commonly-used, expected, or compromised.
           a. Passwords obtained from previous breach corpuses.
           b. Dictionary words.
           c. Repetitive or sequential characters (e.g. ‘aaaaaa’, ‘1234abcd’).
           d. Context-specific words, such as the name of the service, the username, and derivatives thereof.

# Importing the tidyverse library


In [74]:
library(tidyverse)

# Loading the data and examining it


In [60]:
# Loading in datasets/users.csv 
users <-read_csv("datasets/users.csv")


# Counting how many users we've got
count(users)

# Taking a look at the 12 first users

head(users,12)

Parsed with column specification:
cols(
  id = [32mcol_double()[39m,
  user_name = [31mcol_character()[39m,
  password = [31mcol_character()[39m
)


n
<int>
982


id,user_name,password
<dbl>,<chr>,<chr>
1,vance.jennings,joobheco
2,consuelo.eaton,0869347314
3,mitchel.perkins,fabypotter
4,odessa.vaughan,aharney88
5,araceli.wilder,acecdn3000
6,shawn.harrington,5278049
7,evelyn.gay,master
8,noreen.hale,murphy
9,gladys.ward,lwsves2
10,brant.zimmerman,1190KAREN5572497


# Checking the length of passwords

According to NIST guidelines, passwords should not be shorter than 8 characters in length. We will now flag users with short passwords.


In [62]:
# Calculating the lengths of users' passwords
users <- users %>% 
         mutate(length = nchar(password))


# Flagging the users with too short passwords

users <- users %>% 
         mutate(too_short = if_else(length <8, TRUE, FALSE))

# Counting the number of users with too short passwords

users %>% count(too_short == TRUE)

# Taking a look at the 12 first rows

head(users, 12)

too_short == TRUE,n
<lgl>,<int>
False,606
True,376


id,user_name,password,length,too_short
<dbl>,<chr>,<chr>,<int>,<lgl>
1,vance.jennings,joobheco,8,False
2,consuelo.eaton,0869347314,10,False
3,mitchel.perkins,fabypotter,10,False
4,odessa.vaughan,aharney88,9,False
5,araceli.wilder,acecdn3000,10,False
6,shawn.harrington,5278049,7,True
7,evelyn.gay,master,6,True
8,noreen.hale,murphy,6,True
9,gladys.ward,lwsves2,7,True
10,brant.zimmerman,1190KAREN5572497,16,False


# Commonly used passwords
Already this simple rule flagged a couple of offenders among the first 12 users. Next we will check passwords obtained from previous breach corpuses, that is, websites where hackers have leaked all the users' passwords. As many websites don't follow the NIST guidelines and encrypt passwords there now exist large lists of the most popular passwords. Let's start by loading in the 10,000 most common passwords which taken from here.

In [75]:
# Reading in the top 10000 passwords
common_passwords <- read_tsv("datasets/10_million_password_list_top_10000.txt", col_names = FALSE)

# Taking a look at the top 100
head(common_passwords, 100)


Parsed with column specification:
cols(
  X1 = [31mcol_character()[39m
)


X1
<chr>
123456
password
12345678
qwerty
123456789
12345
1234
111111
1234567
dragon


# Passwords should not be common passwords

Let's flag all the passwords in our user database that are among the top 10,000 used passwords.

In [68]:
# Flagging the users with passwords that are common passwords

users <- users %>%
         mutate(common_password = password %in% common_passwords$X1)
# Counting the number of users using common passwords

users %>%count(common_password)

# Taking a look at the 12 first rows
head(users,12)

common_password,n
<lgl>,<int>
False,853
True,129


id,user_name,password,length,too_short,common_password
<dbl>,<chr>,<chr>,<int>,<lgl>,<lgl>
1,vance.jennings,joobheco,8,False,False
2,consuelo.eaton,0869347314,10,False,False
3,mitchel.perkins,fabypotter,10,False,False
4,odessa.vaughan,aharney88,9,False,False
5,araceli.wilder,acecdn3000,10,False,False
6,shawn.harrington,5278049,7,True,False
7,evelyn.gay,master,6,True,True
8,noreen.hale,murphy,6,True,True
9,gladys.ward,lwsves2,7,True,False
10,brant.zimmerman,1190KAREN5572497,16,False,False


# Passwords should not be common words
It turns out many of our users use common passwords, and of the first 12 users there are already two. However, as most common passwords also tend to be short, they were already flagged as being too short. 
Let's check our users' passwords against the top 10,000 English words from Google's Trillion Word Corpus.

In [69]:
# Reading in a list of the 10000 most common words
words <- read_tsv("datasets/google-10000-english.txt", col_name = FALSE)


# Flagging the users with passwords that are common words
users<- users %>% 
         mutate(common_word = password %in% words$X1)

                      

# Counting the number of users using common words as passwords
users %>% count(common_word)

# Taking a look at the 12 first rows
head(users,12)

Parsed with column specification:
cols(
  X1 = [31mcol_character()[39m
)


common_word,n
<lgl>,<int>
False,846
True,136


id,user_name,password,length,too_short,common_password,common_word
<dbl>,<chr>,<chr>,<int>,<lgl>,<lgl>,<lgl>
1,vance.jennings,joobheco,8,False,False,False
2,consuelo.eaton,0869347314,10,False,False,False
3,mitchel.perkins,fabypotter,10,False,False,False
4,odessa.vaughan,aharney88,9,False,False,False
5,araceli.wilder,acecdn3000,10,False,False,False
6,shawn.harrington,5278049,7,True,False,False
7,evelyn.gay,master,6,True,True,True
8,noreen.hale,murphy,6,True,True,True
9,gladys.ward,lwsves2,7,True,False,False
10,brant.zimmerman,1190KAREN5572497,16,False,False,False


# Passwords should not be name
Our users' usernames consist of their first names and last names separated by a dot. Let's flag passwords that are the same as either a user's first or last name

In [70]:
# Extracting first and last names into their own columns
users <- users %>%
         mutate(user_name2split = user_name)
users <- users %>% 
       separate(user_name2split, c("first_name","second_name"))


# Flagging the users with passwords that matches their names

users <- users %>% 
        mutate(uses_name = str_detect(password, first_name)  | str_detect(password, second_name))


# Counting the number of users using names as passwords
users %>% count(uses_name)

# Taking a look at the 12 first rows
head(users,12)

uses_name,n
<lgl>,<int>
False,932
True,50


id,user_name,password,length,too_short,common_password,common_word,first_name,second_name,uses_name
<dbl>,<chr>,<chr>,<int>,<lgl>,<lgl>,<lgl>,<chr>,<chr>,<lgl>
1,vance.jennings,joobheco,8,False,False,False,vance,jennings,False
2,consuelo.eaton,0869347314,10,False,False,False,consuelo,eaton,False
3,mitchel.perkins,fabypotter,10,False,False,False,mitchel,perkins,False
4,odessa.vaughan,aharney88,9,False,False,False,odessa,vaughan,False
5,araceli.wilder,acecdn3000,10,False,False,False,araceli,wilder,False
6,shawn.harrington,5278049,7,True,False,False,shawn,harrington,False
7,evelyn.gay,master,6,True,True,True,evelyn,gay,False
8,noreen.hale,murphy,6,True,True,True,noreen,hale,False
9,gladys.ward,lwsves2,7,True,False,False,gladys,ward,False
10,brant.zimmerman,1190KAREN5572497,16,False,False,False,brant,zimmerman,False


# Passwords should not be repititive
To check for repetitiveness can be arbitrarily complex, but here we're only going to do something simple. We're going to flag all passwords that contain 4 or more repeated characters.

In [72]:
# Splitting the passwords into vectors of single characters
split_passwords <- strsplit(as.character(users$password), "")

# Picking out the max number of repeat characters for each password

users$max_repeats <- sapply(split_passwords, function(split_password) {
    x <- split_password
    repeats<- character(length(x))
    for (i in 1:length(x)){ 
    repeats[i]<-sum(str_count(x, fixed(x[i])))
    }
   max_repeats <- max(repeats)
   return(max_repeats)
})

# Flagging the passwords with >= 4 repeats
users<- users %>% 
        mutate(too_many_repeats = if_else(max_repeats >=4, TRUE, FALSE))

# Taking a look at the users with too many repeats
users %>% filter(too_many_repeats)

id,user_name,password,length,too_short,common_password,common_word,first_name,second_name,uses_name,max_repeats,too_many_repeats
<dbl>,<chr>,<chr>,<int>,<lgl>,<lgl>,<lgl>,<chr>,<chr>,<lgl>,<chr>,<lgl>
39,gus.padilla,wwewwf1,7,True,False,False,gus,padilla,False,4,True
89,ira.cortez,mmommy22,8,False,False,False,ira,cortez,False,4,True
147,patti.dixon,555555,6,True,True,False,patti,dixon,False,6,True
181,pierre.hooper,10101971,8,False,False,False,pierre,hooper,False,4,True
193,trinidad.austin,012227257,9,False,False,False,trinidad,austin,False,4,True
200,tara.mckenzie,eddiestoocool,13,False,False,False,tara,mckenzie,False,4,True
286,lee.roman,alamaage678,11,False,False,False,lee,roman,False,4,True
329,elma.campos,ddpdtd3nii,10,False,False,False,elma,campos,False,4,True
365,emery.ruiz,888278,6,True,False,False,emery,ruiz,False,4,True
397,suzanne.nash,naivanayluismioyubona,21,False,False,False,suzanne,nash,False,4,True


# Flagging all bad passwords
Now we have implemented all the basic tests for bad passwords suggested by NIST Special Publication 800-63B! What's left is just to flag all bad passwords and maybe to send these users an e-mail that strongly suggests they change their password.

In [73]:
# Flagging all passwords that are bad
#users$bad_password <- ....
users <- users %>%
          mutate( bad_password = if_else(
            too_short == TRUE | common_password == TRUE | common_word == TRUE | uses_name == TRUE | too_many_repeats == TRUE, "yes", "no"
         ))
            
# Counting the number of bad passwords
users %>% count(bad_password)

# Looking at the first 100 bad passwords
bad_pwd_users <- users %>% filter(bad_password == "yes")
head(bad_pwd_users$password, 100)

bad_password,n
<chr>,<int>
no,538
yes,444


# Summary
In this notebook, we've implemented the password checks recommended by the NIST Special Publication 800-63B. It's certainly possible to better implement these checks, for example, by using a longer list of common passwords. Also note that the NIST checks in no way guarantee that a chosen password is good, just that it's not obviously bad.