## # Introduction
<p><img src="https://i.imgur.com/kjWF1So.jpg" alt="Different characters on a computer screen"></p>
<p>According to a 2019 <a href="https://storage.googleapis.com/gweb-uniblog-publish-prod/documents/PasswordCheckup-HarrisPoll-InfographicFINAL.pdf">Google / Harris Poll</a>, 24% of Americans have used common passwords, like <code>abc123</code>, <code>Password</code>, and <code>Admin</code>. Even more concerning, 59% of Americans have incorporated personal information, such as their name or birthday, into their password. This makes it unsurprising that 4 in 10 Americans have had their personal information compromised online. Passwords with commonly used phrases and personal information makes cracking a password drastically easier.</p>
<p>You may have noticed over the years that password requirements have increased in complexity, including recommendations to change your passwords every couple of months. Compiled from industry recommendations, below is a list of passwords requirements you will be asked to test: </p>
<p><strong>Password Requirments:</strong></p>
<ol>
<li>Must be at least 10 characters in length</li>
<li>Must contain at least:<ul>
<li>one lower case letter </li>
<li>one upper case letter </li>
<li>one numeric character </li>
<li>one non-alphanumeric character</li></ul></li>
<li>Must not contain the phrase <code>password</code> (case insensitive)</li>
<li>Must not contain the user's first or last name, e.g., if the user's name is <code>John Smith</code>, then <code>SmItH876!</code> is not a valid password.</li>
</ol>
<p>Here is the dataset that you will investigate this project:</p>
<div style="background-color: #ebf4f7; color: #595959; text-align:left; vertical-align: middle; padding: 15px 25px 15px 25px; line-height: 1.6;">
    <div style="font-size:20px"><b>datasets/logins.csv</b></div>
Each row represents a login credential. There are no missing values and you can consider the dataset "clean".
<ul>
    <li><b>id:</b> the user's unique ID.</li>
    <li><b>username:</b> the username with the format {firstname}.{lastname}.</li>
    <li><b>password:</b> the password that may or may not meet the requirements. <i>Note, passwords should never be saved in plaintext, always encrypt them when working with real live passwords!</i></li>
</ul>
</div>
<p>Warning: This dataset contains some <strong>real</strong> passwords leaked from <strong>real</strong> websites. These passwords have been filtered, but may still include words that are explicit and offensive.</p>
<p>From here on out, it will be your task to explore and manipulate the existing data until you can answer the two questions described in the instructions panel. Feel free to import as many packages as you need to complete your task, and add cells as necessary. Finally, remember that you are only tested on your answer, not on the methods you use to arrive at the answer!</p>
<p><strong>Note:</strong> To complete this project, you need to know how to manipulate strings in pandas DataFrames and be familiar with regular expressions. Before starting this project we recommend that you have completed the following courses: <a href="https://learn.datacamp.com/courses/data-cleaning-in-python">Data Cleaning in Python</a> and <a href="https://learn.datacamp.com/courses/regular-expressions-in-python">Regular Expressions in Python</a>.</p>

In [1]:
import pandas as pd
df = pd.read_csv('datasets/logins.csv')
df.head(10)

Unnamed: 0,id,username,password
0,1,vance.jennings,vanceRules888!
1,2,consuelo.eaton,Mail_Pen%Scarlets.414
2,3,mitchel.perkins,Z00+1960
3,4,odessa.vaughan,D-rockyou
4,5,araceli.wilder,Araceli}r3
5,6,shawn.harrington,126_239_123
6,7,evelyn.gay,`4:&iAt$'o~(
7,8,noreen.hale,25941829163
8,9,gladys.ward,=Wj1`i)xYYZ
9,10,brant.zimmerman,L?4)OSB$r


In [2]:
df['first_name'] = df['username'].apply(lambda x: str(x).split('.')[0])
df['last_name'] = df['username'].apply(lambda x: str(x).split('.')[1])
df['pass_length'] = df['password'].apply(lambda x: len(str(x)))
df.head()

Unnamed: 0,id,username,password,first_name,last_name,pass_length
0,1,vance.jennings,vanceRules888!,vance,jennings,14
1,2,consuelo.eaton,Mail_Pen%Scarlets.414,consuelo,eaton,21
2,3,mitchel.perkins,Z00+1960,mitchel,perkins,8
3,4,odessa.vaughan,D-rockyou,odessa,vaughan,9
4,5,araceli.wilder,Araceli}r3,araceli,wilder,10


In [3]:
# check whether password contains "password" case insensitive
def extract_string_password(words):
    strings = []
    others = []
    for word in words:
        if word.islower() == True or word.isupper() == True:
            strings.append(word)
        else:
            others.append(word)
    result = ''.join(strings)
    if 'password' in result.lower():
        return True
    else:
        return False

In [4]:
# check whether password have atleast one lowercase, uppercase, numeric, alphanumeric or case insensitive "password"
def check_password(password):
    lower_case = []
    upper_case = []
    numeric = []
    alphanumeric = []
    
    if extract_string_password(password) == True:
        return True
    else:
        for word in password:
            if word.islower() == True:
                lower_case.append(word)
            elif word.isupper() == True:
                upper_case.append(word)
            elif word.isnumeric() == True:
                numeric.append(word)
            else:
                alphanumeric.append(word)
        if len(lower_case) >= 1 and len(upper_case) >= 1 and len(numeric) >= 1 and len(alphanumeric) >= 1:
            return True
        else:
            return False

In [5]:
df['pass_format'] = df['password'].apply(lambda x: check_password(str(x)))
df = df.drop('id', axis=1)
df

Unnamed: 0,username,password,first_name,last_name,pass_length,pass_format
0,vance.jennings,vanceRules888!,vance,jennings,14,True
1,consuelo.eaton,Mail_Pen%Scarlets.414,consuelo,eaton,21,True
2,mitchel.perkins,Z00+1960,mitchel,perkins,8,False
3,odessa.vaughan,D-rockyou,odessa,vaughan,9,False
4,araceli.wilder,Araceli}r3,araceli,wilder,10,True
...,...,...,...,...,...,...
977,autumn.alford,pink3602,autumn,alford,8,False
978,miriam.haynes,Gizzard.Muse+Patters_857,miriam,haynes,24,True
979,genaro.russo,Rm3OwUfobjYxq,genaro,russo,13,False
980,lora.quinn,bn#_k:},lora,quinn,7,False


In [6]:
bad_pws = pd.DataFrame(columns=['username','password','first_name','last_name','pass_length','pass_format'])
bad_pws

Unnamed: 0,username,password,first_name,last_name,pass_length,pass_format


In [7]:
def check_for_first_last_name(df, bad_pws):
    for i, rows in df.iterrows():
        if rows.first_name in rows.password.lower() or rows.last_name in rows.password.lower():
            df = df.drop(index=i)
            bad_pws = bad_pws.append(rows)
    return df, bad_pws

In [8]:
df, bad_pws = check_for_first_last_name(df, bad_pws)
bad_pws

Unnamed: 0,username,password,first_name,last_name,pass_length,pass_format
0,vance.jennings,vanceRules888!,vance,jennings,14,True
4,araceli.wilder,Araceli}r3,araceli,wilder,10,True
11,milford.hubbard,Milford<3Tom,milford,hubbard,12,True
25,dianna.munoz,munoZ_001,dianna,munoz,9,True
27,loretta.bass,%%%bass,loretta,bass,7,False
86,saundra.king,drekerKing,saundra,king,10,False
98,kimberly.lawson,robertloveskimberly<3,kimberly,lawson,21,False
140,ronald.brooks,P1G_bT”_zBrooks,ronald,brooks,15,True
149,raymundo.haley,HaleyComet333$,raymundo,haley,14,True
179,estelle.sexton,Tremarr&Estelle,estelle,sexton,15,False


In [9]:
correct_passwords = df[(df['pass_length'] >= 10) & (df['pass_format'] == True)]

In [10]:
bad_pass = round((len(df) - len(correct_passwords)) / len(df), 2)

In [11]:
bad_pwd_df = df[((df['pass_length'] >= 10) & (df['pass_format'] == False)) | ((df['pass_length'] < 10) & (df['pass_format'] == True)) | ((df['pass_length'] < 10) & (df['pass_format'] == False))]

In [12]:
email_list = pd.concat([bad_pwd_df, bad_pws], ignore_index=True)

In [13]:
email_list = email_list.sort_values(by="username").reset_index()
email_list = email_list.drop("index", axis=1)
email_list

Unnamed: 0,username,password,first_name,last_name,pass_length,pass_format
0,abdul.rowland,0Vg3f0'L,abdul,rowland,8,True
1,addie.cherry,ancan,addie,cherry,5,False
2,adele.moreno,76;%}C^3J,adele,moreno,9,False
3,adeline.bush,"ejcKv=""ITF",adeline,bush,10,False
4,adolfo.kane,6agy:loY!,adolfo,kane,9,True
...,...,...,...,...,...,...
730,yvette.whitfield,sentry31,yvette,whitfield,8,False
731,yvonne.munoz,+t$p[zZu,yvonne,munoz,8,False
732,zachary.huff,78momma,zachary,huff,7,False
733,zelma.abbott,N0~7}o2,zelma,abbott,7,True


In [14]:
email_list = pd.Series(email_list['username'])

In [15]:
# RATIO OF BAD PASSWORDS
bad_pass

0.74

In [16]:
# RATIO OF EMAIL LIST OF PEOPLE HAVING BAD PASSWORDS
email_list

0         abdul.rowland
1          addie.cherry
2          adele.moreno
3          adeline.bush
4           adolfo.kane
             ...       
730    yvette.whitfield
731        yvonne.munoz
732        zachary.huff
733        zelma.abbott
734       zelma.rosario
Name: username, Length: 735, dtype: object