## PASSWORD STRENGTH CHECKER

To create an **application to check the strength of passwords**, we need to have a labelled dataset about different combinations of letters and symbols. I found a dataset on Kaggle to train a machine learning model to predict the strength of a password.

In [1]:
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

data = pd.read_csv("C:/Users/asus/OneDrive/Desktop/ML_Datasets/project/Ml_models/data.csv")
data.head()

Unnamed: 0,password,strength
0,kzde5577,1
1,kino3434,1
2,visi7k1yr,1
3,megzy123,1
4,lamborghin1,1


#### Data Cleaning

In [2]:
data.strength.unique()

array(['1', '2', '0', 'anakonda_252@hotmail.com', 'destek@migmedya.com',
       'elitebank44@gmail.com', 'memleketim.info@gmail.com',
       'canersastim@gmail.com', 'arifselim.ask@gmail.com',
       'octoberwind@mynet.com', 'sado_370@hotmail.com',
       'djexploit@gmail.com', 'bursabursa2@hotmail.com',
       'info@kayimoglu.com', 'drselimcelik@gmail.com',
       'bykudelfa@hotmail.com', 'kralkotsk@mynet.com',
       'mnyk12@hotmail.com', 'elitebilgisayar26@gmail.com',
       'elumre@gmail.com', 'selim.ozmen07@hotmail.com',
       'selimkaratas@windowslive.com', 'crazy_boeing@yahoo.com',
       'imkbticaretmeslek@gmail.com', 'mrcds.grmnt@yahoo.com',
       'mert_kaya09@hotmail.com', 'beratmisimi1@gmail.com',
       'cousto@live.com', 'boystreet04@gmail.com',
       'hackerbbqueen15@hotmail.com', 'brkyc3@hotmail.com',
       'bursa322@hotmail.com', 'selimcebecioglu@gmail.com',
       'ooyrwdiovouxa@hotmail.com', 'antit0xin@hotmail.com',
       'kaderimse.net@hotmail.com', 'cnraktas@ho

In [3]:
#Replace values based on condition 0,1 and 2 as password strength
data.loc[data['strength'].str.contains('@gmail.com'), 'strength'] = '0'
data.loc[data['strength'].str.contains('@hotmail.com'), 'strength'] = '1'
data.loc[data['strength'].str.contains('@mynet.com'), 'strength'] = '2'
data.strength.unique()

array(['1', '2', '0', 'destek@migmedya.com', 'info@kayimoglu.com',
       'selimkaratas@windowslive.com', 'crazy_boeing@yahoo.com',
       'mrcds.grmnt@yahoo.com', 'cousto@live.com', 'selim2@live.nl',
       'selimest1@yahoo.fr', 'sd66@live.com', 'bykudelfa@windowslive.com',
       'selimiii91@googlemail.com', 'asassa.asdsadsads@mail.ru',
       'byscg@live.com', 'fatal09@live.de', 'manisagfb@windowslive.com',
       'aselimulkatan@windowslive.com', 'auetetee@narod.ru',
       'metaren@yandex.com', 'selimjel@icloud.com',
       'iletisim@selimaltin.com', 'selim1@live.be',
       'dyinglast@outlook.com', 'mehmeteminyakit@tnctr.com',
       'sakaryam5454@yandex.com.tr', 'selimmsn@msn.com',
       'selim199@abv.bg', 'groteselim@msn.com', 'odmosegtnss@ovi.com',
       'alpha_omeg@rocketmail.com', 'iletisim@oderece.net',
       'kikushi@live.com', 'limmy007@live.com', 'hackershift@live.com',
       'selimgabsi@yahoo.fr', 'ftp42@gmx.net', 'basogluselim@yahoo.com',
       'redcap12@yahoo.com'

- Considering data with values 0,1 and 2 only.

In [4]:
valid_values = ['0', '1', '2']
df = data[data['strength'].isin(valid_values)]
df = df.reset_index(drop=True)
df

Unnamed: 0,password,strength
0,kzde5577,1
1,kino3434,1
2,visi7k1yr,1
3,megzy123,1
4,lamborghin1,1
...,...,...
669823,10redtux10,1
669824,infrared1,1
669825,184520socram,1
669826,marken22a,1


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 669828 entries, 0 to 669827
Data columns (total 2 columns):
 #   Column    Non-Null Count   Dtype 
---  ------    --------------   ----- 
 0   password  669827 non-null  object
 1   strength  669828 non-null  object
dtypes: object(2)
memory usage: 10.2+ MB


In [6]:
x = df.isnull().any(axis =1)
df[x]

Unnamed: 0,password,strength
367686,,0


In [7]:
df['password'].fillna('ktyilop12@34#', inplace=True)
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 669828 entries, 0 to 669827
Data columns (total 2 columns):
 #   Column    Non-Null Count   Dtype 
---  ------    --------------   ----- 
 0   password  669828 non-null  object
 1   strength  669828 non-null  object
dtypes: object(2)
memory usage: 10.2+ MB


In [8]:
df['strength'].unique()

array(['1', '2', '0'], dtype=object)

The dataset has two columns; password and strength. In the strength column:

- 0 means: the password’s strength is weak,
  1 means: the password’s strength is medium,
  2 means: the password’s strength is strong,

Before moving forward, I will convert 0, 1, and 2 values in the strength column to weak, medium, and strong.

#### Mapping and Preprocessing

In [9]:
df[["strength"]]=df[["strength"]].astype(float)

In [10]:
df["strength"] = df["strength"].map({0: "Weak", 1: "Medium",2: "Strong"})
df.head()

Unnamed: 0,password,strength
0,kzde5577,Medium
1,kino3434,Medium
2,visi7k1yr,Medium
3,megzy123,Medium
4,lamborghin1,Medium


Let’s move to train a machine learning model to predict the strength of the password. Before we start preparing the model, we need to tokenize the passwords as we need the model to learn from the combinations of digits, letters, and symbols to predict the password’s strength. So here’s how we can tokenize and split the data into training and test sets.

In [11]:
def word(password):
    character=[]
    for i in password:
        character.append(i)
    return character
  
x1 = np.array(df["password"])
y = np.array(df["strength"])

tdif = TfidfVectorizer(tokenizer=word)
x = tdif.fit_transform(x1)
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size=0.05,random_state=42)

In [12]:
model = RandomForestClassifier(n_estimators=10 , criterion ='entropy' ,random_state = 0)
model.fit(X_train, y_train)
model.score(X_test, y_test)

0.9400155260957841

In [13]:
y_pred = model.predict(X_test)

from sklearn.metrics import classification_report
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

      Medium       0.94      0.99      0.96     24776
      Strong       0.97      0.87      0.91      4087
        Weak       0.95      0.75      0.84      4629

    accuracy                           0.94     33492
   macro avg       0.95      0.87      0.90     33492
weighted avg       0.94      0.94      0.94     33492



In [14]:
import getpass
user = getpass.getpass("Enter Password: ")
data = tdif.transform([user]).toarray()
output = model.predict(data)
print(output)

Enter Password: ········
['Medium']


#### Conclusion

This is how you can use machine learning to create a password’s strength checker using the Python programming language. A password strength checker works by understanding the combination of digits, letters, and special symbols you use in your password. 