#**Password Strength Checker with Machine Learning**


**Password Strength Checker is an application that checks how strong a password is. Some popular password strength meters use machine learning algorithms to predict the strength of your password**

# **importing the necessary Python libraries and the dataset**

In [1]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

In [2]:
data = pd.read_csv("/content/data.csv", on_bad_lines='skip')
print(data.head())

      password  strength
0     kzde5577       1.0
1     kino3434       1.0
2    visi7k1yr       1.0
3     megzy123       1.0
4  lamborghin1       1.0


**The dataset has two columns; password and strength. In the strength column:**

0 means: the password’s strength is weak;


1 means: the password’s strength is medium;


2 means: the password’s strength is strong;

**Before moving forward, I will convert 0, 1, and 2 values in the strength column to weak, medium, and strong:**

In [3]:
data = data.dropna()
data["strength"] = data["strength"].map({0: "Weak",
                                         1: "Medium",
                                         2: "Strong"})
print(data.sample(5))

                password strength
265202  74zHmSTgxNwSPTXQ   Strong
145445        sirisak789   Medium
57042          zxecgqkm6   Medium
170200        progin1458   Medium
245739         t14949986   Medium


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  data["strength"] = data["strength"].map({0: "Weak",


#Password Strength Prediction Model


Now let’s move to train a machine learning model to predict the strength of the password. Before we start preparing the model, we need to tokenize the passwords as we need the model to learn from the combinations of digits, letters, and symbols to predict the password’s strength. So here’s how we can tokenize and split the data into training and test sets:

In [4]:
def word(password):
    character=[]
    for i in password:
        character.append(i)
    return character

x = np.array(data["password"])
y = np.array(data["strength"])

tdif = TfidfVectorizer(tokenizer=word)
x = tdif.fit_transform(x)
xtrain, xtest, ytrain, ytest = train_test_split(x, y,
                                                test_size=0.05,
                                                random_state=42)



# train a classification model to predict the strength of the password:

In [5]:
model = RandomForestClassifier()
model.fit(xtrain, ytrain)
print(model.score(xtest, ytest))

0.9533363670210565


#check the strength of a password using the trained model:

In [9]:
import getpass
user = getpass.getpass("Enter Password: ")
data = tdif.transform([user]).toarray()
output = model.predict(data)
print(output)


Enter Password: ··········
['Strong']
