<a href="https://colab.research.google.com/github/Tahira2910/ETG_Python_Projects/blob/main/Password_Strength_Checker.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import pandas as pd
import numpy as np
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

In [None]:
data = pd.read_csv("data2.csv")
print(data.head())

      password  strength
0     kzde5577         1
1     kino3434         1
2    visi7k1yr         1
3     megzy123         1
4  lamborghin1         1


The dataset has two columns; password and strength. In the strength column:

0 means: the password’s strength is weak;
1 means: the password’s strength is medium;
2 means: the password’s strength is strong;
Before moving forward, I will convert 0, 1, and 2 values in the strength column to weak, medium, and strong:

In [None]:
data = data.dropna()
data["strength"] = data["strength"].map({0: "Weak", 
                                         1: "Medium",
                                         2: "Strong"})
print(data.sample(5))

**Password Strength Prediction Model**

Now let’s move to train a machine learning model to predict the strength of the password. Before we start preparing the model, we need to tokenize the passwords as we need the model to learn from the combinations of digits, letters, and symbols to predict the password’s strength. So here’s how we can tokenize and split the data into training and test sets:

In [None]:
def word(password):
    character=[]
    for i in password:
        character.append(i)
    return character
  
# Assigns the values of the "password" column from the data DataFrame to a NumPy array x.
# Assigns the values of the "strength" column from the data DataFrame to a NumPy array y.
x = np.array(data["password"])
y = np.array(data["strength"])

# Creates an instance of the TfidfVectorizer class from the sklearn.feature_extraction.text module and assigns it to the variable tdif

tdif = TfidfVectorizer(tokenizer=word) # word function will be applied to each password to convert it into a list of characters.
# The fit_transform() method combines two steps: fitting the vectorizer on the input data and transforming the 
# input data into a numerical representation based on the learned vocabulary.
x = tdif.fit_transform(x)
xtrain, xtest, ytrain, ytest = train_test_split(x, y, 
                                                test_size=0.05,  # test_size = 5%
                                                random_state=42)

Now here’s how to train a classification model to predict the strength of the password:

In [None]:
model = RandomForestClassifier()
# trains the random forest classifier using the training data.
model.fit(xtrain, ytrain)
print(model.score(xtest, ytest))

Now here’s how we can check the strength of a password using the trained model:

In [None]:
import getpass
user = getpass.getpass("Enter Password: ")
data = tdif.transform([user]).toarray()
output = model.predict(data)
print(output)