# Password Strength Checker

### Password strength checkers are crucial tools for safeguarding online security. They enable users to create and maintain strong passwords that reduce the risk of security breaches and unauthorized access. 

### These applications evaluate the strength of a password based on various factors such as length, complexity, and uniqueness. 

### The feedback provided by password strength checkers enables users to make informed decisions on how to improve their passwords and enhance the security of their online accounts.

## Password Strength Checker using Python 

### A password strength checker is a tool that evaluates the strength of a password based on its combination of digits, letters, and special symbols. 

### Machine learning models are often used to create password strength checkers by training on labeled datasets of different password combinations. These models learn from data to understand which combinations of letters and symbols constitute strong and weak passwords. 

### By analyzing these patterns, machine learning algorithms can accurately predict the strength of a password and provide feedback on how to improve it. 

### Overall, password strength checkers that utilize machine learning algorithms can help users create stronger passwords and reduce the risk of security breaches.

### I’ll start by importing the necessary Python libraries and the dataset:

In [1]:
import pandas as pd
import numpy as np

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier

import getpass
import warnings
warnings.filterwarnings("ignore")

In [12]:
# Load and print the data

data = pd.read_csv("data.csv", error_bad_lines=False)
print(data.head())

b'Skipping line 2810: expected 2 fields, saw 5\nSkipping line 4641: expected 2 fields, saw 5\nSkipping line 7171: expected 2 fields, saw 5\nSkipping line 11220: expected 2 fields, saw 5\nSkipping line 13809: expected 2 fields, saw 5\nSkipping line 14132: expected 2 fields, saw 5\nSkipping line 14293: expected 2 fields, saw 5\nSkipping line 14865: expected 2 fields, saw 5\nSkipping line 17419: expected 2 fields, saw 5\nSkipping line 22801: expected 2 fields, saw 5\nSkipping line 25001: expected 2 fields, saw 5\nSkipping line 26603: expected 2 fields, saw 5\nSkipping line 26742: expected 2 fields, saw 5\nSkipping line 29702: expected 2 fields, saw 5\nSkipping line 32767: expected 2 fields, saw 5\nSkipping line 32878: expected 2 fields, saw 5\nSkipping line 35643: expected 2 fields, saw 5\nSkipping line 36550: expected 2 fields, saw 5\nSkipping line 38732: expected 2 fields, saw 5\nSkipping line 40567: expected 2 fields, saw 5\nSkipping line 40576: expected 2 fields, saw 5\nSkipping line 

      password  strength
0     kzde5577         1
1     kino3434         1
2    visi7k1yr         1
3     megzy123         1
4  lamborghin1         1


b'Skipping line 525174: expected 2 fields, saw 5\nSkipping line 526251: expected 2 fields, saw 5\nSkipping line 529611: expected 2 fields, saw 5\nSkipping line 531398: expected 2 fields, saw 5\nSkipping line 534146: expected 2 fields, saw 5\nSkipping line 544954: expected 2 fields, saw 5\nSkipping line 553002: expected 2 fields, saw 5\nSkipping line 553883: expected 2 fields, saw 5\nSkipping line 553887: expected 2 fields, saw 5\nSkipping line 553915: expected 2 fields, saw 5\nSkipping line 554172: expected 2 fields, saw 5\nSkipping line 563534: expected 2 fields, saw 5\nSkipping line 565191: expected 2 fields, saw 5\nSkipping line 574108: expected 2 fields, saw 5\nSkipping line 574412: expected 2 fields, saw 5\nSkipping line 575985: expected 2 fields, saw 5\nSkipping line 580091: expected 2 fields, saw 5\nSkipping line 582682: expected 2 fields, saw 5\nSkipping line 585885: expected 2 fields, saw 5\nSkipping line 590171: expected 2 fields, saw 5\nSkipping line 591924: expected 2 field

In [3]:
# Print the number of null values in each column

data.isnull().sum()

password    1
strength    0
dtype: int64

In [4]:
# Drop any rows with missing data

data = data.dropna()

### The dataset used for this project contains two columns: password and strength. The strength column values indicate the strength of the password as follows: 0 for weak, 1 for medium, and 2 for strong. 

### To make the data more interpretable, the values in the strength column are converted to their corresponding categories: weak, medium, and strong. This conversion allows for a clearer understanding of the password strength distribution in the dataset, which is essential for developing an accurate and effective password strength checker.

In [5]:
# Map the strength values to more readable and understandable categories

data["strength"] = data["strength"].map({0: "Weak", 
                                         1: "Medium",
                                         2: "Strong"})

# Print a random sample of five rows from the preprocessed dataset

print(data.sample(5))

           password strength
357234   zaraza2008   Medium
626082   c9p5au8naa   Medium
81554   temp3413888   Medium
323318    43661269Z   Medium
76579      omer1453   Medium


# 

## Password Strength Checker with ML in Python: Tokenization

### Before training the model, we must tokenize the passwords and split the data into training and test sets. 

### Tokenizing the passwords involves breaking them down into smaller units, such as characters or words, which the model can use to learn from. 

### Splitting the data into training and test sets is crucial to evaluate the model's accuracy and generalization performance. In the following section, we will demonstrate how to tokenize and split the data for password strength prediction.

In [6]:
# Define a function to split a password into its characters

def word(password):
    character = []
    for i in password:
        character.append(i)
    return character

# Extract the password and strength columns from the dataset as numpy arrays

x = np.array(data["password"])
y = np.array(data["strength"])

# Initialize the TfidfVectorizer and fit it to the password data

tdif = TfidfVectorizer(tokenizer=word)
x = tdif.fit_transform(x)

# Split the data into training and testing sets using train_test_split function

xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.05, random_state=42)

## Password Strength Checker with ML in Python: Training of the Prediction Model

### Training a machine learning model to predict the strength of passwords is a critical step in developing a robust password strength checker. 

### By utilizing a well-trained model, one can accurately evaluate the security of a password and provide users with valuable feedback to improve their account security.

In [7]:
# Train the random forest classifier

clf = RandomForestClassifier()
clf.fit(xtrain, ytrain)

# Evaluate the model's accuracy on the test set

accuracy = clf.score(xtest, ytest)
print(f'Test set accuracy: {accuracy:.4f}')

Test set accuracy: 0.9562


### After training the model, we can use it to check the strength of a password.

In [14]:
# Prompt user to enter password without echoing input to console

user = getpass.getpass("Enter Password: ")

# Transform password using TfidfVectorizer

data = tdif.transform([user]).toarray()

# Use trained model to predict password strength

output = clf.predict(data)

# Print the predicted strength of the password

print(output)

Enter Password: ········
['Strong']


## Summary

### To create an effective password strength checker, training a machine learning model to predict the strength of passwords is a crucial step. By understanding the combination of digits, letters, and special symbols used in a password, a password strength checker can accurately assess the security of a password.