# 🔐 Password Strength Analyzer with Adversarial Training

## Educational Tutorial and Implementation Guide

Welcome to this comprehensive tutorial on building an advanced password strength analyzer using machine learning and adversarial training techniques. This notebook will guide you through the entire process of creating a robust password security assessment system.

### 🎯 Learning Objectives

By the end of this tutorial, you will understand:

1. **Feature Engineering for Password Analysis**: How to extract meaningful features from password strings
2. **Machine Learning for Security**: Applying ML techniques to cybersecurity problems
3. **Adversarial Training**: Making models robust against deceptive inputs
4. **Real-world Application**: Building production-ready security tools

### 🧠 Problem Statement

Traditional password strength checkers rely on simple rules (length, character types) that can be easily fooled. For example:
- `P@ssw0rd123!` looks strong but is based on a common weak password
- `qwerty2024!` contains predictable patterns despite having symbols and numbers

Our goal is to build an ML model that:
- Learns from real password breach data
- Recognizes subtle weakness patterns
- Resists adversarial "strengthening" tricks
- Provides actionable security feedback

### 📊 Dataset Structure

We'll work with password datasets containing:
```
Password         | Crack_Time_Sec | Strength_Label
123456          | 0.001          | Weak
qW3@Zx9!        | 1223           | Strong  
password1       | 0.02           | Weak
G0ldfishKing!   | 4000           | Strong
```

Let's begin!

## 1. Import Required Libraries

Let's start by importing all the necessary libraries for our password strength analyzer.

In [None]:
# Core data science libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Machine learning libraries
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
from sklearn.metrics import precision_recall_fscore_support, roc_auc_score, roc_curve
from sklearn.preprocessing import LabelEncoder
import xgboost as xgb

# Text analysis libraries
import textstat
import nltk
from collections import Counter
import re
import string
import math

# Utility libraries
import warnings
import os
import sys
from getpass import getpass
import time
from faker import Faker

# Add src directory to path for our custom modules
sys.path.append('../src')

# Our custom modules
from feature_extraction import PasswordFeatureExtractor, extract_features_batch
from data_generator import PasswordDataGenerator, create_sample_dataset
from adversarial_training import AdversarialPasswordGenerator
from model_training import PasswordStrengthModel

# Configure display options
warnings.filterwarnings('ignore')
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

print("✅ All libraries imported successfully!")

## 2. Load and Explore Password Dataset

In this section, we'll generate a synthetic password dataset that mimics real-world password patterns. In a production environment, you would use actual breach datasets like RockYou.txt or HaveIBeenPwned data.

In [None]:
# Generate a comprehensive password dataset
print("🔄 Generating password dataset...")
dataset = create_sample_dataset()

print(f"✅ Generated {len(dataset)} password samples")
print(f"\nDataset shape: {dataset.shape}")
print(f"Columns: {list(dataset.columns)}")

# Display basic information about the dataset
print("\n📊 Dataset Overview:")
print(dataset.head(10))

print("\n📈 Strength Label Distribution:")
strength_counts = dataset['Strength_Label'].value_counts()
print(strength_counts)

# Visualize the distribution
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))

# Strength distribution
strength_counts.plot(kind='bar', ax=ax1, color=['red', 'orange', 'green'])
ax1.set_title('Password Strength Distribution')
ax1.set_xlabel('Strength Level')
ax1.set_ylabel('Count')
ax1.tick_params(axis='x', rotation=0)

# Crack time distribution (log scale)
dataset['log_crack_time'] = np.log10(dataset['Crack_Time_Sec'] + 1)
ax2.hist(dataset['log_crack_time'], bins=50, alpha=0.7, color='skyblue')
ax2.set_title('Distribution of Crack Times (Log Scale)')
ax2.set_xlabel('Log10(Crack Time Seconds)')
ax2.set_ylabel('Frequency')

plt.tight_layout()
plt.show()

# Show some example passwords by strength
print("\n🔍 Example Passwords by Strength:")
for strength in ['Weak', 'Medium', 'Strong']:
    examples = dataset[dataset['Strength_Label'] == strength]['Password'].head(5).tolist()
    print(f"\n{strength} passwords:")
    for i, pwd in enumerate(examples, 1):
        print(f"  {i}. {pwd}")

# Basic statistics
print(f"\n📊 Dataset Statistics:")
print(f"Average password length: {dataset['Password'].str.len().mean():.2f}")
print(f"Min crack time: {dataset['Crack_Time_Sec'].min():.3f} seconds")
print(f"Max crack time: {dataset['Crack_Time_Sec'].max():.0f} seconds")
print(f"Median crack time: {dataset['Crack_Time_Sec'].median():.2f} seconds")

## 3. Feature Engineering: Password Analysis

Feature engineering is crucial for password strength analysis. We'll extract various features that capture different aspects of password security, from basic composition to complex patterns.

In [None]:
# Initialize the feature extractor
feature_extractor = PasswordFeatureExtractor()

# Extract features for a sample password to understand the feature set
sample_password = "MyP@ssw0rd123!"
sample_features = feature_extractor.extract_features(sample_password)

print(f"🔍 Feature Analysis for password: '{sample_password}'")
print(f"Total features extracted: {len(sample_features)}")
print("\n📊 Feature breakdown:")

# Group features by category for better understanding
feature_categories = {
    'Basic Composition': ['length', 'uppercase_count', 'lowercase_count', 'digit_count', 'symbol_count'],
    'Character Ratios': ['uppercase_ratio', 'lowercase_ratio', 'digit_ratio', 'symbol_ratio'],
    'Diversity Metrics': ['char_set_size', 'unique_char_ratio', 'entropy'],
    'Pattern Detection': ['has_keyboard_pattern', 'keyboard_pattern_length', 'has_sequential_chars', 
                         'sequential_char_count', 'has_repeated_chars', 'repeated_char_count'],
    'Security Analysis': ['substitution_count', 'contains_dictionary_word', 'contains_date', 
                         'contains_year', 'complexity_score']
}

for category, features in feature_categories.items():
    print(f"\n{category}:")
    for feature in features:
        if feature in sample_features:
            value = sample_features[feature]
            print(f"  {feature}: {value}")

# Extract features for the entire dataset
print(f"\n🔄 Extracting features for {len(dataset)} passwords...")
start_time = time.time()

# Extract features for all passwords
feature_list = extract_features_batch(dataset['Password'].tolist())
features_df = pd.DataFrame(feature_list)

end_time = time.time()
print(f"✅ Feature extraction completed in {end_time - start_time:.2f} seconds")
print(f"Feature matrix shape: {features_df.shape}")

# Display feature correlation heatmap
plt.figure(figsize=(15, 12))
correlation_matrix = features_df.corr()

# Select most important features for visualization
important_features = [
    'length', 'entropy', 'complexity_score', 'unique_char_ratio',
    'uppercase_ratio', 'digit_ratio', 'symbol_ratio',
    'has_keyboard_pattern', 'has_sequential_chars', 'has_repeated_chars',
    'contains_dictionary_word', 'substitution_count'
]

# Filter correlation matrix to important features
filtered_corr = correlation_matrix.loc[important_features, important_features]

sns.heatmap(filtered_corr, annot=True, cmap='coolwarm', center=0, 
            square=True, fmt='.2f', cbar_kws={"shrink": .8})
plt.title('Feature Correlation Matrix (Key Features)')
plt.tight_layout()
plt.show()

# Feature statistics by strength level
print("\n📈 Feature Statistics by Strength Level:")
features_with_labels = features_df.copy()
features_with_labels['Strength_Label'] = dataset['Strength_Label']

feature_stats = features_with_labels.groupby('Strength_Label')[important_features].mean()
print(feature_stats.round(3))

## 4. Password Entropy and Pattern Detection

Entropy is a crucial measure of password unpredictability. Let's dive deeper into entropy calculation and pattern detection techniques.

In [None]:
# Let's analyze entropy and patterns for different types of passwords
test_passwords = [
    "123456",           # Very weak - sequential numbers
    "password",         # Weak - dictionary word
    "qwerty123",        # Weak - keyboard pattern + numbers
    "P@ssw0rd123!",     # Deceptively strong - common substitutions
    "MySecure2024!",    # Medium - predictable year
    "7$kL9#mN2@pQ5",    # Strong - truly random
    "correcthorsebatterystaple",  # Strong - passphrase
]

print("🔍 Detailed Analysis of Different Password Types:\n")

for password in test_passwords:
    features = feature_extractor.extract_features(password)
    
    print(f"Password: '{password}'")
    print(f"  Length: {features['length']}")
    print(f"  Entropy: {features['entropy']:.3f} bits")
    print(f"  Character set size: {features['char_set_size']}")
    print(f"  Unique char ratio: {features['unique_char_ratio']:.3f}")
    print(f"  Complexity score: {features['complexity_score']}")
    
    # Pattern detection
    patterns = []
    if features['has_keyboard_pattern']:
        patterns.append("Keyboard pattern")
    if features['has_sequential_chars']:
        patterns.append("Sequential characters")
    if features['has_repeated_chars']:
        patterns.append("Repeated characters")
    if features['contains_dictionary_word']:
        patterns.append("Dictionary word")
    if features['has_common_substitutions']:
        patterns.append("Common substitutions")
    if features['contains_year']:
        patterns.append("Contains year")
    
    if patterns:
        print(f"  Detected patterns: {', '.join(patterns)}")
    else:
        print(f"  Detected patterns: None")
    
    print()

# Visualize entropy distribution by strength
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Entropy by strength
features_with_labels['entropy'].hist(by=features_with_labels['Strength_Label'], 
                                    ax=axes[0,0], bins=30, alpha=0.7)
axes[0,0].set_title('Entropy Distribution by Strength')
axes[0,0].set_xlabel('Entropy (bits)')

# Length by strength
features_with_labels['length'].hist(by=features_with_labels['Strength_Label'], 
                                   ax=axes[0,1], bins=20, alpha=0.7)
axes[0,1].set_title('Length Distribution by Strength')
axes[0,1].set_xlabel('Password Length')

# Complexity score by strength
features_with_labels['complexity_score'].hist(by=features_with_labels['Strength_Label'], 
                                             ax=axes[1,0], bins=30, alpha=0.7)
axes[1,0].set_title('Complexity Score Distribution by Strength')
axes[1,0].set_xlabel('Complexity Score')

# Character diversity by strength
features_with_labels['unique_char_ratio'].hist(by=features_with_labels['Strength_Label'], 
                                              ax=axes[1,1], bins=20, alpha=0.7)
axes[1,1].set_title('Character Diversity by Strength')
axes[1,1].set_xlabel('Unique Character Ratio')

plt.tight_layout()
plt.show()

# Pattern prevalence analysis
pattern_features = ['has_keyboard_pattern', 'has_sequential_chars', 'has_repeated_chars', 
                   'contains_dictionary_word', 'has_common_substitutions', 'contains_year']

pattern_stats = []
for strength in ['Weak', 'Medium', 'Strong']:
    strength_data = features_with_labels[features_with_labels['Strength_Label'] == strength]
    row = {'Strength': strength}
    for pattern in pattern_features:
        prevalence = strength_data[pattern].mean() * 100
        row[pattern.replace('has_', '').replace('contains_', '').replace('_', ' ').title()] = f"{prevalence:.1f}%"
    pattern_stats.append(row)

pattern_df = pd.DataFrame(pattern_stats)
print("📊 Pattern Prevalence by Strength Level:")
print(pattern_df.to_string(index=False))

## 5. Encode Target Variables

We'll prepare our target variables for both classification (strength labels) and regression (crack time) approaches.

In [None]:
# Prepare target variables
label_encoder = LabelEncoder()

# Classification target: Strength labels (Weak=0, Medium=1, Strong=2)
y_classification = label_encoder.fit_transform(dataset['Strength_Label'])

# Regression target: Log-transformed crack time (for better distribution)
y_regression = np.log10(dataset['Crack_Time_Sec'] + 1)

print("🎯 Target Variable Encoding:")
print(f"Classification labels: {label_encoder.classes_}")
print(f"Label encoding: {dict(zip(label_encoder.classes_, range(len(label_encoder.classes_))))}")

print(f"\nClassification target distribution:")
unique, counts = np.unique(y_classification, return_counts=True)
for label, count in zip(label_encoder.classes_, counts):
    print(f"  {label}: {count} ({count/len(y_classification)*100:.1f}%)")

print(f"\nRegression target (log crack time) statistics:")
print(f"  Mean: {y_regression.mean():.3f}")
print(f"  Std: {y_regression.std():.3f}")
print(f"  Min: {y_regression.min():.3f}")
print(f"  Max: {y_regression.max():.3f}")

# Visualize target distributions
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(14, 5))

# Classification target
pd.Series(y_classification).value_counts().sort_index().plot(kind='bar', ax=ax1, 
                                                            color=['red', 'orange', 'green'])
ax1.set_title('Classification Target Distribution')
ax1.set_xlabel('Strength Level (0=Weak, 1=Medium, 2=Strong)')
ax1.set_ylabel('Count')
ax1.tick_params(axis='x', rotation=0)

# Regression target
ax2.hist(y_regression, bins=30, alpha=0.7, color='skyblue', edgecolor='black')
ax2.set_title('Regression Target Distribution (Log Crack Time)')
ax2.set_xlabel('Log10(Crack Time + 1)')
ax2.set_ylabel('Frequency')

plt.tight_layout()
plt.show()

# Prepare feature matrix X
X = features_df.copy()

print(f"\n📊 Final dataset shapes:")
print(f"  Features (X): {X.shape}")
print(f"  Classification target (y_classification): {y_classification.shape}")
print(f"  Regression target (y_regression): {y_regression.shape}")

# Check for any missing values
missing_features = X.isnull().sum().sum()
print(f"\n🔍 Data quality check:")
print(f"  Missing feature values: {missing_features}")
print(f"  Feature data types: {X.dtypes.value_counts().to_dict()}")

if missing_features > 0:
    print("  Filling missing values with 0...")
    X = X.fillna(0)

## 6. Split Data into Train and Test Sets

We'll create training and testing datasets with stratified sampling to ensure balanced representation across strength levels.

In [None]:
# Split the data for classification and regression tasks
X_train_cls, X_test_cls, y_train_cls, y_test_cls = train_test_split(
    X, y_classification, test_size=0.2, random_state=42, stratify=y_classification
)

X_train_reg, X_test_reg, y_train_reg, y_test_reg = train_test_split(
    X, y_regression, test_size=0.2, random_state=42
)

print("📊 Dataset Split Summary:")
print(f"Classification task:")
print(f"  Training set: {X_train_cls.shape[0]} samples")
print(f"  Test set: {X_test_cls.shape[0]} samples")
print(f"  Features: {X_train_cls.shape[1]}")

print(f"\nRegression task:")
print(f"  Training set: {X_train_reg.shape[0]} samples")
print(f"  Test set: {X_test_reg.shape[0]} samples")

# Check stratification worked correctly
print(f"\n🎯 Class distribution in training set:")
train_dist = pd.Series(y_train_cls).value_counts().sort_index()
for i, (label, count) in enumerate(zip(label_encoder.classes_, train_dist)):
    percentage = count / len(y_train_cls) * 100
    print(f"  {label}: {count} ({percentage:.1f}%)")

print(f"\n🎯 Class distribution in test set:")
test_dist = pd.Series(y_test_cls).value_counts().sort_index()
for i, (label, count) in enumerate(zip(label_encoder.classes_, test_dist)):
    percentage = count / len(y_test_cls) * 100
    print(f"  {label}: {count} ({percentage:.1f}%)")

# Visualize the split
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Training set distribution
train_dist.plot(kind='bar', ax=axes[0], color=['red', 'orange', 'green'], alpha=0.7)
axes[0].set_title('Training Set - Strength Distribution')
axes[0].set_xlabel('Strength Level')
axes[0].set_ylabel('Count')
axes[0].tick_params(axis='x', rotation=0)

# Test set distribution
test_dist.plot(kind='bar', ax=axes[1], color=['red', 'orange', 'green'], alpha=0.7)
axes[1].set_title('Test Set - Strength Distribution')
axes[1].set_xlabel('Strength Level')
axes[1].set_ylabel('Count')
axes[1].tick_params(axis='x', rotation=0)

plt.tight_layout()
plt.show()

print("✅ Data split completed successfully with proper stratification!")

## 7. Train Random Forest and XGBoost Models

Let's train both Random Forest and XGBoost models to compare their performance on password strength prediction.

In [None]:
# Initialize models
models = {
    'Random Forest': RandomForestClassifier(
        n_estimators=100,
        max_depth=10,
        min_samples_split=5,
        min_samples_leaf=2,
        random_state=42,
        n_jobs=-1
    ),
    'XGBoost': xgb.XGBClassifier(
        n_estimators=100,
        max_depth=6,
        learning_rate=0.1,
        random_state=42,
        n_jobs=-1,
        eval_metric='mlogloss'
    )
}

# Store results
model_results = {}

print("🚀 Training Models...")
print("=" * 50)

for name, model in models.items():
    print(f"\n🔄 Training {name}...")
    start_time = time.time()
    
    # Train the model
    model.fit(X_train_cls, y_train_cls)
    training_time = time.time() - start_time
    
    # Make predictions
    y_pred_train = model.predict(X_train_cls)
    y_pred_test = model.predict(X_test_cls)
    
    # Calculate accuracies
    train_accuracy = accuracy_score(y_train_cls, y_pred_train)
    test_accuracy = accuracy_score(y_test_cls, y_pred_test)
    
    # Cross-validation
    cv_scores = cross_val_score(model, X_train_cls, y_train_cls, cv=5, scoring='accuracy')
    
    # Store results
    model_results[name] = {
        'model': model,
        'train_accuracy': train_accuracy,
        'test_accuracy': test_accuracy,
        'cv_mean': cv_scores.mean(),
        'cv_std': cv_scores.std(),
        'training_time': training_time,
        'y_pred_test': y_pred_test
    }
    
    print(f"✅ {name} Training Complete!")
    print(f"   Training Accuracy: {train_accuracy:.4f}")
    print(f"   Test Accuracy: {test_accuracy:.4f}")
    print(f"   CV Accuracy: {cv_scores.mean():.4f} (±{cv_scores.std()*2:.4f})")
    print(f"   Training Time: {training_time:.2f} seconds")

# Compare model performance
print(f"\n📊 Model Comparison Summary:")
print("=" * 50)

comparison_data = []
for name, results in model_results.items():
    comparison_data.append({
        'Model': name,
        'Test Accuracy': f"{results['test_accuracy']:.4f}",
        'CV Accuracy': f"{results['cv_mean']:.4f} ± {results['cv_std']*2:.4f}",
        'Training Time': f"{results['training_time']:.2f}s"
    })

comparison_df = pd.DataFrame(comparison_data)
print(comparison_df.to_string(index=False))

# Visualize model comparison
fig, axes = plt.subplots(1, 2, figsize=(15, 6))

# Accuracy comparison
models_list = list(model_results.keys())
test_accuracies = [model_results[model]['test_accuracy'] for model in models_list]
cv_accuracies = [model_results[model]['cv_mean'] for model in models_list]

x = np.arange(len(models_list))
width = 0.35

axes[0].bar(x - width/2, test_accuracies, width, label='Test Accuracy', alpha=0.8)
axes[0].bar(x + width/2, cv_accuracies, width, label='CV Accuracy', alpha=0.8)
axes[0].set_xlabel('Models')
axes[0].set_ylabel('Accuracy')
axes[0].set_title('Model Performance Comparison')
axes[0].set_xticks(x)
axes[0].set_xticklabels(models_list, rotation=45)
axes[0].legend()
axes[0].set_ylim(0.8, 1.0)

# Training time comparison
training_times = [model_results[model]['training_time'] for model in models_list]
axes[1].bar(models_list, training_times, alpha=0.8, color='orange')
axes[1].set_xlabel('Models')
axes[1].set_ylabel('Training Time (seconds)')
axes[1].set_title('Training Time Comparison')
axes[1].tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

# Select best model based on test accuracy
best_model_name = max(model_results.keys(), key=lambda k: model_results[k]['test_accuracy'])
best_model = model_results[best_model_name]['model']

print(f"\n🏆 Best performing model: {best_model_name}")
print(f"   Test Accuracy: {model_results[best_model_name]['test_accuracy']:.4f}")

# Detailed classification report for best model
print(f"\n📋 Detailed Classification Report for {best_model_name}:")
print(classification_report(y_test_cls, model_results[best_model_name]['y_pred_test'], 
                          target_names=label_encoder.classes_))

## 9. Generate Adversarial Password Examples

Adversarial training is crucial for making our model robust against passwords that look strong but are actually weak due to predictable transformations.

In [None]:
# Initialize adversarial generator
adversarial_gen = AdversarialPasswordGenerator()

# Generate adversarial examples from weak passwords
weak_passwords = ["password", "admin", "123456", "qwerty", "welcome"]

print("🎭 Generating Adversarial Examples:")
print("=" * 50)

adversarial_examples = []
for weak_pwd in weak_passwords:
    variants = adversarial_gen.generate_multiple_adversarials(weak_pwd, n_variants=3)
    
    print(f"\nBase password: '{weak_pwd}'")
    print("Adversarial variants:")
    
    for i, variant in enumerate(variants, 1):
        print(f"  {i}. {variant}")
        adversarial_examples.append(variant)

# Test how our current model handles these adversarial examples
print(f"\n🧪 Testing Model Robustness Against Adversarial Examples:")
print("=" * 60)

adversarial_results = []
for password in adversarial_examples[:10]:  # Test first 10
    # Extract features
    features = feature_extractor.extract_features(password)
    features_df_single = pd.DataFrame([features])
    
    # Ensure same feature order as training
    features_df_single = features_df_single.reindex(columns=X.columns, fill_value=0)
    
    # Predict with best model
    prediction = best_model.predict(features_df_single)[0]
    probabilities = best_model.predict_proba(features_df_single)[0]
    predicted_strength = label_encoder.inverse_transform([prediction])[0]
    
    adversarial_results.append({
        'Password': password,
        'Predicted_Strength': predicted_strength,
        'Confidence': max(probabilities),
        'Weak_Prob': probabilities[0],
        'Medium_Prob': probabilities[1],
        'Strong_Prob': probabilities[2]
    })
    
    print(f"{password:<20} -> {predicted_strength:<8} (confidence: {max(probabilities):.3f})")

# Analyze results
adversarial_df = pd.DataFrame(adversarial_results)
print(f"\n📊 Adversarial Testing Results:")
print(f"Passwords predicted as Weak: {(adversarial_df['Predicted_Strength'] == 'Weak').sum()}")
print(f"Passwords predicted as Medium: {(adversarial_df['Predicted_Strength'] == 'Medium').sum()}")
print(f"Passwords predicted as Strong: {(adversarial_df['Predicted_Strength'] == 'Strong').sum()}")

# The model should ideally classify most of these as Weak since they're based on weak passwords
robustness_score = (adversarial_df['Predicted_Strength'] == 'Weak').mean()
print(f"Model Robustness Score: {robustness_score:.3f} (higher is better)")

if robustness_score < 0.7:
    print("⚠️  Model may be vulnerable to adversarial attacks. Consider adversarial training.")
else:
    print("✅ Model shows good robustness against adversarial examples.")

## 13. Password Strength Prediction Function

Let's create a comprehensive function that analyzes any password and provides detailed feedback.

In [None]:
def analyze_password_comprehensive(password, model=best_model, 
                                  feature_extractor=feature_extractor,
                                  label_encoder=label_encoder):
    """
    Comprehensive password analysis function
    """
    # Extract features
    features = feature_extractor.extract_features(password)
    features_df = pd.DataFrame([features])
    features_df = features_df.reindex(columns=X.columns, fill_value=0)
    
    # Predict strength
    prediction = model.predict(features_df)[0]
    probabilities = model.predict_proba(features_df)[0]
    predicted_strength = label_encoder.inverse_transform([prediction])[0]
    
    # Estimate crack time
    data_gen = PasswordDataGenerator()
    crack_time_seconds = data_gen.calculate_crack_time(password)
    
    # Convert crack time to human readable
    if crack_time_seconds < 60:
        crack_time_str = f"{crack_time_seconds:.3f} seconds"
    elif crack_time_seconds < 3600:
        crack_time_str = f"{crack_time_seconds/60:.1f} minutes"
    elif crack_time_seconds < 86400:
        crack_time_str = f"{crack_time_seconds/3600:.1f} hours"
    elif crack_time_seconds < 31536000:
        crack_time_str = f"{crack_time_seconds/86400:.1f} days"
    else:
        crack_time_str = f"{crack_time_seconds/31536000:.1f} years"
    
    # Analyze weaknesses
    weaknesses = []
    suggestions = []
    
    if features['length'] < 8:
        weaknesses.append("Password is too short")
        suggestions.append("Use at least 8-12 characters")
    
    if features['entropy'] < 3.0:
        weaknesses.append("Low entropy (predictable patterns)")
        suggestions.append("Use more diverse and random characters")
    
    if features['has_keyboard_pattern']:
        weaknesses.append("Contains keyboard patterns (qwerty, 123456)")
        suggestions.append("Avoid sequential keyboard keys")
    
    if features['contains_dictionary_word']:
        weaknesses.append("Contains common dictionary words")
        suggestions.append("Avoid common words like 'password', 'admin'")
    
    if features['has_common_substitutions']:
        weaknesses.append("Uses predictable character substitutions")
        suggestions.append("Avoid simple substitutions like @ for a")
    
    if features['contains_year']:
        weaknesses.append("Contains years or dates")
        suggestions.append("Avoid birth years or current dates")
    
    if features['uppercase_count'] == 0:
        suggestions.append("Add uppercase letters")
    
    if features['digit_count'] == 0:
        suggestions.append("Add numbers")
    
    if features['symbol_count'] == 0:
        suggestions.append("Add special characters (!@#$%)")
    
    return {
        'password': password,
        'predicted_strength': predicted_strength,
        'confidence': max(probabilities),
        'probabilities': {
            'Weak': probabilities[0],
            'Medium': probabilities[1],
            'Strong': probabilities[2]
        },
        'estimated_crack_time': crack_time_str,
        'crack_time_seconds': crack_time_seconds,
        'features': features,
        'weaknesses': weaknesses,
        'suggestions': suggestions
    }

# Test the function with various passwords
test_passwords = [
    "123456",                    # Very weak
    "password",                  # Weak dictionary word
    "P@ssw0rd123!",             # Looks strong but isn't
    "MySecurePassword2024!",     # Medium with predictable year
    "Tr0ub4dor&3",              # Classic XKCD reference
    "correct-horse-battery-staple",  # Strong passphrase
    "9Km#vN$2pL@8xR"            # Strong random
]

print("🔍 Comprehensive Password Analysis:")
print("=" * 70)

for password in test_passwords:
    result = analyze_password_comprehensive(password)
    
    # Color coding for display
    strength_colors = {
        'Weak': '🔴',
        'Medium': '🟠', 
        'Strong': '🟢'
    }
    
    print(f"\n{strength_colors[result['predicted_strength']]} Password: {password}")
    print(f"   Strength: {result['predicted_strength']} (confidence: {result['confidence']:.3f})")
    print(f"   Estimated crack time: {result['estimated_crack_time']}")
    
    if result['weaknesses']:
        print(f"   ⚠️  Weaknesses: {', '.join(result['weaknesses'][:2])}")
    
    if result['suggestions']:
        print(f"   💡 Suggestions: {', '.join(result['suggestions'][:2])}")

print(f"\n✅ Password analysis function created successfully!")

## 15. Interactive Password Testing

Now you can test your own passwords! The input will be hidden for security.

In [None]:
# Interactive password testing
def interactive_password_test():
    """
    Interactive function to test user passwords
    """
    print("🔐 Interactive Password Strength Analyzer")
    print("=" * 50)
    print("Enter a password to analyze (input will be hidden)")
    print("Type 'quit' to exit")
    
    while True:
        try:
            # Get password securely (hidden input)
            user_password = getpass("Enter password: ")
            
            if user_password.lower() == 'quit':
                print("👋 Thanks for using the password analyzer!")
                break
            
            if not user_password:
                print("❌ Please enter a password")
                continue
            
            # Analyze the password
            result = analyze_password_comprehensive(user_password)
            
            # Display results with formatting
            strength_emojis = {
                'Weak': '🔴',
                'Medium': '🟠',
                'Strong': '🟢'
            }
            
            print(f"\n{strength_emojis[result['predicted_strength']]} Analysis Results:")
            print(f"   Strength: {result['predicted_strength']}")
            print(f"   Confidence: {result['confidence']:.1%}")
            print(f"   Estimated crack time: {result['estimated_crack_time']}")
            
            # Show probability breakdown
            print(f"\n📊 Probability Breakdown:")
            for strength, prob in result['probabilities'].items():
                print(f"   {strength}: {prob:.1%}")
            
            # Show weaknesses if any
            if result['weaknesses']:
                print(f"\n⚠️  Identified Weaknesses:")
                for weakness in result['weaknesses']:
                    print(f"   • {weakness}")
            
            # Show suggestions if any
            if result['suggestions']:
                print(f"\n💡 Improvement Suggestions:")
                for suggestion in result['suggestions']:
                    print(f"   • {suggestion}")
            
            print("\n" + "-" * 50)
            
        except KeyboardInterrupt:
            print("\n👋 Thanks for using the password analyzer!")
            break
        except Exception as e:
            print(f"❌ Error analyzing password: {e}")

# Alternative: Test with predefined examples if getpass doesn't work in notebook
def test_example_passwords():
    """
    Alternative testing function with predefined examples
    """
    examples = [
        ("123456", "Very common weak password"),
        ("P@ssw0rd123!", "Looks strong but based on 'Password'"),
        ("MyDog2024!", "Personal info + year"),
        ("7$kL9#mN2@pQ5", "Truly random strong password"),
        ("correct-horse-battery-staple", "Strong passphrase")
    ]
    
    print("🔐 Testing Example Passwords:")
    print("=" * 50)
    
    for password, description in examples:
        print(f"\n📝 {description}")
        result = analyze_password_comprehensive(password)
        
        strength_emojis = {
            'Weak': '🔴',
            'Medium': '🟠',
            'Strong': '🟢'
        }
        
        print(f"{strength_emojis[result['predicted_strength']]} Password: {password}")
        print(f"   Strength: {result['predicted_strength']} ({result['confidence']:.1%} confidence)")
        print(f"   Crack time: {result['estimated_crack_time']}")
        
        if result['weaknesses']:
            print(f"   Issues: {', '.join(result['weaknesses'][:2])}")

# Run the example testing
test_example_passwords()

print(f"\n🎉 Congratulations! You've built a complete password strength analyzer!")
print(f"Key achievements:")
print(f"✅ Feature engineering for password analysis")
print(f"✅ Machine learning model training and evaluation") 
print(f"✅ Adversarial robustness testing")
print(f"✅ Comprehensive password analysis system")
print(f"\nNext steps:")
print(f"• Deploy as a web application using Streamlit")
print(f"• Integrate into existing applications via API")
print(f"• Collect real password data for improved training")
print(f"• Implement additional adversarial training techniques")