# üõ°Ô∏è Phishing URL Detection using Machine Learning

This notebook demonstrates a simple **machine learning model** to detect **phishing URLs** using lexical features.

Models used:
- Decision Tree Classifier

Steps:
1. Load dataset
2. Split into train and test
3. Train model
4. Evaluate performance
5. Visualize results (Confusion Matrix & ROC Curve)

Dataset source: `dataset.csv`


In [None]:
import numpy as np
import pandas as pd
from sklearn import tree
from sklearn.model_selection import train_test_split
from sklearn.metrics import (
    accuracy_score,
    precision_score,
    recall_score,
    f1_score,
    ConfusionMatrixDisplay,
    RocCurveDisplay
)
import matplotlib.pyplot as plt


In [None]:
data = np.genfromtxt("dataset.csv", delimiter=",", dtype=np.int32)

print("Dataset shape:", data.shape)

# separate features & labels
X = data[:, :-1]
y = data[:, -1]


In [None]:
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

print("Train set:", X_train.shape)
print("Test set:", X_test.shape)


In [None]:
model = tree.DecisionTreeClassifier(random_state=42)
model.fit(X_train, y_train)

print("Model training complete.")


In [None]:
y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print(f"Accuracy  : {accuracy:.4f}")
print(f"Precision : {precision:.4f}")
print(f"Recall    : {recall:.4f}")
print(f"F1 Score  : {f1:.4f}")


In [None]:
ConfusionMatrixDisplay.from_estimator(model, X_test, y_test)
plt.title("Confusion Matrix - Phishing Detection")
plt.show()


In [None]:
RocCurveDisplay.from_estimator(model, X_test, y_test)
plt.title("ROC Curve - Phishing Detection")
plt.show()


## ‚úÖ Conclusion

The Decision Tree model was successfully trained to classify **phishing vs legitimate URLs**.

Metrics evaluated:
- Accuracy
- Precision
- Recall
- F1 Score

Visualizations:
- Confusion Matrix
- ROC Curve

This notebook demonstrates:
- basic machine learning workflow
- cybersecurity + data science integration
