## Internship Task/Assingments- 7

## Objective: - The primary objective of this task is to:
•	Make the machine learning model interpretable and understandable, enabling stakeholders to trust and validate its predictions.
•	Deploy the trained model as a service, so that predictions can be accessed through an API, either locally or on a cloud platform (e.g., AWS, GCP).


## Model Explainability Report

In [4]:
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
import shap
import matplotlib.pyplot as plt

# Load dataset
df = pd.read_csv('dataset_phishing.csv')



In [15]:

# Features and target
X = df.drop(['url', 'status'], axis=1)  # Drop url (not useful as feature) and target from X
y = df['status']

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = RandomForestClassifier(random_state=42, n_estimators=100)
model.fit(X_train, y_train)





In [None]:

# SHAP explainability
explainer = shap.TreeExplainer(model)
shap_values = explainer.shap_values(X_test)

# SHAP summary plot (bar)
shap.summary_plot(shap_values[1], X_test, plot_type="bar")

# SHAP summary plot (detailed)
shap.summary_plot(shap_values[1], X_test)

# SHAP dependence plot (for top feature)
top_feature = X_test.columns[np.argsort(np.abs(shap_values[1]).mean(0))[-1]]
shap.dependence_plot(top_feature, shap_values[1], X_test)

## Model Deployment Package

##Local Deployment (Flask API Example)
Flask app code — app.py


In [28]:
# Example: Training a simple model (RandomForestClassifier)
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
import pickle

# Load your dataset
df = pd.read_csv("dataset_phishing.csv")

# Separate features and target
X = df.drop(columns=["status", "url"], errors='ignore')  # remove 'status' and 'url' columns
y = df["status"]

# Train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train model
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

# Save model
with open("phishing_model.pkl", "wb") as f:
    pickle.dump(model, f)

print("Model saved as phishing_model.pkl")


Model saved as phishing_model.pkl


## Create a app.py file with this code

In [30]:
from flask import Flask, request, jsonify
import pickle
import numpy as np

# Load the trained model
with open("phishing_model.pkl", "rb") as f:
    model = pickle.load(f)

# Initialize Flask app
app = Flask(__name__)

@app.route('/')
def home():
    return "Phishing URL Detection API is running!"

@app.route('/predict', methods=['POST'])
def predict():
    # Get JSON data from request
    data = request.get_json(force=True)
    
    # Example: expect {"features": [value1, value2, ..., valueN]}
    features = data.get('features')
    
    if features is None:
        return jsonify({"error": "No features provided"}), 400
    
    # Convert to numpy array and reshape
    features_array = np.array(features).reshape(1, -1)
    
    # Make prediction
    prediction = model.predict(features_array)[0]
    
    return jsonify({"prediction": int(prediction)})

if __name__ == '__main__':
    app.run(debug=True)



 * Serving Flask app '__main__'
 * Debug mode: on


 * Running on http://127.0.0.1:5000
Press CTRL+C to quit
 * Restarting with watchdog (windowsapi)


SystemExit: 1

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)


In [44]:
from flask import Flask, request, render_template_string
import pickle
import pandas as pd
import shap

# Load model
model = pickle.load(open("phishing_model.pkl", "rb"))

# Initialize Flask
app = Flask(__name__)

# Home page
@app.route("/")
def home():
    return '''
    <h1>Phishing Detection App</h1>
    <form action="/predict" method="post" enctype="multipart/form-data">
        Upload CSV file: <input type="file" name="file"><br><br>
        <input type="submit" value="Predict">
    </form>
    '''

# Predict route
@app.route("/predict", methods=["POST"])
def predict():
    file = request.files['file']
    df = pd.read_csv(file)
    preds = model.predict(df)
    return f"Predictions: {preds.tolist()}"

if __name__ == "__main__":
    app.run(debug=True)

 * Serving Flask app '__main__'
 * Debug mode: on


 * Running on http://127.0.0.1:5000
Press CTRL+C to quit
 * Restarting with watchdog (windowsapi)


SystemExit: 1

  warn("To exit: use 'exit', 'quit', or Ctrl-D.", stacklevel=1)


## Deployment Documentation for Phishing Detection Model

In [None]:
# User → Flask API → Model (Random Forest Classifier) → Response → User

Setup Instructions
Environment Setup
Install Python 3.8+.
Set up a virtual environment (recommended):
python -m venv venv
source venv/bin/activate  # Linux/Mac
venv\Scripts\activate    # Windows

Install dependencies:
pip install flask pandas scikit-learn shap lime
Directory Structure
project_root/
├── app.py                  # Flask application
├── phishing_model.pkl       # Trained model file
├── templates/
│   └── index.html           # (Optional) UI template
├── static/                  # (Optional) CSS/JS
└── requirements.txt         # Dependency list

Run the Application Locally
python app.py

Open http://127.0.0.1:5000 in a browser or API client.
Cloud Deployment (optional)
AWS EC2 / GCP Compute / Azure VM:
Provision a server.
SSH into the server and clone your project.
Set up environment and run Flask app.
Use gunicorn + nginx for production-grade deployment.
Platform-as-a-Service (e.g., Heroku / App Engine)
Add Procfile, runtime.txt, and deploy via CLI.



## Phishing Detection Model Deployment Documentation


##  Reproducibility and Portability
### Packaged Deployment Script
A deployment script (`deploy.sh`) that automates the following steps:
* Sets up a Python virtual environment.
* Installs required dependencies (`requirements.txt`).
* Launches the Flask application (`app.py`).

### Example `deploy.sh`
`bash
#!/bin/bash
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
python app.py

### Docker Containerization
A `Dockerfile` to containerize the application for easy replication across different environments.

#### Example `Dockerfile`

```Dockerfile
FROM python:3.9-slim

WORKDIR /app

COPY requirements.txt requirements.txt
RUN pip install --no-cache-dir -r requirements.txt

COPY . .
EXPOSE 5000
CMD ["python", "app.py"]
```

### Usage Instructions

```bash
docker build -t phishing-detector .
docker run -p 5000:5000 phishing-detector
```
* Ensure Docker is installed and running.
* Adjust port mapping as needed.
* For cloud deployment, use container orchestration services (e.g., AWS ECS, GCP Cloud Run).


## Overall Conclusion
The project successfully achieved its objective of making the phishing detection model interpretable, deployable, and reproducible. Below is a summary of the key outcomes from all five tasks:

1) Model Explainability Report
Applied SHAP to visualize and interpret feature influences on model predictions.
Identified important features like length_url, nb_dots, and domain_age that significantly impact phishing detection.
Confirmed the model’s decisions are logical and align with domain knowledge.

2) Model Deployment Package
Developed a Flask API with endpoints for predictions and explanations.
Provided setup documentation for local deployment.
Explained deployment options for AWS/GCP.

3)User Interface / Experience
Built a basic web interface for submitting URLs or features for prediction.
Documented app usage with sample inputs and expected outputs.

4)Deployment Documentation
Described architecture: user → API → model → response.
Included detailed setup instructions and environment configuration.
Proposed monitoring and maintenance strategies.

5)Reproducibility and Portability
Provided a deployment script (deploy.sh) for quick setup.
Designed a Dockerfile for containerized deployment to ensure portability.
Shared clear build/run instructions for Docker.

Final Remark:
The phishing detection system is production-ready, combining interpretability, ease of deployment, user-friendliness, and portability. The solution supports scaling and future improvements.