<a href="https://colab.research.google.com/github/Shadabur-Rahaman/30-days-ml-projects/blob/main/Day_30_End2End-ML-Pipeline/notebook/30_end_to_end_ml_pipeline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# End-to-End ML Pipeline: Iris Classification
## Day 30/30 of Machine Learning Project

**Project Goal**: Implement a complete ML pipeline from data collection to deployment

**Pipeline Stages**:
1. Data Collection & Preprocessing
2. Model Training & Evaluation
3. Model Packaging
4. API Development
5. Containerization
6. Deployment
7. Monitoring

**Technologies Used**:
- Scikit-learn for ML
- FastAPI for REST API
- Docker for containerization
- MLflow for experiment tracking
- Grafana for monitoring

## 1. Data Collection & Preprocessing
Collect and prepare the Iris dataset for modeling

In [5]:
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split

# Load dataset
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['target'] = iris.target
df['species'] = df['target'].apply(lambda x: iris.target_names[x])

# Save raw data
df.to_csv('data/iris_raw.csv', index=False)

# Split data
X = df[iris.feature_names]
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Save processed data
X_train.to_csv('data/X_train.csv', index=False)
X_test.to_csv('data/X_test.csv', index=False)
y_train.to_csv('data/y_train.csv', index=False)
y_test.to_csv('data/y_test.csv', index=False)

print(f"Dataset size: {len(df)} samples")
print(f"Training samples: {len(X_train)}")
print(f"Testing samples: {len(X_test)}")
df.head()

Dataset size: 150 samples
Training samples: 120
Testing samples: 30


Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target,species
0,5.1,3.5,1.4,0.2,0,setosa
1,4.9,3.0,1.4,0.2,0,setosa
2,4.7,3.2,1.3,0.2,0,setosa
3,4.6,3.1,1.5,0.2,0,setosa
4,5.0,3.6,1.4,0.2,0,setosa


## 2. Model Training & Evaluation
Train multiple models and track experiments with MLflow

In [8]:
!pip install mlflow scikit-learn -q

import mlflow
import mlflow.sklearn
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, f1_score, confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt

# Set up MLflow
mlflow.set_tracking_uri('file:./mlruns')
mlflow.set_experiment('Iris-Classification')

# Load data
X_train = pd.read_csv('data/X_train.csv')
X_test = pd.read_csv('data/X_test.csv')
y_train = pd.read_csv('data/y_train.csv').squeeze()
y_test = pd.read_csv('data/y_test.csv').squeeze()

# Define models to evaluate
models = {
    'Logistic Regression': LogisticRegression(max_iter=200),
    'SVM': SVC(probability=True),
    'Random Forest': RandomForestClassifier()
}

best_model = None
best_accuracy = 0

for model_name, model in models.items():
    with mlflow.start_run(run_name=model_name):
        # Train model
        model.fit(X_train, y_train)

        # Evaluate
        y_pred = model.predict(X_test)
        accuracy = accuracy_score(y_test, y_pred)
        f1 = f1_score(y_test, y_pred, average='weighted')

        # Log parameters and metrics
        mlflow.log_param('model', model_name)
        mlflow.log_metric('accuracy', accuracy)
        mlflow.log_metric('f1_score', f1)

        # Log model
        mlflow.sklearn.log_model(model, 'model')

        # Save confusion matrix
        cm = confusion_matrix(y_test, y_pred)
        plt.figure(figsize=(8, 6))
        sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
                   xticklabels=iris.target_names,
                   yticklabels=iris.target_names)
        plt.title(f'Confusion Matrix - {model_name}')
        plt.xlabel('Predicted')
        plt.ylabel('Actual')
        plt.savefig(f'monitoring/cm_{model_name.lower().replace(" ", "_")}.png')
        mlflow.log_artifact(f'monitoring/cm_{model_name.lower().replace(" ", "_")}.png')
        plt.close()

        # Track best model
        if accuracy > best_accuracy:
            best_accuracy = accuracy
            best_model = model

        print(f"{model_name} - Accuracy: {accuracy:.4f}, F1: {f1:.4f}")

# Save best model
import joblib
joblib.dump(best_model, 'models/iris_classifier.joblib')
print(f"\nBest model saved: {type(best_model).__name__} with accuracy {best_accuracy:.4f}")



Logistic Regression - Accuracy: 1.0000, F1: 1.0000




SVM - Accuracy: 1.0000, F1: 1.0000




Random Forest - Accuracy: 1.0000, F1: 1.0000

Best model saved: LogisticRegression with accuracy 1.0000


## 3. Model Packaging
Package the model with preprocessing steps

In [10]:
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

# Create preprocessing pipeline
preprocessor = StandardScaler()

# Create full pipeline
final_model = Pipeline([
    ('scaler', preprocessor),
    ('classifier', best_model)
])

# Retrain on full data
X_full = pd.concat([X_train, X_test])
y_full = pd.concat([y_train, y_test])
final_model.fit(X_full, y_full)

# Save final model
joblib.dump(final_model, 'models/iris_pipeline.joblib')

# Test prediction
sample = X_test.iloc[0].values.reshape(1, -1)
pred = final_model.predict(sample)
prob = final_model.predict_proba(sample)
print(f"Sample prediction: {iris.target_names[pred[0]]}")
print(f"Probabilities: {dict(zip(iris.target_names, prob[0]))}")

Sample prediction: versicolor
Probabilities: {np.str_('setosa'): np.float64(0.009627502570247233), np.str_('versicolor'): np.float64(0.8995825425389817), np.str_('virginica'): np.float64(0.0907899548907711)}




## 4. API Development with FastAPI
Create a REST API for model inference

In [12]:
%%writefile api/main.py
from fastapi import FastAPI
from pydantic import BaseModel
import joblib
import pandas as pd
import numpy as np
import os

# Load model
model_path = os.path.join(os.path.dirname(__file__), 'models/iris_pipeline.joblib')
model = joblib.load(model_path)

# Define class names
class_names = ['setosa', 'versicolor', 'virginica']

# Create FastAPI app
app = FastAPI(title="Iris Classification API")

# Define request model
class IrisFeatures(BaseModel):
    sepal_length: float
    sepal_width: float
    petal_length: float
    petal_width: float

# Define response model
class PredictionResult(BaseModel):
    species: str
    confidence: float
    probabilities: dict

@app.get('/')
def health_check():
    return {"status": "healthy"}

@app.post('/predict', response_model=PredictionResult)
def predict(features: IrisFeatures):
    """Make prediction for Iris flower"""
    # Convert features to DataFrame
    input_data = pd.DataFrame([features.dict()])

    # Make prediction
    prediction = model.predict(input_data)
    probabilities = model.predict_proba(input_data)[0]

    # Get confidence
    confidence = np.max(probabilities)

    # Format probabilities
    prob_dict = {class_names[i]: float(prob) for i, prob in enumerate(probabilities)}

    return {
        "species": class_names[prediction[0]],
        "confidence": confidence,
        "probabilities": prob_dict
    }

@app.get('/model_info')
def model_info():
    """Get model information"""
    return {
        "model_type": type(model.named_steps['classifier']).__name__,
        "features": model.named_steps['classifier'].feature_importances_.tolist() if hasattr(model.named_steps['classifier'], 'feature_importances_') else "Not available"
    }

Writing api/main.py


## 5. Containerization with Docker
Package the API in a Docker container

In [13]:
%%writefile api/Dockerfile
FROM python:3.9-slim

# Set working directory
WORKDIR /app

# Copy requirements
COPY requirements.txt .

# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy application files
COPY . .

# Expose port
EXPOSE 8000

# Command to run the application
CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

Writing api/Dockerfile


In [14]:
%%writefile api/requirements.txt
fastapi
uvicorn
scikit-learn
pandas
numpy
pydantic

Writing api/requirements.txt


## 6. Deployment
Build and run the Docker container

In [17]:
# Install required packages
!pip install fastapi uvicorn python-multipart -q

import subprocess
import time
import requests
import os

# Start API in background
api_process = subprocess.Popen(
    ["uvicorn", "api.main:app", "--host", "0.0.0.0", "--port", "8000"],
    cwd=os.getcwd()
)

# Wait for API to start
time.sleep(3)

# Test API
def test_api():
    try:
        # Health check
        health_response = requests.get("http://localhost:8000/")
        print(f"Health Check Status: {health_response.status_code}")
        print(f"Response: {health_response.json()}")

        # Test prediction
        sample_data = {
            "sepal_length": 5.1,
            "sepal_width": 3.5,
            "petal_length": 1.4,
            "petal_width": 0.2
        }
        pred_response = requests.post("http://localhost:8000/predict", json=sample_data)
        print(f"\nPrediction Status: {pred_response.status_code}")
        print("Prediction Response:")
        print(pred_response.json())

        # Model info
        model_info = requests.get("http://localhost:8000/model_info")
        print(f"\nModel Info Status: {model_info.status_code}")
        print("Model Info:")
        print(model_info.json())

        return True
    except requests.exceptions.ConnectionError:
        print("API not running. Starting API...")
        return False

# Test API - retry if needed
if not test_api():
    # Start API if not running
    api_process = subprocess.Popen(
        ["uvicorn", "api.main:app", "--host", "0.0.0.0", "--port", "8000"],
        cwd=os.getcwd()
    )
    time.sleep(3)
    test_api()

# Stop API after testing
api_process.terminate()

API not running. Starting API...
API not running. Starting API...


## 7. Monitoring & Logging
Set up basic monitoring with Prometheus and Grafana

In [18]:
%%writefile api/main.py
# ... previous API code ...

# Add logging
import logging
import time

logging.basicConfig(filename='api_logs.log', level=logging.INFO,
                    format='%(asctime)s - %(levelname)s - %(message)s')

@app.middleware("http")
async def log_requests(request: Request, call_next):
    start_time = time.time()
    response = await call_next(request)
    process_time = (time.time() - start_time) * 1000
    formatted_time = f"{process_time:.2f}ms"

    logging.info(
        f"{request.method} {request.url} - "
        f"Status: {response.status_code} - "
        f"Time: {formatted_time}"
    )
    return response

# Add metrics endpoint
from prometheus_client import make_asgi_app, Counter, Histogram

# Create metrics
REQUEST_COUNT = Counter(
    'api_request_count',
    'API Request Count',
    ['method', 'endpoint', 'status']
)

REQUEST_LATENCY = Histogram(
    'api_request_latency_seconds',
    'API Request Latency',
    ['method', 'endpoint']
)

# Add Prometheus metrics route
metrics_app = make_asgi_app()
app.mount("/metrics", metrics_app)

@app.middleware("http")
async def monitor_requests(request: Request, call_next):
    start_time = time.time()
    response = await call_next(request)
    process_time = time.time() - start_time

    REQUEST_COUNT.labels(
        method=request.method,
        endpoint=request.url.path,
        status=response.status_code
    ).inc()

    REQUEST_LATENCY.labels(
        method=request.method,
        endpoint=request.url.path
    ).observe(process_time)

    return response

Overwriting api/main.py


In [19]:
%%writefile monitoring/prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'iris-api'
    static_configs:
      - targets: ['host.docker.internal:8000']  # For Mac/Windows
        # Use 'docker.for.mac.localhost' for older Docker Mac versions
        # Use 'docker.for.win.localhost' for Windows

Writing monitoring/prometheus.yml


In [20]:
# Start monitoring stack
!docker-compose -f monitoring/docker-compose.yml up -d

print("Monitoring services started:")
print("- Prometheus: http://localhost:9090")
print("- Grafana: http://localhost:3000 (admin/admin)")
print("\nConfigure Grafana dashboard with Prometheus as data source")

/bin/bash: line 1: docker-compose: command not found
Monitoring services started:
- Prometheus: http://localhost:9090
- Grafana: http://localhost:3000 (admin/admin)

Configure Grafana dashboard with Prometheus as data source


## 8. CI/CD Pipeline
Example GitHub Actions workflow for automated testing and deployment

In [24]:
%%writefile github/workflows/ml-pipeline.yml
name: ML Pipeline

on:
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3

    - name: Set up Python
      uses: actions/setup-python@v4
      with:
        python-version: '3.9'

    - name: Install dependencies
      run: |
        python -m pip install --upgrade pip
        pip install -r api/requirements.txt
        pip install pytest

    - name: Run tests
      run: |
        pytest tests/

  build-and-deploy:
    needs: test
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    steps:
    - uses: actions/checkout@v3

    - name: Build Docker image
      run: docker build -t iris-api api/

    - name: Deploy to AWS ECS
      uses: aws-actions/amazon-ecs-deploy-task-definition@v1
      with:
        task-definition: task-definition.json
        service: iris-service
        cluster: iris-cluster
        wait-for-service-stability: true

Writing github/workflows/ml-pipeline.yml


## 9. Testing
Create unit tests for the API

In [25]:
%%writefile tests/test_api.py
import pytest
from fastapi.testclient import TestClient
from api.main import app

client = TestClient(app)

def test_health_check():
    response = client.get("/")
    assert response.status_code == 200
    assert response.json() == {"status": "healthy"}

def test_predict():
    sample = {
        "sepal_length": 5.1,
        "sepal_width": 3.5,
        "petal_length": 1.4,
        "petal_width": 0.2
    }
    response = client.post("/predict", json=sample)
    assert response.status_code == 200
    data = response.json()
    assert "species" in data
    assert "confidence" in data
    assert "probabilities" in data
    assert data["species"] in ["setosa", "versicolor", "virginica"]

def test_model_info():
    response = client.get("/model_info")
    assert response.status_code == 200
    data = response.json()
    assert "model_type" in data
    assert "features" in data

Writing tests/test_api.py


In [26]:
from fastapi import FastAPI, Request, HTTPException
from pydantic import BaseModel
import joblib
import pandas as pd
import numpy as np
import logging
import time
import matplotlib.pyplot as plt
from io import BytesIO
import base64
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from prometheus_client import make_asgi_app, Counter, Histogram
import seaborn as sns

# Load data and train model
iris = load_iris()
model = Pipeline([
    ('scaler', StandardScaler()),
    ('classifier', RandomForestClassifier(n_estimators=100))
])
model.fit(iris.data, iris.target)

# Save and load model
joblib.dump(model, 'iris_model.joblib')
model = joblib.load('iris_model.joblib')

# Create FastAPI app
app = FastAPI(title="Iris Classification API")

# Data for visualizations
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['species'] = iris.target_names[iris.target]

# Visualization Functions
def create_feature_plot():
    """Create feature distribution plot"""
    plt.figure(figsize=(10, 6))
    sns.boxplot(data=df.melt(id_vars='species'), x='variable', y='value', hue='species')
    plt.title('Feature Distributions by Species')
    plt.ylabel('Measurement (cm)')
    plt.xlabel('Feature')
    plt.legend(title='Species')
    plt.tight_layout()

    # Save to buffer
    buf = BytesIO()
    plt.savefig(buf, format='png')
    plt.close()
    return base64.b64encode(buf.getvalue()).decode('utf-8')

def create_decision_boundary_plot():
    """Create PCA decision boundary plot"""
    from sklearn.decomposition import PCA
    from mlxtend.plotting import plot_decision_regions

    # Reduce to 2D with PCA
    pca = PCA(n_components=2)
    X_pca = pca.fit_transform(iris.data)

    # Train a simple classifier for visualization
    from sklearn.linear_model import LogisticRegression
    clf = LogisticRegression()
    clf.fit(X_pca, iris.target)

    # Create plot
    plt.figure(figsize=(10, 8))
    plot_decision_regions(X_pca, iris.target, clf=clf, legend=2)
    plt.xlabel('Principal Component 1')
    plt.ylabel('Principal Component 2')
    plt.title('Decision Boundaries (PCA Reduced)')

    # Save to buffer
    buf = BytesIO()
    plt.savefig(buf, format='png')
    plt.close()
    return base64.b64encode(buf.getvalue()).decode('utf-8')

# API Endpoints with Visualizations
@app.get('/feature_plot')
def get_feature_plot():
    """Endpoint to get feature distribution plot"""
    try:
        plot_data = create_feature_plot()
        return {"plot": plot_data}
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

@app.get('/decision_plot')
def get_decision_plot():
    """Endpoint to get decision boundary plot"""
    try:
        plot_data = create_decision_boundary_plot()
        return {"plot": plot_data}
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

# ... (rest of the API code from previous implementation) ...

In [39]:
# frontend.py (updated)
import gradio as gr
import requests
import matplotlib.pyplot as plt
from io import BytesIO
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris

# Configuration
API_URL = "http://127.0.0.1:8000/predict"  # Changed from localhost
TIMEOUT = 5  # seconds
iris = load_iris()

def safe_api_call(url, json_data=None):
    """Generic API call with error handling"""
    try:
        if json_data:
            response = requests.post(url, json=json_data, timeout=TIMEOUT)
        else:
            response = requests.get(url, timeout=TIMEOUT)

        if response.status_code == 200:
            return response.json()
        return {"error": f"API returned {response.status_code}"}

    except requests.exceptions.ConnectionError:
        return {"error": f"Could not connect to API at {url}. Is the server running?"}
    except requests.exceptions.Timeout:
        return {"error": "API request timed out"}
    except Exception as e:
        return {"error": f"Unexpected error: {str(e)}"}

def classify_flower(sepal_length, sepal_width, petal_length, petal_width):
    """Enhanced classification function with visualization"""
    data = {
        "sepal_length": float(sepal_length),
        "sepal_width": float(sepal_width),
        "petal_length": float(petal_length),
        "petal_width": float(petal_width)
    }

    # API call
    result = safe_api_call(API_URL, data)

    if "error" in result:
        return result["error"], "", "", None, None

    # Visualization functions
    def create_prob_plot(probs):
        plt.figure(figsize=(8, 4))
        plt.bar(probs.keys(), probs.values(), color=['#ff9999','#66b3ff','#99ff99'])
        plt.title("Class Probabilities")
        plt.ylim(0, 1)
        buf = BytesIO()
        plt.savefig(buf, format='png', bbox_inches='tight')
        plt.close()
        return buf

    def create_feature_plot(input_features, species_avg):
        features = list(input_features.keys())
        values = list(input_features.values())
        avg_values = species_avg.loc[species_avg['species'] == result['species']][features].values[0]

        plt.figure(figsize=(10, 5))
        x = range(len(features))
        plt.bar(x, values, width=0.4, label='Your Input')
        plt.bar([i + 0.4 for i in x], avg_values, width=0.4, label='Species Average')
        plt.xticks([i + 0.2 for i in x], features)
        plt.legend()
        plt.title("Feature Comparison")
        buf = BytesIO()
        plt.savefig(buf, format='png', bbox_inches='tight')
        plt.close()
        return buf

    # Prepare species averages
    df = pd.DataFrame(iris.data, columns=iris.feature_names)
    df['species'] = iris.target_names[iris.target]
    species_avg = df.groupby('species').mean().reset_index()

    return (
        f"Predicted: {result['species'].upper()}",
        f"Confidence: {result['confidence']:.1%}",
        "\n".join([f"{k}: {v:.1%}" for k, v in result['probabilities'].items()]),
        create_prob_plot(result['probabilities']),
        create_feature_plot(data, species_avg)
    )


In [46]:
import gradio as gr
import requests
import os

# Detect environment
IN_COLAB = 'COLAB_GPU' in os.environ

# Configure API URL
if IN_COLAB:
    from google.colab import output
    API_PORT = 8000
    output.serve_kernel_port_as_iframe(API_PORT)
    API_URL = f"https://localhost:{API_PORT}/predict"
else:
    API_URL = "http://127.0.0.1:8000/predict"

def classify_flower(sepal_length, sepal_width, petal_length, petal_width):
    try:
        response = requests.post(API_URL, json={
            "sepal_length": sepal_length,
            "sepal_width": sepal_width,
            "petal_length": petal_length,
            "petal_width": petal_width
        }, timeout=5)

        if response.status_code == 200:
            result = response.json()
            return f"Predicted: {result['species']}", f"Confidence: {result['confidence']:.0%}"
        return "API Error", f"Status code: {response.status_code}"

    except requests.exceptions.ConnectionError:
        if IN_COLAB:
            return "Connection Error", "Ensure you ran the API cell first"
        return "Connection Error", "Start API with: uvicorn api.main:app --port 8000"
    except Exception as e:
        return "Error", str(e)

with gr.Blocks() as demo:
    gr.Markdown("# Iris Classifier")
    with gr.Row():
        with gr.Column():
            sl = gr.Slider(4, 8, value=5.1, label="Sepal Length")
            sw = gr.Slider(2, 5, value=3.5, label="Sepal Width")
            pl = gr.Slider(1, 7, value=1.4, label="Petal Length")
            pw = gr.Slider(0.1, 3, value=0.2, label="Petal Width")
            btn = gr.Button("Classify")
        with gr.Column():
            species = gr.Textbox(label="Prediction")
            confidence = gr.Textbox(label="Confidence")

    btn.click(classify_flower, [sl, sw, pl, pw], [species, confidence])

if __name__ == "__main__":
    if IN_COLAB:
        demo.launch(share=True, server_port=7860)
    else:
        demo.launch(server_port=7860)

<IPython.core.display.Javascript object>

OSError: Cannot find empty port in range: 7860-7860. You can specify a different port by setting the GRADIO_SERVER_PORT environment variable or passing the `server_port` parameter to `launch()`.

In [59]:
# Install only Gradio - no need for FastAPI or ngrok
!pip install gradio scikit-learn > /dev/null

import gradio as gr
import numpy as np
from sklearn.datasets import load_iris
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

# Load iris dataset and train a model
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42)
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)

# Create classification function
def classify_flower(sepal_length, sepal_width, petal_length, petal_width):
    # Create input array
    input_data = np.array([[sepal_length, sepal_width, petal_length, petal_width]])

    # Make prediction
    prediction = model.predict(input_data)[0]
    probabilities = model.predict_proba(input_data)[0]

    # Get species name
    species = iris.target_names[prediction]

    # Format results
    confidence = probabilities[prediction]
    prob_text = "\n".join([f"{iris.target_names[i]}: {prob:.1%}" for i, prob in enumerate(probabilities)])

    # Create visualization
    plt = create_visualization(probabilities, species, [sepal_length, sepal_width, petal_length, petal_width])

    return species, f"{confidence:.1%}", prob_text, plt

# Create visualization
def create_visualization(probabilities, species, features):
    import matplotlib.pyplot as plt
    from matplotlib.colors import LinearSegmentedColormap

    # Create figure
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))

    # Probability plot
    colors = ['#FF9999', '#66B3FF', '#99FF99']
    bars = ax1.bar(iris.target_names, probabilities, color=colors)
    ax1.set_title('Class Probabilities', fontweight='bold')
    ax1.set_ylabel('Probability')
    ax1.set_ylim(0, 1)
    ax1.grid(axis='y', alpha=0.3)

    # Add value labels
    for bar in bars:
        height = bar.get_height()
        ax1.text(bar.get_x() + bar.get_width()/2., height,
                f'{height:.1%}', ha='center', va='bottom')

    # Feature comparison radar plot
    categories = ['Sepal Length', 'Sepal Width', 'Petal Length', 'Petal Width']
    N = len(categories)

    # Compute angles
    angles = [n / float(N) * 2 * np.pi for n in range(N)]
    angles += angles[:1]

    # Species averages
    species_avg = []
    for i in range(len(iris.target_names)):
        species_data = iris.data[iris.target == i]
        species_avg.append(np.mean(species_data, axis=0))

    # Plot setup
    ax2 = plt.subplot(122, polar=True)
    ax2.set_theta_offset(np.pi/2)
    ax2.set_theta_direction(-1)

    # Plot species averages
    for i, species_name in enumerate(iris.target_names):
        values = species_avg[i].tolist()
        values += values[:1]
        ax2.plot(angles, values, linewidth=1, linestyle='solid', label=species_name)
        ax2.fill(angles, values, alpha=0.1)

    # Plot input features
    input_values = features + [features[0]]
    ax2.plot(angles, input_values, color='red', linewidth=2, linestyle='solid', label='Your Input')
    ax2.scatter(angles, input_values, color='red', s=50)

    # Add labels
    plt.xticks(angles[:-1], categories)
    plt.title('Feature Comparison', fontweight='bold')
    plt.legend(loc='upper right', bbox_to_anchor=(1.3, 1.1))

    plt.tight_layout()
    return fig

# Create Gradio interface
with gr.Blocks(title="Iris Flower Classifier", theme=gr.themes.Soft()) as demo:
    gr.Markdown("# üå∏ Iris Flower Classifier")
    gr.Markdown("Enter measurements to classify iris flowers")

    with gr.Row():
        with gr.Column():
            sl = gr.Slider(4.0, 8.0, value=5.1, step=0.1, label="Sepal Length (cm)")
            sw = gr.Slider(2.0, 4.5, value=3.5, step=0.1, label="Sepal Width (cm)")
            pl = gr.Slider(1.0, 7.0, value=1.4, step=0.1, label="Petal Length (cm)")
            pw = gr.Slider(0.1, 2.5, value=0.2, step=0.1, label="Petal Width (cm)")
            btn = gr.Button("Classify", variant="primary")

            gr.Examples(
                examples=[
                    [5.1, 3.5, 1.4, 0.2],  # Setosa
                    [6.0, 3.0, 4.0, 1.2],  # Versicolor
                    [7.0, 3.2, 6.0, 2.0]   # Virginica
                ],
                inputs=[sl, sw, pl, pw],
                label="Example Measurements"
            )

        with gr.Column():
            species = gr.Textbox(label="Predicted Species")
            confidence = gr.Textbox(label="Confidence")
            probabilities = gr.Textbox(label="Class Probabilities", lines=4)
            plot = gr.Plot(label="Visualization")

    btn.click(
        classify_flower,
        inputs=[sl, sw, pl, pw],
        outputs=[species, confidence, probabilities, plot]
    )

# Launch the interface
demo.launch(share=True)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://567efc98b8eb496883.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




In [42]:
import gradio as gr
import requests
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from io import BytesIO
import seaborn as sns
from sklearn.datasets import load_iris

# Configuration
API_URL = "http://127.0.0.1:8000/predict"  # Using 127.0.0.1 instead of localhost for better reliability
TIMEOUT = 5  # seconds for API requests
iris = load_iris()

def safe_api_call(url, json_data=None):
    """Enhanced API call with comprehensive error handling"""
    try:
        if json_data:
            response = requests.post(url, json=json_data, timeout=TIMEOUT)
        else:
            response = requests.get(url, timeout=TIMEOUT)

        response.raise_for_status()  # Raises HTTPError for bad responses
        return response.json()

    except requests.exceptions.RequestException as e:
        error_msg = f"API Error: {str(e)}"
        if isinstance(e, requests.exceptions.ConnectionError):
            error_msg = f"Could not connect to API at {url}. Please ensure the server is running."
        elif isinstance(e, requests.exceptions.Timeout):
            error_msg = "API request timed out. The server may be overloaded."
        return {"error": error_msg}

def create_probability_plot(probabilities):
    """Create styled probability bar chart"""
    plt.figure(figsize=(8, 5))
    colors = ['#ff9999', '#66b3ff', '#99ff99']
    bars = plt.bar(probabilities.keys(), probabilities.values(), color=colors)

    # Add value labels
    for bar in bars:
        height = bar.get_height()
        plt.text(bar.get_x() + bar.get_width()/2., height,
                f'{height:.1%}',
                ha='center', va='bottom')

    plt.title('Class Probabilities', fontweight='bold')
    plt.ylabel('Probability')
    plt.ylim(0, 1.1)
    plt.grid(axis='y', alpha=0.3)
    plt.tight_layout()

    buf = BytesIO()
    plt.savefig(buf, format='png', dpi=100, bbox_inches='tight')
    plt.close()
    return buf

def create_feature_comparison(input_features, predicted_species):
    """Create radar chart comparing input to species averages"""
    # Prepare data
    df = pd.DataFrame(iris.data, columns=iris.feature_names)
    df['species'] = iris.target_names[iris.target]
    species_avg = df.groupby('species').mean().reset_index()

    categories = list(input_features.keys())
    N = len(categories)
    angles = [n / float(N) * 2 * np.pi for n in range(N)]
    angles += angles[:1]

    # Create plot
    plt.figure(figsize=(8, 8))
    ax = plt.subplot(111, polar=True)

    # Plot species averages
    for species in species_avg['species']:
        values = species_avg[species_avg['species'] == species][categories].values[0].tolist()
        values += values[:1]
        ax.plot(angles, values, linewidth=1, linestyle='solid', label=species)
        ax.fill(angles, values, alpha=0.1)

    # Plot input features
    input_values = [input_features[k] for k in categories] + [input_features[categories[0]]]
    ax.plot(angles, input_values, color='red', linewidth=3, linestyle='solid', label='Your Input')
    ax.scatter(angles, input_values, color='red', s=100, zorder=10)

    # Customize plot
    plt.xticks(angles[:-1], categories)
    plt.title(f'Feature Comparison (Predicted: {predicted_species})', fontweight='bold')
    plt.legend(loc='upper right', bbox_to_anchor=(1.3, 1.1))
    plt.tight_layout()

    buf = BytesIO()
    plt.savefig(buf, format='png', dpi=100, bbox_inches='tight')
    plt.close()
    return buf

def classify_flower(sepal_length, sepal_width, petal_length, petal_width):
    """Main classification function with visualization"""
    data = {
        "sepal_length": float(sepal_length),
        "sepal_width": float(sepal_width),
        "petal_length": float(petal_length),
        "petal_width": float(petal_width)
    }

    # API call with error handling
    result = safe_api_call(API_URL, data)

    if "error" in result:
        return result["error"], "", "", None, None

    # Format results
    probs = "\n".join([f"{k}: {v:.1%}" for k, v in result['probabilities'].items()])

    return (
        f"Predicted: {result['species'].upper()}",
        f"Confidence: {result['confidence']:.1%}",
        probs,
        create_probability_plot(result['probabilities']),
        create_feature_comparison(data, result['species'])
    )

# Main Gradio Interface
with gr.Blocks(title="Iris Classifier", theme=gr.themes.Soft()) as app:
    gr.Markdown("# üå∏ Iris Flower Classification Dashboard")
    gr.Markdown("Analyze iris measurements and classify species with visual insights")

    with gr.Tab("Classifier"):
        with gr.Row():
            with gr.Column():
                gr.Markdown("### Input Features")
                sl = gr.Slider(4.0, 8.0, value=5.1, label="Sepal Length (cm)", step=0.1)
                sw = gr.Slider(2.0, 4.5, value=3.5, label="Sepal Width (cm)", step=0.1)
                pl = gr.Slider(1.0, 7.0, value=1.4, label="Petal Length (cm)", step=0.1)
                pw = gr.Slider(0.1, 2.5, value=0.2, label="Petal Width (cm)", step=0.1)
                btn = gr.Button("Classify", variant="primary")

                gr.Markdown("### Example Measurements")
                gr.Examples(
                    examples=[
                        [5.1, 3.5, 1.4, 0.2],  # Setosa
                        [6.0, 3.0, 4.0, 1.2],  # Versicolor
                        [7.0, 3.2, 6.0, 2.0]   # Virginica
                    ],
                    inputs=[sl, sw, pl, pw],
                    label="Try these examples"
                )

            with gr.Column():
                gr.Markdown("### Prediction Results")
                species = gr.Textbox(label="Species Prediction", interactive=False)
                confidence = gr.Textbox(label="Confidence Score", interactive=False)
                probabilities = gr.Textbox(label="Class Probabilities", lines=4, interactive=False)

                gr.Markdown("### Probability Visualization")
                prob_plot = gr.Plot(label="Class Probabilities")

                gr.Markdown("### Feature Comparison")
                feature_plot = gr.Plot(label="Feature Radar Chart")

        btn.click(
            fn=classify_flower,
            inputs=[sl, sw, pl, pw],
            outputs=[species, confidence, probabilities, prob_plot, feature_plot]
        )

    with gr.Tab("Data Insights"):
        gr.Markdown("## Iris Dataset Visualizations")

        with gr.Row():
            with gr.Column():
                gr.Markdown("### Feature Distributions")
                feature_dist_plot = gr.Plot()

                gr.Markdown("### Decision Boundaries")
                decision_plot = gr.Plot()

        # Load visualizations when tab is opened
        def load_insights():
            # Create feature distribution plot
            plt.figure(figsize=(10, 6))
            sns.boxplot(data=pd.DataFrame(iris.data, columns=iris.feature_names).melt(),
                       x='variable', y='value', hue=iris.target_names[iris.target])
            plt.title("Feature Distributions by Species")
            plt.ylabel("Measurement (cm)")
            plt.xlabel("Feature")
            plt.legend(title="Species")
            plt.tight_layout()
            dist_buf = BytesIO()
            plt.savefig(dist_buf, format='png')
            plt.close()

            # Create decision boundary plot (simplified)
            plt.figure(figsize=(10, 8))
            from sklearn.decomposition import PCA
            pca = PCA(n_components=2)
            X_pca = pca.fit_transform(iris.data)
            plt.scatter(X_pca[:, 0], X_pca[:, 1], c=iris.target, cmap='viridis')
            plt.title("PCA Projection of Iris Data")
            plt.xlabel("Principal Component 1")
            plt.ylabel("Principal Component 2")
            plt.colorbar(label="Species")
            decision_buf = BytesIO()
            plt.savefig(decision_buf, format='png')
            plt.close()

            return dist_buf, decision_buf

        app.load(load_insights, inputs=None, outputs=[feature_dist_plot, decision_plot])

    with gr.Tab("Model Info"):
        gr.Markdown("## Model Information")

        with gr.Row():
            with gr.Column():
                gr.Markdown("### Feature Importances")
                feature_importance = gr.Plot()

                # Create feature importance plot
                def create_feature_importance():
                    features = iris.feature_names
                    importances = [0.1, 0.3, 0.4, 0.2]  # Example values

                    plt.figure(figsize=(10, 5))
                    plt.barh(features, importances, color='skyblue')
                    plt.title("Feature Importances")
                    plt.xlabel("Importance Score")
                    plt.tight_layout()

                    buf = BytesIO()
                    plt.savefig(buf, format='png')
                    plt.close()
                    return buf

                feature_importance.value = create_feature_importance()

            with gr.Column():
                gr.Markdown("### Model Metadata")
                model_type = gr.Textbox(label="Model Type", value="Random Forest", interactive=False)
                num_classes = gr.Textbox(label="Number of Classes", value=str(len(iris.target_names)), interactive=False)
                num_features = gr.Textbox(label="Number of Features", value=str(len(iris.feature_names)), interactive=False)
                training_size = gr.Textbox(label="Training Samples", value=str(len(iris.data)), interactive=False)

if __name__ == "__main__":
    # Verify API connection before launching
    test_response = safe_api_call(API_URL.replace('/predict', ''))
    if "error" in test_response:
        print(f"‚ö†Ô∏è {test_response['error']}")
        print("\nPlease start the FastAPI server first with:")
        print("uvicorn api.main:app --reload --port 8000")
        print("\nThen run this Gradio interface in a separate terminal.")
    else:
        print("API connection successful! Launching Gradio interface...")
        app.launch(
            server_port=7860,
            server_name="0.0.0.0",
            show_error=True,
            share=False  # Set to True if you want a public link (for Colab)
        )
from google.colab import output
output.serve_kernel_port_as_window(8000)  # For API
output.serve_kernel_port_as_window(7860)  # For Gradio
app.launch(share=True)  # In the Gradio launch command

‚ö†Ô∏è Could not connect to API at http://127.0.0.1:8000. Please ensure the server is running.

Please start the FastAPI server first with:
uvicorn api.main:app --reload --port 8000

Then run this Gradio interface in a separate terminal.
Try `serve_kernel_port_as_iframe` instead. [0m


<IPython.core.display.Javascript object>

Try `serve_kernel_port_as_iframe` instead. [0m


<IPython.core.display.Javascript object>

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://a235f901fa69d9fd39.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




In [32]:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
from mlxtend.plotting import plot_decision_regions
from sklearn.linear_model import LogisticRegression

# Load iris dataset
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['species'] = iris.target_names[iris.target]

# 1. Feature Distribution Plot
plt.figure(figsize=(12, 8))
sns.boxplot(data=df.melt(id_vars='species'), x='variable', y='value', hue='species')
plt.title('Feature Distributions by Species')
plt.ylabel('Measurement (cm)')
plt.xlabel('Feature')
plt.legend(title='Species')
plt.tight_layout()
plt.savefig('outputs/feature_distribution.png')
plt.close()

# 2. Pair Plot
sns.pairplot(df, hue='species', diag_kind='hist', markers=['o', 's', 'D'])
plt.suptitle('Feature Relationships', y=1.02)
plt.savefig('outputs/pair_plot.png')
plt.close()

# 3. Correlation Heatmap
plt.figure(figsize=(10, 8))
corr = df.corr(numeric_only=True)
sns.heatmap(corr, annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Feature Correlation Matrix')
plt.tight_layout()
plt.savefig('outputs/correlation_heatmap.png')
plt.close()

# 4. Decision Boundaries (PCA Reduced)
pca = PCA(n_components=2)
X_pca = pca.fit_transform(iris.data)
clf = LogisticRegression()
clf.fit(X_pca, iris.target)

plt.figure(figsize=(10, 8))
plot_decision_regions(X_pca, iris.target, clf=clf, legend=2)
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.title('Decision Boundaries (PCA Reduced)')
plt.savefig('outputs/decision_boundaries.png')
plt.close()

print("Visualizations saved to current directory")

Visualizations saved to current directory


In [22]:
import os

# Create a new folder named 'my_folder' in the current directory
os.makedirs('.github', exist_ok=True)

In [35]:
!zip -r colab_files_backup.zip /content/


  adding: content/ (stored 0%)
  adding: content/.config/ (stored 0%)
  adding: content/.config/config_sentinel (stored 0%)
  adding: content/.config/gce (stored 0%)
  adding: content/.config/active_config (stored 0%)
  adding: content/.config/configurations/ (stored 0%)
  adding: content/.config/configurations/config_default (deflated 15%)
  adding: content/.config/default_configs.db (deflated 98%)
  adding: content/.config/.last_update_check.json (deflated 22%)
  adding: content/.config/.last_opt_in_prompt.yaml (stored 0%)
  adding: content/.config/logs/ (stored 0%)
  adding: content/.config/logs/2025.06.12/ (stored 0%)
  adding: content/.config/logs/2025.06.12/13.35.58.871934.log (deflated 57%)
  adding: content/.config/logs/2025.06.12/13.35.39.784909.log (deflated 58%)
  adding: content/.config/logs/2025.06.12/13.35.48.692010.log (deflated 86%)
  adding: content/.config/logs/2025.06.12/13.35.49.978706.log (deflated 58%)
  adding: content/.config/logs/2025.06.12/13.35.19.246604.log 

## 10. Future Enhancements

1. **Data Versioning**: Use DVC for data version control
2. **Feature Store**: Implement a feature store for reusable features
3. **Model Registry**: Use MLflow Model Registry for versioning
4. **Drift Detection**: Monitor data and concept drift
5. **A/B Testing**: Implement canary deployments
6. **AutoML**: Integrate automated model selection
7. **Scalability**: Add Kubernetes for orchestration
8. **Security**: Implement authentication and rate limiting

```mermaid
graph LR
    A[Data Collection] --> B[Preprocessing]
    B --> C[Training]
    C --> D[Validation]
    D --> E[Deployment]
    E --> F[Monitoring]
    F --> G[Retraining]
    G --> C
```