# Deploying Regression Models with Gradio & FastAPI

**DOST-ITDI AI Training Workshop**  
**Module 6: Model Deployment**

---

## Learning Objectives
1. Load trained regression models
2. Create interactive web interfaces with Gradio
3. Build REST APIs with FastAPI
4. Deploy models for real-world use
5. Handle SMILES input and molecular descriptors

## What is Gradio?

**Gradio** is a Python library that makes it easy to create web interfaces for machine learning models.

**Features:**
- Create UI in 3-4 lines of code
- No web development knowledge needed
- Shareable links
- Works with any Python function

**Installation:**
```bash
pip install gradio fastapi uvicorn
```

## Part 1: Train and Save a Regression Model

In [None]:
# Import libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.preprocessing import StandardScaler
import joblib
import warnings
warnings.filterwarnings('ignore')

print("Libraries imported successfully!")

In [None]:
# Load ESOL dataset (solubility prediction)
url = "https://raw.githubusercontent.com/deepchem/deepchem/master/datasets/delaney-processed.csv"
df = pd.read_csv(url)

print(f"Dataset shape: {df.shape}")
print(f"\nFirst few rows:")
df.head()

In [None]:
# Calculate molecular descriptors using RDKit
from rdkit import Chem
from rdkit.Chem import Descriptors

def calculate_descriptors(smiles):
    """Calculate molecular descriptors from SMILES"""
    try:
        mol = Chem.MolFromSmiles(smiles)
        if mol is None:
            return None
        
        return {
            'MolWeight': Descriptors.MolWt(mol),
            'LogP': Descriptors.MolLogP(mol),
            'NumHDonors': Descriptors.NumHDonors(mol),
            'NumHAcceptors': Descriptors.NumHAcceptors(mol),
            'TPSA': Descriptors.TPSA(mol),
            'NumRotatableBonds': Descriptors.NumRotatableBonds(mol),
            'NumAromaticRings': Descriptors.NumAromaticRings(mol)
        }
    except:
        return None

# Calculate descriptors for all molecules
descriptors_list = []
for smiles in df['smiles']:
    desc = calculate_descriptors(smiles)
    descriptors_list.append(desc)

# Create descriptor DataFrame
descriptors_df = pd.DataFrame(descriptors_list)
descriptors_df = descriptors_df.dropna()

# Merge with target variable
df_clean = df.loc[descriptors_df.index].copy()
X = descriptors_df.values
y = df_clean['measured log solubility in mols per litre'].values

print(f"Features shape: {X.shape}")
print(f"Target shape: {y.shape}")
print(f"\nFeature names: {list(descriptors_df.columns)}")

In [None]:
# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train Random Forest model
model = RandomForestRegressor(n_estimators=100, random_state=42, n_jobs=-1)
model.fit(X_train_scaled, y_train)

# Evaluate
from sklearn.metrics import r2_score, mean_absolute_error

train_score = model.score(X_train_scaled, y_train)
test_score = model.score(X_test_scaled, y_test)
y_pred = model.predict(X_test_scaled)
mae = mean_absolute_error(y_test, y_pred)

print(f"Training R² Score: {train_score:.3f}")
print(f"Test R² Score: {test_score:.3f}")
print(f"Test MAE: {mae:.3f}")
print("\nModel trained successfully!")

In [None]:
# Save model and scaler
joblib.dump(model, 'solubility_model.pkl')
joblib.dump(scaler, 'scaler.pkl')

# Save feature names
feature_names = list(descriptors_df.columns)
joblib.dump(feature_names, 'feature_names.pkl')

print("Model, scaler, and feature names saved!")
print(f"  - solubility_model.pkl")
print(f"  - scaler.pkl")
print(f"  - feature_names.pkl")

## Part 2: Create Gradio Interface

In [None]:
import gradio as gr
from rdkit import Chem
from rdkit.Chem import Draw
import io
from PIL import Image

# Load trained model and scaler
model = joblib.load('solubility_model.pkl')
scaler = joblib.load('scaler.pkl')
feature_names = joblib.load('feature_names.pkl')

def predict_solubility(smiles):
    """
    Predict solubility from SMILES notation
    
    Args:
        smiles (str): SMILES notation of molecule
    
    Returns:
        tuple: (prediction, molecule_image, descriptors_text)
    """
    try:
        # Calculate descriptors
        descriptors = calculate_descriptors(smiles)
        
        if descriptors is None:
            return "Invalid SMILES notation", None, "Error: Could not parse SMILES"
        
        # Prepare features
        features = np.array([[descriptors[feat] for feat in feature_names]])
        features_scaled = scaler.transform(features)
        
        # Predict
        prediction = model.predict(features_scaled)[0]
        
        # Generate molecule image
        mol = Chem.MolFromSmiles(smiles)
        img = Draw.MolToImage(mol, size=(300, 300))
        
        # Format descriptors
        desc_text = "**Molecular Descriptors:**\n\n"
        for name, value in descriptors.items():
            desc_text += f"- {name}: {value:.2f}\n"
        
        # Format prediction
        pred_text = f"**Predicted Log Solubility:** {prediction:.3f} mol/L\n\n"
        pred_text += f"**Solubility (g/L):** {10**prediction * descriptors['MolWeight']:.4f} g/L\n\n"
        
        if prediction < -4:
            pred_text += "**Classification:** Poorly soluble"
        elif prediction < -2:
            pred_text += "**Classification:** Moderately soluble"
        else:
            pred_text += "**Classification:** Highly soluble"
        
        return pred_text, img, desc_text
        
    except Exception as e:
        return f"Error: {str(e)}", None, "Could not calculate descriptors"

# Example SMILES
examples = [
    ["CC(=O)Oc1ccccc1C(=O)O"],  # Aspirin
    ["CCO"],  # Ethanol
    ["c1ccccc1"],  # Benzene
    ["CC(=O)Nc1ccc(O)cc1"],  # Paracetamol
    ["CN1C=NC2=C1C(=O)N(C(=O)N2C)C"],  # Caffeine
]

# Create Gradio interface
iface = gr.Interface(
    fn=predict_solubility,
    inputs=gr.Textbox(
        label="Enter SMILES Notation",
        placeholder="e.g., CC(=O)Oc1ccccc1C(=O)O (Aspirin)",
        lines=2
    ),
    outputs=[
        gr.Markdown(label="Prediction"),
        gr.Image(label="Molecule Structure", type="pil"),
        gr.Markdown(label="Molecular Descriptors")
    ],
    title="Solubility Predictor",
    description="""Predict aqueous solubility of chemical compounds from SMILES notation.  
    This model uses Random Forest regression trained on the ESOL dataset.""",
    examples=examples,
    theme="soft",
    allow_flagging="never"
)

# Launch interface
print("Launching Gradio interface...")
iface.launch(share=False, inbrowser=True)

## Part 3: Create FastAPI REST API

In [None]:
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import uvicorn
from typing import Dict

# Create FastAPI app
app = FastAPI(
    title="Solubility Prediction API",
    description="REST API for predicting aqueous solubility of chemical compounds",
    version="1.0.0"
)

# Request model
class PredictionRequest(BaseModel):
    smiles: str
    
    class Config:
        schema_extra = {
            "example": {
                "smiles": "CC(=O)Oc1ccccc1C(=O)O"
            }
        }

# Response model
class PredictionResponse(BaseModel):
    log_solubility: float
    solubility_g_per_L: float
    classification: str
    descriptors: Dict[str, float]

@app.get("/")
def read_root():
    return {
        "message": "Solubility Prediction API",
        "endpoints": {
            "/predict": "POST - Predict solubility from SMILES",
            "/health": "GET - Check API health"
        }
    }

@app.get("/health")
def health_check():
    return {"status": "healthy"}

@app.post("/predict", response_model=PredictionResponse)
def predict(request: PredictionRequest):
    """
    Predict solubility from SMILES notation
    """
    try:
        # Calculate descriptors
        descriptors = calculate_descriptors(request.smiles)
        
        if descriptors is None:
            raise HTTPException(status_code=400, detail="Invalid SMILES notation")
        
        # Prepare features
        features = np.array([[descriptors[feat] for feat in feature_names]])
        features_scaled = scaler.transform(features)
        
        # Predict
        log_solubility = float(model.predict(features_scaled)[0])
        solubility_g_per_L = float(10**log_solubility * descriptors['MolWeight'])
        
        # Classify
        if log_solubility < -4:
            classification = "Poorly soluble"
        elif log_solubility < -2:
            classification = "Moderately soluble"
        else:
            classification = "Highly soluble"
        
        return PredictionResponse(
            log_solubility=log_solubility,
            solubility_g_per_L=solubility_g_per_L,
            classification=classification,
            descriptors=descriptors
        )
        
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

print("FastAPI app created!")
print("\nTo run the API:")
print("  uvicorn app:app --reload")
print("\nAPI will be available at: http://localhost:8000")
print("Interactive docs: http://localhost:8000/docs")

## Part 4: Test the API

In [None]:
import requests
import json

# Note: Run this after starting the FastAPI server
# (Run: uvicorn app:app --reload in terminal)

# Test data
test_smiles = [
    ("Aspirin", "CC(=O)Oc1ccccc1C(=O)O"),
    ("Ethanol", "CCO"),
    ("Benzene", "c1ccccc1"),
    ("Caffeine", "CN1C=NC2=C1C(=O)N(C(=O)N2C)C")
]

print("Testing API with example compounds:\n")

for name, smiles in test_smiles:
    try:
        # Make prediction request
        response = requests.post(
            "http://localhost:8000/predict",
            json={"smiles": smiles}
        )
        
        if response.status_code == 200:
            result = response.json()
            print(f"✓ {name}:")
            print(f"  SMILES: {smiles}")
            print(f"  Log Solubility: {result['log_solubility']:.3f} mol/L")
            print(f"  Solubility: {result['solubility_g_per_L']:.4f} g/L")
            print(f"  Classification: {result['classification']}")
            print()
        else:
            print(f"✗ {name}: Error {response.status_code}")
            
    except requests.exceptions.ConnectionError:
        print("Error: Could not connect to API. Make sure the server is running!")
        print("Run: uvicorn app:app --reload")
        break

## Summary

### What We Learned:

1. **Model Deployment**
   - Saved trained models using joblib
   - Created prediction functions
   - Handled SMILES input

2. **Gradio Interface**
   - Built interactive web UI in ~20 lines
   - Displayed molecule structures
   - Showed predictions and descriptors

3. **FastAPI REST API**
   - Created RESTful endpoints
   - Automatic documentation
   - Error handling
   - JSON responses

### Deployment Options:

1. **Gradio Share**
   ```python
   iface.launch(share=True)  # Creates temporary public link
   ```

2. **Hugging Face Spaces**
   - Free hosting for Gradio apps
   - https://huggingface.co/spaces

3. **Docker + Cloud**
   - Containerize with Docker
   - Deploy to AWS, Google Cloud, Azure

### Next Steps:

1. Try deploying your own models
2. Add more features (batch prediction, visualization)
3. Create user authentication
4. Monitor API usage and performance

---

**Congratulations!** You've learned how to deploy ML models for chemistry research!

---

**Resources:**
- [Gradio Documentation](https://gradio.app/docs)
- [FastAPI Documentation](https://fastapi.tiangolo.com/)
- [Hugging Face Spaces](https://huggingface.co/spaces)