# Cytochrome P450 Complex Analysis - Interactive Jupyter Notebook

This notebook provides an interactive environment for analyzing CYP450 enzyme complexes and drug interactions. It complements the Streamlit web application with enhanced computational capabilities.

## Why Jupyter Notebooks Make This More Efficient:

### 1. **Interactive Development & Prototyping**
- Test individual functions and algorithms step-by-step
- Iterate on data analysis without rerunning entire applications
- Visualize intermediate results and debug complex calculations

### 2. **Advanced Computational Analysis**
- Perform complex mathematical operations with immediate feedback
- Run computationally intensive analyses (PCA, clustering, statistical modeling)
- Access powerful scientific computing libraries (SciPy, scikit-learn, etc.)

### 3. **Research Documentation & Reproducibility**
- Combine code, visualizations, and explanatory text in one document
- Create reproducible research workflows with clear methodology
- Share analysis results with colleagues in an executable format

### 4. **Enhanced Data Exploration**
- Interactive widgets for parameter exploration
- Real-time plot updates as you change parameters
- Flexible data manipulation and filtering

### 5. **Machine Learning Integration**
- Train predictive models for drug-drug interactions
- Develop classification algorithms for drug metabolism pathways
- Validate models with cross-validation and statistical testing

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import seaborn as sns
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix
import ipywidgets as widgets
from IPython.display import display, HTML
import warnings
warnings.filterwarnings('ignore')

# Try to import molecular visualization
try:
    import py3Dmol
    PY3DMOL_AVAILABLE = True
    print("✓ py3Dmol available for 3D molecular visualization")
except ImportError:
    PY3DMOL_AVAILABLE = False
    print("⚠ py3Dmol not available. 3D visualization will be limited.")

print("Jupyter environment initialized for CYP450 analysis")

## CYP450 Enzyme Data Setup

Define the major CYP450 enzymes and their characteristics:

In [None]:
# CYP450 Enzyme Database
cyp450_enzymes = {
    'CYP1A2': {
        'uniprot_id': 'P05177',
        'substrates': ['Caffeine', 'Theophylline', 'Clozapine', 'Fluvoxamine'],
        'inhibitors': ['Fluvoxamine', 'Ciprofloxacin', 'Enoxacin'],
        'inducers': ['Smoking', 'Omeprazole', 'Lansoprazole'],
        'drug_percentage': 5,
        'genetic_variants': 'Low',
        'clinical_significance': 'Caffeine metabolism, drug-smoking interactions'
    },
    'CYP2C9': {
        'uniprot_id': 'P11712',
        'substrates': ['Warfarin', 'Phenytoin', 'Tolbutamide', 'Ibuprofen'],
        'inhibitors': ['Fluconazole', 'Amiodarone', 'Sulfinpyrazone'],
        'inducers': ['Rifampin', 'Carbamazepine', 'Phenobarbital'],
        'drug_percentage': 15,
        'genetic_variants': 'Moderate',
        'clinical_significance': 'Warfarin dosing, bleeding risk'
    },
    'CYP2C19': {
        'uniprot_id': 'P33261',
        'substrates': ['Omeprazole', 'Clopidogrel', 'Diazepam', 'Propranolol'],
        'inhibitors': ['Omeprazole', 'Fluoxetine', 'Fluvoxamine'],
        'inducers': ['Rifampin', 'Carbamazepine', 'St Johns Wort'],
        'drug_percentage': 10,
        'genetic_variants': 'High',
        'clinical_significance': 'PPI efficacy, antiplatelet therapy'
    },
    'CYP2D6': {
        'uniprot_id': 'P10635',
        'substrates': ['Codeine', 'Tramadol', 'Metoprolol', 'Paroxetine'],
        'inhibitors': ['Paroxetine', 'Fluoxetine', 'Quinidine'],
        'inducers': [],  # CYP2D6 is not significantly induced
        'drug_percentage': 25,
        'genetic_variants': 'Very High',
        'clinical_significance': 'Antidepressants, opioid activation'
    },
    'CYP2E1': {
        'uniprot_id': 'P05181',
        'substrates': ['Acetaminophen', 'Ethanol', 'Halothane'],
        'inhibitors': ['Disulfiram', '4-Methylpyrazole'],
        'inducers': ['Ethanol', 'Isoniazid'],
        'drug_percentage': 2,
        'genetic_variants': 'Low',
        'clinical_significance': 'Alcohol-drug interactions, hepatotoxicity'
    },
    'CYP3A4': {
        'uniprot_id': 'P08684',
        'substrates': ['Midazolam', 'Atorvastatin', 'Cyclosporine', 'Tacrolimus'],
        'inhibitors': ['Ketoconazole', 'Ritonavir', 'Clarithromycin', 'Grapefruit'],
        'inducers': ['Rifampin', 'Phenytoin', 'Carbamazepine', 'St Johns Wort'],
        'drug_percentage': 50,
        'genetic_variants': 'Moderate',
        'clinical_significance': 'Major drug interaction site, most drugs'
    }
}

# Create DataFrame for analysis
cyp_df = pd.DataFrame.from_dict(cyp450_enzymes, orient='index')
cyp_df.index.name = 'Enzyme'
cyp_df = cyp_df.reset_index()

print("CYP450 enzyme database created with", len(cyp_df), "major enzymes")
display(cyp_df[['Enzyme', 'drug_percentage', 'genetic_variants', 'clinical_significance']])

## Interactive Parameter Exploration

Use interactive widgets to explore CYP450 enzyme characteristics:

In [None]:
# Create interactive widgets for enzyme exploration
@widgets.interact
def explore_cyp450_enzyme(enzyme=list(cyp450_enzymes.keys())):
    """
    Interactive widget to explore CYP450 enzyme characteristics
    """
    data = cyp450_enzymes[enzyme]
    
    print(f"\n🧬 {enzyme} Analysis")
    print(f"{'='*50}")
    print(f"UniProt ID: {data['uniprot_id']}")
    print(f"Drug metabolism percentage: {data['drug_percentage']}%")
    print(f"Genetic variant frequency: {data['genetic_variants']}")
    print(f"Clinical significance: {data['clinical_significance']}")
    
    print(f"\n📊 Substrates ({len(data['substrates'])}):")  
    for substrate in data['substrates'][:5]:  # Show first 5
        print(f"  • {substrate}")
    
    print(f"\n🚫 Inhibitors ({len(data['inhibitors'])}):")  
    for inhibitor in data['inhibitors'][:5]:  # Show first 5
        print(f"  • {inhibitor}")
    
    print(f"\n⬆️  Inducers ({len(data['inducers'])}):")  
    for inducer in data['inducers'][:5]:  # Show first 5
        print(f"  • {inducer}")
    
    return data

## Advanced Data Visualization

Create interactive plots for CYP450 analysis:

In [None]:
# Create comprehensive visualization dashboard
def create_cyp450_dashboard():
    """
    Create an interactive dashboard for CYP450 analysis
    """
    fig = make_subplots(
        rows=2, cols=2,
        subplot_titles=(
            'Drug Metabolism Distribution', 'Genetic Variant Frequency',
            'Substrate Count by Enzyme', 'Clinical Risk Assessment'
        ),
        specs=[[{"type": "bar"}, {"type": "pie"}],
               [{"type": "scatter"}, {"type": "bar"}]]
    )
    
    # Plot 1: Drug metabolism percentage
    fig.add_trace(
        go.Bar(
            x=cyp_df['Enzyme'], 
            y=cyp_df['drug_percentage'],
            name='Drug %',
            marker_color='lightblue'
        ),
        row=1, col=1
    )
    
    # Plot 2: Genetic variants (pie chart)
    variant_counts = cyp_df['genetic_variants'].value_counts()
    fig.add_trace(
        go.Pie(
            labels=variant_counts.index,
            values=variant_counts.values,
            name="Genetic Variants"
        ),
        row=1, col=2
    )
    
    # Plot 3: Substrate count vs drug percentage
    substrate_counts = [len(cyp450_enzymes[enzyme]['substrates']) for enzyme in cyp_df['Enzyme']]
    fig.add_trace(
        go.Scatter(
            x=substrate_counts,
            y=cyp_df['drug_percentage'],
            mode='markers+text',
            text=cyp_df['Enzyme'],
            textposition="top center",
            marker=dict(size=10, color='red'),
            name='Enzyme Profile'
        ),
        row=2, col=1
    )
    
    # Plot 4: Clinical risk (based on drug percentage and genetic variants)
    risk_scores = []
    for _, row in cyp_df.iterrows():
        base_score = row['drug_percentage']
        if row['genetic_variants'] == 'Very High':
            risk_score = base_score * 1.5
        elif row['genetic_variants'] == 'High':
            risk_score = base_score * 1.3
        elif row['genetic_variants'] == 'Moderate':
            risk_score = base_score * 1.1
        else:
            risk_score = base_score
        risk_scores.append(risk_score)
    
    fig.add_trace(
        go.Bar(
            x=cyp_df['Enzyme'],
            y=risk_scores,
            name='Clinical Risk',
            marker_color='orange'
        ),
        row=2, col=2
    )
    
    fig.update_layout(
        height=800,
        title_text="CYP450 Comprehensive Analysis Dashboard",
        showlegend=False
    )
    
    # Update axis labels
    fig.update_xaxes(title_text="Enzyme", row=1, col=1)
    fig.update_yaxes(title_text="Drug Metabolism %", row=1, col=1)
    fig.update_xaxes(title_text="Substrate Count", row=2, col=1)
    fig.update_yaxes(title_text="Drug %", row=2, col=1)
    fig.update_xaxes(title_text="Enzyme", row=2, col=2)
    fig.update_yaxes(title_text="Clinical Risk Score", row=2, col=2)
    
    return fig

# Display the dashboard
dashboard = create_cyp450_dashboard()
dashboard.show()

## Machine Learning for Drug Interaction Prediction

Demonstrate how Jupyter notebooks enable advanced ML workflows:

In [None]:
# Generate synthetic drug interaction data for demonstration
def generate_drug_interaction_data(n_samples=1000):
    """
    Generate synthetic drug interaction data for ML demonstration
    In practice, this would come from clinical databases
    """
    np.random.seed(42)
    
    # Features: molecular properties and enzyme affinities
    molecular_weight = np.random.normal(350, 100, n_samples)
    logp = np.random.normal(2.5, 1.5, n_samples)
    cyp3a4_affinity = np.random.uniform(0, 1, n_samples)
    cyp2d6_affinity = np.random.uniform(0, 1, n_samples)
    cyp2c9_affinity = np.random.uniform(0, 1, n_samples)
    
    # Target: interaction risk (0=low, 1=medium, 2=high)
    interaction_risk = []
    for i in range(n_samples):
        risk_score = (
            cyp3a4_affinity[i] * 0.5 +  # CYP3A4 is major interaction site
            cyp2d6_affinity[i] * 0.3 +   # CYP2D6 genetic variability
            cyp2c9_affinity[i] * 0.2 +   # CYP2C9 warfarin interactions
            (logp[i] > 3) * 0.2 +         # Lipophilic drugs
            (molecular_weight[i] > 500) * 0.1  # Large molecules
        )
        
        if risk_score < 0.4:
            interaction_risk.append(0)  # Low risk
        elif risk_score < 0.7:
            interaction_risk.append(1)  # Medium risk  
        else:
            interaction_risk.append(2)  # High risk
    
    data = pd.DataFrame({
        'molecular_weight': molecular_weight,
        'logp': logp,
        'cyp3a4_affinity': cyp3a4_affinity,
        'cyp2d6_affinity': cyp2d6_affinity,
        'cyp2c9_affinity': cyp2c9_affinity,
        'interaction_risk': interaction_risk
    })
    
    return data

# Generate and explore the dataset
ml_data = generate_drug_interaction_data()
print("Generated ML dataset with", len(ml_data), "samples")
print("\nTarget distribution:")
print(ml_data['interaction_risk'].value_counts().sort_index())
print("\n0 = Low risk, 1 = Medium risk, 2 = High risk")

display(ml_data.head())

In [None]:
# Train machine learning model for drug interaction prediction
def train_interaction_model(data):
    """
    Train Random Forest model to predict drug interaction risk
    """
    # Prepare features and target
    feature_cols = ['molecular_weight', 'logp', 'cyp3a4_affinity', 'cyp2d6_affinity', 'cyp2c9_affinity']
    X = data[feature_cols]
    y = data['interaction_risk']
    
    # Split data
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y)
    
    # Scale features
    scaler = StandardScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    X_test_scaled = scaler.transform(X_test)
    
    # Train model
    model = RandomForestClassifier(n_estimators=100, random_state=42)
    model.fit(X_train_scaled, y_train)
    
    # Evaluate
    y_pred = model.predict(X_test_scaled)
    accuracy = model.score(X_test_scaled, y_test)
    
    print(f"Model Accuracy: {accuracy:.3f}")
    print("\nClassification Report:")
    print(classification_report(y_test, y_pred, target_names=['Low', 'Medium', 'High']))
    
    # Feature importance
    importance_df = pd.DataFrame({
        'feature': feature_cols,
        'importance': model.feature_importances_
    }).sort_values('importance', ascending=False)
    
    print("\nFeature Importance:")
    display(importance_df)
    
    return model, scaler, importance_df

# Train the model
model, scaler, feature_importance = train_interaction_model(ml_data)

In [None]:
# Create interactive prediction interface
@widgets.interact
def predict_drug_interaction(
    molecular_weight=widgets.FloatSlider(value=350, min=100, max=800, step=10, description='Mol Weight:'),
    logp=widgets.FloatSlider(value=2.5, min=-2, max=8, step=0.1, description='LogP:'),
    cyp3a4_affinity=widgets.FloatSlider(value=0.5, min=0, max=1, step=0.1, description='CYP3A4:'),
    cyp2d6_affinity=widgets.FloatSlider(value=0.5, min=0, max=1, step=0.1, description='CYP2D6:'),
    cyp2c9_affinity=widgets.FloatSlider(value=0.5, min=0, max=1, step=0.1, description='CYP2C9:')
):
    """
    Interactive widget for drug interaction prediction
    """
    # Prepare input
    input_data = np.array([[molecular_weight, logp, cyp3a4_affinity, cyp2d6_affinity, cyp2c9_affinity]])
    input_scaled = scaler.transform(input_data)
    
    # Make prediction
    prediction = model.predict(input_scaled)[0]
    probabilities = model.predict_proba(input_scaled)[0]
    
    # Display results
    risk_levels = ['Low Risk', 'Medium Risk', 'High Risk']
    colors = ['green', 'orange', 'red']
    
    print(f"\n🎯 Predicted Interaction Risk: {risk_levels[prediction]}")
    print(f"\n📊 Prediction Probabilities:")
    for i, (risk, prob, color) in enumerate(zip(risk_levels, probabilities, colors)):
        bar = '█' * int(prob * 20)
        print(f"  {risk}: {prob:.3f} {bar}")
    
    if prediction == 2:
        print("\n⚠️  HIGH RISK: Consider alternative drugs or dose adjustment")
    elif prediction == 1:
        print("\n⚡ MODERATE RISK: Monitor closely for interactions")
    else:
        print("\n✅ LOW RISK: Minimal interaction concerns")
    
    return prediction, probabilities

## How This Makes Research More Efficient

### 1. **Rapid Prototyping**
- Test new algorithms and visualizations immediately
- Iterate on data analysis without rebuilding entire applications
- Quick hypothesis testing and validation

### 2. **Enhanced Collaboration**  
- Share executable research with colleagues
- Combine code, results, and documentation in one file
- Version control integration for research reproducibility

### 3. **Advanced Analytics**
- Access to complete scientific Python ecosystem
- Interactive parameter exploration with immediate feedback
- Machine learning model development and validation

### 4. **Research Documentation**
- Self-documenting analysis with markdown explanations
- Reproducible workflows that others can run
- Clear methodology for publication and peer review

### 5. **Integration with Web Applications**
- Develop algorithms in Jupyter, deploy in Streamlit
- Test complex computations before web integration
- Maintain separate environments for research vs. production

This notebook complements the Streamlit application by providing a more flexible environment for research and development, while the web app provides user-friendly access for clinical applications.