---------------------------

# **Project Title:**

# **MEDSYNTH: PREDICTIVE ANALYTICS FOR SYNTHETIC PATIENT DATA**

**PART-3 (DEPLOYMENT)**



-------------------------------------------------------------

**Project Type :**
 EDA + Hypothesis Testing + Supervised & Unsupervised Models+ Synthetic data Generation (CTGAN)+ Neural Network + Deployment(streamlit).

**Contribution:** Team

**Team Members:**


1.   Fida Taneem
2.   Isha Shrivastava
3.   Manthan M Y
4.   Nischitha K N
5.   Padakandla Venkata Naga Sai Hasini

---------------------------------------------------------

## **PART-2 SUMMARY**

- Built and evaluated classification models (Random Forest, XGBoost, Neural Network) to predict multimorbidity, addressing feature selection, class imbalance, and robust metrics (accuracy, ROC-AUC, recall).

- Initial tree-based models achieved >90% accuracy but <25% recall for multimorbid patients, indicating poor rare-class detection.

- Employed CTGAN to generate 200 realistic synthetic multimorbid samples, preserving original data distributions and outperforming SMOTE.

- Augmented dataset enabled the neural network to attain 98.5% accuracy, 96% recall, 94% precision for multimorbidity, and ROC-AUC of 0.998.

Final workflow validates statistical associations and delivers a clinically actionable, unbiased multimorbidity risk prediction system.

---------------------


# **Project Summary(PART-3)**

## **Model Deployment**


1. Exported the best-trained model along with preprocessing artifacts for real-world use.

2. Wrapped the model into a user-friendly web interface (e.g., Streamlit app) So predictions can be easily made on new, unseen data by clinicians or analysts.

3. Purpose: Transitioned from research to practical utility—making the predictive tool usable and actionable in real healthcare workflows.

--------------------------------------------

In [1]:
#!pip install streamlit pyngrok joblib tensorflow scikit-learn plotly seaborn


## **1. Upload Saved File (Best Model)**

In [12]:
from google.colab import files

# Upload saved files
uploaded = files.upload()
# Choose: best_nn_model_joblib.pkl and scaler.pkl


Saving best_nn_model.h5 to best_nn_model.h5
Saving best_nn_model_joblib.pkl to best_nn_model_joblib (1).pkl
Saving scaler.pkl to scaler (1).pkl


---------------------------------

# **2. Streamlit App (app.py)**

Install Streamlit and Pyngrok to serve the model as a web app and create a tunnel to localhost.

Authenticate ngrok with your authtoken to enable secure tunnels.

In [13]:
!pip install streamlit pyngrok

from pyngrok import ngrok


# Replace YOUR_NGROK_AUTHTOKEN_HERE with your actual token
!ngrok authtoken 33TQOGC36M3l0NRjglYzm5UNzwS_3cDL9Bw3ZdEywdeAtSx6V


Authtoken saved to configuration file: /root/.config/ngrok/ngrok.yml


In [14]:
# Install packages
!pip install streamlit pyngrok --quiet

## **UI Web App Design**

In [15]:
%%writefile app.py
import streamlit as st
import pandas as pd
import numpy as np
from tensorflow.keras.models import load_model
import joblib
import plotly.graph_objects as go

# --- Page config ---
st.set_page_config(
    page_title="Medical Risk Predictor",
    page_icon="❤️",
    layout="wide"
)

# --- Custom Banner ---
st.markdown("""
    <style>
    .banner {
        background: linear-gradient(to right, #4B6EFF, #61FFDA);
        color: white;
        text-align: center;
        padding: 15px;
        font-size: 22px;
        font-weight: bold;
        border-radius: 8px;
        margin-bottom: 20px;
        position: sticky;
        top: 0;
        z-index: 100;
    }
    </style>
    <div class="banner">🩺 Medical Risk Prediction Dashboard</div>
""", unsafe_allow_html=True)

# --- Intro Text ---
st.markdown("""
This tool predicts whether a patient is **multi-morbid** (has >2 medical conditions)
based on key **health** and **demographic** features.

You can:
- 🔹 Enter details manually for a **single patient**
- 🔹 Upload a **CSV file** for **batch predictions**

""")

# --- Load model and scaler ---
model = load_model('best_nn_model.h5')
scaler = joblib.load('scaler.pkl')

# --- Features ---
numeric_features = ['age','total_claims','num_conditions','num_meds','num_encounters']
categorical_features = ['gender','race']

# --- Sidebar Settings (no disclaimer now) ---
st.sidebar.title("⚙️ Settings")
smoothing_factor = st.sidebar.slider("Smoothing Factor", 0.5, 1.0, 0.9)

# --- Sigmoid smoothing ---
def smooth_prob(p, factor=0.95):
    return factor * p + (1 - factor) * 0.5

# --- Single Patient Input ---
st.subheader("📍 Single Patient Prediction")
with st.form("patient_form"):
    age = st.number_input("Age", min_value=0, max_value=120, value=45)
    total_claims = st.number_input("Total Claims", min_value=0, value=5)
    num_conditions = st.number_input("Number of Conditions", min_value=0, value=1)
    num_meds = st.number_input("Number of Medications", min_value=0, value=1)
    num_encounters = st.number_input("Number of Encounters", min_value=0, value=5)

    gender = st.selectbox("Gender", ["Male", "Female"])
    race = st.selectbox("Race", ["White", "Black", "Asian", "Other"])

    submitted = st.form_submit_button("🔍 Predict Risk")

if submitted:
    # Prepare dataframe
    data = pd.DataFrame([[age, total_claims, num_conditions, num_meds, num_encounters, gender, race]],
                        columns=numeric_features + categorical_features)

    # Encode categorical features
    data['gender'] = data['gender'].map({"Male": 0, "Female": 1})
    data['race'] = data['race'].map({"White": 0, "Black": 1, "Asian": 2, "Other": 3})

    # Scale numeric features
    data[numeric_features] = scaler.transform(data[numeric_features])

    # Predict
    raw_prob = model.predict(data)[0][0]
    pred_prob = smooth_prob(raw_prob, factor=smoothing_factor)
    pred_class = int(pred_prob > 0.5)

    # --- Styled Result ---
    if pred_class == 1:
        st.markdown(f"""
        <div style="background-color:#ffe6e6; padding:20px; border-radius:10px; border:1px solid red;">
            <h3 style="color:#cc0000;">⚠️ High Risk of Multi-Morbidity</h3>
            <p>Probability: <b>{pred_prob*100:.2f}%</b></p>
        </div>
        """, unsafe_allow_html=True)
    else:
        st.markdown(f"""
        <div style="background-color:#e6ffe6; padding:20px; border-radius:10px; border:1px solid green;">
            <h3 style="color:#008000;">✅ Low Risk of Multi-Morbidity</h3>
            <p>Probability: <b>{pred_prob*100:.2f}%</b></p>
        </div>
        """, unsafe_allow_html=True)

    # --- Probability Gauge ---
    fig = go.Figure(go.Indicator(
        mode="gauge+number",
        value=pred_prob*100,
        title={'text': "Risk Probability"},
        gauge={'axis': {'range': [0,100]},
               'bar': {'color': "red" if pred_class==1 else "green"}}
    ))
    st.plotly_chart(fig, use_container_width=True)

# --- Batch Prediction (Clean Version) ---
st.subheader("📂 Batch Prediction (Upload CSV)")
uploaded_file = st.file_uploader("Upload CSV", type=['csv'])

if uploaded_file is not None:
    df = pd.read_csv(uploaded_file)

    # Define features exactly as trained
    numeric_features = ['age','total_claims','num_conditions','num_meds','num_encounters']
    categorical_features = ['gender','race']
    required_cols = numeric_features + categorical_features

    # Check if all required columns exist
    if all(col in df.columns for col in required_cols):
        with st.spinner("Running predictions..."):

            # --- Encode categorical features ---
            df['gender'] = df['gender'].map({"Male": 0, "Female": 1})
            df['race'] = df['race'].map({"White": 0, "Black": 1, "Asian": 2, "Other": 3})

            # --- Ensure numeric column order matches training ---
            df_numeric = df[numeric_features].copy()

            # --- Scale numeric features ---
            df[numeric_features] = scaler.transform(df_numeric)

            # --- Generate probabilities ---
            if hasattr(model, "predict_proba"):  # scikit-learn / XGBoost
                raw_probs = model.predict_proba(df)[:, 1]  # probability of 'High'
            else:  # Keras / TensorFlow
                raw_probs = model.predict(df).flatten()
                # Apply sigmoid if NN output is not [0,1]
                if raw_probs.max() > 1 or raw_probs.min() < 0:
                    raw_probs = 1 / (1 + np.exp(-raw_probs))

            # --- Apply smoothing (optional) ---
            df['MultiMorbid_Probability'] = np.array(
                [smooth_prob(p, factor=smoothing_factor) for p in raw_probs]
            )

            # --- Label risk ---
            df['Risk'] = np.where(df['MultiMorbid_Probability'] > 0.5, 'High', 'Low')

        st.success("✅ Predictions Completed")
        st.dataframe(df)

        # --- Download option ---
        st.download_button(
            "📥 Download Predictions as CSV",
            df.to_csv(index=False),
            "predictions_corrected.csv",
            "text/csv"
        )

    else:
        st.error(f"❌ Uploaded CSV must contain these columns: {required_cols}")



Overwriting app.py


- Initializes a Streamlit dashboard titled Medical Risk Prediction Dashboard with a blue–teal gradient banner.

- Loads the trained neural network (best_nn_model.h5) and scaler.pkl for consistent preprocessing.

- Provides a single-patient form to input demographics and health metrics; encodes, scales, then predicts multimorbidity risk.

- Applies a smoothing factor (configurable in sidebar) to stabilize probability outputs.

- Displays results in styled alert boxes (red for high risk, green for low risk) plus a Plotly gauge chart.

- Supports batch predictions by CSV upload: validates required columns, encodes/scales data, computes smoothed probabilities, labels risk, and offers a CSV download of predictions.

- Uses st.download_button for exporting batch results and ensures all UI elements are mobile-responsive with layout="wide".

Ready for deployment via streamlit run app.py; public access can be enabled using Pyngrok.


---------------

In [16]:
#Confirming app.py is saved

!ls -l app.py


-rw-r--r-- 1 root root 6223 Oct  8 13:05 app.py


In [17]:
#Run Streamlit

!streamlit run app.py



Collecting usage statistics. To deactivate, set browser.gatherUsageStats to false.
[0m
[0m
[34m[1m  You can now view your Streamlit app in your browser.[0m
[0m
[34m  Local URL: [0m[1mhttp://localhost:8502[0m
[34m  Network URL: [0m[1mhttp://172.28.0.12:8502[0m
[34m  External URL: [0m[1mhttp://34.90.211.180:8502[0m
[0m
[34m  Stopping...[0m
^C


------

### **Run Streamlit app in background**

In [18]:

!get_ipython().system_raw('streamlit run app.py &')


/bin/bash: -c: line 1: syntax error near unexpected token `.system_raw'
/bin/bash: -c: line 1: `get_ipython().system_raw('streamlit run app.py &')'


### **Download the app.py to PC**

In [19]:
from google.colab import files

files.download('app.py')


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

**Kill ngrok Sessions**

In [20]:
!pkill ngrok


### **Run Streamlit in background and expose via ngrok**

In [21]:
import os
from pyngrok import ngrok
import time

os.system("nohup streamlit run app.py &")
time.sleep(5)
public_url = ngrok.connect(8501)
print("Open the UI Web App here:", public_url)


Open the UI Web App here: NgrokTunnel: "https://unregistrable-eleanora-hedgiest.ngrok-free.dev" -> "http://localhost:8501"


- Runs the Streamlit app in the background using nohup so it continues after the cell finishes.

- Pauses 5 seconds to allow the server to start.

- Opens an ngrok tunnel on port 8501, exposing the local Streamlit app to the internet.

- Prints the public ngrok URL for accessing the dashboard from any browser.

-------------------------------

### **Deployment Conclusion**


1. The trained neural network and scaler were packaged into best_nn_model.h5, best_nn_model_joblib.pkl, and scaler.pkl, then served via a Streamlit dashboard exposed through an ngrok tunnel.

2. This setup provides clinicians with an interactive web UI to input single-patient data or upload CSV batches, receiving real-time multimorbidity risk scores with styled alerts and gauge charts.

 3.  The dashboard’s architecture ensures consistent preprocessing, reliable inference, and seamless export of results for integration into existing healthcare workflows.

-----------------------


### **Future Enhancements**

- Integrate real-world EHR connectivity for automatic data ingestion and continuous model retraining.


- Deploy the app on a scalable cloud platform (AWS/GCP/Azure) with CI/CD pipelines and secure authentication.

- Extend the CTGAN framework to synthesize longitudinal patient trajectories, enabling time-series risk forecasting.

- Incorporate additional data modalities (lab results, imaging features, genomics) to enrich model inputs and improve robustness.

- Implement automated monitoring of model drift and performance, with alerts for retraining triggers.

------------------------

# **Overall Project Conclusion**


From initial data collection and hypothesis testing through model development and CTGAN-driven augmentation, this project established statistically validated associations between patient characteristics and multimorbidity, then built and compared multiple classifiers. Traditional models highlighted the severity of class imbalance, while CTGAN-augmented neural networks achieved near-perfect discrimination (ROC-AUC 0.998) and high recall for rare, high-risk cases. The end-to-end pipeline—from EDA and hypothesis validation to synthetic data generation, model training, evaluation, and deployment—demonstrates a robust framework for developing clinically actionable predictive systems in healthcare.

# **END OF THE PIPELINE !**