## Install Necessary Libraries
This cell installs the required Python libraries for data processing, machine learning, API creation, and tunneling to make the API publicly accessible.

In [1]:
!pip install pandas scikit-learn fastapi pydantic joblib uvicorn nest-asyncio pyngrok



## Import Libraries
This cell imports all necessary libraries for data manipulation, model training, evaluation, and API deployment.

In [2]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from sklearn.preprocessing import OneHotEncoder
from sklearn.impute import SimpleImputer
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, precision_score, recall_score
import joblib
from google.colab import files
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from pyngrok import conf, ngrok
import nest_asyncio
import uvicorn
from threading import Thread

## Upload and Load Dataset
This cell prompts the user to upload a dataset (e.g., `student_data.csv`) and loads it into a pandas DataFrame for further processing.

In [3]:
print("Please upload your dataset (e.g., student_data.csv):")
uploaded = files.upload()
data = pd.read_csv(list(uploaded.keys())[0])

Please upload your dataset (e.g., student_data.csv):


Saving data.csv to data (1).csv


## Define Target Variables
This cell defines two target variables:
- `Needs_Support`: 1 if `Exam_Score` is below 60, else 0.
- `Engagement_Level`: A categorical feature derived from `Attendance`, `Extracurricular_Activities`, and `Motivation_Level`.

In [4]:
threshold = 60  # Adjustable threshold for support
data['Needs_Support'] = (data['Exam_Score'] < threshold).astype(int)
data = data.drop('Exam_Score', axis=1)

# Define Engagement_Level based on a sum of mapped values
data['Engagement_Level'] = pd.cut(
    data[['Attendance', 'Extracurricular_Activities', 'Motivation_Level']].apply(
        lambda x: x.map({'Low': 1, 'Medium': 2, 'High': 3}).sum(), axis=1),
    bins=3, labels=['Low', 'Medium', 'High']
)

## Preprocess Data
This cell handles missing values in categorical columns using the most frequent value and applies one-hot encoding to categorical variables.

In [5]:
categorical_cols = data.select_dtypes(include=['object']).columns
numerical_cols = data.select_dtypes(exclude=['object']).columns

# Impute missing values in categorical columns
imputer = SimpleImputer(strategy='most_frequent')
data[categorical_cols] = imputer.fit_transform(data[categorical_cols])

# One-hot encode categorical variables
encoder = OneHotEncoder(sparse_output=False, handle_unknown='ignore')
encoded_cols = pd.DataFrame(encoder.fit_transform(data[categorical_cols]))
encoded_cols.columns = encoder.get_feature_names_out(categorical_cols)
data = pd.concat([data, encoded_cols], axis=1).drop(categorical_cols, axis=1)

## Split Data for Models
This cell splits the data into training and testing sets for the `Needs_Support` and `Engagement_Level` models.

In [6]:
# Split for Needs_Support model
X_support = data.drop(['Needs_Support', 'Engagement_Level'], axis=1)
y_support = data['Needs_Support']
X_support_train, X_support_test, y_support_train, y_support_test = train_test_split(X_support, y_support, test_size=0.2, random_state=42)

# Split for Engagement_Level model
X_engage = data.drop(['Needs_Support', 'Engagement_Level'], axis=1)
y_engage = data['Engagement_Level']
X_engage_train, X_engage_test, y_engage_train, y_engage_test = train_test_split(X_engage, y_engage, test_size=0.2, random_state=42)

## Train and Evaluate Needs_Support Model
This cell trains a RandomForestClassifier for `Needs_Support`, tunes hyperparameters with GridSearchCV, performs cross-validation, and evaluates performance.

In [7]:
clf_support = RandomForestClassifier(random_state=42, class_weight='balanced')
param_grid = {
    'n_estimators': [100, 200],
    'max_depth': [10, 20, None],
    'min_samples_split': [2, 5]
}
grid_search = GridSearchCV(clf_support, param_grid, cv=5, scoring='f1')
grid_search.fit(X_support_train, y_support_train)
clf_support = grid_search.best_estimator_

# Cross-validation
cv_scores = cross_val_score(clf_support, X_support_train, y_support_train, cv=5, scoring='accuracy')
print(f"Cross-Validation Accuracy (Support Model): {cv_scores.mean():.2f} (±{cv_scores.std():.2f})")

# Train and evaluate
clf_support.fit(X_support_train, y_support_train)
y_support_pred = clf_support.predict(X_support_test)
print(f"Support Model - Accuracy: {accuracy_score(y_support_test, y_support_pred):.2f}")
print(f"Support Model - Precision: {precision_score(y_support_test, y_support_pred):.2f}")
print(f"Support Model - Recall: {recall_score(y_support_test, y_support_pred):.2f}")

Cross-Validation Accuracy (Support Model): 0.99 (±0.00)
Support Model - Accuracy: 0.99
Support Model - Precision: 1.00
Support Model - Recall: 0.09


## Train and Evaluate Engagement_Level Model
This cell trains a RandomForestClassifier for `Engagement_Level` and evaluates its accuracy on the test set.

In [8]:
clf_engage = RandomForestClassifier(random_state=42)
clf_engage.fit(X_engage_train, y_engage_train)
y_engage_pred = clf_engage.predict(X_engage_test)
print(f"Engagement Model - Accuracy: {accuracy_score(y_engage_test, y_engage_pred):.2f}")

Engagement Model - Accuracy: 1.00


## Save Models and Encoder
This cell saves the trained models and the one-hot encoder to disk for later use in the API.

In [9]:
joblib.dump(clf_support, 'support_model.joblib')
joblib.dump(clf_engage, 'engage_model.joblib')
joblib.dump(encoder, 'encoder.joblib')

['encoder.joblib']

## Define FastAPI App
This cell sets up a FastAPI application, loads the saved models and encoder, and defines a `/predict` endpoint to make predictions.

In [10]:
from fastapi import FastAPI, HTTPException
from contextlib import asynccontextmanager
import pandas as pd
import joblib

# Define the lifespan handler
@asynccontextmanager
async def lifespan(app: FastAPI):
    # Startup code: runs when the app starts
    print("Starting up the FastAPI application...")
    print("Support model classes:", model_support.classes_)  # Verify class order
    yield  # This is where the app runs
    # Shutdown code (optional): runs when the app stops
    print("Shutting down the FastAPI application...")

# Initialize the app with the lifespan handler
app = FastAPI(lifespan=lifespan)

# Load models and encoder (assuming these are already saved)
model_support = joblib.load('support_model.joblib')
model_engage = joblib.load('engage_model.joblib')
encoder = joblib.load('encoder.joblib')

# Define the StudentData model (unchanged)
class StudentData(BaseModel):
    Hours_Studied: int
    Attendance: int
    Parental_Involvement: str
    Access_to_Resources: str
    Extracurricular_Activities: str
    Sleep_Hours: int
    Previous_Scores: int
    Motivation_Level: str
    Internet_Access: str
    Tutoring_Sessions: int
    Family_Income: str
    Teacher_Quality: str
    School_Type: str
    Peer_Influence: str
    Physical_Activity: int
    Learning_Disabilities: str
    Parental_Education_Level: str
    Distance_from_Home: str
    Gender: str

# Define the /predict endpoint with updated output format
@app.post("/predict")
async def predict(student: StudentData):
    try:
        # Convert input to DataFrame
        input_data = pd.DataFrame([student.model_dump()])

        # Preprocess categorical columns with the encoder
        categorical_cols = [col for col in input_data.columns if input_data[col].dtype == 'object']
        encoded_input = pd.DataFrame(encoder.transform(input_data[categorical_cols]))
        encoded_input.columns = encoder.get_feature_names_out(categorical_cols)
        input_data = pd.concat([input_data, encoded_input], axis=1).drop(categorical_cols, axis=1)

        # Prediction for Needs_Support
        input_support = input_data.reindex(columns=model_support.feature_names_in_, fill_value=0)
        pred_support = model_support.predict(input_support)[0]
        prob_support = model_support.predict_proba(input_support)[0]

        # Prediction for Engagement_Level
        input_engage = input_data.reindex(columns=model_engage.feature_names_in_, fill_value=0)
        pred_engage = model_engage.predict(input_engage)[0]

        # Convert Needs_Support to descriptive string
        support_status = "Needs support" if pred_support == 1 else "Does not need support"

        # Convert probabilities to a labeled dictionary
        support_prob = {
            "Probability of not needing support": float(prob_support[0]),  # Convert to float for JSON serialization
            "Probability of needing support": float(prob_support[1])       # Convert to float for JSON serialization
        }

        return {
            "Needs_Support": support_status,
            "Support_Probability": support_prob,
            "Engagement_Level": pred_engage
        }
    except Exception as e:
        raise HTTPException(status_code=400, detail=str(e))

In [11]:
@app.on_event("startup")
async def startup_event():
    print("Support model classes:", model_support.classes_)

        on_event is deprecated, use lifespan event handlers instead.

        Read more about it in the
        [FastAPI docs for Lifespan Events](https://fastapi.tiangolo.com/advanced/events/).
        
  @app.on_event("startup")


## Set Up Ngrok and Run Server
This cell configures ngrok with your authentication token, starts the FastAPI server in a background thread, and provides a public URL for access.

In [12]:
import socket
from threading import Thread
import uvicorn
from pyngrok import ngrok

# Function to find an available port
def get_free_port():
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
        s.bind(('', 0))  # Bind to port 0 to let the OS assign a free port
        return s.getsockname()[1]  # Return the assigned port

# Get an available port
port = get_free_port()
print(f"Selected port {port} for the server.")

# Define the server function
def run_server():
    print(f"Starting the server on port {port}...")
    uvicorn.run(app, host="0.0.0.0", port=port)  # 'app' is your FastAPI app

# Start the server in a thread
thread = Thread(target=run_server)
thread.start()

# Set your Ngrok auth token
ngrok.set_auth_token("2tvNB9xCZcTaCKe6VGJ8rCtzjwB_4p8NDQSywaMkQFStRPjHP")

# Connect ngrok to the selected port
print(f"Connecting ngrok to port {port}...")
ngrok_tunnel = ngrok.connect(port)
print(f"Server is now accessible at {ngrok_tunnel.public_url}")

Selected port 46207 for the server.
Starting the server on port 46207...


INFO:     Started server process [2354]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:46207 (Press CTRL+C to quit)


Connecting ngrok to port 46207...
Starting up the FastAPI application...
Support model classes: [0 1]
Server is now accessible at https://2573-35-197-33-30.ngrok-free.app


## Test the API
This cell demonstrates how to test the `/predict` endpoint using the `requests` library with sample student data.

In [13]:
import requests

# Example input data (adjust based on your dataset)
student_input = {
    "Hours_Studied": 5,
    "Attendance": 90,
    "Parental_Involvement": "High",
    "Access_to_Resources": "Yes",
    "Extracurricular_Activities": "Yes",
    "Sleep_Hours": 7,
    "Previous_Scores": 75,
    "Motivation_Level": "Medium",
    "Internet_Access": "Yes",
    "Tutoring_Sessions": 2,
    "Family_Income": "Middle",
    "Teacher_Quality": "Good",
    "School_Type": "Public",
    "Peer_Influence": "Positive",
    "Physical_Activity": 3,
    "Learning_Disabilities": "No",
    "Parental_Education_Level": "College",
    "Distance_from_Home": "Near",
    "Gender": "Male"
}

# Send POST request to the ngrok public URL
public_url = ngrok_tunnel.public_url  # Use the URL from the previous cell
response = requests.post(f"{public_url}/predict", json=student_input)
print("Prediction:", response.json())

INFO:     35.197.33.30:0 - "POST /predict HTTP/1.1" 200 OK
Prediction: {'Needs_Support': 'Does not need support', 'Support_Probability': {'Probability of not needing support': 0.9823287658646125, 'Probability of needing support': 0.017671234135387407}, 'Engagement_Level': 'Medium'}
