# üåü Student Focus & Burnout Risk Predictor (ML Project)

Welcome to this beginner-friendly Machine Learning project! 

**Goal:** We will build a model that predicts whether a student is at **Low**, **Medium**, or **High** risk of burnout based on their daily habits.

### We will look at:
1. **Study Hours**: How many hours they study.
2. **Sleep Hours**: How much sleep they get.
3. **Screen Time**: Hours spent on phones/laptops.
4. **Exercise Minutes**: Daily physical activity.
5. **Stress Level**: Self-reported stress (1-10).
6. **Attendance %**: Class attendance.

Let's get started! üöÄ

## 1. Import Libraries üìö

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import joblib

# Set a nice style for our plots
sns.set_theme(style="whitegrid")

## 2. Simulate the Data üé≤

Since we don't have a real dataset, we will create a realistic one using Python!
We'll generate 500 fictitious students.

In [None]:
# Set random seed for reproducibility
np.random.seed(42)

n_samples = 500

# Generate random features
study_hours = np.random.randint(1, 10, n_samples)
sleep_hours = np.random.randint(4, 10, n_samples)
screen_time = np.random.randint(1, 12, n_samples)
exercise_minutes = np.random.randint(0, 120, n_samples)
attendance = np.random.randint(50, 100, n_samples)

# Intelligently generate Stress Level dependent on other factors + some noise
# Less sleep & more study/screen -> Higher Stress
stress_level = []
for i in range(n_samples):
    base_stress = 5
    if sleep_hours[i] < 6: base_stress += 2
    if study_hours[i] > 7: base_stress += 1
    if screen_time[i] > 8: base_stress += 1
    if exercise_minutes[i] > 60: base_stress -= 2
    
    # Add randomness and clip between 1-10
    final_stress = base_stress + np.random.randint(-2, 3)
    stress_level.append(max(1, min(10, final_stress)))

stress_level = np.array(stress_level)

# Create DataFrame
data = pd.DataFrame({
    'Study Hours': study_hours,
    'Sleep Hours': sleep_hours,
    'Screen Time': screen_time,
    'Exercise Min': exercise_minutes,
    'Stress (1-10)': stress_level,
    'Attendance %': attendance
})

# Define Target: Burnout Risk
# We'll define a "Risk Score" to assign labels
# Score up: High Stress, High Screen, Low Sleep
def calculate_burnout(row):
    score = 0
    score += row['Stress (1-10)'] * 1.5
    score += row['Screen Time'] * 0.5
    score += (10 - row['Sleep Hours']) * 1.0
    score -= (row['Exercise Min'] / 60) * 0.5
    
    if score > 15:
        return 'High'
    elif score > 10:
        return 'Medium'
    else:
        return 'Low'

data['Burnout Risk'] = data.apply(calculate_burnout, axis=1)

# Show first few rows
data.head()

## 3. Exploratory Data Analysis (EDA) üìä
Let's visualize the relationships using Matplotlib!

In [None]:
# Check value counts
print(data['Burnout Risk'].value_counts())

# 1. Bar Chart: How many students in each risk category?
risk_counts = data['Burnout Risk'].value_counts()
plt.figure(figsize=(8, 5))
plt.bar(risk_counts.index, risk_counts.values, color=['skyblue', 'orange', 'salmon'])
plt.title('Distribution of Student Burnout Risk')
plt.xlabel('Risk Level')
plt.ylabel('Count of Students')
plt.show()

# 2. Scatter Plot: Study Hours vs Stress (Color-coded)
# We map risk to colors manually for specific control
colors = {'Low': 'green', 'Medium': 'orange', 'High': 'red'}

plt.figure(figsize=(8, 5))

for risk, color in colors.items():
    subset = data[data['Burnout Risk'] == risk]
    plt.scatter(subset['Study Hours'], subset['Stress (1-10)'], c=color, label=risk, alpha=0.6, edgecolors='w', s=100)

plt.title('Study Hours vs. Stress Level')
plt.xlabel('Hours Studied')
plt.ylabel('Stress Level (1-10)')
plt.legend(title='Burnout Risk')
plt.grid(True, linestyle='--', alpha=0.5)
plt.show()

## 4. Preprocessing ‚öôÔ∏è
Models like numbers, not words. We need to convert 'Burnout Risk' (High/Medium/Low) into numbers if needed, but Sklearn classifiers handle targets well. However, it's often good practice to map them or let the model handle strings (sklearn's recent versions handle string targets, but let's be safe and clear).

In [None]:
# Inputs (Features) and Output (Target)
X = data.drop('Burnout Risk', axis=1)
y = data['Burnout Risk']

# Split into Training and Testing sets (80% train, 20% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"Training Shape: {X_train.shape}")
print(f"Testing Shape: {X_test.shape}")

## 5. Model Training ü§ñ
We will use a **Random Forest Classifier**. It's great for beginners and handles this kind of data well.

In [None]:
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)

print("Model Trained Successfully! ‚úÖ")

## 6. Evaluation üìâ
How well did our student model perform?

In [None]:
y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")

print("\nClassification Report:\n")
print(classification_report(y_test, y_pred))

## 7. Build the Prediction System üîÆ
Now, let's make it usable! You can enter your own habits.

In [None]:
def predict_my_burnout(study, sleep, screen, exercise, stress, attendance):
    features = np.array([[study, sleep, screen, exercise, stress, attendance]])
    
    # Create a DataFrame with correct feature names to avoid warnings
    features_df = pd.DataFrame(features, columns=X.columns)
    
    prediction = model.predict(features_df)
    return prediction[0]

# --- TEST IT OUT HERE ---
# Example: 8h study, 5h sleep, 9h screen, 10m exercise, 8 stress, 70% attendance
result = predict_my_burnout(8, 5, 9, 10, 8, 70)

print(f" Predicted Burnout Risk: {result} ")

## 8. Save the Model üíæ
We will save our trained model so we can use it in our Streamlit web app!

In [None]:
import joblib

# Save the model to a file
joblib.dump(model, 'student_burnout_model.pkl')

print("Model saved successfully to 'student_burnout_model.pkl' üöÄ")