# Student Score Prediction Analysis

This notebook provides an interactive analysis of student performance prediction based on study habits.

## Project Overview
- **Objective**: Predict student final exam scores using study hours and attendance data
- **Method**: Linear Regression
- **Features**: Hours_Studied, Attendance
- **Target**: Final_Score

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error
import sys
import os

# Add src directory to path
sys.path.append('../src')

from data_processor import DataProcessor
from visualizer import DataVisualizer
from model_trainer import ModelTrainer
from predictor import ScorePredictor

# Set style for better plots
plt.style.use('default')
sns.set_palette("husl")

print("Libraries imported successfully!")

## 1. Data Loading and Exploration

In [None]:
# Initialize components
data_processor = DataProcessor()
visualizer = DataVisualizer()

# Load data
data_path = '../data/student_data.csv'
df = data_processor.load_data(data_path)

print("Dataset shape:", df.shape)
print("\nFirst 5 records:")
df.head()

In [None]:
# Display summary statistics
visualizer.display_summary_stats(df)

## 2. Data Visualization

In [None]:
# Create scatter plots showing relationships
visualizer.plot_scatter_relationships(df)

In [None]:
# Create correlation heatmap
visualizer.plot_correlation_heatmap(df)

In [None]:
# Show data distributions
visualizer.plot_distribution(df)

## 3. Data Preprocessing

In [None]:
# Clean and preprocess data
df_clean = data_processor.clean_data(df)

print(f"Original dataset: {len(df)} records")
print(f"Cleaned dataset: {len(df_clean)} records")
print(f"Records removed: {len(df) - len(df_clean)}")

In [None]:
# Split data into training and testing sets
X_train, X_test, y_train, y_test = data_processor.split_data(df_clean)

print("Data split completed:")
print(f"Training set: {len(X_train)} samples")
print(f"Testing set: {len(X_test)} samples")

## 4. Model Training

In [None]:
# Train the linear regression model
trainer = ModelTrainer()
model = trainer.train_model(X_train, y_train)

## 5. Model Evaluation

In [None]:
# Evaluate model performance
metrics = trainer.evaluate_model(X_test, y_test)

In [None]:
# Visualize predictions vs actual values
plt.figure(figsize=(10, 6))

# Scatter plot of predictions vs actual
plt.subplot(1, 2, 1)
plt.scatter(metrics['actual'], metrics['predictions'], alpha=0.6)
plt.plot([0, 100], [0, 100], 'r--', alpha=0.8)
plt.xlabel('Actual Scores')
plt.ylabel('Predicted Scores')
plt.title('Predictions vs Actual Scores')
plt.grid(True, alpha=0.3)

# Residuals plot
plt.subplot(1, 2, 2)
residuals = metrics['actual'] - metrics['predictions']
plt.scatter(metrics['predictions'], residuals, alpha=0.6)
plt.axhline(y=0, color='r', linestyle='--', alpha=0.8)
plt.xlabel('Predicted Scores')
plt.ylabel('Residuals')
plt.title('Residuals Plot')
plt.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## 6. Making Predictions

In [None]:
# Save and load model for prediction
model_path = '../models/student_score_model.pkl'
trainer.save_model(model_path)

# Initialize predictor
predictor = ScorePredictor()
predictor.load_model(model_path)

In [None]:
# Example prediction: 4 hours studied, 80% attendance
hours = 4
attendance = 80

prediction = predictor.predict_score(hours, attendance)
print(f"Prediction for {hours} hours studied and {attendance}% attendance:")
print(f"Expected Final Score: {prediction:.1f}/100")

# Get detailed explanation
explanation = predictor.get_prediction_explanation(hours, attendance)
print("\n" + explanation)

In [None]:
# Compare different scenarios
scenarios = [
    (6, 80),    # More study hours
    (4, 95),    # Better attendance
    (6, 95),    # Both improved
    (2, 60),    # Both reduced
]

comparison_df = predictor.compare_scenarios(hours, attendance, scenarios)
print("Scenario Comparison:")
print(comparison_df)

## 7. Interactive Prediction Tool

In [None]:
# Interactive prediction function
def interactive_prediction():
    """Interactive tool for making predictions."""
    print("Student Score Prediction Tool")
    print("=" * 30)
    
    try:
        hours = float(input("Enter hours studied per day (0-24): "))
        attendance = float(input("Enter attendance percentage (0-100): "))
        
        prediction = predictor.predict_score(hours, attendance)
        explanation = predictor.get_prediction_explanation(hours, attendance)
        
        print(f"\nPredicted Score: {prediction:.1f}/100")
        print("\n" + explanation)
        
    except ValueError as e:
        print(f"Error: {e}")
    except Exception as e:
        print(f"Unexpected error: {e}")

# Uncomment the line below to run the interactive tool
# interactive_prediction()

## 8. Model Insights and Conclusions

### Key Findings:
1. **Strong Correlation**: Attendance shows stronger correlation with final scores than study hours
2. **Model Performance**: The linear regression model provides reasonable predictions
3. **Feature Importance**: Both study hours and attendance contribute significantly to final scores

### Model Equation:
The trained model follows the equation:
```
Final_Score = Intercept + (Hours_Coefficient × Hours_Studied) + (Attendance_Coefficient × Attendance)
```

### Recommendations:
- Students should focus on both consistent attendance and adequate study time
- Attendance appears to be slightly more important than study hours
- The model can be used for early intervention to identify at-risk students

In [None]:
# Final model summary
coeffs = trainer.get_model_coefficients()
print("Final Model Summary:")
print("=" * 25)
print(f"R² Score: {metrics['r2_score']:.4f}")
print(f"Mean Absolute Error: {metrics['mean_absolute_error']:.2f} points")
print(f"Model Intercept: {coeffs['intercept']:.2f}")
print(f"Hours Studied Coefficient: {coeffs['coefficients']['Hours_Studied']:.2f}")
print(f"Attendance Coefficient: {coeffs['coefficients']['Attendance']:.2f}")

print("\nModel saved to:", model_path)
print("\nProject completed successfully!")