# Phase 1: Linear Regression Model Development and Deployment

## Introduction
In this phase, we will develop a linear regression model to predict the Performance Index of students based on various factors. We will use the Student Performance dataset from Kaggle, train the model using Scikit-Learn, and deploy it using a simple JS/CSS/HTML interface via ML.js.

## Dataset
The dataset used for this project is the [Student Performance dataset](https://www.kaggle.com/datasets/nikhil7280/student-performance-multiple-linear-regression?resource=download). It contains the following columns:

- **Hours Studied**: Number of hours the student studied (numerical).
- **Previous Scores**: Previous test scores of the student (numerical).
- **Extracurricular Activities**: Participation in extracurricular activities (Yes/No, needs to be converted to 1/0).
- **Sleep Hours**: Number of hours the student slept (numerical).
- **Sample Question Papers Practiced**: Number of practice papers attempted by the student (numerical).
- **Performance Index**: The target variable we want to predict (numerical).
  


## Step 1: Data Preprocessing
Before training the model, we need to preprocess the data. This includes loading the dataset, converting categorical variables, and splitting the data into training and testing sets.

In [25]:
import pandas as pd
from sklearn.model_selection import train_test_split

In [29]:
# Load the dataset
data = pd.read_csv('Student_Performance.csv')

In [31]:
# Convert 'Extracurricular Activities' to numerical values
data['Extracurricular Activities'] = data['Extracurricular Activities'].map({'Yes': 1, 'No': 0})

In [33]:
# Split the data into features and target variable
X = data[['Hours Studied', 'Previous Scores', 'Extracurricular Activities', 'Sleep Hours', 'Sample Question Papers Practiced']]
y = data['Performance Index']

In [35]:
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## Step 2: Train the Linear Regression Model
Next, we will train a linear regression model using Scikit-Learn.

In [42]:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

In [44]:
# Create a linear regression model
model = LinearRegression()

In [46]:
# Train the model
model.fit(X_train, y_train)

In [48]:
# Make predictions on the test set
y_pred = model.predict(X_test)

In [50]:
# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

In [52]:
print(f'Mean Squared Error: {mse}')
print(f'R^2 Score: {r2}')

Mean Squared Error: 4.082628398521855
R^2 Score: 0.9889832909573145


## Step 3: Save the Model
We will save the trained model for later use.

In [56]:
import joblib

# Save the model to a file
joblib.dump(model, 'student_performance_model.pkl')

['student_performance_model.pkl']