# Student Grade Predictor using Linear Regression

This is a machine learning algorithm for predicting student performance using the Linear Regression technique. The goal of this program is to forecast the final grades of students based on their academic performance and other related factors.

## Overview

In this algorithm, we use the "student-mat.csv" dataset, which is part of the [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/Student+Performance). The dataset contains information about student performance in mathematics. The features include attributes such as first-period grade, second-period grade, weekly study time, school type, family size, parent's occupation, and more.

## Task List

Before running the code, make sure to complete the following tasks:

- Download and install the required software, including Python and necessary libraries such as pandas, numpy, and sklearn.
- Set up your GitHub account to participate in the algorithm or review assignment.

## Steps Performed by the Code

1. **Data Loading:** The code reads the "student-mat.csv" file, which contains the student performance data, using the pandas library. The data is loaded into a DataFrame for further processing.

2. **Data Preprocessing:** The dataset may have missing values or categorical variables that need handling. The code preprocesses the data, converting categorical variables into numerical form using one-hot encoding. This transformation is necessary because most machine learning algorithms, including Linear Regression, require numerical inputs.

3. **Data Splitting:** The data is split into training and testing sets using the `train_test_split()` function from sklearn. This ensures that the model is trained on a subset of the data and evaluated on unseen data to assess its generalization performance.

4. **Model Training:** The Linear Regression model from sklearn is created and trained on the training data using the `fit()` method. The model aims to learn the relationships between the features and the target variable (final grade).

5. **Model Evaluation:** After training, the model's performance is evaluated using the test data. Two common evaluation metrics used are Mean Squared Error (MSE) and R-squared (R2). MSE measures the average squared difference between the predicted and actual grades, while R2 indicates how well the model explains the variance in the target variable.

6. **Example Prediction:** The code algorithmnstrates how to make predictions for a new student using the trained model. You can input the first-period grade, second-period grade, and weekly study time of a new student, and the model will predict their final grade (G3).

In [18]:
import pandas as pd
import numpy as np
import sklearn
from sklearn import linear_model
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score

# Load the dataset (assuming the dataset is in the same directory as this script)
file_path = "data/student-mat.csv"
data = pd.read_csv(file_path, sep=';')

# Data preprocessing - handle missing values or categorical variables
# For categorical variables, we'll use one-hot encoding

# Convert categorical variables to one-hot encoding
data = pd.get_dummies(data, columns=['school', 'sex', 'address', 'famsize', 'Pstatus',
                                     'Mjob', 'Fjob', 'reason', 'guardian', 'schoolsup',
                                     'famsup', 'paid', 'activities', 'nursery', 'higher',
                                     'internet', 'romantic'], drop_first=True)

# Select features and target variable
features = data.drop(columns=['G3'])  # Features: all columns except 'G3' (final grade)
target = data['G3']  # Target variable: 'G3' (final grade)

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.2, random_state=42)

# Create and train the Linear Regression model
model = linear_model.LinearRegression()
model.fit(X_train, y_train)

# Make predictions on the test set
y_pred = model.predict(X_test)

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error:", mse)
print("R-squared:", r2)

# Now the model is trained and evaluated. You can use it to make predictions on new data.
# For example, to predict the final grade (G3) for a new student, you can use the 'model.predict()' method.
# Just make sure the input data has the same features as the original dataset (without the 'G3' column).

# Example prediction:
new_student_features = pd.DataFrame({
    'G1': [12],      # First-period grade
    'G2': [14],      # Second-period grade
    'studytime': [3], # Weekly study time (hours)
    # Include other relevant features here...
})

# Perform one-hot encoding for the new student data and align with training data
new_student_features_encoded = pd.get_dummies(new_student_features, drop_first=True)
new_student_features_encoded = new_student_features_encoded.align(features, join='right', axis=1, fill_value=0)[0]

predicted_grade = model.predict(new_student_features_encoded)
print("Predicted Final Grade for the New Student:", predicted_grade[0])


Mean Squared Error: 5.656642833231225
R-squared: 0.7241341236974019
Predicted Final Grade for the New Student: 13.92894133847506
