Simple Linear Regression example to predict the first year college grades of students from their high school SAT and GPA scores

__Prodigy University__ is seeking to enhance its enrollment process. They plan to do so by implementing a predictive analytics model aimed at identifying prospective students who demonstrate a high potential for academic success. 

__The goal is to develop a predictive model that can accurately forecast the first-year college GPA of applicants based on their SAT scores and high school scores. This model is intended to serve as a strategic tool for the admissions office, enabling them to efficiently shortlist candidates who not only meet the academic standards of the university but are also likely to thrive in their chosen fields of study.__ By doing so, the university aspires to optimize its student selection process, improve academic outcomes, and foster an environment of excellence and high achievement. 

# Load the data

In [1]:
import pandas as pd

In [2]:
# Loading data
data = pd.read_csv('Prodigy University Dataset.csv')
# Split the data into features (X) and target (y)
data.head()

Unnamed: 0,sat_sum,hs_gpa,fy_gpa
0,508,3.4,3.18
1,488,4.0,3.33
2,464,3.75,3.25
3,380,3.75,2.42
4,428,4.0,2.63


# Data Preprocessing

In [3]:
## Load the data

In [4]:
# Converting data to numpy
X = data[['sat_sum', 'hs_gpa']].values
# reshape the fy_gpa into a 2D array with [data_size] rows and 1 column
y = data['fy_gpa'].values.reshape(-1, 1)
print(X.shape)
print(y.shape)

(1000, 2)
(1000, 1)


In [5]:
from sklearn.model_selection import train_test_split
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [6]:
from sklearn.preprocessing import StandardScaler

# Normalize the features so that it is easier to train the data
scaler = StandardScaler()
X_train= scaler.fit_transform(X_train)
X_test= scaler.fit_transform(X_test)

In [7]:
X_train.shape

(800, 2)

In [8]:
import torch
# Convert numpy to PyTorch tensors
X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.float32)
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.float32)

# Building the model 

In [9]:
import torch.nn as nn

In [10]:
# Building model with 2 neurons
model = nn.Sequential(
    nn.Linear(2, 2),
    nn.Sigmoid(),
    nn.Linear(2, 1)
)

In [11]:
# Forward Propagation
preds = model(X_train_tensor)

In [12]:
preds[:5]

tensor([[-0.6428],
        [-0.6315],
        [-0.5385],
        [-0.6656],
        [-0.5112]], grad_fn=<SliceBackward0>)

In [13]:
from torch.nn import MSELoss

In [14]:
# Calculating Loss
criterion = MSELoss()
loss = criterion(preds, y_train_tensor)
print(loss)

tensor(10.2468, grad_fn=<MseLossBackward0>)


# Comparing predictions with Target

In [15]:
preds[:5]

tensor([[-0.6428],
        [-0.6315],
        [-0.5385],
        [-0.6656],
        [-0.5112]], grad_fn=<SliceBackward0>)

In [16]:
y_train_tensor[:5]

tensor([[2.0000],
        [3.1100],
        [1.6300],
        [3.0200],
        [1.5500]])

In [17]:
model[0].weight

Parameter containing:
tensor([[-0.3846,  0.2908],
        [ 0.1662,  0.6213]], requires_grad=True)

In [18]:
model[2].weight

Parameter containing:
tensor([[ 0.1072, -0.6789]], requires_grad=True)