# Simple Linear Regression example to predict the first year college grades of students from their high school SAT and GPA scores

__Prodigy University__ is seeking to enhance its enrollment process. They plan to do so by implementing a predictive analytics model aimed at identifying prospective students who demonstrate a high potential for academic success. 

__The goal is to develop a predictive model that can accurately forecast the first-year college GPA of applicants based on their SAT scores and high school scores. This model is intended to serve as a strategic tool for the admissions office, enabling them to efficiently shortlist candidates who not only meet the academic standards of the university but are also likely to thrive in their chosen fields of study.__ By doing so, the university aspires to optimize its student selection process, improve academic outcomes, and foster an environment of excellence and high achievement. 

# Load the data

In [52]:
import pandas as pd

In [53]:
# Loading data
data = pd.read_csv('Prodigy University Dataset.csv')
# Split the data into features (X) and target (y)
data.head()

Unnamed: 0,sat_sum,hs_gpa,fy_gpa
0,508,3.4,3.18
1,488,4.0,3.33
2,464,3.75,3.25
3,380,3.75,2.42
4,428,4.0,2.63


# Data Preprocessing

converting data to numpy is mandatory because we cannot convert from pandas to tensor but we can convert numpy to tensor
it is also easier to change the dimensions in numpy for matrix multiplication than it is to do it in pandas

In [54]:
# Converting data to numpy
X = data[['sat_sum', 'hs_gpa']].values
# reshape the fy_gpa into a 2D array with [data_size] rows and 1 column
y = data['fy_gpa'].values.reshape(-1, 1)
print(X.shape)
print(y.shape)

(1000, 2)
(1000, 1)


Next we split the data into train and testing set following the 80% - 20% proportions
then we normalize the data using standard scalar
the last step is converting the data to tensor

In [55]:
from sklearn.model_selection import train_test_split
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [56]:
from sklearn.preprocessing import StandardScaler

# Normalize the features so that it is easier to train the data
scaler = StandardScaler()
X_train= scaler.fit_transform(X_train)
X_test= scaler.fit_transform(X_test)

In [57]:
X_train.shape

(800, 2)

In [58]:
import torch
# Convert numpy to PyTorch tensors
X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.float32)
X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.float32)

# Building the model 

In [59]:
import torch.nn as nn

we are using a sigmoid function in the hidden layer and a linear function for the output layer since this is a regression problem

In [60]:
# Building model with 2 neurons
model = nn.Sequential(
    nn.Linear(2, 2),
    nn.Sigmoid(),
    nn.Linear(2, 1)
)

In [61]:
# Forward Propagation
preds = model(X_train_tensor)

In [62]:
preds[:5]

tensor([[-0.6853],
        [-0.6904],
        [-0.7363],
        [-0.6714],
        [-0.7470]], grad_fn=<SliceBackward0>)

In [63]:
from torch.nn import MSELoss

In [64]:
# Calculating Loss
criterion = MSELoss()
loss = criterion(preds, y_train_tensor)
print(loss)

tensor(10.3439, grad_fn=<MseLossBackward0>)


# Comparing predictions with Target

In [65]:
preds[:5]

tensor([[-0.6853],
        [-0.6904],
        [-0.7363],
        [-0.6714],
        [-0.7470]], grad_fn=<SliceBackward0>)

In [66]:
y_train_tensor[:5]

tensor([[2.0000],
        [3.1100],
        [1.6300],
        [3.0200],
        [1.5500]])

In [67]:
model[0].weight

Parameter containing:
tensor([[ 0.5312,  0.0606],
        [-0.2949, -0.3346]], requires_grad=True)

In [68]:
model[2].weight

Parameter containing:
tensor([[-0.2169, -0.6769]], requires_grad=True)