# PyTorch Loss Functions
* Notebook by Adam Lang
* Date: 9/30/2024

# Loss Functions
* Measures the error in a networks predictions.
* Loss function --> Cost Function -->
      * network preds vs. actual target values
      * Loss functions are a "feedback mechanism"
* Loss Functions categorized:
1. **Classification**
  * Binary Cross-Entropy or Log Loss
    * measures accuracy of binary classification preds vs. true outcome
    * Goal: minimize loss func during training
    * Rules of thumb:
       * Low log loss --> better model
       * High log loss --> poor model
  * Multi-classification problems
    * Variation of BCE is **Category Cross Entropy Loss**
    * As example: Predicting 5 categorical classes --> loss function
2. **Regression**
  * Mean squared error (MSE) - average squared diff between preds vs actual
     * MSE is not always accurate, especially if there are significant outliers.
  * Mean Absolute Error (MAE) - averages absolute differences between pred vs. actual values
     * MAE is often preferred where exact predictions are warranted (e.g. predicting delivery times, we don't want outliers!)

## Simple Linear Regression example to predict first year college grades of students from high school SAT and GPA scores
* A university is seeking to enhance its enrollment process.
* They would like to do this by implementing predictive analytics modeling aimed at identifying prospective stuents who demo high potential for academic success.
* The goal here is to develop a predictive model that can accurately forecast first year college GPA of applications based on SAT scores and high school grades.
* Ultimately this can be considered an optimization problem as the university aims to optimize its selection process, improve academic outcomes and foster an environment of high academic achievement.

## TL;DR
* This is a predictive analytics problem.
* We are trying to predict a target which is gpa.
* This could also be considered an optimization problem --> what GPA is considered most ideal to optimize the student selection process?

In [1]:
## load data
import pandas as pd
import numpy as np

In [2]:
## data path
data_path = '/content/drive/MyDrive/Colab Notebooks/Deep Learning Notebooks/Prodigy University Dataset.csv'
## load data
data = pd.read_csv(data_path)
data.head()

Unnamed: 0,sat_sum,hs_gpa,fy_gpa
0,508,3.4,3.18
1,488,4.0,3.33
2,464,3.75,3.25
3,380,3.75,2.42
4,428,4.0,2.63


## Data Preprocessing
* Data dictionary:
  * `sat_sum` = SAT score
  * `hs_gpa` = high school gpa
  * `fy_gpa` = gpa in first year of college

We convert data to numpy arrays for 2 reasons:
1. matrix multiplication efficiency
2. To work with pytorch tensors

In [3]:
## convert variables to numpy - 2D array
X = data[['sat_sum', 'hs_gpa']].values

## reshape the fy_gpa into a 2D array with [data_size] rows and 1 col
## this is to match up with the 2 dependent vars
y = data['fy_gpa'].values.reshape(-1, 1)


print(f"Shape of X: {X.shape}")
print(f"Shape of y: {y.shape}")

Shape of X: (1000, 2)
Shape of y: (1000, 1)


In [4]:
## create train_test_split
from sklearn.model_selection import train_test_split

## split data into train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [5]:
## standard scaler
from sklearn.preprocessing import StandardScaler

# normalize feature so it is easier to train data
## setup scaler
scaler = StandardScaler()

## fit_transform X and y train data
X_train = scaler.fit_transform(X_train)
X_test = scaler.fit_transform(X_test)

In [6]:
## lets get shape of X_train
print(f"Shape of X_train: {X_train.shape}")

Shape of X_train: (800, 2)


Final step in preprocessing - convert to tensors

In [7]:
import torch
## convert numpy to PyTorch tensors
X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train, dtype=torch.float32)

X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test, dtype=torch.float32)

In [8]:
## shape of X_train_tensor
print(f"X_train_tensor shape: {X_train_tensor.shape}")

X_train_tensor shape: torch.Size([800, 2])


# Build Linear Regression Model in PyTorch
* A good review on activation functons for neural nets: https://machinelearningmastery.com/choose-an-activation-function-for-deep-learning/
* We will use a `Sigmoid()` function as it is also called the "logistic regression" function.

In [9]:
## torch.nn for neural network building
import torch.nn as nn

Neural Network:
  * Linear input layer - we have 2 dependent variables, a 2x2 tensor
  * Sigmoid hidden layer
  * Linear output layer --> 1 output variable the target we are predicting.

In [10]:
## build a model with 2 neurons
## Sequential --> forward propagation
model = nn.Sequential(
    nn.Linear(2, 2), ##2 inputs, 2 outputs
    nn.Sigmoid(), ## non-linear logistic hidden layer
    nn.Linear(2, 1) ##2 inputs, 1 output --> target pred
)

Summary:
* Note: we have NOT used an output activation function here.
* Thus, by default it is a Linear activation function.

In [11]:
## Forward propagation nn
preds = model(X_train_tensor)

In [12]:
## lets see first 5 preds
preds[:5]

tensor([[0.5266],
        [0.5279],
        [0.5338],
        [0.5209],
        [0.5403]], grad_fn=<SliceBackward0>)

In [13]:
## Compute loss - MSE or mean squared error (common for binary prediction)
from torch.nn import MSELoss

In [14]:
## calculate loss
criterion = MSELoss()
loss = criterion(preds, y_train_tensor)
print(loss)

tensor(4.3082, grad_fn=<MseLossBackward0>)


## Compare predictions on X_train with Target

In [15]:
preds[:5]

tensor([[0.5266],
        [0.5279],
        [0.5338],
        [0.5209],
        [0.5403]], grad_fn=<SliceBackward0>)

In [16]:
## y_train
y_train_tensor[:5]

tensor([[2.0000],
        [3.1100],
        [1.6300],
        [3.0200],
        [1.5500]])

Summary:
* What we see here is that the output is not the same and that is why we need further training to optimize the output.

In [17]:
## lets see the weights assigned in the nn
model[0].weight

Parameter containing:
tensor([[ 0.0667, -0.2492],
        [ 0.3345, -0.4954]], requires_grad=True)

In [18]:
## weights
model[2].weight

Parameter containing:
tensor([[ 0.6110, -0.2186]], requires_grad=True)