# HW04: ML and DL

Remember that these homework work as a completion grade. **You can skip one section without losing credit.**

## Load and Pre-process Text
We do sentiment analysis on the [Movie Review Data](https://www.cs.cornell.edu/people/pabo/movie-review-data/). If you would like to know more about the data, have a look at [the paper](https://www.cs.cornell.edu/home/llee/papers/pang-lee-stars.pdf) (but no need to do so).

In [2]:
# In this tutorial, we do sentiment analysis
# download the data
#!wget https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
#!tar xf aclImdb_v1.tar.gz

!wget https://www.cs.cornell.edu/people/pabo/movie-review-data/scale_data.tar.gz
!wget https://www.cs.cornell.edu/people/pabo/movie-review-data/scale_whole_review.tar.gz

!tar xf scale_data.tar.gz
!tar xf scale_whole_review.tar.gz

--2024-03-20 10:54:28--  https://www.cs.cornell.edu/people/pabo/movie-review-data/scale_data.tar.gz
Resolving www.cs.cornell.edu (www.cs.cornell.edu)... 132.236.207.36
Connecting to www.cs.cornell.edu (www.cs.cornell.edu)|132.236.207.36|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4029756 (3.8M) [application/x-gzip]
Saving to: ‘scale_data.tar.gz’


2024-03-20 10:54:29 (12.6 MB/s) - ‘scale_data.tar.gz’ saved [4029756/4029756]

--2024-03-20 10:54:29--  https://www.cs.cornell.edu/people/pabo/movie-review-data/scale_whole_review.tar.gz
Resolving www.cs.cornell.edu (www.cs.cornell.edu)... 132.236.207.36
Connecting to www.cs.cornell.edu (www.cs.cornell.edu)|132.236.207.36|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 8853204 (8.4M) [application/x-gzip]
Saving to: ‘scale_whole_review.tar.gz’


2024-03-20 10:54:30 (22.1 MB/s) - ‘scale_whole_review.tar.gz’ saved [8853204/8853204]



First, we have to load the data for which we provide the function below. Note how we also preprocess the text using gensim's simple_preprocess() function and how we already split the data into a train and test split.

In [None]:
import os
from gensim.utils import simple_preprocess
def load_data():
    examples, labels = [], []
    authors = os.listdir("scale_whole_review")
    for author in authors:
        path = os.listdir(os.path.join("scale_whole_review", author, "txt.parag"))
        fn_ids = os.path.join("scaledata", author, "id." + author)
        fn_ratings = os.path.join("scaledata", author, "rating." + author)
        with open(fn_ids) as ids, open(fn_ratings) as ratings:
            for idx, rating in zip(ids, ratings):
                labels.append(float(rating.strip()))
                filename_text = os.path.join("scale_whole_review", author, "txt.parag", idx.strip() + ".txt")
                with open(filename_text, encoding='latin-1') as f:
                    examples.append(" ".join(simple_preprocess(f.read())))
    return examples, labels

X,y  = load_data()
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
print ("text:", X_train[0], "\nlabel:", y_train[0])

text: for what it worth correctly guessed the identity of the killer in scream well sort of suppose should feel satisfied at my own cleverness since dimension and the makers of scream have put so much effort into keeping that piece of information secret even more so than in the original scream writer kevin williamson goes to ridiculous extremes to keep the audience guessing whodunnit so ridiculous that the film becomes too focused on the one thing which should have been least important as horror film it solid piece of work as satire it frequently hilarious as mystery it tries way way too hard scream takes place two years after the events of the original just in time for hollywood to cash in on the woodsboro high murders the non fiction book by reporter gale weathers courteney cox has become popular horror film called stab which in turn appears to have generated copycat killer when two college students turn up dead at the film premiere sidney prescott neve campbell once again begins to 

## Vectorize the data

In [None]:
# train a TF_IDF Vectorizer on X_train and vectorize X_train and X_test
from sklearn.feature_extraction.text import TfidfVectorizer

vec = TfidfVectorizer(min_df=0.01, # at min 1% of docs
                        max_df=.5,
                        stop_words='english',
                        ngram_range=(1,2))

##TODO train vectorizer

vec.fit(X_train)

##TODO transform X_train to TF-IDF values
X_train_tfidf = vec.transform(X_train)
##TODO transform X_test to TF-IDF values
X_test_tfidf = vec.transform(X_test)

In [None]:
##TODO scale both training and test data with the standard scaler
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler(with_mean=False)
scaler.fit(X_train_tfidf)
X_train_scaled = scaler.transform(X_train_tfidf)
X_test_scaled = scaler.transform(X_test_tfidf)

## ElasticNet

In [None]:
##TODO train an elastic net on the transformed output of the scaler
from sklearn.linear_model import ElasticNet

en = ElasticNet(alpha=0.01)

##TODO train the ElasticNet
en.fit(X_train_scaled, y_train)
##TODO predict the testset
y_pred = en.predict(X_test_scaled)
##TODO print mean squared error and r2 score on the test set
from sklearn.metrics import r2_score, accuracy_score, mean_squared_error, balanced_accuracy_score
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print("Mean Squared Error:", mse)
print("R-squared:", r2)

Mean Squared Error: 0.01701246255023818
R-squared: 0.49683493599243544


## Logistic Regression

Next, we train an OLS model doing binary prediction on these movie reviews. Two get two bins, we transform the continuous ratings into two classes, where one class contains all the negative ratings (value < 0.5), the other class all the positive ratings (value > 0.5)

In [None]:
y_train = [1 if i >= 0.5 else 0 for i in y_train]
y_test = [1 if i >= 0.5 else 0 for i in y_test]


In [None]:
##TODO train logistic regression on X_train
from sklearn.linear_model import LogisticRegression
logistic_regression = LogisticRegression()

##TODO train a logistic regression
logistic_regression.fit(X_train_tfidf, y_train)

##TODO predict the testset
y_pred = logistic_regression.predict(X_test_tfidf)

##since we have continuous output, we need to post-process our labels into two classes. We choose a threshold of 0.5
def map_predictions(predicted):
    predicted = [1 if i > 0.5 else 0 for i in predicted]
    return predicted

y_pred_binary = map_predictions(y_pred)
y_test_binary = map_predictions(y_test)

##TODO print the accuracy of our classifier on the testset
accuracy = accuracy_score(y_test_binary, y_pred_binary)
print("Accuracy:", accuracy)

Accuracy: 0.7487893462469734


In [None]:
## TODO print the 10 most informative words of the regression (the 10 words having the highest coefficients)
def get_top_features_with_coefs(model, vectorizer, n=10):
    """Gets the top 'n' features along with their coefficients"""
    feature_names = vectorizer.get_feature_names_out()
    coefs = model.coef_.ravel()
    top_features = sorted(zip(coefs, feature_names), key=lambda x: x[0], reverse=True)[:n]
    return [x[1] for x in top_features]

top_10_features = get_top_features_with_coefs(logistic_regression, vec, 10)
print("Top 10 Informative Words (Features):", top_10_features)

Top 10 Informative Words (Features): ['great', 'fine', 'best', 'quite', 'effective', 'play', 'true', 'easy', 'fascinating', 'performance']


Interesting that it finds the word quite to be an informative word, when realistically it is neutral

# Deep Learning

## MLP

In [3]:
!wget https://raw.githubusercontent.com/mhjabreel/CharCnn_Keras/master/data/ag_news_csv/train.csv

--2024-03-20 10:55:45--  https://raw.githubusercontent.com/mhjabreel/CharCnn_Keras/master/data/ag_news_csv/train.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.108.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 29470338 (28M) [text/plain]
Saving to: ‘train.csv’


2024-03-20 10:55:45 (163 MB/s) - ‘train.csv’ saved [29470338/29470338]



In [67]:
#Import the AG news dataset (same as hw01)
#Download them from here
import pandas as pd
import nltk
df = pd.read_csv('train.csv')

df.columns = ["label", "title", "lead"]
label_map = {1:"world", 2:"sport", 3:"business", 4:"sci/tech"}
def replace_label(x):
	return label_map[x]
df["label"] = df["label"].apply(replace_label)
df["text"] = df["title"] + " " + df["lead"]
df = df.sample(n=500)
df.head()

Unnamed: 0,label,title,lead,text
44943,sci/tech,Key Part of Patriot Act Ruled Unconstitutional,A federal judge in New York ruled that a key c...,Key Part of Patriot Act Ruled Unconstitutional...
95818,sci/tech,Oracle moves to quarterly patch release schedule,Oracle today announced that it is moving to a ...,Oracle moves to quarterly patch release schedu...
107149,sport,Meyer can leave Utah without buyout,Notre Dame officials will meet with Utah coach...,Meyer can leave Utah without buyout Notre Dame...
92956,sport,Auburn's Tailback Duo Share Spotlight (AP),AP - Auburn tailbacks Carnell Williams and Ron...,Auburn's Tailback Duo Share Spotlight (AP) AP ...
16392,sci/tech,Microsoft to offer Windows XP security pack at...,TOKYO - Microsoft Corp said Wednesday it will ...,Microsoft to offer Windows XP security pack at...


In [68]:
# create a new variable "business" that takes value 1 if the label is business and 0 otherwise
df['business'] = df['label'].apply(lambda x: int(x=='business'))
y = df['business'].values
df['business'].head()

44943     0
95818     0
107149    0
92956     0
16392     0
Name: business, dtype: int64

In [69]:
import spacy
nlp = spacy.load('en_core_web_sm')
from sklearn.feature_extraction.text import CountVectorizer

# pre-process text as you did in HW02
def tokenize(x):
    return [w.lemma_.lower() for w in nlp(x) if not w.is_stop and not w.is_punct and not w.is_digit]
df["tokens"] = df["text"].apply(lambda x: tokenize(x))
df["preprocessed"] = df['tokens'].apply(lambda x: ' '.join(x))
df["preprocessed_text"] = df["preprocessed"].apply(lambda x: " ".join(x))

In [60]:
df

Unnamed: 0,label,title,lead,text,business,tokens,preprocessed,preprocessed_text
19081,world,"Frances Strikes, Knocks Out Power to 4M","STUART, Fla. - Hurricane Frances weakened but ...","Frances Strikes, Knocks Out Power to 4M STUART...",0,"[frances, strikes, knocks, power, m, stuart, f...",frances strikes knocks power m stuart fla. hur...,f r a n c e s s t r i k e s k n o c k s ...
100330,business,Jobless Claims Fall to Three-Month Low (AP),AP - America's factories saw orders for big-ti...,Jobless Claims Fall to Three-Month Low (AP) AP...,1,"[jobless, claims, fall, month, low, ap, ap, am...",jobless claims fall month low ap ap america fa...,j o b l e s s c l a i m s f a l l m o n ...
46871,world,House Panel Moves to Limit Floor Access (AP),AP - A Cabinet member's role in pressing lawma...,House Panel Moves to Limit Floor Access (AP) A...,0,"[house, panel, move, limit, floor, access, ap,...",house panel move limit floor access ap ap cabi...,h o u s e p a n e l m o v e l i m i t ...
15771,sci/tech,Miniscule atomic clock demonstrated in US,"Washington, Sept. 1. (UNI): Scientists at the ...",Miniscule atomic clock demonstrated in US Wash...,0,"[miniscule, atomic, clock, demonstrate, washin...",miniscule atomic clock demonstrate washington ...,m i n i s c u l e a t o m i c c l o c k ...
58654,sci/tech,Microsoft Releases New #39;Critical #39; Patches,Microsoft Corp. today released an unprecedente...,Microsoft Releases New #39;Critical #39; Patc...,0,"[microsoft, releases, new, , 39;critical, pat...",microsoft releases new 39;critical patches m...,m i c r o s o f t r e l e a s e s n e w ...
...,...,...,...,...,...,...,...,...
77846,sci/tech,Hacking pleads not guilty in killing of wife i...,SALT LAKE CITY - Mark Hacking pleaded not guil...,Hacking pleads not guilty in killing of wife i...,0,"[hacking, plead, guilty, killing, wife, july, ...",hacking plead guilty killing wife july salt la...,h a c k i n g p l e a d g u i l t y k i ...
102687,sport,England players hit out,ENGLAND #39;S players hit out at cricket #39;s...,England players hit out ENGLAND #39;S players ...,0,"[england, player, hit, england, 39;s, player, ...",england player hit england 39;s player hit cri...,e n g l a n d p l a y e r h i t e n g l ...
5604,business,Away on Business: Making a Difference,NEW YORK (Reuters) - Working in poverty-stric...,Away on Business: Making a Difference NEW YOR...,1,"[away, business, make, difference, , new, yor...",away business make difference new york reute...,a w a y b u s i n e s s m a k e d i f f ...
107784,sci/tech,This lens sheds a tear,"According to an article on The Register, camer...",This lens sheds a tear According to an article...,0,"[lens, shed, tear, accord, article, register, ...",lens shed tear accord article register camera ...,l e n s s h e d t e a r a c c o r d a ...


In [70]:
##TODO vectorize the pre-processed text using CountVectorizer
vectorizer = CountVectorizer()
vectorizer = CountVectorizer(min_df=0.01, # at min 1.5% of docs
                        max_df=.9,
                        max_features=100,
                        stop_words='english',
                        ngram_range=(1,3))
X = vectorizer.fit_transform(df['preprocessed'])
words = vectorizer.get_feature_names_out()
X = X.todense()
X = X / X.sum(axis=1) # counts to frequencies
for i, word in enumerate(words):
    column = X[:,i]
    df['x_'+word] = column
features = ['x_'+x for x in words]
features_and_output = ['business'] + features
df2 = df[features_and_output]
df2.dropna(how='any', inplace=True)
df2

  X = X / X.sum(axis=1) # counts to frequencies
  df['x_'+word] = column
  df['x_'+word] = column
  df['x_'+word] = column
  df['x_'+word] = column
  df['x_'+word] = column
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df2.dropna(how='any', inplace=True)


Unnamed: 0,business,x_000,x_39,x_announce,x_ap,x_ap ap,x_attack,x_baghdad,x_base,x_big,...,x_tuesday,x_united,x_victory,x_wednesday,x_week,x_win,x_world,x_year,x_yesterday,x_york
44943,0,0.000000,0.0,0.000000,0.0,0.00,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.200000
95818,0,0.000000,0.0,0.142857,0.0,0.00,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.000000
107149,0,0.000000,0.0,0.000000,0.0,0.00,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.142857
92956,0,0.000000,0.0,0.000000,0.5,0.25,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.000000
16392,0,0.090909,0.0,0.000000,0.0,0.00,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.090909,0.000000,0.0,0.0,0.000000,0.0,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
38245,1,0.000000,0.0,0.000000,0.0,0.00,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.000000,0.0,0.0,0.000000,0.0,0.000000
10743,1,0.000000,0.0,0.000000,0.0,0.00,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.200000,0.0,0.0,0.000000,0.0,0.000000
54264,0,0.000000,0.0,0.000000,0.0,0.00,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.000000,0.2,0.2,0.000000,0.0,0.000000
97239,0,0.000000,0.0,0.000000,0.0,0.00,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.000000,0.000000,0.0,0.2,0.000000,0.0,0.000000


In [62]:
df2.corr()

Unnamed: 0,business,x_39,x_afp,x_announce,x_ap,x_ap ap,x_attack,x_bank,x_big,x_billion,...,x_tuesday,x_united,x_victory,x_wednesday,x_week,x_win,x_world,x_year,x_yesterday,x_york
business,1.000000,-0.048020,-0.043754,-0.007913,-0.140323,-0.135318,-0.048294,0.132310,0.009634,0.130698,...,0.023543,-0.030645,-0.070490,0.011525,-0.020557,-0.089499,-0.038681,-0.007387,-0.006804,0.102812
x_39,-0.048020,1.000000,-0.047963,-0.027012,-0.114367,-0.114935,-0.031442,-0.022151,-0.011128,-0.025862,...,-0.031047,-0.023700,-0.025754,-0.040774,-0.016015,-0.010104,-0.007272,-0.040483,-0.002667,-0.061379
x_afp,-0.043754,-0.047963,1.000000,-0.005233,-0.033214,-0.032200,0.008988,-0.008507,-0.011110,-0.012328,...,-0.022034,0.022506,-0.006808,-0.020580,-0.002709,-0.014433,-0.010533,-0.019139,-0.025120,-0.023834
x_announce,-0.007913,-0.027012,-0.005233,1.000000,-0.029301,-0.033783,-0.015346,-0.014698,-0.004517,-0.001304,...,0.017255,-0.012839,-0.022158,-0.018608,0.001635,-0.018060,-0.014938,-0.011203,-0.002756,-0.025667
x_ap,-0.140323,-0.114367,-0.033214,-0.029301,1.000000,0.956828,-0.020492,-0.022388,-0.021536,-0.016596,...,-0.002803,-0.027033,0.009973,-0.016429,-0.030248,-0.010455,-0.029514,-0.025723,-0.050611,-0.023300
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
x_win,-0.089499,-0.010104,-0.014433,-0.018060,-0.010455,-0.008118,-0.025594,-0.017175,-0.003754,-0.017120,...,-0.022933,-0.008647,0.065024,-0.018756,-0.022321,1.000000,-0.001223,0.000269,0.008999,-0.015873
x_world,-0.038681,-0.007272,-0.010533,-0.014938,-0.029514,-0.028456,-0.023119,-0.000784,0.013017,-0.009823,...,-0.017881,-0.011877,-0.003138,-0.025258,-0.013350,-0.001223,1.000000,-0.013515,-0.016976,-0.024215
x_year,-0.007387,-0.040483,-0.019139,-0.011203,-0.025723,-0.022864,-0.009517,-0.009884,-0.007097,-0.010707,...,-0.019480,-0.008122,-0.020102,-0.028344,-0.016914,0.000269,-0.013515,1.000000,-0.015395,-0.038533
x_yesterday,-0.006804,-0.002667,-0.025120,-0.002756,-0.050611,-0.049712,0.005816,0.023765,-0.000661,-0.009810,...,-0.039287,0.000589,0.019321,-0.037388,-0.008375,0.008999,-0.016976,-0.015395,1.000000,-0.016502


Your goal here is to use features from the Vectorized text to predict whether the snippet is from a business article.

In [106]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import TensorDataset, DataLoader
from sklearn.model_selection import train_test_split
from torchsummary import summary

import math
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

## TODO build a MLP model with at least 2 hidden layers with ReLU activation, followed by dropout and an output layer with sigmoid activation

## Setting up the NN:

class MLP(nn.Module):
    def __init__(self, input_dim, hidden_dim1, hidden_dim2, output_dim):
        super().__init__()
        self.linear1 = nn.Linear(input_dim, hidden_dim1)
        self.linear2 = nn.Linear(hidden_dim1, hidden_dim2)
        self.linear3 = nn.Linear(hidden_dim2, output_dim)
        self.dropout = nn.Dropout(p=0.25)  # not sure what dropout rate was required but this seems reasonable

    def forward(self, x):
        x = F.relu(self.linear1(x))
        x = F.relu(self.linear2(x))
        x = self.dropout(x)
        x = F.sigmoid(self.linear3(x))
        return x

X = df2[features]
y = df2['business']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train,
                                                  test_size=0.2, random_state=42)

X_train_tensor = torch.tensor(X_train.values, dtype=torch.float32)
X_test_tensor = torch.tensor(X_test.values, dtype=torch.float32)
X_val_tensor = torch.tensor(X_val.values, dtype=torch.float32)
y_train_tensor = torch.tensor(y_train.values, dtype=torch.float32)
y_test_tensor = torch.tensor(y_test.values, dtype=torch.float32)
y_val_tensor = torch.tensor(y_val.values, dtype=torch.float32)


train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
test_dataset = TensorDataset(X_test_tensor, y_test_tensor)
val_dataset = TensorDataset(X_val_tensor, y_val_tensor)


train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32)
val_loader = DataLoader(val_dataset, batch_size=32)


input_dim = X.shape[1]
hidden_dim1 = 64
hidden_dim2 = 64
output_dim = 1

model = MLP(input_dim, hidden_dim1, hidden_dim2, output_dim)

## TODO summarize the model using torchsummary

In [80]:
print(X.shape[1])

100


In [91]:
## Running the training

import torch.optim as optim

num_epochs = 15

optimizer = optim.Adam(model.parameters(), lr=0.001)

criterion = nn.BCELoss()

for epoch in range(num_epochs):
    model.train()
    for batch_idx, (data, labels) in enumerate(train_loader):
        optimizer.zero_grad()
        outputs = model(data).squeeze()
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        if batch_idx % 10 == 0:  # Print status updates
            print(f'Epoch: {epoch+1}/{num_epochs}  Batch: {batch_idx}  Loss: {loss.item():.4f}')

Epoch: 1/15  Batch: 0  Loss: 0.7000
Epoch: 2/15  Batch: 0  Loss: 0.6778
Epoch: 3/15  Batch: 0  Loss: 0.6718
Epoch: 4/15  Batch: 0  Loss: 0.6025
Epoch: 5/15  Batch: 0  Loss: 0.5510
Epoch: 6/15  Batch: 0  Loss: 0.5043
Epoch: 7/15  Batch: 0  Loss: 0.5289
Epoch: 8/15  Batch: 0  Loss: 0.5775
Epoch: 9/15  Batch: 0  Loss: 0.4482
Epoch: 10/15  Batch: 0  Loss: 0.5396
Epoch: 11/15  Batch: 0  Loss: 0.6021
Epoch: 12/15  Batch: 0  Loss: 0.4111
Epoch: 13/15  Batch: 0  Loss: 0.5565
Epoch: 14/15  Batch: 0  Loss: 0.4244
Epoch: 15/15  Batch: 0  Loss: 0.3531


In [78]:
!pip install torchsummary



In [92]:
from torchsummary import summary
summary(model, input_size= (100,))

----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
            Linear-1                   [-1, 64]           6,464
            Linear-2                   [-1, 64]           4,160
           Dropout-3                   [-1, 64]               0
            Linear-4                    [-1, 1]              65
Total params: 10,689
Trainable params: 10,689
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.04
Estimated Total Size (MB): 0.04
----------------------------------------------------------------


In [107]:
## TODO fit the model using early stopping to predict the business label
# (hint: early stopping means if the validation score does not increase for more than "patience" times, training should stop and load the best model so far)

num_epochs = 15
patience = 3  # Number of epochs to wait before stopping if no improvement

optimizer = optim.Adam(model.parameters(), lr=0.001)
criterion = nn.BCELoss()

best_loss = float('inf')  # Track the best validation loss

def evaluate_model(model, data_loader, criterion):
  model.eval()
  total_loss = 0.0
  with torch.no_grad():
      for data, labels in data_loader:
          outputs = model(data).squeeze()
          loss = criterion(outputs, labels)
          total_loss += loss.item()
  average_loss = total_loss / len(data_loader)
  return average_loss

while True:
    model.train()
    for batch_idx, (data, labels) in enumerate(train_loader):
        optimizer.zero_grad()
        outputs = model(data).squeeze()
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        if batch_idx % 10 == 0:
            print(f'Epoch: {epoch+1}/{num_epochs}  Batch: {batch_idx}  Loss: {loss.item():.4f}')

    val_loss = evaluate_model(model, val_loader, criterion)
    if val_loss < best_loss:
        best_loss = val_loss
        patience = 3
        print("resetting lives remaining")
    else:
        patience -= 1
        print("decrementing lives remaining")
        if patience == 0:
            print('Early stopping triggered.')
            break

Epoch: 11/15  Batch: 0  Loss: 0.7509
resetting lives remaining
Epoch: 11/15  Batch: 0  Loss: 0.7023
resetting lives remaining
Epoch: 11/15  Batch: 0  Loss: 0.6890
resetting lives remaining
Epoch: 11/15  Batch: 0  Loss: 0.6508
resetting lives remaining
Epoch: 11/15  Batch: 0  Loss: 0.5839
resetting lives remaining
Epoch: 11/15  Batch: 0  Loss: 0.5690
resetting lives remaining
Epoch: 11/15  Batch: 0  Loss: 0.5589
resetting lives remaining
Epoch: 11/15  Batch: 0  Loss: 0.5043
resetting lives remaining
Epoch: 11/15  Batch: 0  Loss: 0.5041
resetting lives remaining
Epoch: 11/15  Batch: 0  Loss: 0.4695
resetting lives remaining
Epoch: 11/15  Batch: 0  Loss: 0.3837
resetting lives remaining
Epoch: 11/15  Batch: 0  Loss: 0.3665
resetting lives remaining
Epoch: 11/15  Batch: 0  Loss: 0.4782
resetting lives remaining
Epoch: 11/15  Batch: 0  Loss: 0.5668
resetting lives remaining
Epoch: 11/15  Batch: 0  Loss: 0.4706
resetting lives remaining
Epoch: 11/15  Batch: 0  Loss: 0.3938
resetting lives re