##**IS695 - Deep Learning Project**
### **H&M Dataset**<br>
We are helping H&M by making a deep learning neural network to recommend people clothing and accessories on the basis of their purchases.<br><br>
**Business value and importance of the problem:** This recommender system could help customers make the right choices, which also has a positive implications for sustainability, as it reduces returns, and thereby minimizes emissions from transportation.<br><br>
**Dataset Description:**<br>
1. **article.csv:** It gives us the insight on the products.<br> article_id<br>product_code<br>prod_name<br>product_type_no<br>product_type_name
2. **customers.csv:** It gives us the insight about the customers we have.<br>customer_id<br>club_member_status<br>age<br>postal_code
3. **transaction_train.csv:** This gives us the insight about the transactions made by customers for the product.<br>t_dat<br>article_id<br>customer_id<br>price<br><br>-- We will only use data for 2000 unique customers for the prediction.


In [None]:
# Mounting Google Drive
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:
# Import libraries
import pandas as pd
import numpy as np
import seaborn as sns
from sklearn.model_selection import train_test_split
from matplotlib import pyplot as plt
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import plot_confusion_matrix
from sklearn.metrics import classification_report

from sklearn import preprocessing
from sklearn.metrics import confusion_matrix
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import LabelEncoder
from itertools import chain
import torch
import torch.nn as nn
import torch.optim as optim

## Uploading and cleaning data

In [None]:
# Read product data
products_df = pd.read_csv('/content/drive/MyDrive/DL_data/articles.csv')
products_df.head()

Unnamed: 0,article_id,product_code,prod_name,product_type_no,product_type_name,product_group_name,graphical_appearance_no,graphical_appearance_name,colour_group_code,colour_group_name,...,department_name,index_code,index_name,index_group_no,index_group_name,section_no,section_name,garment_group_no,garment_group_name,detail_desc
0,108775015,108775,Strap top,253,Vest top,Garment Upper body,1010016,Solid,9,Black,...,Jersey Basic,A,Ladieswear,1,Ladieswear,16,Womens Everyday Basics,1002,Jersey Basic,Jersey top with narrow shoulder straps.
1,108775044,108775,Strap top,253,Vest top,Garment Upper body,1010016,Solid,10,White,...,Jersey Basic,A,Ladieswear,1,Ladieswear,16,Womens Everyday Basics,1002,Jersey Basic,Jersey top with narrow shoulder straps.
2,108775051,108775,Strap top (1),253,Vest top,Garment Upper body,1010017,Stripe,11,Off White,...,Jersey Basic,A,Ladieswear,1,Ladieswear,16,Womens Everyday Basics,1002,Jersey Basic,Jersey top with narrow shoulder straps.
3,110065001,110065,OP T-shirt (Idro),306,Bra,Underwear,1010016,Solid,9,Black,...,Clean Lingerie,B,Lingeries/Tights,1,Ladieswear,61,Womens Lingerie,1017,"Under-, Nightwear","Microfibre T-shirt bra with underwired, moulde..."
4,110065002,110065,OP T-shirt (Idro),306,Bra,Underwear,1010016,Solid,10,White,...,Clean Lingerie,B,Lingeries/Tights,1,Ladieswear,61,Womens Lingerie,1017,"Under-, Nightwear","Microfibre T-shirt bra with underwired, moulde..."


In [None]:
# Check the basic description of the data like column names and their datatypes.
products_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 105542 entries, 0 to 105541
Data columns (total 25 columns):
 #   Column                        Non-Null Count   Dtype 
---  ------                        --------------   ----- 
 0   article_id                    105542 non-null  int64 
 1   product_code                  105542 non-null  int64 
 2   prod_name                     105542 non-null  object
 3   product_type_no               105542 non-null  int64 
 4   product_type_name             105542 non-null  object
 5   product_group_name            105542 non-null  object
 6   graphical_appearance_no       105542 non-null  int64 
 7   graphical_appearance_name     105542 non-null  object
 8   colour_group_code             105542 non-null  int64 
 9   colour_group_name             105542 non-null  object
 10  perceived_colour_value_id     105542 non-null  int64 
 11  perceived_colour_value_name   105542 non-null  object
 12  perceived_colour_master_id    105542 non-null  int64 
 13 

In [None]:
# Read customer data
customers_df = pd.read_csv('/content/drive/MyDrive/DL_data/customers.csv')
customers_df = customers_df.head(2000)
customers_df

Unnamed: 0,customer_id,FN,Active,club_member_status,fashion_news_frequency,age,postal_code
0,00000dbacae5abe5e23885899a1fa44253a17956c6d1c3...,,,ACTIVE,NONE,49.0,52043ee2162cf5aa7ee79974281641c6f11a68d276429a...
1,0000423b00ade91418cceaf3b26c6af3dd342b51fd051e...,,,ACTIVE,NONE,25.0,2973abc54daa8a5f8ccfe9362140c63247c5eee03f1d93...
2,000058a12d5b43e67d225668fa1f8d618c13dc232df0ca...,,,ACTIVE,NONE,24.0,64f17e6a330a85798e4998f62d0930d14db8db1c054af6...
3,00005ca1c9ed5f5146b52ac8639a40ca9d57aeff4d1bd2...,,,ACTIVE,NONE,54.0,5d36574f52495e81f019b680c843c443bd343d5ca5b1c2...
4,00006413d8573cd20ed7128e53b7b13819fe5cfc2d801f...,1.0,1.0,ACTIVE,Regularly,52.0,25fa5ddee9aac01b35208d01736e57942317d756b32ddd...
...,...,...,...,...,...,...,...
1995,005f20207323b38134759777d19b5093e45808a42e68da...,1.0,1.0,ACTIVE,Regularly,41.0,2c29ae653a9282cce4151bd87643c907644e09541abc28...
1996,005f216c145b713ccf769e91135a32ab8900be51275c23...,,,ACTIVE,NONE,71.0,7be66b2f9d1f0b966d8ac2b713d6dbcf127c684f4ac285...
1997,005f236361e73bb5cf20e7dec6c6602d9ecfd044fd884c...,,,ACTIVE,NONE,37.0,7a6f439a298f90c504656a2e7cea2422173c34576fb6c1...
1998,005f28c032dc019b7d35e3feb6b39be48b7f73d6f8d96b...,,,ACTIVE,NONE,57.0,51b2b04ade4aba4a76fa31c19880aa674f3cfccfb242c4...


In [None]:
# Check the basic description of the data like column names and their datatypes.
customers_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2000 entries, 0 to 1999
Data columns (total 7 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   customer_id             2000 non-null   object 
 1   FN                      668 non-null    float64
 2   Active                  652 non-null    float64
 3   club_member_status      1991 non-null   object 
 4   fashion_news_frequency  1976 non-null   object 
 5   age                     1969 non-null   float64
 6   postal_code             2000 non-null   object 
dtypes: float64(3), object(4)
memory usage: 109.5+ KB


In [None]:
len(customers_df.customer_id.unique())

2000

In [None]:
# Read transaction data
transactions_df = pd.read_csv('/content/drive/MyDrive/DL_data/transactions_train.csv')
transactions_df

Unnamed: 0,t_dat,customer_id,article_id,price,sales_channel_id
0,2018-09-20,000058a12d5b43e67d225668fa1f8d618c13dc232df0ca...,663713001,0.050831,2
1,2018-09-20,000058a12d5b43e67d225668fa1f8d618c13dc232df0ca...,541518023,0.030492,2
2,2018-09-20,00007d2de826758b65a93dd24ce629ed66842531df6699...,505221004,0.015237,2
3,2018-09-20,00007d2de826758b65a93dd24ce629ed66842531df6699...,685687003,0.016932,2
4,2018-09-20,00007d2de826758b65a93dd24ce629ed66842531df6699...,685687004,0.016932,2
...,...,...,...,...,...
31788319,2020-09-22,fff2282977442e327b45d8c89afde25617d00124d0f999...,929511001,0.059305,2
31788320,2020-09-22,fff2282977442e327b45d8c89afde25617d00124d0f999...,891322004,0.042356,2
31788321,2020-09-22,fff380805474b287b05cb2a7507b9a013482f7dd0bce0e...,918325001,0.043203,1
31788322,2020-09-22,fff4d3a8b1f3b60af93e78c30a7cb4cf75edaf2590d3e5...,833459002,0.006763,1


In [None]:
# Check the basic description of the data like column names and their datatypes.
transactions_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 31788324 entries, 0 to 31788323
Data columns (total 5 columns):
 #   Column            Dtype  
---  ------            -----  
 0   t_dat             object 
 1   customer_id       object 
 2   article_id        int64  
 3   price             float64
 4   sales_channel_id  int64  
dtypes: float64(1), int64(2), object(2)
memory usage: 1.2+ GB


## Data Preprocessing 

In [None]:
# Generating Transactions for 2000 unique customers using inner join
transactions_df_2000 = pd.merge(customers_df, transactions_df, how = 'inner',on='customer_id')
transactions_df_2000

Unnamed: 0,customer_id,FN,Active,club_member_status,fashion_news_frequency,age,postal_code,t_dat,article_id,price,sales_channel_id
0,00000dbacae5abe5e23885899a1fa44253a17956c6d1c3...,,,ACTIVE,NONE,49.0,52043ee2162cf5aa7ee79974281641c6f11a68d276429a...,2018-12-27,625548001,0.044051,1
1,00000dbacae5abe5e23885899a1fa44253a17956c6d1c3...,,,ACTIVE,NONE,49.0,52043ee2162cf5aa7ee79974281641c6f11a68d276429a...,2018-12-27,176209023,0.035576,1
2,00000dbacae5abe5e23885899a1fa44253a17956c6d1c3...,,,ACTIVE,NONE,49.0,52043ee2162cf5aa7ee79974281641c6f11a68d276429a...,2018-12-27,627759010,0.030492,1
3,00000dbacae5abe5e23885899a1fa44253a17956c6d1c3...,,,ACTIVE,NONE,49.0,52043ee2162cf5aa7ee79974281641c6f11a68d276429a...,2019-05-02,697138006,0.010153,2
4,00000dbacae5abe5e23885899a1fa44253a17956c6d1c3...,,,ACTIVE,NONE,49.0,52043ee2162cf5aa7ee79974281641c6f11a68d276429a...,2019-05-25,568601006,0.050831,2
...,...,...,...,...,...,...,...,...,...,...,...
43807,005f28c032dc019b7d35e3feb6b39be48b7f73d6f8d96b...,,,ACTIVE,NONE,57.0,51b2b04ade4aba4a76fa31c19880aa674f3cfccfb242c4...,2020-02-05,842607004,0.017797,1
43808,005f28c032dc019b7d35e3feb6b39be48b7f73d6f8d96b...,,,ACTIVE,NONE,57.0,51b2b04ade4aba4a76fa31c19880aa674f3cfccfb242c4...,2020-08-15,898596007,0.014390,1
43809,005f28c032dc019b7d35e3feb6b39be48b7f73d6f8d96b...,,,ACTIVE,NONE,57.0,51b2b04ade4aba4a76fa31c19880aa674f3cfccfb242c4...,2020-08-17,856840001,0.014390,1
43810,005f28c032dc019b7d35e3feb6b39be48b7f73d6f8d96b...,,,ACTIVE,NONE,57.0,51b2b04ade4aba4a76fa31c19880aa674f3cfccfb242c4...,2020-08-17,842112005,0.030237,1


In [None]:
len(transactions_df_2000.customer_id.unique())

1982

In [None]:
# Sort data based on 'customer_id' and 'date of purchase(t_dat)' so that we can do the data partition by taking the last element of each customer_id for testing dataset.
transactions_df_2000 = transactions_df_2000.sort_values(by=['customer_id', 't_dat'], ascending=False)
transactions_df_2000

Unnamed: 0,customer_id,FN,Active,club_member_status,fashion_news_frequency,age,postal_code,t_dat,article_id,price,sales_channel_id
43811,005f2a3ed4761f7e048ae64d012e5b82dbc3abfd9b625b...,,,ACTIVE,NONE,40.0,2c29ae653a9282cce4151bd87643c907644e09541abc28...,2019-03-04,767782001,0.061000,1
43809,005f28c032dc019b7d35e3feb6b39be48b7f73d6f8d96b...,,,ACTIVE,NONE,57.0,51b2b04ade4aba4a76fa31c19880aa674f3cfccfb242c4...,2020-08-17,856840001,0.014390,1
43810,005f28c032dc019b7d35e3feb6b39be48b7f73d6f8d96b...,,,ACTIVE,NONE,57.0,51b2b04ade4aba4a76fa31c19880aa674f3cfccfb242c4...,2020-08-17,842112005,0.030237,1
43808,005f28c032dc019b7d35e3feb6b39be48b7f73d6f8d96b...,,,ACTIVE,NONE,57.0,51b2b04ade4aba4a76fa31c19880aa674f3cfccfb242c4...,2020-08-15,898596007,0.014390,1
43806,005f28c032dc019b7d35e3feb6b39be48b7f73d6f8d96b...,,,ACTIVE,NONE,57.0,51b2b04ade4aba4a76fa31c19880aa674f3cfccfb242c4...,2020-02-05,842607004,0.017220,1
...,...,...,...,...,...,...,...,...,...,...,...
5,00000dbacae5abe5e23885899a1fa44253a17956c6d1c3...,,,ACTIVE,NONE,49.0,52043ee2162cf5aa7ee79974281641c6f11a68d276429a...,2019-05-25,568601006,0.050831,2
3,00000dbacae5abe5e23885899a1fa44253a17956c6d1c3...,,,ACTIVE,NONE,49.0,52043ee2162cf5aa7ee79974281641c6f11a68d276429a...,2019-05-02,697138006,0.010153,2
0,00000dbacae5abe5e23885899a1fa44253a17956c6d1c3...,,,ACTIVE,NONE,49.0,52043ee2162cf5aa7ee79974281641c6f11a68d276429a...,2018-12-27,625548001,0.044051,1
1,00000dbacae5abe5e23885899a1fa44253a17956c6d1c3...,,,ACTIVE,NONE,49.0,52043ee2162cf5aa7ee79974281641c6f11a68d276429a...,2018-12-27,176209023,0.035576,1


In [None]:
# We are doing negative sampling here for our data since we just have the positive samples. 
# We are creating list for customer_id, article_id, and purchase(if customer purchased the article or not).
# After that, we are randomly selecting items and checking if the customer has purchased it or not.
# If purchsed, the value will be '1'. If not, the value will be '0'.
# We have kept a ratio of 5:1. That means for each positive sample we'll have 5 negative samples.
from tqdm.notebook import tqdm

In [None]:
product_list = transactions_df_2000['article_id'].unique()

# Placeholders that will hold the training data
customers_id, article_id, purchase = [], [], []

# This is the set of items that each user has interaction with
user_item_set = set(zip(transactions_df_2000['customer_id'], transactions_df_2000['article_id']))

# 5:1 ratio of negative to positive samples
num_negatives = 5

for (u, i) in tqdm(user_item_set):
    customers_id.append(u)
    article_id.append(i)
    purchase.append(1) # items that the user has interacted with are positive
    for _ in range(num_negatives):
        # randomly select an item
        negative_item = np.random.choice(product_list) 
        # check that the user has not interacted with this item
        while (u, negative_item) in user_item_set:
            negative_item = np.random.choice(product_list)
        customers_id.append(u)
        article_id.append(negative_item)
        purchase.append(0) # items not interacted with are negative

  0%|          | 0/37636 [00:00<?, ?it/s]

In [None]:
# Creating a dataframe from lists(customer_id, article_id, purchase)
df_purchase = pd.DataFrame()
df_purchase['customer_id'] = customers_id
df_purchase['article_id'] = article_id
df_purchase['purchase'] = purchase

In [None]:
len(df_purchase.article_id.unique())

20774

In [None]:
# Number of items that were purchased by customer and those that were not.
df_purchase.purchase.value_counts()

0    188180
1     37636
Name: purchase, dtype: int64

In [None]:
df_purchase

Unnamed: 0,customer_id,article_id,purchase
0,00415737ad0daa6c1e16bbbca3baa69dfeb257e8ceef5f...,534181005,1
1,00415737ad0daa6c1e16bbbca3baa69dfeb257e8ceef5f...,708311003,0
2,00415737ad0daa6c1e16bbbca3baa69dfeb257e8ceef5f...,748566002,0
3,00415737ad0daa6c1e16bbbca3baa69dfeb257e8ceef5f...,859808002,0
4,00415737ad0daa6c1e16bbbca3baa69dfeb257e8ceef5f...,766247001,0
...,...,...,...
225811,000ee56f745271e72ae8b5680a416a4fbf8acf6a690ab2...,702005001,0
225812,000ee56f745271e72ae8b5680a416a4fbf8acf6a690ab2...,825045001,0
225813,000ee56f745271e72ae8b5680a416a4fbf8acf6a690ab2...,634160006,0
225814,000ee56f745271e72ae8b5680a416a4fbf8acf6a690ab2...,557247010,0


In [None]:
# Encode customer_id and 'article_id' for the df_purchase dataframe 
label_encoder = preprocessing.LabelEncoder()
df_purchase['customer_id'] = label_encoder.fit_transform(df_purchase['customer_id'])
df_purchase['article_id'] = label_encoder.fit_transform(df_purchase['article_id'])
df_purchase

Unnamed: 0,customer_id,article_id,purchase
0,1397,1521,1
1,1397,9462,0
2,1397,12425,0
3,1397,18959,0
4,1397,13808,0
...,...,...,...
225811,311,9029,0
225812,311,17287,0
225813,311,4712,0
225814,311,2023,0


In [None]:
# Total number of unique customers
customer_num = len(df_purchase['customer_id'].unique())
customer_num

1982

In [None]:
# Total number of unique articles
article_num = len(df_purchase['article_id'].unique())
article_num

20774

In [None]:
# Partition the data
test_data = df_purchase.drop_duplicates(subset=["customer_id"], keep='last')
index_df = df_purchase.index.isin(test_data.index)
train_data = df_purchase.iloc[~index_df]
print(len(train_data), len(test_data))

223834 1982


##Building the Neural Network

In [None]:
# Build a neural network on training data
class neural_network(nn.Module):
    def __init__(self,  emb_size, hidden_size1, hidden_size2, hidden_size3, hidden_size4, out_size):
        super().__init__()

        self.user_emb = nn.Embedding(customer_num, emb_size)
        self.item_emb = nn.Embedding(article_num, emb_size)
        
        self.network = nn.Sequential(
          nn.Linear(emb_size*2, hidden_size1),
          nn.ReLU(),
          nn.Linear(hidden_size1, hidden_size2),
          nn.ReLU(),
          nn.Linear(hidden_size2, hidden_size3),
          nn.ReLU(),
          nn.Linear(hidden_size3, hidden_size4),
          nn.ReLU(),
          nn.Linear(hidden_size4, out_size))

    def forward(self, u_id, v_id):
        u = self.user_emb(u_id)
        v = self.item_emb(v_id)
        c = torch.cat([u,v], dim = 1)
        out = self.network(c)
        return out

In [None]:
# Create tensor from pandas dataframe
train_customer_tensor = torch.tensor(train_data['customer_id'].values)
train_article_tensor = torch.tensor(train_data['article_id'].values)
train_purchase_tensor = torch.tensor(train_data['purchase'].values)
test_customer_tensor = torch.tensor(test_data['customer_id'].values)
test_article_tensor = torch.tensor(test_data['article_id'].values)
test_purchase_tensor = torch.tensor(test_data['purchase'].values)

# Create tensor dataset
train_dataset = torch.utils.data.TensorDataset(train_customer_tensor.long(),train_article_tensor.long(),train_purchase_tensor.long())
test_dataset = torch.utils.data.TensorDataset(test_customer_tensor.long(),test_article_tensor.long(),test_purchase_tensor.long())

# Define training and testing data loader, and set batch size to 512
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size= 512, shuffle = True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size= 512, shuffle = False)

In [None]:
# Define training loop function
def training_loop(n_epochs, optimizer, model, loss_fn, train_loader):
    for epoch in range(0, n_epochs):
        # Training Phase 
        model.train()
        loss_train = 0.0
        for customer_input, article_input, labels in train_loader: # (customer_input, article_input, labels) are from (train_user_tensor, train_movie_tensor, train_rating_tensor) in train_dataset
                                                             # (customer_input, article_input, labels) are the inputs for each batch
            outputs = model(customer_input, article_input) # (customer_input,article_input) correspond to the u_id, v_id, which are the inputs of the forward(self, u_id, v_id) function
            loss = loss_fn(outputs, labels)
            
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            loss_train += loss.item()

        if epoch == 0 or epoch == n_epochs-1 or epoch % 1 == 0:
            print('Epoch {}, Training loss {}'.format(epoch, loss_train / len(train_loader)))

In [None]:
# Model training
torch.manual_seed(0)
NCF = neural_network(8,128,64,64,64,2)
Adam_optimizer = optim.Adam(NCF.parameters(), lr = 0.01)
loss_fn = nn.CrossEntropyLoss()

training_loop(n_epochs = 20, optimizer = Adam_optimizer, model = NCF, loss_fn = loss_fn, train_loader = train_loader)

Epoch 0, Training loss 0.4553452059966788
Epoch 1, Training loss 0.45346984736723445
Epoch 2, Training loss 0.4506992829716913
Epoch 3, Training loss 0.44059952385893697
Epoch 4, Training loss 0.4296227182563581
Epoch 5, Training loss 0.4157294983461023
Epoch 6, Training loss 0.39874678605223357
Epoch 7, Training loss 0.3832378509246051
Epoch 8, Training loss 0.3702903785253769
Epoch 9, Training loss 0.3600208072765777
Epoch 10, Training loss 0.3525528392699211
Epoch 11, Training loss 0.3482294845390538
Epoch 12, Training loss 0.3444381497085911
Epoch 13, Training loss 0.3418316527112434
Epoch 14, Training loss 0.3394204361798012
Epoch 15, Training loss 0.3381923147260326
Epoch 16, Training loss 0.33625419186130506
Epoch 17, Training loss 0.33480746935219524
Epoch 18, Training loss 0.3339226318684887
Epoch 19, Training loss 0.33251287233611765


In [None]:
# Defining testing function
def test(model, train_loader, test_loader):
 
  # testing phase
  model.eval()
  predict_train = []
  predict_test = []
  labels_train = []
  labels_test = []

  with torch.no_grad():
      for customer_input, article_input, labels in train_loader:
          outputs = model(customer_input, article_input)
          index_, predicted = torch.max(outputs, dim=1)
          predict_train.append(predicted.tolist())
          labels_train.append(labels.tolist())

      for customer_input, article_input, labels in test_loader:
          outputs = model(customer_input, article_input)
          index_, predicted = torch.max(outputs, dim=1)
          predict_test.append(predicted.tolist())
          labels_test.append(labels.tolist())

  print("Confusion matrix on train:\n",  confusion_matrix(list(chain(*labels_train)), list(chain(*predict_train)), labels=[0, 1]))
  print()
  print("Classification report on train:\n",  classification_report(list(chain(*labels_train)), list(chain(*predict_train)), labels=[0, 1]))
  print()
  print("Confusion matrix on test:\n",  confusion_matrix(list(chain(*labels_test)), list(chain(*predict_test)), labels=[0, 1]))
  print()
  print("Classification report on test:\n",  classification_report(list(chain(*labels_test)), list(chain(*predict_test)), labels=[0, 1]))

In [None]:
# Examine evaluation results
test(model = NCF, train_loader = train_loader, test_loader = test_loader)

Confusion matrix on train:
 [[185215    983]
 [ 22421  15215]]

Classification report on train:
               precision    recall  f1-score   support

           0       0.89      0.99      0.94    186198
           1       0.94      0.40      0.57     37636

    accuracy                           0.90    223834
   macro avg       0.92      0.70      0.75    223834
weighted avg       0.90      0.90      0.88    223834


Confusion matrix on test:
 [[1858  124]
 [   0    0]]

Classification report on test:
               precision    recall  f1-score   support

           0       1.00      0.94      0.97      1982
           1       0.00      0.00      0.00         0

    accuracy                           0.94      1982
   macro avg       0.50      0.47      0.48      1982
weighted avg       1.00      0.94      0.97      1982



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


In [None]:
!jupyter nbconvert --to html "/content/drive/MyDrive/Colab Notebooks/IS695_Final_Project_Model_Group_6.ipynb"

[NbConvertApp] Converting notebook /content/drive/MyDrive/Colab Notebooks/IS695_Final_Project_Model_Group_6.ipynb to html
[NbConvertApp] Writing 385377 bytes to /content/drive/MyDrive/Colab Notebooks/IS695_Final_Project_Model_Group_6.html
