**Author:** Oguz Alp Eren

**Course:** Projects in Advanced Machine Learning

Columbia University, 2023

#Assignment 3: Text Classification Using the Stanford SST Sentiment Dataset

##/Loading the Data and Preprocessing

In [None]:
#install aimodelshare library
! pip install aimodelshare==0.0.189

In [None]:
#Set credentials using modelshare.org username/password

import aimodelshare as ai
from aimodelshare.aws import set_credentials
    
apiurl="https://rlxjxnoql9.execute-api.us-east-1.amazonaws.com/prod/m" #This is the unique rest api that powers this specific Playground

set_credentials(apiurl=apiurl)

AI Modelshare Username:··········
AI Modelshare Password:··········
AI Model Share login credentials set successfully.


In [None]:
#Instantiate Competition

mycompetition= ai.Competition(apiurl)

In [None]:
# Get competition data
from aimodelshare import download_data
download_data('public.ecr.aws/y2e2a1d6/sst2_competition_data-repository:latest') 


Data downloaded successfully.


In [None]:
# Set up X_train, X_test, and y_train_labels objects
import pandas as pd
import warnings
warnings.simplefilter(action='ignore', category=Warning)

X_train=pd.read_csv("sst2_competition_data/X_train.csv", squeeze=True)
X_test=pd.read_csv("sst2_competition_data/X_test.csv", squeeze=True)

y_train_labels=pd.read_csv("sst2_competition_data/y_train_labels.csv", squeeze=True)

# ohe encode Y data
y_train = pd.get_dummies(y_train_labels)

X_train.head()

0    The Rock is destined to be the 21st Century 's...
1    The gorgeously elaborate continuation of `` Th...
2    Singer/composer Bryan Adams contributes a slew...
3                 Yet the act is still charming here .
4    Whether or not you 're enlightened by any of D...
Name: text, dtype: object

In [None]:
# This preprocessor function makes use of the tf.keras tokenizer

from tensorflow import keras
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.utils import pad_sequences
import numpy as np

# Build vocabulary from training text data
tokenizer = Tokenizer(num_words=10000)
tokenizer.fit_on_texts(X_train)

# preprocessor tokenizes words and makes sure all documents have the same length
def preprocessor(data, maxlen=40, max_words=10000):

    sequences = tokenizer.texts_to_sequences(data)

    word_index = tokenizer.word_index
    X = pad_sequences(sequences, maxlen=maxlen)

    return X

print(preprocessor(X_train).shape)
print(preprocessor(X_test).shape)

(6920, 40)
(1821, 40)


In [None]:
ai.export_preprocessor(preprocessor,"") 

Your preprocessor is now saved to 'preprocessor.zip'


In [None]:
!wget http://nlp.stanford.edu/data/glove.6B.zip
!unzip glove.6B.zip

import os
import numpy as np

glove_dir = './'

embeddings_index = {}
with open(os.path.join(glove_dir, 'glove.6B.100d.txt')) as f:
    for line in f:
        values = line.split()
        word = values[0]
        coefs = np.asarray(values[1:], dtype='float32')
        embeddings_index[word] = coefs

embedding_dim = 100
word_index = tokenizer.word_index
max_words = 10000

embedding_matrix = np.zeros((max_words, embedding_dim))
for word, i in word_index.items():
    if i < max_words:
        embedding_vector = embeddings_index.get(word)
        if embedding_vector is not None:
            embedding_matrix[i] = embedding_vector

--2023-04-14 21:22:11--  http://nlp.stanford.edu/data/glove.6B.zip
Resolving nlp.stanford.edu (nlp.stanford.edu)... 171.64.67.140
Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://nlp.stanford.edu/data/glove.6B.zip [following]
--2023-04-14 21:22:11--  https://nlp.stanford.edu/data/glove.6B.zip
Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://downloads.cs.stanford.edu/nlp/data/glove.6B.zip [following]
--2023-04-14 21:22:11--  https://downloads.cs.stanford.edu/nlp/data/glove.6B.zip
Resolving downloads.cs.stanford.edu (downloads.cs.stanford.edu)... 171.64.64.22
Connecting to downloads.cs.stanford.edu (downloads.cs.stanford.edu)|171.64.64.22|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 862182613 (822M) [application/zip]
Saving to: ‘glove.6B.zip.2’


2

In [None]:
# Download larger GloVe embeddings
!wget http://nlp.stanford.edu/data/glove.6B.zip
!unzip glove.6B.zip

import os
import numpy as np

glove_dir = './'

embeddings_index = {}
with open(os.path.join(glove_dir, 'glove.6B.300d.txt')) as f:
    for line in f:
        values = line.split()
        word = values[0]
        coefs = np.asarray(values[1:], dtype='float32')
        embeddings_index[word] = coefs

embedding_dim = 300

embedding_matrix = np.zeros((max_words, embedding_dim))
for word, i in word_index.items():
    if i < max_words:
        embedding_vector = embeddings_index.get(word)
        if embedding_vector is not None:
            embedding_matrix[i] = embedding_vector

--2023-04-14 21:27:34--  http://nlp.stanford.edu/data/glove.6B.zip
Resolving nlp.stanford.edu (nlp.stanford.edu)... 171.64.67.140
Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://nlp.stanford.edu/data/glove.6B.zip [following]
--2023-04-14 21:27:35--  https://nlp.stanford.edu/data/glove.6B.zip
Connecting to nlp.stanford.edu (nlp.stanford.edu)|171.64.67.140|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: https://downloads.cs.stanford.edu/nlp/data/glove.6B.zip [following]
--2023-04-14 21:27:35--  https://downloads.cs.stanford.edu/nlp/data/glove.6B.zip
Resolving downloads.cs.stanford.edu (downloads.cs.stanford.edu)... 171.64.64.22
Connecting to downloads.cs.stanford.edu (downloads.cs.stanford.edu)|171.64.64.22|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 862182613 (822M) [application/zip]
Saving to: ‘glove.6B.zip.3’


2

##1.) Discuss the dataset in general terms and describe why building a predictive model using this data might be practically useful.  Who could benefit from a model like this? Explain.

The SST dataset consists of movie reviews, with each review broken down into phrases and sentences. Sentiment labels are assigned to these components, indicating positive, negative, or neutral emotions. This hierarchical structure enables the analysis of sentiment at multiple levels of granularity within the text. The dataset is an important tool for building and testing sentiment analysis models. Its detailed sentiment labels help create models that understand and interpret complicated opinions in text. As a standard in the NLP field, the SST dataset has led to major improvements in sentiment analysis research and the development of more precise and advanced models.

A predictive model trained on the SST dataset can be of practical use in a variety of contexts. Businesses can use sentiment analysis to monitor customer feedback and make informed decisions about product development and marketing strategies. Investors can gauge public sentiment about companies to inform their investment decisions. Researchers can leverage the dataset to further explore NLP techniques and improve upon existing sentiment analysis models. Content creators and media professionals can also benefit from understanding audience sentiment to better tailor their output and engage their viewers or readers. From businesses to researchers and content creators, the applications of a model trained on the SST dataset are vast, making it an essential resource for those seeking to better understand and act on the opinions and emotions expressed in textual data.

##2.) Run at least three prediction models to try to predict the SST sentiment dataset well. Submit your best three models to the leader board for the SST Model Share competition

###2a.) Model with an Embedding layer and LSTM layers

This model is a Keras sequential neural network that consists of an Embedding layer to convert integer-encoded tokens into dense vectors, two LSTM layers to learn and remember long-range dependencies in sequences, and a Dense layer with a softmax activation function for multi-class classification. The model is compiled using the RMSprop optimizer, categorical cross-entropy loss, and accuracy as the evaluation metric. It is trained using preprocessed input data and one-hot encoded labels for 10 epochs with a batch size of 32, while 20% of the data is used for validation purposes.

In [None]:
from tensorflow.keras.layers import Dense, Embedding, LSTM
from tensorflow.keras.models import Sequential

# Build the model
model1 = Sequential()
model1.add(Embedding(10000, 16, input_length=40))
model1.add(LSTM(32, return_sequences=True))
model1.add(LSTM(32))
model1.add(Dense(2, activation='softmax'))
model1.summary()

# Compile the model
model1.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['acc'])

# Train the model
history = model1.fit(preprocessor(X_train), y_train,
                    epochs=10,
                    batch_size=32,
                    validation_split=0.2)

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, 40, 16)            160000    
                                                                 
 lstm (LSTM)                 (None, 40, 32)            6272      
                                                                 
 lstm_1 (LSTM)               (None, 32)                8320      
                                                                 
 dense (Dense)               (None, 2)                 66        
                                                                 
Total params: 174,658
Trainable params: 174,658
Non-trainable params: 0
_________________________________________________________________
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [None]:
# Save keras model to local ONNX file
from aimodelshare.aimsonnx import model_to_onnx

onnx_model1 = model_to_onnx(model1, framework='keras',
                          transfer_learning=False,
                          deep_learning=True)

with open("model1.onnx", "wb") as f:
    f.write(onnx_model1.SerializeToString())

###2b.) Model with an Embedding layer and Conv1d layers

This model is a Keras sequential neural network that starts with an Embedding layer for converting integer-encoded tokens into dense vectors. Following the Embedding layer, a Conv1D layer with 32 filters and a kernel size of 3 is used to perform convolution operations on the input data, and a ReLU activation function is applied for introducing non-linearity. Afterward, a GlobalMaxPooling1D layer is utilized to reduce the spatial dimensions of the feature maps. The model concludes with a Dense layer containing 2 output units and a softmax activation function for multi-class classification. The model is compiled using the RMSprop optimizer, categorical cross-entropy loss, and accuracy as the evaluation metric. It is trained on preprocessed input data and one-hot encoded labels for 10 epochs with a batch size of 32, using 20% of the data for validation.

In [None]:
from tensorflow.keras.layers import Dense, Embedding, Conv1D, GlobalMaxPooling1D
from tensorflow.keras.models import Sequential

# Build the model
model2 = Sequential()
model2.add(Embedding(10000, 16, input_length=40))
model2.add(Conv1D(32, 3, activation='relu'))
model2.add(GlobalMaxPooling1D())
model2.add(Dense(2, activation='softmax'))
model2.summary()

# Compile the model
model2.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['acc'])

# Train the model
history = model2.fit(preprocessor(X_train), y_train,
                     epochs=10,
                     batch_size=32,
                     validation_split=0.2)

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_1 (Embedding)     (None, 40, 16)            160000    
                                                                 
 conv1d (Conv1D)             (None, 38, 32)            1568      
                                                                 
 global_max_pooling1d (Globa  (None, 32)               0         
 lMaxPooling1D)                                                  
                                                                 
 dense_1 (Dense)             (None, 2)                 66        
                                                                 
Total params: 161,634
Trainable params: 161,634
Non-trainable params: 0
_________________________________________________________________
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


In [None]:
# Save keras model to local ONNX file
from aimodelshare.aimsonnx import model_to_onnx

onnx_model2 = model_to_onnx(model2, framework='keras',
                          transfer_learning=False,
                          deep_learning=True)

with open("model2.onnx", "wb") as f:
    f.write(onnx_model2.SerializeToString())

###2c.) Model with transfer learning with glove embeddings

This model is a Keras sequential neural network that employs transfer learning with pre-trained GloVe embeddings. The model starts with an Embedding layer that converts integer-encoded tokens into dense vectors using the GloVe embedding matrix, with the layer's weights set as non-trainable. Next, an LSTM layer with 32 units is used to learn long-range dependencies in sequences and is followed by a GlobalMaxPooling1D layer to reduce the spatial dimensions of the feature maps. Finally, a Dense layer with 2 output units and a softmax activation function is included for multi-class classification. The model is compiled using the RMSprop optimizer, categorical cross-entropy loss, and accuracy as the evaluation metric. It is trained on preprocessed input data and one-hot encoded labels for 10 epochs with a batch size of 32, using 20% of the data for validation.

In [None]:
from tensorflow.keras.layers import Dense, Embedding, LSTM, GlobalMaxPooling1D
from tensorflow.keras.models import Sequential

model3 = Sequential()
model3.add(Embedding(max_words, embedding_dim, input_length=40, weights=[embedding_matrix], trainable=False))
model3.add(LSTM(32, return_sequences=True))
model3.add(GlobalMaxPooling1D())
model3.add(Dense(2, activation='softmax'))
model3.summary()

model3.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['acc'])

history = model3.fit(preprocessor(X_train), y_train,
                     epochs=10,
                     batch_size=32,
                     validation_split=0.2)

Model: "sequential_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_2 (Embedding)     (None, 40, 300)           3000000   
                                                                 
 lstm_2 (LSTM)               (None, 40, 32)            42624     
                                                                 
 global_max_pooling1d_1 (Glo  (None, 32)               0         
 balMaxPooling1D)                                                
                                                                 
 dense_2 (Dense)             (None, 2)                 66        
                                                                 
Total params: 3,042,690
Trainable params: 42,690
Non-trainable params: 3,000,000
_________________________________________________________________
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 

In [None]:
# Save keras model to local ONNX file
from aimodelshare.aimsonnx import model_to_onnx

onnx_model3 = model_to_onnx(model3, framework='keras',
                          transfer_learning=False,
                          deep_learning=True)

with open("model3.onnx", "wb") as f:
    f.write(onnx_model3.SerializeToString())

###2d.) Discuss which models performed better and point out relevant hyper-parameter values for successful models.


Out of the first three models, Model 3 achieved the highest accuracy score of 0.8057, followed by Model 1 with 0.7991, and Model 2 with the lowest score of 0.7749. Model 3 performed the best among the three models, and one of the main reasons behind its success is the use of pre-trained GloVe word embeddings. These embeddings are trained on a large corpus of text and provide a more expressive representation of words. By using these pre-trained embeddings and setting the "trainable" parameter to "False," the model takes advantage of the word relationships and semantic information captured in the embeddings.

Model 1, which performed slightly worse than Model 3, employed an LSTM architecture without pre-trained embeddings. It used an Embedding layer with a smaller dimension of 16 compared to the 100-dimensional embeddings used in Model 3. This difference in embedding size could have an impact on the model's ability to capture the semantic relationships between words. Despite this, the LSTM layers in Model 1 still helped it achieve a relatively high accuracy score, as LSTM layers are known for their ability to capture long-range dependencies in sequential data. Model 2 had the lowest accuracy score, likely because it used a 1D convolutional neural network (Conv1D) instead of LSTMs. While Conv1D layers can capture local patterns in the input data, they might not be as effective as LSTMs for capturing the long-range dependencies in text data. Furthermore, Model 2 also utilized an Embedding layer with a dimension of 16, which could limit its ability to represent word relationships effectively.

##3.) After you submit your first three models, describe your best model with your team via your team slack channel. Fit and submit up to three more models after learning from your team.

###3a.) Updated model with an Embedding Layer and LSTM layers

This model is a Keras sequential neural network that starts with an Embedding layer for converting integer-encoded tokens into dense vectors of size 128. Following the Embedding layer, a Bidirectional LSTM layer with 128 units is used to learn long-range dependencies in sequences from both forward and backward directions, incorporating dropout and recurrent dropout of 0.2 each. Another Bidirectional LSTM layer with 64 units is added, also with dropout and recurrent dropout set to 0.2. Afterward, a Dropout layer with a rate of 0.5 is used to reduce overfitting. The model concludes with a Dense layer containing 2 output units and a softmax activation function for multi-class classification. The model is compiled using the Adam optimizer with a learning rate of 0.001, categorical cross-entropy loss, and accuracy as the evaluation metric. Early stopping is applied with a monitor on validation loss and a patience of 3 epochs. The model is trained on preprocessed input data and one-hot encoded labels for 10 epochs with a batch size of 32, using 20% of the data for validation, and early stopping as a callback.

In [None]:
from tensorflow.keras.layers import Dense, Embedding, LSTM, Bidirectional, Dropout
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping

# Build the model
model4 = Sequential()
model4.add(Embedding(10000, 128, input_length=40))
model4.add(Bidirectional(LSTM(128, return_sequences=True, dropout=0.2, recurrent_dropout=0.2)))
model4.add(Bidirectional(LSTM(64, dropout=0.2, recurrent_dropout=0.2)))
model4.add(Dropout(0.5))
model4.add(Dense(2, activation='softmax'))
model4.summary()

# Compile the model
optimizer = Adam(lr=0.001)
model4.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['acc'])

# Set early stopping
early_stopping = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)

# Train the model
history = model4.fit(preprocessor(X_train), y_train,
                     epochs=10,
                     batch_size=32,
                     validation_split=0.2,
                     callbacks=[early_stopping])


Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_3 (Embedding)     (None, 40, 128)           1280000   
                                                                 
 bidirectional (Bidirectiona  (None, 40, 256)          263168    
 l)                                                              
                                                                 
 bidirectional_1 (Bidirectio  (None, 128)              164352    
 nal)                                                            
                                                                 
 dropout (Dropout)           (None, 128)               0         
                                                                 
 dense_3 (Dense)             (None, 2)                 258       
                                                                 
Total params: 1,707,778
Trainable params: 1,707,778
No

In [None]:
# Save keras model to local ONNX file
from aimodelshare.aimsonnx import model_to_onnx

onnx_model4 = model_to_onnx(model4, framework='keras',
                          transfer_learning=False,
                          deep_learning=True)

with open("model4.onnx", "wb") as f:
    f.write(onnx_model4.SerializeToString())

###3b.) Updated model with an Embedding layer and Conv1d layers

This model employs a Keras functional API to create a neural network with an Input layer taking sequences of length 40. An Embedding layer follows, transforming integer-encoded tokens into dense vectors of size 128. Next, three Conv1D layers with 128 filters and varying kernel sizes (3, 4, and 5) are applied to the embedding output to capture local patterns of different lengths. The outputs of these convolutional layers are concatenated along the time axis, resulting in a single feature map. A GlobalMaxPooling1D layer is used to extract the most important features from the concatenated feature map. A Dropout layer with a rate of 0.5 helps reduce overfitting, and a Dense output layer with 2 units and a softmax activation function is used for multi-class classification. The model is compiled with the Adam optimizer at a learning rate of 0.001, categorical cross-entropy loss, and accuracy as the evaluation metric. Early stopping is applied with a monitor on validation loss and a patience of 3 epochs. The model is trained on preprocessed input data and one-hot encoded labels for 10 epochs with a batch size of 32, using 20% of the data for validation, and early stopping as a callback.

In [None]:
from tensorflow.keras.layers import Dense, Embedding, Conv1D, GlobalMaxPooling1D, Dropout, Input, concatenate
from tensorflow.keras.models import Sequential, Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping

# Build the model
input_layer = Input(shape=(40,))
embedding_layer = Embedding(10000, 128, input_length=40)(input_layer)

conv1 = Conv1D(128, 3, activation='relu')(embedding_layer)
conv2 = Conv1D(128, 4, activation='relu')(embedding_layer)
conv3 = Conv1D(128, 5, activation='relu')(embedding_layer)

merged = concatenate([conv1, conv2, conv3], axis=1)
pooling = GlobalMaxPooling1D()(merged)
dropout = Dropout(0.5)(pooling)
output_layer = Dense(2, activation='softmax')(dropout)

model5 = Model(inputs=input_layer, outputs=output_layer)
model5.summary()

# Compile the model
optimizer = Adam(lr=0.001)
model5.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['acc'])

# Set early stopping
early_stopping = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)

# Train the model
history = model5.fit(preprocessor(X_train), y_train,
                     epochs=10,
                     batch_size=32,
                     validation_split=0.2,
                     callbacks=[early_stopping])

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_1 (InputLayer)           [(None, 40)]         0           []                               
                                                                                                  
 embedding_4 (Embedding)        (None, 40, 128)      1280000     ['input_1[0][0]']                
                                                                                                  
 conv1d_1 (Conv1D)              (None, 38, 128)      49280       ['embedding_4[0][0]']            
                                                                                                  
 conv1d_2 (Conv1D)              (None, 37, 128)      65664       ['embedding_4[0][0]']            
                                                                                              

In [None]:
# Save keras model to local ONNX file
from aimodelshare.aimsonnx import model_to_onnx

onnx_model5 = model_to_onnx(model5, framework='keras',
                          transfer_learning=False,
                          deep_learning=True)

with open("model5.onnx", "wb") as f:
    f.write(onnx_model5.SerializeToString())

###3c.) Updated model with transfer learning with glove embeddings

This model is a sequential neural network that begins with an Embedding layer that uses pre-trained GloVe embeddings. The embedding layer has a vocabulary size of max_words, an embedding dimension equal to the pre-trained GloVe embeddings, and an input length of 40. The Embedding layer is followed by two Bidirectional LSTM layers with 128 and 64 units, respectively. Both LSTM layers use dropout and recurrent dropout rates of 0.2 to reduce overfitting and return sequences to feed the next layer. A GlobalMaxPooling1D layer is employed to extract the most important features from the sequences. A Dropout layer with a rate of 0.5 is added to further prevent overfitting, and a Dense output layer with 2 units and a softmax activation function performs multi-class classification. The model is compiled using the Adam optimizer with a learning rate of 0.001, a categorical cross-entropy loss function, and accuracy as the evaluation metric. Early stopping with a patience of 3 epochs is implemented to prevent overfitting. The model is trained for 10 epochs using a batch size of 32 and a validation split of 0.2.




In [None]:
from tensorflow.keras.layers import Dense, Embedding, LSTM, GlobalMaxPooling1D, Dropout, Bidirectional
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping

# Build the model
model6 = Sequential()
model6.add(Embedding(max_words, embedding_dim, input_length=40, weights=[embedding_matrix], trainable=True))
model6.add(Bidirectional(LSTM(128, return_sequences=True, dropout=0.2, recurrent_dropout=0.2)))
model6.add(Bidirectional(LSTM(64, return_sequences=True, dropout=0.2, recurrent_dropout=0.2)))
model6.add(GlobalMaxPooling1D())
model6.add(Dropout(0.5))
model6.add(Dense(2, activation='softmax'))
model6.summary()

# Compile the model
optimizer = Adam(lr=0.001)
model6.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['acc'])

# Set early stopping
early_stopping = EarlyStopping(monitor='val_loss', patience=3, restore_best_weights=True)

# Train the model
history = model6.fit(preprocessor(X_train), y_train,
                     epochs=10,
                     batch_size=32,
                     validation_split=0.2,
                     callbacks=[early_stopping])

Model: "sequential_4"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding_5 (Embedding)     (None, 40, 300)           3000000   
                                                                 
 bidirectional_2 (Bidirectio  (None, 40, 256)          439296    
 nal)                                                            
                                                                 
 bidirectional_3 (Bidirectio  (None, 40, 128)          164352    
 nal)                                                            
                                                                 
 global_max_pooling1d_3 (Glo  (None, 128)              0         
 balMaxPooling1D)                                                
                                                                 
 dropout_2 (Dropout)         (None, 128)               0         
                                                      

In [None]:
# Save keras model to local ONNX file
from aimodelshare.aimsonnx import model_to_onnx

onnx_model6 = model_to_onnx(model6, framework='keras',
                          transfer_learning=False,
                          deep_learning=True)

with open("model6.onnx", "wb") as f:
    f.write(onnx_model6.SerializeToString())

###3d.) Discuss results

Among models 4-6, Model 6 achieved the highest accuracy score of 0.8299, followed by Model 5 with 0.7925, and Model 4 with the lowest score of 0.7804.Model 6 performed the best, and one key factor behind its success is the use of pre-trained GloVe word embeddings, similar to Model 3. These pre-trained embeddings provide a richer representation of words by capturing semantic relationships from a large corpus of text. Another important aspect of Model 6 is its use of bidirectional LSTMs with dropout and recurrent_dropout for regularization. Bidirectional LSTMs allow the model to process the input sequence from both directions, which can help capture more context and improve the model's ability to understand the underlying patterns in the data.

Model 5, which had a slightly lower accuracy than Model 6, used a combination of 1D convolutional neural networks (Conv1D) with varying filter sizes and a GlobalMaxPooling1D layer to extract features from the input data. The concatenated output of these Conv1D layers is then passed through a Dropout layer for regularization. Although this architecture can capture local patterns in the data, it may not be as effective as bidirectional LSTMs in capturing long-range dependencies in text data. Model 4 had the lowest accuracy among the three models. Like Model 6, it utilized bidirectional LSTMs and dropout for regularization. However, one key difference between Model 4 and Model 6 is the absence of pre-trained GloVe embeddings in Model 4. Instead, it used an Embedding layer with a higher dimension of 128 but learned the embeddings from scratch. This may have limited the model's ability to capture semantic relationships effectively, leading to a lower performance compared to Model 6.

In [None]:
#Submit Model 1: 

#-- Generate predicted y values (Model 1)
#Note: Keras predict returns the predicted column index location for classification models
prediction_column_index=model1.predict(preprocessor(X_test)).argmax(axis=1)

# extract correct prediction labels 
prediction_labels = [y_train.columns[i] for i in prediction_column_index]

# Submit Model 1 to Competition Leaderboard
mycompetition.submit_model(model_filepath = "model1.onnx",
                                 preprocessor_filepath="preprocessor.zip",
                                 prediction_submission=prediction_labels)

Insert search tags to help users find your model (optional): emin
Provide any useful notes about your model (optional): emine

Your model has been submitted as model version 138

To submit code used to create this model or to view current leaderboard navigate to Model Playground: 

 https://www.modelshare.org/detail/model:2763


In [None]:
#Submit Model 2: 

#-- Generate predicted y values (Model 2)
#Note: Keras predict returns the predicted column index location for classification models
prediction_column_index=model2.predict(preprocessor(X_test)).argmax(axis=1)

# extract correct prediction labels 
prediction_labels = [y_train.columns[i] for i in prediction_column_index]

# Submit Model 2 to Competition Leaderboard
mycompetition.submit_model(model_filepath = "model2.onnx",
                                 preprocessor_filepath="preprocessor.zip",
                                 prediction_submission=prediction_labels)

Insert search tags to help users find your model (optional): emin
Provide any useful notes about your model (optional): emine

Your model has been submitted as model version 139

To submit code used to create this model or to view current leaderboard navigate to Model Playground: 

 https://www.modelshare.org/detail/model:2763


In [None]:
#Submit Model 3: 

#-- Generate predicted y values (Model 3)
#Note: Keras predict returns the predicted column index location for classification models
prediction_column_index=model3.predict(preprocessor(X_test)).argmax(axis=1)

# extract correct prediction labels 
prediction_labels = [y_train.columns[i] for i in prediction_column_index]

# Submit Model 3 to Competition Leaderboard
mycompetition.submit_model(model_filepath = "model3.onnx",
                                 preprocessor_filepath="preprocessor.zip",
                                 prediction_submission=prediction_labels)

Insert search tags to help users find your model (optional): emin
Provide any useful notes about your model (optional): emine

Your model has been submitted as model version 140

To submit code used to create this model or to view current leaderboard navigate to Model Playground: 

 https://www.modelshare.org/detail/model:2763


In [None]:
#Submit Model 4: 

#-- Generate predicted y values (Model 4)
#Note: Keras predict returns the predicted column index location for classification models
prediction_column_index=model4.predict(preprocessor(X_test)).argmax(axis=1)

# extract correct prediction labels 
prediction_labels = [y_train.columns[i] for i in prediction_column_index]

# Submit Model 4 to Competition Leaderboard
mycompetition.submit_model(model_filepath = "model4.onnx",
                                 preprocessor_filepath="preprocessor.zip",
                                 prediction_submission=prediction_labels)

Insert search tags to help users find your model (optional): emin
Provide any useful notes about your model (optional): emine

Your model has been submitted as model version 141

To submit code used to create this model or to view current leaderboard navigate to Model Playground: 

 https://www.modelshare.org/detail/model:2763


In [None]:
#Submit Model 5: 

#-- Generate predicted y values (Model 5)
#Note: Keras predict returns the predicted column index location for classification models
prediction_column_index=model5.predict(preprocessor(X_test)).argmax(axis=1)

# extract correct prediction labels 
prediction_labels = [y_train.columns[i] for i in prediction_column_index]

# Submit Model 5 to Competition Leaderboard
mycompetition.submit_model(model_filepath = "model5.onnx",
                                 preprocessor_filepath="preprocessor.zip",
                                 prediction_submission=prediction_labels)

Insert search tags to help users find your model (optional): emin
Provide any useful notes about your model (optional): emine

Your model has been submitted as model version 142

To submit code used to create this model or to view current leaderboard navigate to Model Playground: 

 https://www.modelshare.org/detail/model:2763


In [None]:
#Submit Model 6: 

#-- Generate predicted y values (Model 6)
#Note: Keras predict returns the predicted column index location for classification models
prediction_column_index=model6.predict(preprocessor(X_test)).argmax(axis=1)

# extract correct prediction labels 
prediction_labels = [y_train.columns[i] for i in prediction_column_index]

# Submit Model 6 to Competition Leaderboard
mycompetition.submit_model(model_filepath = "model6.onnx",
                                 preprocessor_filepath="preprocessor.zip",
                                 prediction_submission=prediction_labels)

Insert search tags to help users find your model (optional): emin
Provide any useful notes about your model (optional): emine

Your model has been submitted as model version 143

To submit code used to create this model or to view current leaderboard navigate to Model Playground: 

 https://www.modelshare.org/detail/model:2763


In [None]:
# Get leaderboard

data = mycompetition.get_leaderboard()
mycompetition.stylize_leaderboard(data)

##4.) Discuss which models you tried and which models performed better and point out relevant hyper-parameter values for successful models.

Model 1: A simple LSTM model with an Embedding layer and two LSTM layers.
Model 2: A 1D Convolutional Neural Network with an Embedding layer, a Conv1D layer, and a GlobalMaxPooling1D layer.
Model 3: An LSTM model with an Embedding layer using pre-trained GloVe word embeddings, an LSTM layer, and a GlobalMaxPooling1D layer.
Model 4: A Bidirectional LSTM model with an Embedding layer, two Bidirectional LSTM layers, and a Dropout layer.
Model 5: A 1D Convolutional Neural Network with multiple filter sizes, an Embedding layer, multiple Conv1D layers, a GlobalMaxPooling1D layer, and a Dropout layer.
Model 6: A Bidirectional LSTM model with an Embedding layer using pre-trained GloVe word embeddings, two Bidirectional LSTM layers with dropout and recurrent_dropout, and a GlobalMaxPooling1D layer.

Model 3 and Model 6 performed the best among all models with accuracy scores of 0.8057 and 0.8299, respectively. The key factors contributing to their success include:

- Pre-trained GloVe word embeddings: Both models utilized pre-trained GloVe embeddings, which capture semantic relationships from a large corpus of text, enriching the word representations and improving the model's performance.

- LSTM and Bidirectional LSTM layers: Model 3 employed a single LSTM layer, while Model 6 used two Bidirectional LSTM layers. These layers help capture long-range dependencies in text data, with bidirectional layers in Model 6 being particularly effective by processing input sequences from both directions.

- Dropout and recurrent_dropout: Model 6 used dropout and recurrent_dropout for regularization in the Bidirectional LSTM layers, preventing overfitting and improving generalization.

- GlobalMaxPooling1D layer: Both models used a GlobalMaxPooling1D layer to extract the most important features from the input sequence, which is beneficial for the model's performance.