Skip to content

Hamidraei23/LSTM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Coding Assignment 2: Hackathon - Binary Sentiment Analysis

Introduction

The second coding assignment asks you to implement a simple natural language processing model for sentiment analysis on the Amazon Review Dataset kaggle page of this coding assignment

You can use some deep learning libraries (e.g., PyTorch, Tensorflow) to accelerate your code with CUDA back-end.

Note: we will use Python 3.x for the project.

Submission checklist


What to submit

Push to your github classroom

  • All of the python files listed above (under "Files you'll edit").
    • Caution: DO NOT UPLOAD THE DATASET

Construct the dataset (10%)

Construct the training set for the amazon review dataset as instructed and report the following statistics.

REPORT1: Please fill the below table in the report

Statistics Value
the total number of unique words in T Plz, fill this
the total number of training examples in T Plz, fill this
the ratio of positive examples to negative examples in T Plz, fill this
the average length of document in T Plz, fill this
the max length of document in T Plz, fill this

Performance of deep neural network for classification (40%)

Suggested hyperparameters:

  1. Data processing

    1. Word embedding dimension: 100
    2. Word Index: keep the most frequent 10k words
  2. CNN

    1. Network: Word embedding lookup layer -> 1D CNN layer -> fully connected layer -> output prediction
    2. Number of filters: 100
    3. Filter length: 3
    4. CNN Activation: Relu
    5. Fully connected layer dimension 100, activation: None (i.e. this layer is linear)
  3. RNN

    1. Network: Word embedding lookup layer -> LSTM layer -> fully connected layer(on the hidden state of the last LSTM cell) -> output prediction
    2. Hidden dimension for LSTM cell: 100
    3. Activation for LSTM cell: tanh
    4. Fully connected layer dimension 100, activation: None (i.e. this layer is linear)

REPORT2: Please fill the below table in the report

Accuracy Training time (in seconds)
RNN w/o pretrained embedding Plz, fill this Plz, fill this
RNN w/ pretrained embedding Plz, fill this Plz, fill this
CNN w/o pretrained embedding Plz, fill this Plz, fill this
CNN w/ pretrained embedding Plz, fill this Plz, fill this

Training behavior (20%)

Plot the training/testing objective, training/testing accuracy over time for the 4 model combinations (correspond to 4 rows in the above table). In other word, there should be 2*4=8 graphs in total, each of which contains two curves (training and testing).

REPORT3: RNN w/o pretrained embedding

  • training/testing objective over time
  • training/testing accuracy over time

REPORT4: RNN w/ pretrained embedding

  • training/testing objective over time
  • training/testing accuracy over time

REPORT5: CNN w/o pretrained embedding

  • training/testing objective over time
  • training/testing accuracy over time

REPORT6: CNN w/ pretrained embedding

  • training/testing objective over time
  • training/testing accuracy over time

Analysis of results (20%)

REPORT7: Discuss the complete set of experimental results, comparing the algorithms to each other.

REPORT8: Discuss your observations about the various algorithms, i.e., differences in how they performed, different parameters, what worked well and didn't, patterns/trends you observed across the set of experiments, etc.

REPORT9: Try to explain why certain algorithms or approaches behaved the way they did.


The software implementation (10%)

Add detailed descriptions about software implementation & data preprocessing, including:

REPORT10: A description of what you did to preprocess the dataset to make your implementations easier or more efficient.

REPORT11: A description of major data structures (if any); any programming tools or libraries that you used;

REPORT12: Strengths and weaknesses of your design, and any problems that your system encountered;


About

Amazon Sentiment Analysis

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published