Twitter Datasets

Download the tweet datasets from here:

The dataset should have the following files:

  • sample_submission.csv
  • train_neg.txt : a subset of negative training samples
  • train_pos.txt: a subset of positive training samples
  • test_data.txt:
  • train_neg_full.txt: the full negative training samples
  • train_pos_full.txt: the full positive training samples

Build the Co-occurence Matrix

To build a co-occurence matrix, run the following commands. (Remember to put the data files in the correct locations)

Note that the script takes a few minutes to run, and displays the number of tweets processed.

  • python3
  • python3

Template for Glove Question

Your task is to fill in the SGD updates to the template

Once you tested your system on the small set of 10% of all tweets, we suggest you run on the full datasets train_pos_full.txt, train_neg_full.txt