Skip to content

Predict if a tweet message used to contain a positive :) or negative :( smiley, by considering only the remaining text.

Notifications You must be signed in to change notification settings

baraschi/Twitter_Sentiment_Analysis

Repository files navigation

Twitter Sentiment Analysis

In this report, we present a study of sentiment analysis on Twitter data, where the task is to predict whether the smiley contained in the tweet is happy :) or sad :(. We experimented with today's most common solutions, such as text preprocessing and supervised classification techniques. We mixed-and-matched our algorithms to evaluate how it influenced the accuracy of our predictions. Our predictor currently obtains an accuracy of: 0.85

Description

See Project Description

Dependencies

In order to run the project you will need the following dependencies installed:

Libraries

  • Anaconda3 - Download and install Anaconda with python3

  • Scikit-Learn - Download scikit-learn library with conda

    $ conda install scikit-learn
  • Pandas

    $ conda install pandas
  • NLTK - Download all packages of NLTK

    $ python
    $ >>> import nltk
    $ >>> nltk.download()

download all packages from the GUI

  • Matplotlib - Optional - Needed to see the beautiful plots on our notebook!
    $ pip install matplotlib

Files

  • Train and Test Data

    Download all files here in order to train and test the models and move them in data/twitter-datasets/ directory.

  • constants.py - all constants used, such as file names and label values.

  • data_cleaning.py - methods used for data cleaning.

  • data_exploration.py - methods used to explore the data, like exctracting and countring hashtags.

  • data_loading.py - methods used for data loading and DataFrame creation.

  • prediction.py - methods to classify (BoW, TD-IDF), to cross-validate and to create the submission csv.

  • run.py - main class, uses above functions to generate best available submission.

  • utils.py - log utility

  • Run_All_Combinations.ipynb - notebook we used to find best parameter combinations and to generate plots.

  • Data_Exploration.ipynb - notebook we used to explore the data to find out what cleaning methods we needed to apply.

Reproduce Our Submission

In order to produce the same submission corresponding to our crowdAI ranking, just run the following command:

$ python3 run.py

The submission can be found in the file preds/submission_clean_tweet.csv

The leaderboard can be found on crowdAI.

Our submission - username baraschi / submission id 24870 - can be found here.

Contributors

License: MIT

About

Predict if a tweet message used to contain a positive :) or negative :( smiley, by considering only the remaining text.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published