Twitter Sentiment Analysis

In this report, we present a study of sentiment analysis on Twitter data, where the task is to predict whether the smiley contained in the tweet is happy :) or sad :(. We experimented with today's most common solutions, such as text preprocessing and supervised classification techniques. We mixed-and-matched our algorithms to evaluate how it influenced the accuracy of our predictions. Our predictor currently obtains an accuracy of: 0.85

Description

See Project Description

Dependencies

In order to run the project you will need the following dependencies installed:

Libraries

Anaconda3 - Download and install Anaconda with python3
Scikit-Learn - Download scikit-learn library with conda
```
$ conda install scikit-learn
```
Pandas
```
$ conda install pandas
```

NLTK - Download all packages of NLTK

$ python
$ >>> import nltk
$ >>> nltk.download()

download all packages from the GUI

Matplotlib - Optional - Needed to see the beautiful plots on our notebook!
```
$ pip install matplotlib
```

Files

Train and Test Data

Download all files here in order to train and test the models and move them in data/twitter-datasets/ directory.
constants.py - all constants used, such as file names and label values.
data_cleaning.py - methods used for data cleaning.
data_exploration.py - methods used to explore the data, like exctracting and countring hashtags.
data_loading.py - methods used for data loading and DataFrame creation.
prediction.py - methods to classify (BoW, TD-IDF), to cross-validate and to create the submission csv.
run.py - main class, uses above functions to generate best available submission.
utils.py - log utility
Run_All_Combinations.ipynb - notebook we used to find best parameter combinations and to generate plots.
Data_Exploration.ipynb - notebook we used to explore the data to find out what cleaning methods we needed to apply.

Reproduce Our Submission

In order to produce the same submission corresponding to our crowdAI ranking, just run the following command:

$ python3 run.py

The submission can be found in the file preds/submission_clean_tweet.csv

The leaderboard can be found on crowdAI.

Our submission - username baraschi / submission id 24870 - can be found here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter Sentiment Analysis

Description

Dependencies

Libraries

Files

Reproduce Our Submission

Contributors

License: MIT

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
.gitignore		.gitignore
Data_Exploration.ipynb		Data_Exploration.ipynb
README.md		README.md
Run_All_Combinations.ipynb		Run_All_Combinations.ipynb
constants.py		constants.py
data_cleaning.py		data_cleaning.py
data_exploration.py		data_exploration.py
data_loading.py		data_loading.py
embeddings.py		embeddings.py
prediction.py		prediction.py
report.pdf		report.pdf
run.py		run.py
utils.py		utils.py

baraschi/Twitter_Sentiment_Analysis

Folders and files

Latest commit

History

Repository files navigation

Twitter Sentiment Analysis

Description

Dependencies

Libraries

Files

Reproduce Our Submission

Contributors

License: MIT

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages