sentiment-analysis-big-data

This project uses PySpark to perform sentiment analyis on Tweets. Three different types of machine learning algorithms namely logistic regression, naïve bayes and decision tree are developed and compared to find the best one which can classify the tweets.

Logistic Regression emerged as the best classifier as it has the highest 10-fold cross validation accuracy of 99.26%.

The dataset is the Tweets.csv file in the dataset folder.The dataset contains tweets made by customers of US flights. There are a total of 14640 tweets and 15 features for every tweet. The data was obtained from https://www.kaggle.com/datasets/crowdflower/twitter-airline-sentiment

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
code		code
dataset		dataset
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

sentiment-analysis-big-data

About

Uh oh!

Releases

Packages

Languages

codetech3/sentiment-analysis-big-data

Folders and files

Latest commit

History

Repository files navigation

sentiment-analysis-big-data

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages