Skip to content

codetech3/sentiment-analysis-big-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 

Repository files navigation

sentiment-analysis-big-data

This project uses PySpark to perform sentiment analyis on Tweets. Three different types of machine learning algorithms namely logistic regression, naïve bayes and decision tree are developed and compared to find the best one which can classify the tweets.

Logistic Regression emerged as the best classifier as it has the highest 10-fold cross validation accuracy of 99.26%.

The dataset is the Tweets.csv file in the dataset folder.The dataset contains tweets made by customers of US flights. There are a total of 14640 tweets and 15 features for every tweet. The data was obtained from https://www.kaggle.com/datasets/crowdflower/twitter-airline-sentiment

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published