Topic modelling and sentiment analysis on twitter data with Spark

Authors

👩‍💻 Janice Tan Yi Xuan
👩‍💻 Li Yishan
👩‍💻 Ma Xiaojing
👨‍💻 Ranjit Khare
👨‍💻 Wang Xiaoyuan

Architecture

Tweepy: Tweepy is used to connect to twitter and get the historical and streaming data. Then, this data is saved to hdfs or sent to TCP-port.

Spark SQL/MLlib: Load the messages from HDFS and process them, and saves the result for visulisation in matplotlib or Tableau.

Spark Streaming: Consumes the messages from TCP-port and process them, and sends them to Flask server.

Flask: Python web framework, which receives the data from Spark and shows dashboards.

Dashboard

Instructions

Tweepy:

run "Full archive tweet tweepy-v4.ipynb" for historical data extraction
run "Stream tweet tweepy.ipynb" for real time data extraction and sending to TCP port

Spark SQL/MLlib:

install pyspark
run "Tweet LDA.ipynb" for data processing and modelling

Visulization:

install bar_chart_race
run "AprTweetsBarChart.ipynb" to create bar chart race video

Spark Streaming

run "Tweet stream process.ipynb" to process real time data and send to Flask server

Flask:

Run the flask.rc file in this project (source flask.rc)
Run "flask run", this starts the application by default in localhost:5000

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.ipynb_checkpoints		.ipynb_checkpoints
AprTopics.csv		AprTopics.csv
Dashboard		Dashboard
FlaskDashBoard		FlaskDashBoard
development-DashBoard		development-DashBoard
geoData		geoData
jsonFiles		jsonFiles
twittercheckpt		twittercheckpt
.DS_Store		.DS_Store
AprTweetsBarChart.ipynb		AprTweetsBarChart.ipynb
Apr_tweet.mp4		Apr_tweet.mp4
Full archive tweet geolocation.ipynb		Full archive tweet geolocation.ipynb
Full archive tweet shared.ipynb		Full archive tweet shared.ipynb
Full archive tweet tweepy-v4.ipynb		Full archive tweet tweepy-v4.ipynb
Full archive tweet tweepy.ipynb		Full archive tweet tweepy.ipynb
Full archive tweet.ipynb		Full archive tweet.ipynb
README.md		README.md
SampleStream.ipynb		SampleStream.ipynb
Stream tweet tweepy-v4.ipynb		Stream tweet tweepy-v4.ipynb
Stream tweet tweepy.ipynb		Stream tweet tweepy.ipynb
Tweet LDA V2.ipynb		Tweet LDA V2.ipynb
Tweet LDA V4.ipynb		Tweet LDA V4.ipynb
Tweet LDA V5.ipynb		Tweet LDA V5.ipynb
Tweet LDA-backup.ipynb		Tweet LDA-backup.ipynb
Tweet LDA.ipynb		Tweet LDA.ipynb
Tweet stream process simple.ipynb		Tweet stream process simple.ipynb
Tweet stream process-geolocation.ipynb		Tweet stream process-geolocation.ipynb
Tweet stream process-stable.ipynb		Tweet stream process-stable.ipynb
Tweet stream process-undevelopment.ipynb		Tweet stream process-undevelopment.ipynb
Tweet stream process.ipynb		Tweet stream process.ipynb
TweetTest.ipynb		TweetTest.ipynb
academic_account_token		academic_account_token
academic_auth		academic_auth
architecture.PNG		architecture.PNG
dashboard.png		dashboard.png
helloworld.conf		helloworld.conf
helloworld2.conf		helloworld2.conf
helloworld2.conf~		helloworld2.conf~
log4j.properties		log4j.properties
plot.csv		plot.csv
setup.sh		setup.sh
setup.sh~		setup.sh~
standard_auth		standard_auth
standard_bearerToken		standard_bearerToken
twitter-flume-hdfs.conf		twitter-flume-hdfs.conf
twitter-flume-hdfs.conf~		twitter-flume-hdfs.conf~
twitter-stopwords - TA - Less.txt		twitter-stopwords - TA - Less.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Topic modelling and sentiment analysis on twitter data with Spark

Authors

Architecture

Dashboard

Instructions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Topic modelling and sentiment analysis on twitter data with Spark

Authors

Architecture

Dashboard

Instructions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages