Developed a project for performing web crawling from Twitter and then produced data analysis on the acquired data.
-Crawled Twitter data using Streaming REST API via tweepy library.
-Also fetched geo-tagged data. Performed keyword based data crawling.
-Calculated and visualized the amount of geo-tagged data from
Glasgow location amount of redundant tweets, amount retweets
and quotes.
Software Specification:
It is developed with the help of the following:-
1. Programming Language- Python Version 3.6.5
2. Database – Mongo DB Version 4.0
3. IDE- JetBrains PyCharm Community Edition 2017.2.3 x64
File Names | File Description |
StreamingAPI.py | Program for Crawling Twitter Data using Streaming API start(based of certain users and keywords) |
StreamingAPI_sample_method.py | Program for Crawling Twitter Data using Streaming API start for 1% data (via sample method) |
REST_API_Twitter.py | Program for Twitter Data Crawling via REST API using tweepy |
Geo_Tagged_Data.py | Program for Geo-tagged data for Glasgow via Streaming API and REST API |
DataAnalytics_Twitter.py | Program to gather the data statistics from Mongo DB. |
Twitter_DataAnalytics.txt | Text file having Data statistics from Mongo DB. |
Histogram.py | Program to show the visualizations of things like total tweel count, total redundant data collected, total retweets etc. |
DataAnalytics_GUI.py | A GUI application buit on tkinter to generate the analysis and visualizations. |