-
Notifications
You must be signed in to change notification settings - Fork 0
Project Proposal
Vidyullatha Lakshmi Kaza - 8
Aparna Manda – 11
Lohitha Yenugu – 19
The initial phase of the project is focused on the collection of “Twitter” data. More than 100,000 tweets have been collected. The hashtags used in these tweets have been filtered through extraction using appropriate code. Along with the extraction, word count is performed identifying the number of times each of the hashtags are used within the tweets. This creates a foundation for the data analysis to be done from the information collected and filtered.
Team members had Windows based machines to work. Apache Hadoop was primarily used for data extraction and filtering of hashtags. Along with this, Map Reduce was also used when performing word count operation of the data gathered.
Python, Java
• Number of tweets used (extracted) - 100,000
• Keywords used - COVID-19, COVID19, covid-19, covid19, corona, CORONA
• Columns extracted - date, user, is_retweet, is_quoted, text, quoted_text