Project Proposal

A system to store, analyze, and visualize Twitter’s tweets on COVID-19

TEAM-4

Vidyullatha Lakshmi Kaza - 8

Aparna Manda – 11

Lohitha Yenugu – 19

Overview:

The initial phase of the project is focused on the collection of “Twitter” data. More than 100,000 tweets have been collected. The hashtags used in these tweets have been filtered through extraction using appropriate code. Along with the extraction, word count is performed identifying the number of times each of the hashtags are used within the tweets. This creates a foundation for the data analysis to be done from the information collected and filtered.

Tools Used:

Team members had Windows based machines to work. Apache Hadoop was primarily used for data extraction and filtering of hashtags. Along with this, Map Reduce was also used when performing word count operation of the data gathered.

Language Used:

Python, Java

Key Components:

• Number of tweets used (extracted) - 100,000

• Keywords used - COVID-19, COVID19, covid-19, covid19, corona, CORONA

• Columns extracted - date, user, is_retweet, is_quoted, text, quoted_text

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Project Proposal

A system to store, analyze, and visualize Twitter’s tweets on COVID-19

TEAM-4

Overview:

Tools Used:

Language Used:

Key Components:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally