This is a forked repo from https://github.com/mikeizbicki/twitter_coronavirus/tree/51e77e53a10aa2933e29378883051464357095f9 and was an assignment for a Big Data class at Claremont Mckenna, completed on Mar 6, 2022.
I completed this assignment through modifying and running ./src/map.py
on a large data set containing tweets seperated by day from the covid pandemic period. ./src/map.py
is a pythong file that collects tweets from the given data set that utilized hashtags relating to the corona virus in order to seperate them out by language and country. The loop_map.sh
script was used to call our python file on every seperate day over the given time period. Next, I redcued the language and country files for each hastag down to their own respective files using the src/reduce.py
file. Lastly, I ran the loop_visualize.sh
script which utilized the src/visualize.py
file in order to store our outputs in our viz
directory by respective covid-related hashtags, each containing the total mentions by language and country.