# COVID-19 Twitter Final Project Dashboard
> This interactive dashboard is one of the three major output components of our MACS 30122 final project. It is based on some of the analysis done by the other two parts, but also adds some extra visualization and story telling elements.

>![Coronavirus banner](https://www.dfweyes.com//files/2020/04/coronavirus.jpg)

-------------------------------------------------------------------

In [44]:
####Notice####
#1. please run 2_COVID_Tweets_word_frequency_analysis.ipynb before this notebook. 
#Since preprocessed tweets are stored with it and is called as stored value here to save running time.
#2. please use pip install voila to install voila. 
#If your Jupyter notebook is updated, you will be able to see a voila button in the tool bar.
#Click this bar and you will be directed to the interactive dashboard.


In [24]:
import pandas as pd
import covid_data_analysis as cov
import numpy as np

# ploting packages
import seaborn as sns
import wordcloud
import matplotlib.pyplot as plt
import plotly.graph_objects as go
from ipywidgets import interact
from IPython.core.display import display, HTML

%matplotlib inline

In [39]:
merged_df = pd.read_csv("data/merged_df_final.csv")
merged_df.fillna("NA", inplace = True)
covid_state_df = pd.read_csv("data/covid-data-by-state.csv")
covid_state_df = cov.clean_data(covid_state_df)

In [3]:
def get_df_within_period(df, begin, end):
    df = df[df.date >= begin]
    df = df[df.date <= end]
    return  df

### State-level COVID numbers and Tweets:


Users can identify an `US state` and a given `date` to see the State-level COVID number and tweets from the CDC on that date.

Inputs:
- quoted full name of an US state, with the first letter capitalized. `E.g., "Alaska"`
- date of standard date format: `year/month/day`, ranging from 2020/03/10 to 2021/02/21

In [43]:
def plot_cdc_tweets(merged_df, date):
    tweet = merged_df[merged_df.date == date].CDC_tweet
    num_t = 0
    t_dedup = set()
    for t in tweet:
        if t != "NA" and t not in t_dedup:
            num_t +=1
            t_dedup.add(t)
            display(HTML("<div style = 'background-color: #504e4e; padding: 30px '>" +
                         "<span style='color: red; font-size:20px;'> CDC Tweet " + str(num_t) + ":" +"</span>" +
                         "<span style='color: #fff; font-size:15px;'>" + str(t) +"</span>" +
                         "</div>"))
    if num_t == 0:
        display(HTML("<div style = 'background-color: #504e4e; padding: 30px '>" +
                     "<span style='color: #fff; font-size:15px;'> There are no tweets from CDC found on this day" + "</span>" +
                     "</div>"))

def plot_gov_tweets(merged_df, date, state):
    df_state = merged_df[merged_df.province_state == state]
    tweet = df_state[df_state.date == date].governor_tweet
    num_t = 0
    t_dedup = set()
    for t in tweet:
        if t != "NA" and t not in t_dedup:
            num_t +=1
            t_dedup.add(t)
            display(HTML("<div style = 'background-color: #504e4e; padding: 30px '>" +
                         "<span style='color: red; font-size:20px;'> Governor Tweet " + str(num_t) + ":" +"</span>" +
                         "<span style='color: #fff; font-size:15px;'>" + str(t) +"</span>" +
                         "</div>"))
    if num_t == 0:
        display(HTML("<div style = 'background-color: #504e4e; padding: 30px '>" +
                     "<span style='color: #fff; font-size:15px;'> There are no tweets from the state governor found on this day" + "</span>" +
                     "</div>"))

def plot_cov_num_tweet(merged_df, state, date):
    df_dt = merged_df[merged_df.date == date]
    df = df_dt[df_dt.province_state == state]
    confirm = df.confirmed_state.iloc[0]
    deaths = df.deaths_state.iloc[0]
    display(HTML("<div style = 'background-color: #fff; padding: 20px '>" +
                     "<span style='color: black; font-size:22px;'> Confirmed number:" + str(confirm) + "</span>" +
                     "<span style='color: red; font-size:22px;'> Deaths number:" + str(deaths) + "</span>" +
                     "</div>"))

@interact(state = "Alaska", date = "2021/02/19")
def cov_tweet_search(state, date):
    plot_cov_num_tweet(merged_df, state, date)
    plot_cdc_tweets(merged_df, date)
    plot_gov_tweets(merged_df, date, state)

interactive(children=(Text(value='Alaska', description='state'), Text(value='2021/02/19', description='date'),…

-------------

### COVID state map within customized period

Users can input `start` and `end` date to identify a period of their interest, to explore the COVID trends of top `n` states by confirmed/deaths numbers. Update of the plots may take a while.

Inputs:
- begin and end date of standard date format: `year-month-day`, within the range from 2020-03-10 to 2021-02-21

In [37]:
@interact(begin = "2020-03-10", end = "2021-02-21")
def plot_state_map(begin, end):
    df = get_df_within_period(covid_state_df, begin, end)
    cov.draw_state_heatmap(df, "confirmed_state")
    cov.draw_state_heatmap(df, "deaths_state")

interactive(children=(Text(value='2020-03-10', description='begin'), Text(value='2021-02-21', description='end…

-------------

### Top hit states within customized period

Users can input `start` and `end` date to identify a period of their interest, to explore the COVID trends of top `n` states by confirmed/deaths numbers. Update of the plots may take a while.

Inputs:
- date of standard date format: `year-month-day`, ranging from 2020-03-10 to 2021-02-21
- integer n between `1 to 10`

In [19]:
def draw_top_n_hardest_hit_state(df, var='confirmed_state', n=5):
    '''
    Draw the top n hardest hit state by death/confirmed 
    var(str): the variable to be drawn (deaths or confirmed)
    '''
    latest_date = sorted(df.date)[-1]
    top_n_state = list(df[df['date'] == latest_date]
         .sort_values(by=var,ascending=False)
         .province_state[:n])

    # create a data frame contain the time trend of the top n state
    top_state_time_trend_df = df[df.province_state.isin(top_n_state)]

    # clean the name
    if var == 'confirmed_state':
        col_name = 'Confirmed'
    else:
        col_name = 'Deaths'
    
    sns.set(rc={'figure.figsize':(20,15)})
    
    # draw the relplot
    sns.relplot(x='date', y=var, kind='line',
                data=top_state_time_trend_df,
                hue='province_state', palette="rocket")

    plt.xticks(rotation=90)

    if len(top_state_time_trend_df) > 10:
        plt.xticks(ticks=np.arange(0, len(top_state_time_trend_df)/n,
                step=len(top_state_time_trend_df)/n//10),
                 rotation=90)

    plt.xlabel('Date')
    plt.ylabel('Number')  
    plt.title(label=f"Top {n} States by {col_name} Number") 

    plt.show()

In [36]:
@interact(begin = "2020-03-10", end = "2021-02-21", n = (1, 10))
def plot_top_hitted_state(begin, end, n):
    df = get_df_within_period(covid_state_df, begin, end)
    draw_top_n_hardest_hit_state(df, "confirmed_state", n)
    draw_top_n_hardest_hit_state(df, "deaths_state", n)


interactive(children=(Text(value='2020-03-10', description='begin'), Text(value='2021-02-21', description='end…

-------------

### Word clouds based on CDC tweets of chosen season

Users can input the `season` they are interested in and get the word clouds we made based on CDC tweets. Word clouds aid visualization of text data by effectively displaying `freqently mentioned words`. Users may be able identify some interesting patterns from our work. The generation of word clouds may take 1-2 seconds. 

Inputs:
- season format: `year season`. You can choose from five seasons: "21 spring", "20 spring", "20 summer", "20 fall", "20 winter".


In [21]:
####please make share to run load_cdc_covid_tweets.ipynb before the flowing#####

%store -r tweets01
%store -r tweets02
%store -r tweets03
%store -r tweets04
%store -r tweets0121

season_dct = {"20 spring":tweets01, "20 summer": tweets02,
              "20 fall":  tweets03, "20 winter":tweets04,
              "21 spring": tweets0121}

In [34]:
def grey_color_func(word, font_size, position,orientation,random_state=None, **kwargs):
    return("hsl(360,100%%, %d%%)" % np.random.randint(40,100))

@interact(season = "21 spring")
def plot_word_cloud(season):
    tweets = season_dct[season]
    text = ' '.join(tweets['normalized_tokens'].sum())
    wc = wordcloud.WordCloud(width=1600, height=800,).generate(text)

    wc.recolor(color_func = grey_color_func)
    plt.figure(figsize=(18,12), facecolor='k')
    plt.imshow(wc)
    plt.axis("off")
    plt.tight_layout(pad=0)
    plt.show()

interactive(children=(Text(value='21 spring', description='season'), Output()), _dom_classes=('widget-interact…

--------------------

   `Thanks for watching!`
    
 We are:
- Jinfei ZHU: Data Collection and COVID Data Analysis
- Xi CHENG: CDC Tweets Word and Phrase Frequency Analysis
- Boya FU: Record Linkage
- Yile CHEN: Data Visualization and Interative Dashboard