This program draws a "heat map" of COVID cases based on the Covid-19 dataset. The dataset includes information about
case counts over time.
It relies on a pre-made country map JSON to make the Folium overlay, and a "better_names" csv which I wrote so that the
code wouldn't get too cluttered.

To Do:
Fix South Korea
Look into logarithmic scaling for folium
Look into making the map interactive.

In [None]:
# Import standard libraries
import pandas as pd
import pycountry as pc
import folium as fol
from urllib.request import urlopen
from json import load


# Links to Data Files on GitHub
COVID_DATA_URL = "https://raw.githubusercontent.com/WBArno/PDA_Project/master/Dat/covid_19_data.csv"
BETTER_NAMES_URL = "https://raw.githubusercontent.com/WBArno/PDA_Project/master/Dat/better_names.csv"
COUNTRIES_URL = "https://raw.githubusercontent.com/WBArno/PDA_Project/master/Dat/countries.json"


# Loads Data Files (LOCAL)
df = pd.read_csv("../Dat/covid_19_data.csv")
bn = pd.read_csv("../Dat/better_names.csv")
ct = load(open("../Dat/countries.json"))

# Loads CSV files (CoLab)
# df = pd.read_csv(COVID_DATA_URL)
# bn = pd.read_csv(BETTER_NAMES_URL)
# ct = load(urlopen(COUNTRIES_URL))

In [None]:
# Changes the poorly-named-countries into ones that PyCountry can recognize; uses better_names.csv
def sanitize_csv(original, new):
    if new == "nil":
        df["Country"] = df["Country"].str.replace(original, "", regex=True)
    else:
        df["Country"] = df["Country"].str.replace(original, new, regex=True)

I've put all of the requested project tasks in this cell, as I couldn't fit some of them into the main program.

In [None]:
# Project Tasks
pt = df # Creating a new dataframe for this section

pt["Location"] = pt["Province/State"] + ", " + pt["Country/Region"]
pt = pt.groupby(["SNo", "Location", "Confirmed", "Recovered", "Deaths"], as_index=False).agg({"Recovered":"sum"})
pt.set_index(["SNo"], inplace=True) # Sets the index to SNo, because why not
pt = pt[pt.Recovered > pt.Recovered.mean()]
print(pt.head)

pt.to_csv("test_output.csv")

In [None]:
# Run
# Prepares the table for use by dropping unneeded columns and renaming an annoying one.
df.drop(["SNo", "ObservationDate", "Recovered", "Last Update", "Deaths"], axis = 1, inplace = True)
df.rename(columns = {"Country/Region": "Country"}, inplace=True)

In [None]:
# "Sanitizes" the country names so that PyCountry will recognize them, then collapses them all together.
for row in bn.itertuples(): sanitize_csv(row[1], row[2])

# Groups by and finds the maximum value for each state (the entries are cumulative)
df = df.groupby(["Country", "Province/State"], as_index=False, dropna=False).aggregate({"Confirmed":"last"})
# Groups the table again by country, finding the sum of all of the states.
df = df.groupby(["Country"], as_index=False, dropna=False).aggregate({"Confirmed":"sum"})

# Uses PyCountry to get the three-letter acronym for each country so that Folium will recognize them.
for row in df["Country"]: df["Country"] = df["Country"].replace(row, pc.countries.search_fuzzy(row)[0].alpha_3)


I'm looking to try to tone down the outliers or increase the variance in the colors so that the majority of the map isn't one color.

In addition, I want to see if I can get count labels for each country, if possible.

In [None]:
# Creates the Folium map
outbreak_map = fol.Map(location=[0, 0], zoom_start=0)
fol.Choropleth(
    name="COVID Cases",
    geo_data=ct, # Polygonal data to draw the country map.
    data=df, # COVID case data
    columns=["Country", "Confirmed"], # Column to match with the key, count-based column.
    key_on="feature.id", # Establishes the key of the country JSON.
    fill_color="YlOrRd", # Color scheme
    fill_opacity=0.75,
    line_opacity=0.25,
    nan_fill_opacity=0,
    legend_name="Confirmed Cases",
).add_to(outbreak_map)
fol.LayerControl().add_to(outbreak_map)


# Displays the map
outbreak_map