### 💉 COVID-19 Cases 🦠 Visualization using Bar Chart Race

### Final result of this notebook: 
Video that diaplay the most 10 countries have cases in the world over interval 01-2020 to 05-2021.

![](https://github.com/MhmdSyd/Bar_Chart_Race_Gif/blob/main/COVID_Full.gif?raw=true)

### This notebook divided into 2 main parts:

> EDA

> Visualiztion

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

import matplotlib.pyplot as plt

import seaborn as sns 

from datetime import datetime

# display video of bar chart for all data
from IPython.display import Video

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/novel-corona-virus-2019-dataset/time_series_covid_19_deaths_US.csv
/kaggle/input/novel-corona-virus-2019-dataset/time_series_covid_19_recovered.csv
/kaggle/input/novel-corona-virus-2019-dataset/time_series_covid_19_confirmed_US.csv
/kaggle/input/novel-corona-virus-2019-dataset/covid_19_data.csv
/kaggle/input/novel-corona-virus-2019-dataset/time_series_covid_19_confirmed.csv
/kaggle/input/novel-corona-virus-2019-dataset/time_series_covid_19_deaths.csv


In [2]:
import warnings
warnings.filterwarnings("ignore")

### Install bar_chart_race and ffmpeg that need in Visualization

In [3]:
! pip install bar_chart_race

Collecting bar_chart_race
  Downloading bar_chart_race-0.1.0-py3-none-any.whl (156 kB)
[K     |████████████████████████████████| 156 kB 822 kB/s eta 0:00:01
Installing collected packages: bar-chart-race
Successfully installed bar-chart-race-0.1.0


In [4]:
# ! pip install ffmpeg

In [5]:
# ! conda install -c conda-forge ffmpeg 

In [6]:
# import bar chart race package that will need to visualization.
import bar_chart_race as bcr

## EDA

In [7]:
# read dataset by pandas and diaplay 5 frist 5 rows.
covid_df = pd.read_csv("../input/novel-corona-virus-2019-dataset/covid_19_data.csv", index_col="SNo")
covid_df.head()

Unnamed: 0_level_0,ObservationDate,Province/State,Country/Region,Last Update,Confirmed,Deaths,Recovered
SNo,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
1,01/22/2020,Anhui,Mainland China,1/22/2020 17:00,1.0,0.0,0.0
2,01/22/2020,Beijing,Mainland China,1/22/2020 17:00,14.0,0.0,0.0
3,01/22/2020,Chongqing,Mainland China,1/22/2020 17:00,6.0,0.0,0.0
4,01/22/2020,Fujian,Mainland China,1/22/2020 17:00,1.0,0.0,0.0
5,01/22/2020,Gansu,Mainland China,1/22/2020 17:00,0.0,0.0,0.0


In [8]:
# from this line below sure that some of countries not have data in same interval
covid_df.ObservationDate.value_counts()

02/07/2021    765
03/23/2021    765
02/09/2021    765
03/17/2021    765
05/15/2021    765
             ... 
01/26/2020     49
01/23/2020     48
01/25/2020     46
01/24/2020     43
01/22/2020     40
Name: ObservationDate, Length: 494, dtype: int64

In [9]:
# group data by country and date then sum cases at same date for all countries.
# then create new data to store change.
data = covid_df.groupby(by=["Country/Region", "ObservationDate"]).agg({'Confirmed' : ['sum'],
                                                                       'Deaths':['sum'],
                                                                       'Recovered': ['sum']})
# need the index columns, so reset index for data .
data =data.reset_index()
# convert ObservationDate to datetime type.
data.ObservationDate = pd.to_datetime(data.ObservationDate)
# sort data based on ObservationDate columns (date).
data = data.sort_values("ObservationDate")
# rename columns of data.
data.columns = ["Country/Region", "ObservationDate", "Confirmed", "Deaths", "Recovered"]

In [10]:
# Slicing data to get confirmed columns in all days for Egypt Country.
data.loc[data["Country/Region"]=="Egypt",["ObservationDate", "Confirmed"]]

Unnamed: 0,ObservationDate,Confirmed
23003,2020-02-14,1.0
23005,2020-02-15,1.0
23007,2020-02-16,1.0
23009,2020-02-17,1.0
23011,2020-02-18,1.0
...,...,...
23205,2021-05-25,256124.0
23207,2021-05-26,257275.0
23209,2021-05-27,258407.0
23211,2021-05-28,259540.0


In [11]:
# create new DataFrame to display ObservationDate as index and Countries as Columns.
df_covid = pd.DataFrame(data.ObservationDate.unique())

# rename df_covid data column
df_covid.columns = ["ObservationDate"]

# iterate on all countries to splite every country data and merge it as a column with df_covid data.
for country in [*data["Country/Region"].unique()]:
#  get data for country and split only two columns data and cases.
    test_data = data.loc[data["Country/Region"]==country,["ObservationDate", "Confirmed"]]
# rename columns for temporiery data.
    test_data.columns = ["ObservationDate", country]
# left merge on temporiery data with df_covid data based on ObservationDate columns in two datasets.
    df_covid = df_covid.merge(test_data, how='left', left_on='ObservationDate', right_on="ObservationDate")

# replace nan values by zero.
df_covid = df_covid.fillna(0)

# set date column as index.
df_covid.set_index("ObservationDate", inplace=True)

# convert index column type to datetime column.
df_covid.index = pd.to_datetime(df_covid.index)

# there is Others column in data that not need it, so i will drop it. 
df_covid.drop("Others",axis=1, inplace=True)

# display last 5 rows from data.
df_covid.tail()

Unnamed: 0_level_0,Thailand,China,Kiribati,Taiwan,US,South Korea,Japan,Mainland China,Macau,Hong Kong,...,South Sudan,Sao Tome and Principe,Yemen,Tajikistan,Comoros,Lesotho,Solomon Islands,Marshall Islands,Vanuatu,Micronesia
ObservationDate,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2021-05-25,135439.0,0.0,2.0,5456.0,33166418.0,137682.0,726586.0,91019.0,51.0,11835.0,...,10677.0,2338.0,6670.0,13308.0,3875.0,10822.0,20.0,4.0,4.0,1.0
2021-05-26,137894.0,0.0,2.0,6091.0,33190470.0,138311.0,731071.0,91038.0,51.0,11836.0,...,10677.0,2338.0,6688.0,13308.0,3879.0,10822.0,20.0,4.0,4.0,1.0
2021-05-27,143280.0,0.0,2.0,6761.0,33217995.0,138898.0,735234.0,91045.0,51.0,11836.0,...,10688.0,2338.0,6696.0,13308.0,3879.0,10824.0,20.0,4.0,4.0,1.0
2021-05-28,147039.0,0.0,2.0,7315.0,33239963.0,139431.0,738935.0,91061.0,51.0,11836.0,...,10688.0,2344.0,6723.0,13308.0,3879.0,10825.0,20.0,4.0,4.0,1.0
2021-05-29,151842.0,0.0,2.0,7806.0,33251939.0,139910.0,742539.0,91072.0,51.0,11837.0,...,10688.0,2345.0,6731.0,13308.0,3881.0,10825.0,20.0,4.0,4.0,1.0


## Visualiztion

In [12]:
# this function calulate cumulative sum for all cases in the world for every day.
def summary(values, ranks):
    total_deaths = int(round(values.sum(), -2))
    s = f'Total Cases - {total_deaths:,.0f}'
    return {'x': .99, 'y': .05, 's': s, 'ha': 'right',
            'size': 10,'color':'#733f6e'}

In [21]:
# funcion use to create bar chart race need two parameters pandas data and file name.
def create_sub_bar_chart_race(data,file_name):
# start to create bar chart race.
    plt.style.use("seaborn")
    fig, ax = plt.subplots(figsize=(10,7), dpi=120)
    ax.set_facecolor("#f2f0f0")
    ax.set_title('COVID-19 Cases Race by Country', 
                 fontdict={'family': 'Helvetica', 'size': '20', 'color': '#148585'})
    
    _ = bcr.bar_chart_race(df=data,
            filename=file_name,
            n_bars=10, fig=fig,
            orientation='h',
            fixed_order=False,
            bar_size=.85,
            shared_fontdict={'family': 'Helvetica', 'weight': 'normal', 'color': '#213030'},
            period_label={'x': .97, 'y': .15, 'ha': 'right', 'va': 'center',
                          'color':"#b01296", "size":14, "weight":"semibold"},
            period_fmt='%b %d, %Y',
            figsize=(10,7),
            dpi=120,
            period_summary_func=summary,
#             cmap='Paired',
            bar_label_size=8,
            tick_label_size=5,
            steps_per_period=20,
            period_length=400,
            interpolate_period=True,
            filter_column_colors=True,
            bar_kwargs={'alpha': .8, "lw":0})
    plt.close()

In [22]:
# create a gif for bar chart race for sub of data.
create_sub_bar_chart_race(df_covid.iloc[50:80],"/kaggle/working/COVID_Sub.gif")

### COVID-19 sub bar_chart_race animator gif display:
![](./COVID_Sub.gif "COVID.gif")

In [15]:
# print Start Time of Processing ffmpeg video.
current_time = datetime.now().strftime("%H:%M:%S")
print("Start Time of Processing =", current_time)

Start Time of Processing = 00:28:52


In [16]:
# create a mp4 video for bar chart race for all of data.
create_sub_bar_chart_race(df_covid,"/kaggle/working/COVID_Full.mp4")

In [17]:
# print end time of processing
current_time = datetime.now().strftime("%H:%M:%S")
print("End Time of Processing =", current_time)

End Time of Processing = 00:29:22


In [18]:
Video("./COVID_Full.mp4",width=600)