# Analyse de la pandemie du COVID-19

La popularite grandissante de l'analyse des données a ete renforcee par la situation très malheureuse entourant COVID-19 et suscité encore plus d'intérêt pour cette science. Au cours des deux derniers mois, les gouvernements et les particuliers du monde entier tentent de collecter des données sur COVID-19 et de construire des modèles qui peuvent aider à prédire l'effet du virus sur nos vies et notre économie, et comprendre comment sauver des vies et lutter contre la crise .

Dans cet article, nous voulons donner au grand public les moyens d'examiner l'evolution de la situation grace a des visualisations interactives. La source de donnees que nous utilisons peut etre librement telechargee sur [Kaggle](kaggle.com). Elle presente la situation sur:

- Les infections
- Les morts
- Les guerrisons

Les outils d'analyse mis en oeuvre ici sont centres autour du langage python.
Nos donnees et visualisations seront publies sur le site [dstack](dstack.ai). Raison pour laquelle nous importons, avec les autres, le module python y afferent.

In [1]:
import pandas as pd
import numpy as np
import plotly.express as px
from dstack import create_frame

## Let's import data from [GITHUB](http://github.com)

In [2]:
import pandas as pd
import plotly.express as px
from dstack import create_frame

confirmed_cases = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
deaths = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv')
recoveries = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv')

confirmed_cases

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,4/24/20,4/25/20,4/26/20,4/27/20,4/28/20,4/29/20,4/30/20,5/1/20,5/2/20,5/3/20
0,,Afghanistan,33.000000,65.000000,0,0,0,0,0,0,...,1351,1463,1531,1703,1828,1939,2171,2335,2469,2704
1,,Albania,41.153300,20.168300,0,0,0,0,0,0,...,678,712,726,736,750,766,773,782,789,795
2,,Algeria,28.033900,1.659600,0,0,0,0,0,0,...,3127,3256,3382,3517,3649,3848,4006,4154,4295,4474
3,,Andorra,42.506300,1.521800,0,0,0,0,0,0,...,731,738,738,743,743,743,745,745,747,748
4,,Angola,-11.202700,17.873900,0,0,0,0,0,0,...,25,25,26,27,27,27,27,30,35,35
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
261,,Western Sahara,24.215500,-12.885800,0,0,0,0,0,0,...,6,6,6,6,6,6,6,6,6,6
262,,Sao Tome and Principe,0.186360,6.613081,0,0,0,0,0,0,...,4,4,4,4,8,8,14,16,16,16
263,,Yemen,15.552727,48.516388,0,0,0,0,0,0,...,1,1,1,1,1,6,6,7,10,10
264,,Comoros,-11.645500,43.333300,0,0,0,0,0,0,...,0,0,0,0,0,0,1,1,3,3


## Voyons l'evolution de la situation durant les trois derniers jours
L'une des raisons pour lesquelles les données des deux derniers jours peuvent être intéressantes est qu'elles permettent de calculer facilement la proportion de nouveaux cas. Le code ci-dessous selectionne les donnees pour les deux derniers jours et ajoute deux nouvelles colonnes: augmentation du nombre absolu de nouveaux cas, et augmentation du % de nouveaux cas.

In [3]:
cols = [confirmed_cases.columns[1]] + list(confirmed_cases.columns[-3:]) # this and below are a few of the ways how you can manipulate a dataframe using pandas
# country + two recent days (very simple, ignore week day, etc)
last_three_days = confirmed_cases[confirmed_cases["Province/State"].isnull()][cols].copy()
last_three_days

Unnamed: 0,Country/Region,5/1/20,5/2/20,5/3/20
0,Afghanistan,2335,2469,2704
1,Albania,782,789,795
2,Algeria,4154,4295,4474
3,Andorra,745,747,748
4,Angola,30,35,35
...,...,...,...,...
261,Western Sahara,6,6,6
262,Sao Tome and Principe,16,16,16
263,Yemen,7,10,10
264,Comoros,1,3,3


In [4]:
import numpy as np
d1 = last_three_days.columns[-1]
d3 = last_three_days.columns[-3]

last_three_days["New Cases"] = last_three_days[d1] - last_three_days[d3]
last_three_days["New Cases (%)"] = (last_three_days["New Cases"] / last_three_days[d3])*100
last_three_days

Unnamed: 0,Country/Region,5/1/20,5/2/20,5/3/20,New Cases,New Cases (%)
0,Afghanistan,2335,2469,2704,369,15.802998
1,Albania,782,789,795,13,1.662404
2,Algeria,4154,4295,4474,320,7.703418
3,Andorra,745,747,748,3,0.402685
4,Angola,30,35,35,5,16.666667
...,...,...,...,...,...,...
261,Western Sahara,6,6,6,0,0.000000
262,Sao Tome and Principe,16,16,16,0,0.000000
263,Yemen,7,10,10,3,42.857143
264,Comoros,1,3,3,2,200.000000


## Vitesse de progression du COVID-19

Nous allons calculer et publier sur [dstack](dstack.ai) la vitesse de propagation du virus pour chaque pays.

In [5]:
min_cases = 50
# create frame and set stack name
top_speed_frame = create_frame("covid_19/infection_speed")
# top countries
sort_by_cols = ["New Cases", "New Cases (%)"]
for col in sort_by_cols:
    top = last_three_days[last_three_days[last_three_days.columns[1]]>min_cases].sort_values(by=[col], ascending=False).head(50)
    # commit attachment
    top_speed_frame.commit(top, f"Countries with the fastest growing number of confirmed Covid-19 cases" + "<br>" + "\n"
                           "Les pays ayant un fort taux d'accroissement du COVID-19", {"Sort by": col})

top_speed_frame.push()

'https://dstack.ai/deebodiong/covid_19/infection_speed'