# Analyze Corona: create a Jupyter notebook with various statistics and visualisations 


###### Use dataset / code from: https://documenter.getpostman.com/view/10808728/SzS8rjbc?version=latest

#### Explore the APIs in Postman
#### Copy + change the code in Jupyter notebook
#### Create 1 cell/section for each of the questions

### Questions:
1. How many cases have there been in total?
2. What is the trend of number of cases, per country?
3. What is the % of people infected, per country (hint: also use the Worldbank API to get country information)
4. Try to come up with another 5 questions that you think are relevant. Explain why they are relevant. Answer the questions with the data.

### What is a Corona Virus?

As listed on WHO website, Coronaviruses (CoV) are a large family of viruses that cause illness ranging from the common cold to more severe diseases such as Middle East Respiratory Syndrome (MERS-CoV) and Severe Acute Respiratory Syndrome (SARS-CoV). A novel coronavirus (nCoV) is a new strain that has not been previously identified in humans.

Common signs of infection include respiratory symptoms, fever, cough, shortness of breath and breathing difficulties. In more severe cases, infection can cause pneumonia, severe acute respiratory syndrome, kidney failure and even death.

Objective:
Since we see that outbreak of Corona Virus is increasing Day by day, we can explore trends from the given data and try to predict future.

In [1]:
#Zen of Python 
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!


In [2]:
#Import Libraries
import requests as re
import pandas as pd
from datetime import datetime, timedelta 

In [3]:
#using requests to GET all data from the API in .json format
data = re.get("https://api.covid19api.com/all").json()


In [4]:
df = pd.DataFrame(data)

In [5]:
#View the first 5 rows of the data
df.head()

Unnamed: 0,Country,CountryCode,Province,City,CityCode,Lat,Lon,Confirmed,Deaths,Recovered,Active,Date
0,Afghanistan,AF,,,,33.94,67.71,0,0,0,0,2020-01-22T00:00:00Z
1,Afghanistan,AF,,,,33.94,67.71,0,0,0,0,2020-01-23T00:00:00Z
2,Afghanistan,AF,,,,33.94,67.71,0,0,0,0,2020-01-24T00:00:00Z
3,Afghanistan,AF,,,,33.94,67.71,0,0,0,0,2020-01-25T00:00:00Z
4,Afghanistan,AF,,,,33.94,67.71,0,0,0,0,2020-01-26T00:00:00Z


In [6]:
#check info on the dataframe
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 439500 entries, 0 to 439499
Data columns (total 12 columns):
Country        439500 non-null object
CountryCode    439500 non-null object
Province       439500 non-null object
City           439500 non-null object
CityCode       439500 non-null object
Lat            439500 non-null object
Lon            439500 non-null object
Confirmed      439500 non-null int64
Deaths         439500 non-null int64
Recovered      439500 non-null int64
Active         439500 non-null int64
Date           439500 non-null object
dtypes: int64(4), object(8)
memory usage: 40.2+ MB


In [7]:
#View the columns and shape (number of rows and columns) of the dataframe
df.columns, df.shape

(Index(['Country', 'CountryCode', 'Province', 'City', 'CityCode', 'Lat', 'Lon',
        'Confirmed', 'Deaths', 'Recovered', 'Active', 'Date'],
       dtype='object'), (439500, 12))

In [8]:
#Check the unique number of values in the Country column
df.Country.nunique()

186

In [9]:
#create a copy of the dataframe
df1 = df.copy() 

In [10]:
#replace Date column to the a uniform format
df1.Date = df1.Date.str.replace("T00:00:00Z", "")
df1.head() 

Unnamed: 0,Country,CountryCode,Province,City,CityCode,Lat,Lon,Confirmed,Deaths,Recovered,Active,Date
0,Afghanistan,AF,,,,33.94,67.71,0,0,0,0,2020-01-22
1,Afghanistan,AF,,,,33.94,67.71,0,0,0,0,2020-01-23
2,Afghanistan,AF,,,,33.94,67.71,0,0,0,0,2020-01-24
3,Afghanistan,AF,,,,33.94,67.71,0,0,0,0,2020-01-25
4,Afghanistan,AF,,,,33.94,67.71,0,0,0,0,2020-01-26


In [11]:
#convert Date column from string to Datetime type
df1.Date = pd.to_datetime(df1.Date, format='%Y-%m-%d')


In [12]:
#Check the datatype of the Date Column
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 439500 entries, 0 to 439499
Data columns (total 12 columns):
Country        439500 non-null object
CountryCode    439500 non-null object
Province       439500 non-null object
City           439500 non-null object
CityCode       439500 non-null object
Lat            439500 non-null object
Lon            439500 non-null object
Confirmed      439500 non-null int64
Deaths         439500 non-null int64
Recovered      439500 non-null int64
Active         439500 non-null int64
Date           439500 non-null datetime64[ns]
dtypes: datetime64[ns](1), int64(4), object(7)
memory usage: 40.2+ MB


In [13]:
#Convert the Lat and Lon type from string to float 
df1.Lat = df1.Lat.astype(float)
df1.Lon = df1.Lon.astype(float)

In [14]:
#find yesterday's date and assigning it to a variable 
yesterday = pd.datetime.now() - timedelta(days=1)
yesterday = yesterday.strftime('%Y-%m-%d')
yesterday

'2020-05-25'

### Questions

### 1. How many cases have there been in total?


In [15]:
#Create a dataframe and group the country by the max value of Confirmed

Total_df = df1.groupby(['Country', "CountryCode",], as_index=False)[['Confirmed']].max() 
Total_df

Unnamed: 0,Country,CountryCode,Confirmed
0,Afghanistan,AF,11173
1,Albania,AL,1004
2,Algeria,DZ,8503
3,Andorra,AD,763
4,Angola,AO,70
...,...,...,...
181,Viet Nam,VN,326
182,Western Sahara,EH,9
183,Yemen,YE,233
184,Zambia,ZM,920


In [16]:
#Result

print("The Total number of cases as at", yesterday, "is:", Total_df.Confirmed.sum())

The Total number of cases as at 2020-05-25 is: 5470296


### 2. What is the trend of number of cases, per country

In [None]:
#Graphical visualizations of trends in all countries

In [None]:
#Import all libraries for visualization 
import plotly
import plotly.express as px
import plotly.graph_objects as go
import plotly.offline as pyo
from plotly.offline import init_notebook_mode,plot, iplot


In [None]:
#Stop here for now


In [None]:
#Visualise the trend of cases on the various countries
#Still working on this

fig1 = px.scatter_mapbox(df1, lat="Lat", lon="Lon", hover_name="Country", hover_data=["Country", "Confirmed"],
                        color_discrete_sequence=["fuchsia"], zoom=3, height=300)
fig1.update_layout(mapbox_style="open-street-map")
fig1.update_layout(margin={"r":0,"t":0,"l":0,"b":0})

fig1.show() 



### 3. What is the % of people infected, per country (hint: also use the Worldbank API to get country information)

In [None]:
#Use Worldbank data for Population, and other country information
#Still working on this

### 4. Try to come up with another 5 questions that you think are relevant. Explain why they are relevant. Answer the questions with the data.

In [None]:
#Percentage of Death and Recovery
#Still working on this