# **A2-insights Lab** *presents*

Welcome to the COVID-19 informative analysis crafted using python and pandas and more modern technologies. 


---



**API starter:**
>Use this notebook to connect to the COVID-19 database for access to our datasources.

>Use [these API's](https://ijvcpr9af1.execute-api.eu-west-1.amazonaws.com/api/) to access the data.

---
## **Disclaimer:**
<br>
<br>

# **Package Installation**
NB: This notebook was created using **Google Colab**


# **Imports**

In [0]:
# Data manipulation 
import pandas as pd 
import requests

#  Data Visualization
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

# Extas
import warnings

# Disabling library Warnings
warnings.filterwarnings('ignore')

# **Loading Data** 

## Build a dataframe using the following code:

```url = "INSERT API URL"``` <br>
```headers = {'x-api-key': "INSERT API KEY HERE"}```<br>
```response = requests.request("GET", url, headers=headers)```<br>
```x = response.json()```<br>
```df = pd.DataFrame(x)```

## **Call the following function to read the data To a Dataframe**


```
load_covid19(url, API_KEY)
```


<br>
<br>


In [0]:
# CONSTANTS
API_KEY = 'WVllUkRA01awNNgKxGg607vl5qIvuOAN3pW9HXmD'

# API URLs
url_CasesGlobalView = 'https://9gnht4xyvf.execute-api.eu-west-1.amazonaws.com/api/get_table/CasesGlobalView'
url_CasesLocalView  = 'https://9gnht4xyvf.execute-api.eu-west-1.amazonaws.com/api/get_table/CasesLocalView'
ulr_CounterMeasureView = 'https://9gnht4xyvf.execute-api.eu-west-1.amazonaws.com/api/get_table/CounterMeasureView'

In [0]:
def load_covid19(url, API_KEY = 'WVllUkRA01awNNgKxGg607vl5qIvuOAN3pW9HXmD' ):
  """load_covid19(url)

  DESCRIPTION:
        The function reads in an API URL for COVID-19 Data and returns a pandas DataFrame.

  PARAMETERS:
        url (string): API URL.

  RETURNS:
      (DataFrame): Data in a form of a pandas dataframe.
  
  """

  headers = {
    'x-api-key': API_KEY
    }
  response = requests.request("GET", url, headers=headers)
  x = response.json()

  return pd.DataFrame(x)

In [0]:
# Reading Data to DataFrames
global_view = load_covid19(url_CasesGlobalView)
local_view = load_covid19(url_CasesLocalView)
counterMeasure_view = load_covid19(ulr_CounterMeasureView)


# **Quick Descriptive Analysis**





## **CasesGlobalView**

In [0]:
# view 5 samples of the data
global_view.sample(5)

Unnamed: 0,date,country,lat,long,confirmed,deaths,recovered,active,confirmed_daily,deaths_daily,recovered_daily,daily_change_in_active_cases,active_dailiy_growth_rate,active_rolling_3_day_growth_rate
5024,2020-03-25,Norway,60.472,8.4689,3084,14,6,3064,221.0,2.0,0.0,219.0,0.076977,0.088311
3911,2020-03-05,Lebanon,33.8547,35.8623,16,0,1,15,3.0,0.0,0.0,3.0,0.25,0.048856
3788,2020-03-22,Kuwait,29.5,47.75,188,0,27,161,12.0,0.0,0.0,12.0,0.080537,0.073893
2733,2020-04-05,Greece,39.0742,21.8243,1735,73,78,1584,62.0,5.0,0.0,57.0,0.037328,0.034681
2955,2020-03-29,Honduras,15.2,-86.2419,110,3,3,104,15.0,2.0,0.0,13.0,0.142857,0.268102


In [0]:
# understanding global view columns
global_cols = global_view.columns.to_list()
global_cols

['date',
 'country',
 'lat',
 'long',
 'confirmed',
 'deaths',
 'recovered',
 'active',
 'confirmed_daily',
 'deaths_daily',
 'recovered_daily',
 'daily_change_in_active_cases',
 'active_dailiy_growth_rate',
 'active_rolling_3_day_growth_rate']

In [0]:
# data types
global_view.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7423 entries, 0 to 7422
Data columns (total 14 columns):
 #   Column                            Non-Null Count  Dtype  
---  ------                            --------------  -----  
 0   date                              7423 non-null   object 
 1   country                           7423 non-null   object 
 2   lat                               7423 non-null   float64
 3   long                              7423 non-null   float64
 4   confirmed                         7423 non-null   int64  
 5   deaths                            7423 non-null   int64  
 6   recovered                         7423 non-null   int64  
 7   active                            7423 non-null   int64  
 8   confirmed_daily                   7238 non-null   float64
 9   deaths_daily                      7238 non-null   float64
 10  recovered_daily                   7238 non-null   float64
 11  daily_change_in_active_cases      7238 non-null   float64
 12  active

In [0]:
# get count for null values
global_view.isnull().sum()

date                                  0
country                               0
lat                                   0
long                                  0
confirmed                             0
deaths                                0
recovered                             0
active                                0
confirmed_daily                     185
deaths_daily                        185
recovered_daily                     185
daily_change_in_active_cases        185
active_dailiy_growth_rate           385
active_rolling_3_day_growth_rate    755
dtype: int64

**Notes 2** Missing Data:
> We need to understand the Nature of he missing values before we can drop, impute the missing values. 

> NB: Imputaion / droping records with missing values may yeild misleading results and may affect the interpretation of the pandemic outbreak. 

The following columns have more than **755** missing values out of **7423** observations :
* confirmed_daily                    
* deaths_daily                   
* recovered_daily                 
* daily_change_in_active_cases     
* active_dailiy_growth_rate       
* active_rolling_3_day_growth_rate

In [0]:
# Eliminate the Logitude and latitude the view the summary statistics
pd.set_option('precision', 3)
global_view.drop(['lat','long'], axis=1).describe()

Unnamed: 0,confirmed,deaths,recovered,active,confirmed_daily,deaths_daily,recovered_daily,daily_change_in_active_cases,active_dailiy_growth_rate,active_rolling_3_day_growth_rate
count,7423.0,7423.0,7423.0,7423.0,7238.0,7238.0,7238.0,7238.0,7038.0,6668.0
mean,3678.275,187.512,900.21,2590.554,255.004,15.76,58.261,180.983,0.208,0.159
std,21323.292,1260.387,6221.328,17258.912,1613.128,103.044,361.341,1391.274,0.761,0.249
min,1.0,0.0,0.0,0.0,-15.0,-31.0,-268.0,-4919.0,-1.0,-1.0
25%,7.0,0.0,0.0,6.0,0.0,0.0,0.0,0.0,0.0,0.019
50%,48.0,0.0,2.0,41.0,5.0,0.0,0.0,3.0,0.06,0.101
75%,482.0,7.0,27.0,411.5,46.0,1.0,2.0,32.0,0.196,0.229
max,555313.0,22020.0,77956.0,500305.0,35098.0,2108.0,10219.0,29957.0,37.0,2.733


**Notes 3**
> Understanding the **mean**, **std** and **MAx** for the following features/ Variables:
- **confirmed**
    - mean= 3678.275 , std= 21323.291 **and** max= 555313.000
- **deaths** 
    - Mean= 187.512 **and** std= 1260.387 **and** max= 22020.000	
- **recovered**
    - Mean= 900.210 **and** std= 6221.328 **and** max= 77956.000

> **Ideal Scenario**
* The **std** and **mean** for **confirmed** and **deaths** are expected  to have a very high variation, this will mean that the virus **preading-rate** and **death-rate** are droping(flaterning the  curve).
* The **std** and **mean** for **Recovered** are expected to have a less to no variation and this will mean that more people are recovering at a short period of time.

[health statistics]('')

## **CasesLocalView**

In [0]:
# view 5 samples of the data
local_view.sample(5)

Unnamed: 0,id,country_id,location,location_level,date,confirmed
338,10,153,UNKNOWN,Provincial,2020-04-01,90
105,3,153,GP,Provincial,2020-03-30,618
241,7,153,NC,Provincial,2020-03-26,2
67,2,153,FS,Provincial,2020-03-22,9
323,9,153,WC,Provincial,2020-04-09,515


In [0]:
# data types
local_view.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 370 entries, 0 to 369
Data columns (total 6 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   id              370 non-null    int64 
 1   country_id      370 non-null    int64 
 2   location        370 non-null    object
 3   location_level  370 non-null    object
 4   date            370 non-null    object
 5   confirmed       370 non-null    int64 
dtypes: int64(3), object(3)
memory usage: 17.5+ KB


In [0]:
# get count for Missing Data
local_view.isnull().sum()

id                0
country_id        0
location          0
location_level    0
date              0
confirmed         0
dtype: int64

In [0]:
local_view.location.value_counts()

MP         37
WC         37
FS         37
UNKNOWN    37
NW         37
KZN        37
EC         37
NC         37
GP         37
LP         37
Name: location, dtype: int64

## **CounterMeasureView**

# **Data Cleanig**


*   Set the DATATYPES to reduce memory usage
*   Create pipeline To clean the Data
*   Pass all the data Stream through the pipeline



# **Visualizations**

# **Credits & Sources**

The 
---