# Geospatial Analysis in Python - Covid 19 Case Study

Personal Data Analytics Project <br>
Author: [Diardano Raihan](https://www.linkedin.com/in/diardanoraihan)
<hr>

## Table of Content
* [Introduction: Business Problem](#Introduction)
* [Data](#Data)
* [Methodology: Analytic Approach](#methodology)
* [Mehtodology: Exploratory Data Analysis](#analysis)
* [Mehtodology: Cluster the Neighborhoods](#cluster)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

# Data
<hr>
    
## Data Requirement and Collection

Data Source: https://raw.githubusercontent.com/datasets/covid-19/master/data/countries-aggregated.csv. Do not download the data and save it on your local since it keeps on updating to date.

## Import the Libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

import plotly
import plotly.express as px
import plotly.graph_objs as go
from plotly import tools
from plotly.offline import init_notebook_mode, plot, iplot

%config IPCompleter.greedy=True
%config IPCompleter.use_jedi=False

## Read the Data

In [2]:
df = pd.read_csv("https://raw.githubusercontent.com/datasets/covid-19/master/data/countries-aggregated.csv")
df.head()

Unnamed: 0,Date,Country,Confirmed,Recovered,Deaths
0,2020-01-22,Afghanistan,0,0,0
1,2020-01-23,Afghanistan,0,0,0
2,2020-01-24,Afghanistan,0,0,0
3,2020-01-25,Afghanistan,0,0,0
4,2020-01-26,Afghanistan,0,0,0


In [3]:
df[df.Country == 'Indonesia'].tail()

Unnamed: 0,Date,Country,Confirmed,Recovered,Deaths
66907,2022-04-12,Indonesia,6035358,0,155717
66908,2022-04-13,Indonesia,6036909,0,155746
66909,2022-04-14,Indonesia,6037742,0,155794
66910,2022-04-15,Indonesia,6038664,0,155820
66911,2022-04-16,Indonesia,6039266,0,155844


Notice that the data is still updating with the latest recorded cases is on 16 April 2022.

- Check the number of data along with its corresponding data type

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 161568 entries, 0 to 161567
Data columns (total 5 columns):
 #   Column     Non-Null Count   Dtype 
---  ------     --------------   ----- 
 0   Date       161568 non-null  object
 1   Country    161568 non-null  object
 2   Confirmed  161568 non-null  int64 
 3   Recovered  161568 non-null  int64 
 4   Deaths     161568 non-null  int64 
dtypes: int64(3), object(2)
memory usage: 6.2+ MB


Notice that our Date field is still not in the correct data type. Hence, we need to convert it to the correct one, that is to datetime.

In [5]:
df['Date'] = pd.to_datetime(df['Date'])
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 161568 entries, 0 to 161567
Data columns (total 5 columns):
 #   Column     Non-Null Count   Dtype         
---  ------     --------------   -----         
 0   Date       161568 non-null  datetime64[ns]
 1   Country    161568 non-null  object        
 2   Confirmed  161568 non-null  int64         
 3   Recovered  161568 non-null  int64         
 4   Deaths     161568 non-null  int64         
dtypes: datetime64[ns](1), int64(3), object(1)
memory usage: 6.2+ MB


- Check the shape of the data

In [6]:
print('The data has {} number of rows and {} columns'.format(df.shape[0], df.shape[1]))

The data has 161568 number of rows and 5 columns


## Missing Value Observations

In [7]:
df.isnull().sum()

Date         0
Country      0
Confirmed    0
Recovered    0
Deaths       0
dtype: int64

Fortunately, we don't find any missing value in our data, so we can continue to the next step!

# Spatial Analysis on Covid-19

## Choropleth Maps

### Definition

Choropleth maps show interval data as colors. They are shaded in using one color, where the __darker shades__ represent high numbers and the __lighter color__ represent low numbers. A choropleth map needs a key to explain what the different shades mean. 

__How do you read a Choropleth Map?__
1. Read the instructions and color legend/key to understand what the shading means. 
2. Look for the regions with the largest value shades.
3. Look for the lighter colors to see the low values. 
4. Look out for any significant regional patterns.

### Confirmed Covid-19 Cases

In [8]:
import plotly.express as px

In [None]:
px.s

### Particular Continent

## Geographical Scatter Plot

### Definition

### Confirmed Covid-19 Cases

## Plotting of Recovery

## Plotting of Deaths