<a href="https://colab.research.google.com/github/SriSatyaLokesh/COVID19-DataAnalysis/blob/master/COVID19_Worldwide_Analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## COVID19 Exploratory Data Analysis
### World wide

#### follwing cells is for performing data analysis in google colab

In [4]:
# upload your kaggle API token (you can get that from your account) 
from google.colab import files
uploaded = files.upload()

Saving kaggle.json to kaggle.json


In [0]:
# Run this to create a kaggle environment
!pip install -q kaggle
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 /root/.kaggle/kaggle.json

import numpy as np
import pandas as pd


**```Let's perform exploratory data analysis on covid-19 data ```**
- I'm using data from kaggle and github
- Global covid-19 data https://www.kaggle.com/imdevskp/corona-virus-report/
- India covid-19 data https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset/
- Time series covid-19 data https://github.com/CSSEGISandData/COVID-19.git

In [6]:
# Get data from kaggle 
import zipfile
# Download data
!kaggle datasets download -d imdevskp/corona-virus-report/
!kaggle datasets download -d sudalairajkumar/novel-corona-virus-2019-dataset/

# UnZip data
zip_ref = zipfile.ZipFile("corona-virus-report.zip", 'r')
zip_ref.extractall()
zip_ref = zipfile.ZipFile("novel-corona-virus-2019-dataset.zip", 'r')
zip_ref.extractall()
zip_ref.close()

Downloading corona-virus-report.zip to /content
  0% 0.00/6.90M [00:00<?, ?B/s] 72% 5.00M/6.90M [00:00<00:00, 16.9MB/s]
100% 6.90M/6.90M [00:00<00:00, 19.8MB/s]
Downloading novel-corona-virus-2019-dataset.zip to /content
  0% 0.00/706k [00:00<?, ?B/s]
100% 706k/706k [00:00<00:00, 45.7MB/s]


In [7]:
# Get data from github 

# Download data
!git clone https://github.com/CSSEGISandData/COVID-19.git

Cloning into 'COVID-19'...
remote: Enumerating objects: 7, done.[K
remote: Counting objects: 100% (7/7), done.[K
remote: Compressing objects: 100% (7/7), done.[K
remote: Total 18731 (delta 0), reused 2 (delta 0), pack-reused 18724[K
Receiving objects: 100% (18731/18731), 75.03 MiB | 11.46 MiB/s, done.
Resolving deltas: 100% (9607/9607), done.


In [8]:
!ls

corona-virus-report.zip		     sample_data
COVID-19			     time_series_covid_19_confirmed.csv
covid_19_clean_complete.csv	     time_series_covid_19_confirmed_US.csv
covid_19_data.csv		     time_series_covid_19_deaths.csv
COVID19_line_list_data.csv	     time_series_covid_19_deaths_US.csv
COVID19_open_line_list.csv	     time_series_covid_19_recovered.csv
kaggle.json			     usa_county_wise.csv
novel-corona-virus-2019-dataset.zip


In [0]:
# load the data
data = pd.read_csv("covid_19_clean_complete.csv")

In [10]:
print(data.columns)
data.tail(5)

Index(['Province/State', 'Country/Region', 'Lat', 'Long', 'Date', 'Confirmed',
       'Deaths', 'Recovered'],
      dtype='object')


Unnamed: 0,Province/State,Country/Region,Lat,Long,Date,Confirmed,Deaths,Recovered
20092,Falkland Islands (Malvinas),United Kingdom,-51.7963,-59.5236,4/7/20,2,0,0
20093,Saint Pierre and Miquelon,France,46.8852,-56.3159,4/7/20,1,0,0
20094,,South Sudan,6.877,31.307,4/7/20,2,0,0
20095,,Western Sahara,24.2155,-12.8858,4/7/20,4,0,0
20096,,Sao Tome and Principe,0.18636,6.613081,4/7/20,4,0,0


In [11]:
data[data["Country/Region"]=="India"]["Province/State"]

131      NaN
392      NaN
653      NaN
914      NaN
1175     NaN
        ... 
18923    NaN
19184    NaN
19445    NaN
19706    NaN
19967    NaN
Name: Province/State, Length: 77, dtype: object

#### Even India doesn't have state specification so we should fill those values

In [0]:
# Replacing all the NaN values with Country/Region
data["Province/State"].fillna(data["Country/Region"], inplace=True)

In [14]:
data.tail(5)

Unnamed: 0,Province/State,Country/Region,Lat,Long,Date,Confirmed,Deaths,Recovered
20092,Falkland Islands (Malvinas),United Kingdom,-51.7963,-59.5236,4/7/20,2,0,0
20093,Saint Pierre and Miquelon,France,46.8852,-56.3159,4/7/20,1,0,0
20094,South Sudan,South Sudan,6.877,31.307,4/7/20,2,0,0
20095,Western Sahara,Western Sahara,24.2155,-12.8858,4/7/20,4,0,0
20096,Sao Tome and Principe,Sao Tome and Principe,0.18636,6.613081,4/7/20,4,0,0


In [17]:
data[data["Country/Region"]=="India"]["Province/State"]

131      India
392      India
653      India
914      India
1175     India
         ...  
18923    India
19184    India
19445    India
19706    India
19967    India
Name: Province/State, Length: 77, dtype: object

###### We have filled all NaN values, we are ready to perform analysis

In [18]:
data["Date"].tail(5)

20092    4/7/20
20093    4/7/20
20094    4/7/20
20095    4/7/20
20096    4/7/20
Name: Date, dtype: object

In [19]:
#we need to form date with that specific format
from datetime import datetime as dt,date,timedelta
today = dt.now()-timedelta(days=2)
today = dt.strftime(today,"%-m/%-d/%y")
print(today)

4/7/20
