In [1]:
from cvid_19_tlkt import *

## Clone 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE data from github repo

if you don't currenty have the access to the repo, follow the steps below.

## 1 
be sure that you are located in 'covid-19_tlkt'

In [11]:
! pwd

/Volumes/S190813/Coding/covid_19/covid-19_tlkt


## 2 
here is the current active repo. updated daily around midnight (you only need to do this once)
once you have completed steps 1 and 2, only load steps 3 and 4 in order to update data

In [13]:
! git clone https://github.com/CSSEGISandData/COVID-19.git

Cloning into 'COVID-19'...
remote: Enumerating objects: 15275, done.[K
remote: Total 15275 (delta 0), reused 0 (delta 0), pack-reused 15275[K
Receiving objects: 100% (15275/15275), 48.93 MiB | 2.18 MiB/s, done.
Resolving deltas: 100% (7398/7398), done.
Checking out files: 100% (180/180), done.


## 3
after you have cloned the repo, the toolkit directory should look like this

In [17]:
ls

[34mCOVID-19[m[m/          [34m__pycache__[m[m/       cvid_19_tlkt.py
README.md          covid-19_nb.ipynb  [34mworkbench[m[m/


## 4
#### UPDATE REPO

### 4.1
navigate to the repo

In [18]:
cd COVID-19/

/Volumes/S190813/Coding/covid_19/covid-19_tlkt/COVID-19


### 4.2
pull down any updates

In [19]:
! git pull

Already up to date.


### 4.3
navigate back to toolkit directory

In [20]:
cd ..

/Volumes/S190813/Coding/covid_19/covid-19_tlkt


## Analyzing the Data

In [2]:
import glob
# grab most current csv
all_CSV = glob.glob('../COVID-19/csse_covid_19_data/csse_covid_19_daily_reports/*.csv')
all_CSV[-1]

'../COVID-19/csse_covid_19_data/csse_covid_19_daily_reports/03-22-2020.csv'

In [3]:
current_csv = all_CSV[-1]
most_current = Covid19_data(current_csv)

In [4]:
# plotly ploty
most_current.analyze_covid_global()

In [15]:
# plotly ploty
most_current.analyze_covid_distribution('Deaths')

In [6]:
# plotly ploty
most_current.analyze_covid_treemap('Deaths')

In [7]:
most_current.analyze_country('US')

Unnamed: 0,Province/State,Country/Region,Last Update,Confirmed,Deaths,Recovered,Latitude,Longitude,world
6,New York,US,2020-03-22T22:13:32,15793,117,0,42.1657,-74.9481,world
14,Washington,US,2020-03-22T23:13:22,1996,95,0,47.4009,-121.4905,world
16,New Jersey,US,2020-03-22T19:43:03,1914,20,0,40.2989,-74.521,world
17,California,US,2020-03-22T22:13:28,1642,30,0,36.1162,-119.6816,world
29,Illinois,US,2020-03-22T22:13:32,1049,9,0,40.3495,-88.9861,world
30,Michigan,US,2020-03-22T21:13:20,1037,9,0,43.3266,-84.5361,world
35,Louisiana,US,2020-03-22T17:43:06,837,20,0,31.1695,-91.8678,world
36,Florida,US,2020-03-22T17:43:06,830,13,0,27.7663,-81.6868,world
42,Massachusetts,US,2020-03-22T22:43:03,646,5,0,42.2302,-71.5301,world
46,Texas,US,2020-03-22T22:43:03,627,6,0,31.0545,-97.5635,world


In [8]:
most_current.analyze_state('New York')

Unnamed: 0,Province/State,Country/Region,Last Update,Confirmed,Deaths,Recovered,Latitude,Longitude,world
6,New York,US,2020-03-22T22:13:32,15793,117,0,42.1657,-74.9481,world


## Analyzing TimeSeries Data



In [9]:
# these linkes will not change. The datesets are just added to and previous timeseries sets are not archived. You will also be working with the most up to date timeseries data from when you last pulled from the the repo
confirmed = '../COVID-19/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv'
death = '..//COVID-19/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Deaths.csv'
recovered = '../COVID-19/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Recovered.csv'

- in order to accurately plot the dataset's information. You will need to make sure the timeline variable is set to the accurate date. 

- the accurate date in the timeline is one date preceeding the date of the dataset being used

- for example: if you are using an updated time series dataset from '3/21/20', you will need to make sure the date in the list is ending on '3/20/20'

In [10]:
# call timeline variable (goes up to march 20th 2020)
timeline

['1/22/20',
 '1/23/20',
 '1/24/20',
 '1/25/20',
 '1/26/20',
 '1/27/20',
 '1/28/20',
 '1/29/20',
 '1/30/20',
 '1/31/20',
 '2/1/20',
 '2/2/20',
 '2/3/20',
 '2/4/20',
 '2/5/20',
 '2/6/20',
 '2/7/20',
 '2/8/20',
 '2/9/20',
 '2/10/20',
 '2/11/20',
 '2/12/20',
 '2/13/20',
 '2/14/20',
 '2/15/20',
 '2/16/20',
 '2/17/20',
 '2/18/20',
 '2/19/20',
 '2/20/20',
 '2/21/20',
 '2/22/20',
 '2/23/20',
 '2/24/20',
 '2/25/20',
 '2/26/20',
 '2/27/20',
 '2/28/20',
 '2/29/20',
 '3/1/20',
 '3/2/20',
 '3/3/20',
 '3/4/20',
 '3/5/20',
 '3/6/20',
 '3/7/20',
 '3/8/20',
 '3/9/20',
 '3/10/20',
 '3/11/20',
 '3/12/20',
 '3/13/20',
 '3/14/20',
 '3/15/20',
 '3/16/20',
 '3/17/20',
 '3/18/20',
 '3/19/20',
 '3/20/20']

In [11]:
# update timeline to with current dates

# make dataframe
dataset = pd.read_csv(confirmed)
# set the most updated date in the set to a variable
dataset_date = dataset[dataset.columns[-1]].name
# pass timeline and date variable through update function
timeline = update_timeline(timeline, dataset_date)

In [16]:
timeline

['1/22/20',
 '1/23/20',
 '1/24/20',
 '1/25/20',
 '1/26/20',
 '1/27/20',
 '1/28/20',
 '1/29/20',
 '1/30/20',
 '1/31/20',
 '2/1/20',
 '2/2/20',
 '2/3/20',
 '2/4/20',
 '2/5/20',
 '2/6/20',
 '2/7/20',
 '2/8/20',
 '2/9/20',
 '2/10/20',
 '2/11/20',
 '2/12/20',
 '2/13/20',
 '2/14/20',
 '2/15/20',
 '2/16/20',
 '2/17/20',
 '2/18/20',
 '2/19/20',
 '2/20/20',
 '2/21/20',
 '2/22/20',
 '2/23/20',
 '2/24/20',
 '2/25/20',
 '2/26/20',
 '2/27/20',
 '2/28/20',
 '2/29/20',
 '3/1/20',
 '3/2/20',
 '3/3/20',
 '3/4/20',
 '3/5/20',
 '3/6/20',
 '3/7/20',
 '3/8/20',
 '3/9/20',
 '3/10/20',
 '3/11/20',
 '3/12/20',
 '3/13/20',
 '3/14/20',
 '3/15/20',
 '3/16/20',
 '3/17/20',
 '3/18/20',
 '3/19/20',
 '3/20/20',
 '3/21/20']

## Run Analysis

In [12]:
most_current_timeline = Covid_Timeline_data(confirmed,death,recovered,timeline)

In [13]:
# plotly ploty
most_current_timeline.analyze_covid_spread()

In [14]:
# plotly ploty
most_current_timeline.analyze_covid_timelines()


Tota confirmed: 304524
Total Deaths: 12973
Total Recovered: 91499

