# Extracting data from a CSV file (local and online)

The data comes from the [John's Hopkins - Whiting School of Engineering COVID-19 GitHub repo](https://github.com/CSSEGISandData/COVID-19). 

Daily Reports are push to the repo with the most up-to-date numbers for:
- Confirmed Cases of COVID-19
- COVID-19 Deaths
- Number of Recovered Persons
- etc

We will use the most recent COVID-19 Daily Reports csv file as of this example. 

We will be importing the data using the pandas [read_csv](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) method to put the data in a pandas DataFrame.

## Example #1: CSV file saved in your project folder

In [18]:
#import our dependencies for this project
import pandas as pd
import datetime

In [22]:
#Read the CSV file from our data folder into a dataframe
covid_daily_us_local = pd.read_csv('data/05-28-2020.csv')

In [23]:
#Return first five rows of data
covid_daily_us_local.head()

Unnamed: 0,Province_State,Country_Region,Last_Update,Lat,Long_,Confirmed,Deaths,Recovered,Active,FIPS,Incident_Rate,People_Tested,People_Hospitalized,Mortality_Rate,UID,ISO3,Testing_Rate,Hospitalization_Rate
0,Alabama,US,2020-05-29 02:32:58,32.3182,-86.9023,16530,591,9355.0,6584.0,1,337.127806,200481.0,1765.0,3.575318,84000001,USA,4088.791265,10.677556
1,Alaska,US,2020-05-29 02:32:58,61.3707,-152.4044,424,10,366.0,48.0,2,57.959524,47970.0,,2.358491,84000002,USA,6557.35464,
2,American Samoa,US,2020-05-29 02:32:58,-14.271,-170.132,0,0,,0.0,60,0.0,174.0,,,16,ASM,312.719038,
3,Arizona,US,2020-05-29 02:32:58,33.7298,-111.4312,17877,860,4452.0,12565.0,4,245.606472,202914.0,2848.0,4.810651,84000004,USA,2787.771526,15.931085
4,Arkansas,US,2020-05-29 02:32:58,34.9697,-92.3731,6538,125,4583.0,1830.0,5,216.647602,118902.0,640.0,1.9119,84000005,USA,3940.017311,9.788926


## Example #2: CSV file from the web
The same file we read into pandas from our local machine can be accessed from the web by using the URL directing us to the data. For GitHub, we will need to get the RAW URL in order to pull in our data into a dataframe.

### Getting Raw link for CSV data
1. Go to the [csse_covid_19_daily_reports_us folder](https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports_us)
2. Click on the the CSV file you'd like to get data from (we will use 05-28-2020.csv)
3. Once the page loads, you will see a Raw button. Click on that and it will take you to a page containing the raw form of the CSV data.
4. Copy the url to use in the read_csv method

In [24]:
#Lets save the url in a variable called url and pass it as our parameter
url = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports_us/05-28-2020.csv'

#Read CSV from URL into a dataframe
covid_daily_us_url = pd.read_csv(url)

In [25]:
#Return first five rows of data
covid_daily_us_url.head()

Unnamed: 0,Province_State,Country_Region,Last_Update,Lat,Long_,Confirmed,Deaths,Recovered,Active,FIPS,Incident_Rate,People_Tested,People_Hospitalized,Mortality_Rate,UID,ISO3,Testing_Rate,Hospitalization_Rate
0,Alabama,US,2020-05-29 02:32:58,32.3182,-86.9023,16530,591,9355.0,6584.0,1,337.127806,200481.0,1765.0,3.575318,84000001,USA,4088.791265,10.677556
1,Alaska,US,2020-05-29 02:32:58,61.3707,-152.4044,424,10,366.0,48.0,2,57.959524,47970.0,,2.358491,84000002,USA,6557.35464,
2,American Samoa,US,2020-05-29 02:32:58,-14.271,-170.132,0,0,,0.0,60,0.0,174.0,,,16,ASM,312.719038,
3,Arizona,US,2020-05-29 02:32:58,33.7298,-111.4312,17877,860,4452.0,12565.0,4,245.606472,202914.0,2848.0,4.810651,84000004,USA,2787.771526,15.931085
4,Arkansas,US,2020-05-29 02:32:58,34.9697,-92.3731,6538,125,4583.0,1830.0,5,216.647602,118902.0,640.0,1.9119,84000005,USA,3940.017311,9.788926


One benefit of using data that is accesible online AND uses a date format is that you can get the most up-to-date version using [Python's built-in `datetime` module] (https://docs.python.org/3/library/datetime.html) and the [string `format`method](https://matthew-brett.github.io/teaching/string_formatting.html)to create a new string with inserted values (in this case, the current date you are pulling the data!).

In [26]:
#store today's date in a variable and format the date to match the format for the CSV file
todays_date = datetime.date.today().strftime('%m-%d-%Y')
print(todays_date)

05-29-2020


In [27]:
#lets now use the same URL body but replace the MM-DD-YYYY with today's date
todays_url = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports_us/{}.csv'.format(todays_date)

In [28]:
print(url)

https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports_us/05-28-2020.csv
