# Basic Usage Demo
This notebook goes over how to load data using the `covid19pandas` package. We will cover the many loading options available.

First, import the package.

In [1]:
import covid19pandas as cod

## Data source options
Our package provides access to data from both [Johns Hopkins University](https://github.com/CSSEGISandData/COVID-19) and [The New York Times](https://github.com/nytimes/covid-19-data). Here are some details about both datasets:
#### Johns Hopkins
- Function for access: `get_data_jhu`
- Provides both global and US data
- US data is broken down into states and counties
- For global data, provides confirmed cases, deaths, and recovered counts
- For US data, provides only confirmed cases and deaths counts.

#### The New York Times
- Function for access: `get_data_nyt`
- Provides only US data
- Has both state and county-level data
- Provides confirmed cases and deaths counts.

We provide both data sources for comparison, as they aggregate data from different sources.

Note: Johns Hopkins [has stated](https://github.com/CSSEGISandData/COVID-19/issues/1250#issuecomment-606354840) that the reason they do not provide recovered counts for their US-specific data is that they cannot find a reliable source for recovered counts at the county level.

In [2]:
cod.get_data_jhu()

These data were obtained from Johns Hopkins University (https://github.com/CSSEGISandData/COVID-19).


Unnamed: 0,date,Province/State,Country/Region,Lat,Long,cases,deaths,recovered
0,2020-01-22,Anhui,China,31.825700,117.226400,1.0,0.0,0.0
1,2020-01-22,Beijing,China,40.182400,116.414200,14.0,0.0,0.0
2,2020-01-22,Chongqing,China,30.057200,107.874000,6.0,0.0,0.0
3,2020-01-22,Fujian,China,26.078900,117.987400,1.0,0.0,0.0
4,2020-01-22,Guangdong,China,23.341700,113.424400,26.0,0.0,0.0
5,2020-01-22,Guangxi,China,23.829800,108.788100,2.0,0.0,0.0
6,2020-01-22,Guizhou,China,26.815400,106.874800,1.0,0.0,0.0
7,2020-01-22,Hainan,China,19.195900,109.745300,4.0,0.0,0.0
8,2020-01-22,Hebei,China,39.549000,116.130600,1.0,0.0,0.0
9,2020-01-22,Henan,China,33.882000,113.614000,5.0,0.0,0.0


In [3]:
cod.get_data_nyt()

These data were obtained from The New York Times (https://github.com/nytimes/covid-19-data).


Unnamed: 0,date,state,fips,cases,deaths
0,2020-01-21,Washington,53,1,0
1,2020-01-22,Washington,53,1,0
2,2020-01-23,Washington,53,1,0
3,2020-01-24,Illinois,17,1,0
4,2020-01-24,Washington,53,1,0
5,2020-01-25,California,6,1,0
6,2020-01-25,Illinois,17,1,0
7,2020-01-25,Washington,53,1,0
8,2020-01-26,Arizona,4,1,0
9,2020-01-26,California,6,2,0


## Data type options
By default, our package will return counts cases, deaths, and recovered (the last if it's available). We call these "data types". You may want a table that only has one data type. To get a table with just one data type, pass that data type to the `data_type` parameter.

For example, to get just the counts of confirmed cases:

In [4]:
cod.get_data_jhu(data_type="cases")

These data were obtained from Johns Hopkins University (https://github.com/CSSEGISandData/COVID-19).


Unnamed: 0,date,Province/State,Country/Region,Lat,Long,cases
0,2020-01-22,Anhui,China,31.825700,117.226400,1
1,2020-01-22,Beijing,China,40.182400,116.414200,14
2,2020-01-22,Chongqing,China,30.057200,107.874000,6
3,2020-01-22,Fujian,China,26.078900,117.987400,1
4,2020-01-22,Guangdong,China,23.341700,113.424400,26
5,2020-01-22,Guangxi,China,23.829800,108.788100,2
6,2020-01-22,Guizhou,China,26.815400,106.874800,1
7,2020-01-22,Hainan,China,19.195900,109.745300,4
8,2020-01-22,Hebei,China,39.549000,116.130600,1
9,2020-01-22,Henan,China,33.882000,113.614000,5


Pass "deaths" to get just counts of deaths, and "recovered" to get just counts of recovered cases. If you ask for an unavailable data type (e.g. passing "recovered" to the `get_data_nyt` function), the package will notify you and throw an exception.

## Dataframe format options
Our package provides the option of returning data tables in either the "wide" or "long" format. A "wide" format table has a separate column for every variable, whereas a "long" format table has one column for all variables. (See [this explanation](https://en.wikipedia.org/wiki/Wide_and_narrow_data) from Wikipedia for a more in-depth discussion of the differences.)

Many plotting tools prefer data in the "long" format. However, the "wide" format can also be useful, and sometimes makes data easier to think about. So, we provide both options. The actual data is the same either way; we just convert between the different formats for you.

Our package defaults to returning tables in the "long" format. To get a table in the "wide" format instead, pass`"wide"` to the `format` parameter in the getter function. 

Note: With a "wide" format table, you can only have one data type (cases, deaths, or recovered) in the table. So, you must pass an argument other that the default `"all"` to the `data_type` parameter. If you want multiple data types in the "wide" format, you need a separate table for each data type.