# Quick-Start

This is a quick guide to how to install and start using Oscovida, for more
in-depth information go through the full guide.

## Installation

Oscovida releases are available on PyPi, so it can be pip-installed as normal:

```sh
python3 -m pip install oscovida
```

## Selecting a Region

The main way to interact with data through Oscovida is through our `Region`
object. Internally Oscovida used [COVID-19 Data
Hub](https://covid19datahub.io/index.html) as its data source, so we follow
their hierarchical approach to regions:

| Administrative Area           | Administrative area level                                                 |
|-------------------------------|---------------------------------------------------------------------------|
| `administrative_area_level_1` | Administrative area of top level.                                         |
| `administrative_area_level_2` | Administrative area of a lower level, usually states, regions or cantons. |
| `administrative_area_level_2` | Administrative are of a lower level, usually cities or municipalities.    |

`Region` has the following parameters:

```
Parameters:
    country: str
        Country name string (e.g. 'United States') or alpha_3 string,
        administrative area of top level
        (e.g. 'USA')
    admin_2: Optional[str] = None
        Second-level administrative area, usually states, regions or cantons
        (e.g. 'California')
    admin_3: Optional[str] = None
        Third-level administrative area, usually cities or municipalities
        (e.g. 'San Francisco')
    level: Optional[int] = None
        Level is automatically detected from the number of administrative
        levels passed. Optionally you can specify the level to return a
        dataframe containing the information for all administrative regions
        up to and including the specified level
        (e.g. `Region('USA', level=2)` returns all USA states)
```

You can call it with as much or as little regions specification as you'd like,
broadly speaking there are two ways to call `Region`, first is calling it with
**fully specified levels**:

In [1]:
from oscovida.regions import Region
america = Region('USA', end='2020-10-01', vintage=True)  # Entire country, vintage with end on 1st October for docs consistency
california =  Region('USA', 'California', end='2020-10-01', vintage=True)  # Whole state (sum)
sf = Region('USA', 'California', 'San Francisco')  # Single city

All of these will return a `Region` object with data of the most specified
level, they will only have rows belonging to that region.

The other way to call `Region` is with a **wildcard**, or by requesting a
**detail level greater than the one you have specified**:

In [2]:
us_states = Region('USA', '*', end='2020-10-01', vintage=True)  # Returns every single state in America
us_states = Region('USA', level=2, end='2020-10-01', vintage=True)  # Identical to the above
california_regions =  Region('USA', 'California', '*')  # All regions in California
california_regions =  Region('USA', 'California', level=3)  # Identical to above

If you do this then the `Region` object's data will contain information where
the rows contain data for different regions.

!!! summary
    Calling `Region` with **specific arguments** will return data for the region
    you specified, calling it with a **wildcard** or by setting the level above
    what you have specified will return data for **multiple regions**.

## Accessing the Data

These region objects contain the following attributes:

```
Attributes:
    data: DataFrame
        Pandas dataframe containing the data for the specified region
    cite: list[str]
        Returns a list of sources for the data
    country: str
        Country name string (e.g. 'United States')
    admin_1: str
        Country alpha_3 string, administrative area of top level (e.g. 'USA')
    admin_2: Optional[str]
        Second-level administrative area, usually states, regions or cantons
    admin_3: Optional[str]
        Third-level administrative area, usually cities or municipalities
    level: int
        Level of administrative areas specified
```

Most of these are metadata, the key attribute here is the `data` one which
contains a **cumulative daily pandas DataFrame**:

In [3]:
import pandas as pd
pd.options.display.max_columns = 5  # Don't display too many columns for the docs

In [4]:
america.data

Unnamed: 0_level_0,id,tests,...,key_apple_mobility,key_google_mobility
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2020-01-01,USA,0,...,United States,US
2020-01-02,USA,0,...,United States,US
2020-01-03,USA,0,...,United States,US
2020-01-04,USA,0,...,United States,US
2020-01-05,USA,0,...,United States,US
...,...,...,...,...,...
2020-09-27,USA,113629560,...,United States,US
2020-09-28,USA,114169222,...,United States,US
2020-09-29,USA,114474881,...,United States,US
2020-09-30,USA,114474881,...,United States,US


If we look at one of the 'broader' regions defined above:

In [5]:
us_states.data

Unnamed: 0_level_0,id,tests,...,key_alpha_2,key_numeric
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2020-01-12,121cd66e,0,...,MN,
2020-01-13,121cd66e,0,...,MN,
2020-01-14,121cd66e,0,...,MN,
2020-01-15,121cd66e,0,...,MN,
2020-01-16,121cd66e,0,...,MN,
...,...,...,...,...,...
2020-09-27,fb98cb76,882842,...,CO,
2020-09-28,fb98cb76,893381,...,CO,
2020-09-29,fb98cb76,899874,...,CO,
2020-09-30,fb98cb76,907152,...,CO,


In [6]:
us_states.data.administrative_area_level_2.unique()

array(['Minnesota', 'California', 'Florida', 'Wyoming', 'Virgin Islands',
       'South Dakota', 'Kansas', 'Nevada', 'Virginia', 'Washington',
       'Oregon', 'Wisconsin', 'New Jersey', 'Rhode Island', 'Vermont',
       'North Carolina', 'Oklahoma', 'Alabama', 'Delaware', 'Guam',
       'Missouri', 'Utah', 'Mississippi', 'Connecticut', 'Indiana',
       'Georgia', 'Texas', 'Pennsylvania', 'Massachusetts', 'Maine',
       'Tennessee', 'Michigan', 'Idaho', 'Illinois', 'Louisiana',
       'New Mexico', 'Arizona', 'Arkansas', 'Nebraska', 'West Virginia',
       'South Carolina', 'New York', 'District of Columbia', 'Kentucky',
       'Ohio', 'Alaska', 'New Hampshire', 'North Dakota',
       'American Samoa', 'Iowa', 'Northern Mariana Islands', 'Montana',
       'Hawaii', 'Maryland', 'Puerto Rico', 'Colorado'], dtype=object)

You can see that this `Region` with `admin_2=*` contains all of the states.

!!! summary
    The `data` attribute of a `Region` contains a **cumulative daily Pandas DataFrame**
    with either data for either a single area or multiple areas if you passed a
    wildcard through to the `Region` object.

## Doing Some Stats

Now that we have a Pandas DataFrame containing the data we want, we can start to
do some data analysis with it. Oscovida has a number of built-in statistical
methods to make it a bit easier to start off, and to provide examples of how
these functions should be applied.

The built-in functions currently are:

In [7]:
import oscovida.statistics as statistics
statistics?

[0;31mType:[0m        module
[0;31mString form:[0m <module 'oscovida.statistics' from '/home/roscar/work/github.com/oscovida/oscovida/src/oscovida/statistics.py'>
[0;31mFile:[0m        ~/work/github.com/oscovida/oscovida/src/oscovida/statistics.py
[0;31mDocstring:[0m  
Module containing some generic statistics functions for use with Oscovida, we
currently provide the following functions:

- `daily`: Computes the daily change for the series
- `smooth`: Smooths the pandas series with a rolling average and mean
- `doubling_time`: Compute the doubling time for a given series by shifting the
   rows by one
- `r_number`: Calculate the R-number using a method similar to RKI
- `growth_factor`: Computes the growth factor for a series
- `min_max`: Given a time series, find the min and max values in the time series
   within the last n days


These functions are meant to be used with the pandas `.pipe` operator, for
example if we want to find the smoothed daily numbers for all of America we
could do something like:

In [8]:
daily = (america.data[['confirmed', 'deaths']]
    .pipe(statistics.daily)
    .pipe(statistics.smooth)
)

daily

Unnamed: 0_level_0,confirmed,deaths
date,Unnamed: 1_level_1,Unnamed: 2_level_1
2020-01-02,0.000000,0.000000
2020-01-03,0.000000,0.000000
2020-01-04,0.000000,0.000000
2020-01-05,0.000000,0.000000
2020-01-06,0.000000,0.000000
...,...,...
2020-09-27,41457.393540,718.519058
2020-09-28,41294.623870,695.578125
2020-09-29,40965.549187,698.097428
2020-09-30,40494.882848,709.060568


Using the `.pipe` operator makes it very easy to chain together multiple
statistical functions.

!!! warning
    Depending on the operation being done, some statistics functions expect to
    be applied onto either daily data or cumulative data, make sure to check the
    docstring (e.g. `oscovida.statistics.r_number?`) to see which kind of data
    the function expects otherwise you may get incorrect results.

## Making Some Plots

Along with the statistics functions, we also provide some plotting functionality
for the `Region` object itself, for example you can do:

In [9]:
import oscovida.plots as plots
plots.set_backend('plotly')  # For nice interactive plots

In [10]:
plots.plot_daily(america)

Most statistics functions have an equivalent plotting function, so the following
plots can be created:

In [11]:
plots?

[0;31mType:[0m        module
[0;31mString form:[0m <module 'oscovida.plots' from '/home/roscar/work/github.com/oscovida/oscovida/src/oscovida/plots/__init__.py'>
[0;31mFile:[0m        ~/work/github.com/oscovida/oscovida/src/oscovida/plots/__init__.py
[0;31mDocstring:[0m  
Module containing some generic plotting functions for use with Oscovida, the
plots largely parallel the statistics functions, the following plots can be
created:

- `plot_totals`: Plots the total numbers for an oscovida `Region`, by default
  plots only the `confirmed` and `deaths` columns.
- `plot_daily`: Plots the daily numbers for an oscovida `Region`, by default
  plots only the `confirmed` and `deaths` columns.
- `plot_r_number`: Plots the daily r number for an oscovida `Region`, by default
  plots only the `confirmed` and `deaths` columns.
- `plot_growth_factor`: Plots the daily growth factor for for an oscovida
  `Region`, by default plots only the `confirmed` and `deaths` columns.
- `plot_doubling_time

## Next Steps?

This quick-start guide skips through some of the specifics, but covers all the
major points. All functions are documented so if you need more information check
the docstrings, if you have any questions feel free to contact us.
