vignettes/python.Rmd

---
title: "Python"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Python}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

[![](https://img.shields.io/pypi/v/covid19dh.svg?color=brightgreen)](https://pypi.org/pypi/covid19dh/) [![](https://img.shields.io/pypi/dm/covid19dh.svg?color=blue)](https://pypi.org/pypi/covid19dh/) [![](https://img.shields.io/github/stars/covid19datahub/python?style=social)](https://github.com/covid19datahub/python)

## Setup and usage

Install from [pip](https://pypi.org/project/covid19dh/) with

```py
pip install covid19dh
```

Importing the main function `covid19()`   

```py
from covid19dh import covid19
x, src = covid19() 
```

Package is regularly updated. Update with

```bash
pip install --upgrade covid19dh
```

## Return values

The function `covid19()` returns 2 pandas dataframes:
* the data and
* references to the data sources.

## Parametrization

### Country

List of country names (case-insensitive) or ISO codes (alpha-2, alpha-3 or numeric). The list of ISO codes can be found [here](https://github.com/covid19datahub/COVID19/blob/master/inst/extdata/src.csv).

Fetching data from a particular country:

```py
x, src = covid19("USA") # Unites States
```

Specify multiple countries at the same time:

```py
x, src = covid19(["ESP","PT","andorra",250])
```

If `country` is omitted, the whole dataset is returned:

```py
x, src = covid19()
```

### Raw data

Logical. Skip data cleaning? Default `True`. If `raw=False`, the raw data are cleaned by filling missing dates with `NaN` values. This ensures that all locations share the same grid of dates and no single day is skipped. Then, `NaN` values are replaced with the previous non-`NaN` value or `0`.  

```py
x, src = covid19(raw = False)
```

### Date filter

Date can be specified with `datetime.datetime`, `datetime.date` or as a `str` in format `YYYY-mm-dd`.

```py
from datetime import datetime
x, src = covid19("SWE", start = datetime(2020,4,1), end = "2020-05-01")
```

### Level

Integer. Granularity level of the data:

1. Country level
2. State, region or canton level
3. City or municipality level

```py
from datetime import date
x, src = covid19("USA", level = 2, start = date(2020,5,1))
```

### Cache

Logical. Memory caching? Significantly improves performance on successive calls. By default, using the cached data is enabled.

Caching can be disabled (e.g. for long running programs) by:

```py
x, src = covid19("FRA", cache = False)
```

### Vintage

Logical. Retrieve the snapshot of the dataset that was generated at the `end` date instead of using the latest version. Default `False`.

To fetch e.g. US data that were accessible on *22th April 2020* type

```py
x, src = covid19("US", end = "2020-04-22", vintage = True)
```

The vintage data are collected at the end of the day, but published with approximately 48 hour delay, once the day is completed in all the timezones.

Hence if `vintage = True`, but `end` is not set, warning is raised and `None` is returned.

```py
x, src = covid19("USA", vintage = True) # too early to get today's vintage
```

```
UserWarning: vintage data not available yet
```

### Citations

The data sources are returned as second value.

```py
from covid19dh import covid19
x, src = covid19("USA")
print(src)
```

## Star the repo

<a class="github-button" href="https://github.com/covid19datahub/python" data-icon="octicon-star" data-size="large" data-show-count="true" aria-label="Star covid19datahub/python on GitHub">Star</a>
<script async defer src="https://buttons.github.io/buttons.js"></script>

`r gsub("^# ", "## ", readr::read_file('../LICENSE.md'))`