In [1]:
%run "../../../common/0_notebooks_base_setup.py"

! pip install uk-covid19

/Users/csuarezgurruchaga/Desktop/Digital-House/CLASE_51/dsad_2021/common
default checking
Running command `conda list`... ok
jupyterlab=2.2.6 already installed
pandas=1.3.0 already installed
bokeh=2.2.3 already installed
seaborn=0.11.0 already installed
matplotlib=3.3.2 already installed
ipywidgets=7.5.1 already installed
pytest=6.2.1 already installed
chardet=4.0.0 already installed
psutil=5.7.2 already installed
scipy=1.5.2 already installed
statsmodels=0.12.1 already installed
scikit-learn=0.23.2 already installed
xlrd=2.0.1 already installed
nltk=3.5 already installed
unidecode=1.1.1 already installed
pydotplus=2.0.2 already installed
pandas-datareader=0.10.0 already installed
flask=1.1.2 already installed
Collecting uk-covid19
  Downloading uk_covid19-1.2.2-py3-none-any.whl (10 kB)
Installing collected packages: uk-covid19
Successfully installed uk-covid19-1.2.2


<img src='../../../common/logo_DH.png' align='left' width=35%/>

# Checkpoint APIs

---

En esta práctica vamos a usar una api que disponibiliza datos de COVID-19 en Reino Unido

La documentación de la API está disponible en https://coronavirus.data.gov.uk/details/developers-guide

Todos los pedidos (requests) a la API son sobre HTTPS.

También proveen una biblioteca para acceso a los datos https://github.com/publichealthengland/coronavirus-dashboard-api-python-sdk

En la primera parte de esta práctica vamos a usar un request para consultar cierta información (<a href="https://coronavirus.data.gov.uk/details/developers-guide#sdks">documentación</a>), y en la segunda parte vamos a consultar la misma info usando la bilbioteca que provee <a href="https://github.com/publichealthengland/coronavirus-dashboard-api-python-sdk">Public Health England</a>


## Imports


In [2]:
import pandas as pd
import numpy as np
from requests import get
from json import dumps
from datetime import date, timedelta

from uk_covid19 import Cov19API

## Ejercicio 1

Usando un web request construir un DataFrame con información sobre casos nuevos y muertes en Inglaterra ("england") el día de ayer:

Los valores de metricName que queremos obtener en la respuesta son:

* date,
* areaName,
* areaCode,
* newCasesByPublishDate,
* cumCasesByPublishDate,
* newDeaths28DaysByPublishDate,
* cumDeaths28DaysByPublishDate

Para ver qué filtros podemos aplicar en la consulta:

https://coronavirus.data.gov.uk/details/developers-guide#params-filters

Según la documentación, la estructura de la respuesta queda definida por 

`structure={[responseName]:[metricName], [responseName]:[metricName]}`

Para ver los valores disponibles para metricName:

https://coronavirus.data.gov.uk/details/developers-guide `See a list of valid metrics for structure`

**Vemos en la documentación que el filtro areaType es requerido para todas las consultas:**

`The areaType metric is mandatory and must be defined in all queries.`





In [10]:
hoy = date.today()
ayer = hoy + timedelta(days=-1)
print(ayer)
print(str(ayer))

2022-02-01
2022-02-01


In [11]:
ENDPOINT = "https://api.coronavirus.data.gov.uk/v1/data"

#valores para los filtros:

AREA_TYPE = "nation"
AREA_NAME = "england"
DATE = str(ayer)

filters = [
    f"areaType={ AREA_TYPE }",
    f"areaName={ AREA_NAME }",
    f"date={ DATE }"
]

# estructura de la respuesta

structure = {
    "date": "date",
    "name": "areaName",
    "code": "areaCode",
    "dailyCases": "newCasesByPublishDate",
    "cumulativeCases": "cumCasesByPublishDate",
    "dailyDeaths": "newDeaths28DaysByPublishDate",
    "cumulativeDeaths": "cumDeaths28DaysByPublishDate"
}

api_params = {
    "filters": str.join(";", filters),
    "structure": dumps(structure, separators=(",", ":")),
    "format":"json"
}


response = get(ENDPOINT, params = api_params, timeout=10)

if response.status_code >= 400:
    raise RuntimeError(f'Request failed: { response.text }')

print(response.url)
print("---")
print(response.json())

https://api.coronavirus.data.gov.uk/v1/data?filters=areaType%3Dnation%3BareaName%3Dengland%3Bdate%3D2022-02-01&structure=%7B%22date%22%3A%22date%22%2C%22name%22%3A%22areaName%22%2C%22code%22%3A%22areaCode%22%2C%22dailyCases%22%3A%22newCasesByPublishDate%22%2C%22cumulativeCases%22%3A%22cumCasesByPublishDate%22%2C%22dailyDeaths%22%3A%22newDeaths28DaysByPublishDate%22%2C%22cumulativeDeaths%22%3A%22cumDeaths28DaysByPublishDate%22%7D&format=json
---
{'length': 1, 'maxPageLimit': 2500, 'totalRecords': 4, 'data': [{'date': '2022-02-01', 'name': 'England', 'code': 'E92000001', 'dailyCases': 103353, 'cumulativeCases': 14948735, 'dailyDeaths': 185, 'cumulativeDeaths': 136596}], 'requestPayload': {'structure': {'date': 'date', 'name': 'areaName', 'code': 'areaCode', 'dailyCases': 'newCasesByPublishDate', 'cumulativeCases': 'cumCasesByPublishDate', 'dailyDeaths': 'newDeaths28DaysByPublishDate', 'cumulativeDeaths': 'cumDeaths28DaysByPublishDate'}, 'filters': [{'identifier': 'areaType', 'operator': 

In [22]:
response_dict=response.json()
response_dict

{'length': 1,
 'maxPageLimit': 2500,
 'totalRecords': 4,
 'data': [{'date': '2022-02-01',
   'name': 'England',
   'code': 'E92000001',
   'dailyCases': 103353,
   'cumulativeCases': 14948735,
   'dailyDeaths': 185,
   'cumulativeDeaths': 136596}],
 'requestPayload': {'structure': {'date': 'date',
   'name': 'areaName',
   'code': 'areaCode',
   'dailyCases': 'newCasesByPublishDate',
   'cumulativeCases': 'cumCasesByPublishDate',
   'dailyDeaths': 'newDeaths28DaysByPublishDate',
   'cumulativeDeaths': 'cumDeaths28DaysByPublishDate'},
  'filters': [{'identifier': 'areaType', 'operator': '=', 'value': 'nation'},
   {'identifier': 'areaName', 'operator': '=', 'value': 'england'},
   {'identifier': 'date', 'operator': '=', 'value': '2022-02-01'}],
  'page': 1},
 'pagination': {'current': '/v1/data?filters=areaType=nation;areaName=england;date=2022-02-01&structure={"date":"date","name":"areaName","code":"areaCode","dailyCases":"newCasesByPublishDate","cumulativeCases":"cumCasesByPublishDate

In [21]:
response_dict.keys()

dict_keys(['length', 'maxPageLimit', 'totalRecords', 'data', 'requestPayload', 'pagination'])

In [23]:
response_dict['data']

[{'date': '2022-02-01',
  'name': 'England',
  'code': 'E92000001',
  'dailyCases': 103353,
  'cumulativeCases': 14948735,
  'dailyDeaths': 185,
  'cumulativeDeaths': 136596}]

In [28]:
# El request que le envio a la API

request= response_dict["requestPayload"]
request

{'structure': {'date': 'date',
  'name': 'areaName',
  'code': 'areaCode',
  'dailyCases': 'newCasesByPublishDate',
  'cumulativeCases': 'cumCasesByPublishDate',
  'dailyDeaths': 'newDeaths28DaysByPublishDate',
  'cumulativeDeaths': 'cumDeaths28DaysByPublishDate'},
 'filters': [{'identifier': 'areaType', 'operator': '=', 'value': 'nation'},
  {'identifier': 'areaName', 'operator': '=', 'value': 'england'},
  {'identifier': 'date', 'operator': '=', 'value': '2022-02-01'}],
 'page': 1}

In [24]:
# La informacion que me devuelve la API

response_df = pd.DataFrame(response.json()["data"])
response_df

Unnamed: 0,date,name,code,dailyCases,cumulativeCases,dailyDeaths,cumulativeDeaths
0,2022-02-01,England,E92000001,103353,14948735,185,136596


# Ejercicio 2

Obtener todos los datos que obtuvimos en el ejercicio anterior para los últimos 30 días.


Ayuda: Quitamos el valor de `date` de los filtros y vemos que la primera página trae en orden decreciente de fecha más de treinta días

In [49]:
ENDPOINT = "https://api.coronavirus.data.gov.uk/v1/data"

#valores para los filtros:

AREA_TYPE = "nation"
AREA_NAME = "england"
DATE = str(ayer)

filters = [
    f"areaType={ AREA_TYPE }",
    f"areaName={ AREA_NAME }"
]

# estructura de la respuesta

structure = {
    "date": "date",
    "name": "areaName",
    "code": "areaCode",
    "dailyCases": "newCasesByPublishDate",
    "cumulativeCases": "cumCasesByPublishDate",
    "dailyDeaths": "newDeaths28DaysByPublishDate",
    "cumulativeDeaths": "cumDeaths28DaysByPublishDate"
}

api_params = {
    "filters": str.join(";", filters),
    "structure": dumps(structure, separators=(",", ":")),
    "format":"json",
    "page":"1"
}


response = get(ENDPOINT, params = api_params, timeout=10)

if response.status_code >= 400:
    raise RuntimeError(f'Request failed: { response.text }')

print(response.url)
print("---")
# print(response.json())

https://api.coronavirus.data.gov.uk/v1/data?filters=areaType%3Dnation%3BareaName%3Dengland&structure=%7B%22date%22%3A%22date%22%2C%22name%22%3A%22areaName%22%2C%22code%22%3A%22areaCode%22%2C%22dailyCases%22%3A%22newCasesByPublishDate%22%2C%22cumulativeCases%22%3A%22cumCasesByPublishDate%22%2C%22dailyDeaths%22%3A%22newDeaths28DaysByPublishDate%22%2C%22cumulativeDeaths%22%3A%22cumDeaths28DaysByPublishDate%22%7D&format=json&page=1
---


Alternativa: 

Hacer un ciclo iterando sobre cada fecha de interés. Usar sleep entre consultas sucesivas https://www.programiz.com/python-programming/time/sleep
    

In [50]:
response_df = pd.DataFrame(response.json()["data"])
response_df

Unnamed: 0,date,name,code,dailyCases,cumulativeCases,dailyDeaths,cumulativeDeaths
0,2022-02-02,England,E92000001,81446,15028951,519.0,137115.0
1,2022-02-01,England,E92000001,103353,14948735,185.0,136596.0
2,2022-01-31,England,E92000001,81720,14845382,37.0,135509.0
3,2022-01-30,England,E92000001,59559,14023177,75.0,135472.0
4,2022-01-29,England,E92000001,69137,13963618,275.0,135397.0
...,...,...,...,...,...,...,...
729,2020-02-04,England,E92000001,0,2,,
730,2020-02-03,England,E92000001,0,2,,
731,2020-02-02,England,E92000001,0,2,,
732,2020-02-01,England,E92000001,0,2,,


In [51]:
response_df.date = pd.to_datetime(response_df.date)

In [64]:
response_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 734 entries, 0 to 733
Data columns (total 7 columns):
 #   Column            Non-Null Count  Dtype         
---  ------            --------------  -----         
 0   date              734 non-null    datetime64[ns]
 1   name              734 non-null    object        
 2   code              734 non-null    object        
 3   dailyCases        734 non-null    int64         
 4   cumulativeCases   734 non-null    int64         
 5   dailyDeaths       699 non-null    float64       
 6   cumulativeDeaths  699 non-null    float64       
dtypes: datetime64[ns](1), float64(2), int64(2), object(2)
memory usage: 40.3+ KB


In [72]:
max_date = date.today()
min_date = max_date + timedelta(days=-30)

mask=np.logical_and((response_df.date <=  pd.to_datetime(max_date)), (response_df.date >  pd.to_datetime(min_date)))

response_df[mask]

Unnamed: 0,date,name,code,dailyCases,cumulativeCases,dailyDeaths,cumulativeDeaths
0,2022-02-02,England,E92000001,81446,15028951,519.0,137115.0
1,2022-02-01,England,E92000001,103353,14948735,185.0,136596.0
2,2022-01-31,England,E92000001,81720,14845382,37.0,135509.0
3,2022-01-30,England,E92000001,59559,14023177,75.0,135472.0
4,2022-01-29,England,E92000001,69137,13963618,275.0,135397.0
5,2022-01-28,England,E92000001,78711,13895065,247.0,135122.0
6,2022-01-27,England,E92000001,85288,13817017,302.0,134875.0
7,2022-01-26,England,E92000001,90587,13732435,300.0,134573.0
8,2022-01-25,England,E92000001,84302,13642522,409.0,134273.0
9,2022-01-24,England,E92000001,77232,13558510,44.0,133864.0


In [73]:
import requests
from requests.exceptions import HTTPError

url = 'https://httpbin.org/'

try:
    response = requests.get(url)
    response.raise_for_status()
except HTTPError as http_err:
    print('error ocurred')
except Exception as err:
    print('other error')
else:
    print('succes')


succes


## Ejercicio 3

Repetir el ejercicio 1 usando la bilbioteca que provee Public Health England

Documentación: https://publichealthengland.github.io/coronavirus-dashboard-api-python-sdk/pages/examples/general_use.html#

## Ejercicio 4

Repetir el ejercicio 2 usando la bilbioteca que provee Public Health England


## Referencias y Material Adicional
---

https://coronavirus.data.gov.uk/details/developers-guide

https://github.com/publichealthengland/coronavirus-dashboard-api-python-sdk

https://apidocs.data.world/toolkit/api/clients

https://apidocs.data.world/toolkit/rest-api

https://github.com/datadotworld/data.world-py

https://datosgobar.github.io/series-tiempo-ar-api/

https://datosgobar.github.io/series-tiempo-ar-api/python-usage/