This notebook explains how the data can be obtained from the Mosqlimate Data platform. All the information and functions used here are based on the [documentation](https://api.mosqlimate.org/docs/) of the project. Since we aim to train a simple univariate model, we will request the Infodengue data. Still, it's important to highlight that climate data is available from all the Brazilian cities on the platform. 

Packages necessary: 

In [13]:

import time
import aiohttp
import asyncio
import requests
import pandas as pd 

The outputs of the requests in the platform are organized by pages. So, to get large time series, it's necessary to request multiple pages from the API. So, to get all the data available for Rio de Janeiro, we will use the functions provided in the [async requests tutorial](https://api.mosqlimate.org/docs/utils/AsyncRequests/).

In [14]:
def compose_url(base_url: str, parameters: dict, page: int = 1) -> str:
    """Helper method to compose the API url with parameters"""
    url = base_url + "?" if not base_url.endswith("?") else base_url
    params = "&".join([f"{p}={v}" for p,v in parameters.items()]) + f"&page={page}"
    return url + params

In [15]:
async def fetch_data(session: aiohttp.ClientSession, url: str):
    """Uses ClientSession to create the async call to the API"""
    async with session.get(url) as response:
        return await response.json()

async def attempt_delay(session: aiohttp.ClientSession, url: str):
    """The request may fail. This method adds a delay to the failing requests"""
    try:
        return await fetch_data(session, url)
    except Exception as e:
        await asyncio.sleep(0.2)
        return await attempt_delay(session, url)

In [16]:
async def fetch_data(session: aiohttp.ClientSession, url: str):
    """Uses ClientSession to create the async call to the API"""
    async with session.get(url) as response:
        return await response.json()

async def attempt_delay(session: aiohttp.ClientSession, url: str):
    """The request may fail. This method adds a delay to the failing requests"""
    try:
        return await fetch_data(session, url)
    except Exception as e:
        await asyncio.sleep(0.2)
        return await attempt_delay(session, url)

In [17]:
async def get(base_url: str, parameters: dict) -> list:
    st = time.time()
    result = []
    tasks = []
    async with aiohttp.ClientSession() as session:
        url = compose_url(base_url, parameters)
        data = await attempt_delay(session, url)
        total_pages = data["pagination"]["total_pages"]
        result.extend(data["items"])

        for page in range(1, total_pages + 1):
            url = compose_url(base_url, parameters, page)
            tasks.append(attempt_delay(session, url))

        responses = await asyncio.gather(*tasks)
        for resp in responses:
            result.extend(resp["items"])
            
    et = time.time()
    print(f"Took {et-st:.6f} seconds")
    return result

Now, let's define the parameters of the request. Here, we define the city (by the `geocode`), the time interval (setting `start` and `end`), and the `disease`. 

In [18]:
url = "https://api.mosqlimate.org/api/datastore/infodengue/?"
parameters = {
    "per_page": 100,
    "disease": "dengue",
    "start": "2010-01-01",
    "end": "2023-06-25",
    # Optional parameters are included here
    "geocode": 3304557
}

Let's get the data: 

In [19]:
data = await get(url, parameters)

data[0]

Took 3.825173 seconds


{'data_iniSE': '2023-06-25',
 'SE': 202326,
 'casos_est': 996.0,
 'casos_est_min': 996,
 'casos_est_max': 996,
 'casos': 996,
 'municipio_geocodigo': 3304557,
 'p_rt1': 3.749455e-09,
 'p_inc100k': 14.760333,
 'Localidade_id': 0,
 'nivel': 4,
 'id': 330455720232619684,
 'versao_modelo': '2023-11-23',
 'Rt': 0.7837772,
 'municipio_nome': 'Rio de Janeiro',
 'pop': 6747815.0,
 'tempmin': 19.7142857142857,
 'umidmax': 84.4387274285714,
 'receptivo': 0,
 'transmissao': 0,
 'nivel_inc': 2,
 'umidmed': 81.390786,
 'umidmin': 78.5984042857143,
 'tempmed': 20.2142857142857,
 'tempmax': 20.7142857142857,
 'casprov': None,
 'casprov_est': None,
 'casprov_est_min': None,
 'casprov_est_max': None,
 'casconf': None}

Let's transform it in a pandas dataframe: 

In [20]:
df = pd.DataFrame(data) 

df.head()

Unnamed: 0,data_iniSE,SE,casos_est,casos_est_min,casos_est_max,casos,municipio_geocodigo,p_rt1,p_inc100k,Localidade_id,...,nivel_inc,umidmed,umidmin,tempmed,tempmax,casprov,casprov_est,casprov_est_min,casprov_est_max,casconf
0,2023-06-25,202326,996.0,996,996.0,996,3304557,3.749455e-09,14.760333,0,...,2,81.390786,78.598404,20.214286,20.714286,,,,,
1,2023-06-18,202325,1234.0,1234,1234.0,1234,3304557,0.09390902,18.287401,0,...,2,77.214252,76.076638,19.025,19.35,,,,,
2,2023-06-11,202324,1351.0,1351,1351.0,1351,3304557,0.3714996,20.021296,0,...,2,84.672437,83.266114,22.066667,22.25,,,,,
3,2023-06-04,202323,1242.0,1242,1242.0,1242,3304557,1.519026e-07,18.405958,0,...,2,84.986669,83.148884,19.078947,19.263158,,,,,
4,2023-05-28,202322,1335.0,1335,1335.0,1335,3304557,3.511171e-10,19.784182,0,...,2,88.264806,87.689309,20.283333,20.5,,,,,


To train our model, we need the date and cases columns. Also, to use it with the neural prophet API, it's necessary to rename the date column to `ds` and the `casos` column to `y`. Let's make these changes and save it on a `csv` that will used in the next step to train our model. 

In [21]:
df = df[['data_iniSE', 'casos']].rename(columns = {'data_iniSE':'ds', 'casos':'y'} ) 

df.ds = pd.to_datetime(df.ds)

df = df.drop_duplicates()

df = df.sort_values(by = 'ds')

df.reset_index(drop=True, inplace=True)

df.to_csv('./data_3304557.csv', index = False)