## Historical Max Temperature from Tomorrow.io API for Delhi, India

We retrieve the max temperature (i.e., `temperatureMax`) for Delhi, India on [H3 resolution 7](https://h3geo.org) from the [Historical](https://docs.tomorrow.io/reference/historical) data layer of the [Tomorrow.io API](https://docs.tomorrow.io/reference/api-introduction).

In [1]:
import asyncio
import json
import os
from typing import Optional

import geopandas
import h3
import pandas as pd
import shapely
from aiohttp import ClientSession
from shapely.geometry import Polygon

The following are auxiliary function(s). Ideally, these will be packaged in an opportune moment.

In [2]:
def get_h3_tessellation(
    gdf: geopandas.GeoDataFrame, name="shapeName", resolution=10
) -> geopandas.GeoDataFrame:
    mapper = dict()
    tiles = set()

    # TODO: vectorize, if possible
    for idx, row in gdf.iterrows():
        geometry = row["geometry"] 
        match geometry.geom_type:
            case "Polygon":
                hex_ids = h3.polyfill(
                    shapely.geometry.mapping(geometry),
                    resolution,
                    geo_json_conformant=True,
                )

                tiles = tiles.union(set(hex_ids))
                mapper.update([(hex_id, row[name]) for hex_id in hex_ids])

            case "MultiPolygon":
                for x in geometry.geoms:
                    hex_ids = h3.polyfill(
                        shapely.geometry.mapping(x),
                        resolution,
                        geo_json_conformant=True,
                    )

                    tiles = tiles.union(set(hex_ids))
                    mapper.update([(hex_id, row[name]) for hex_id in hex_ids])
            case _:
                raise (Exception)

    tessellation = geopandas.GeoDataFrame(
        data=tiles,
        geometry=[Polygon(h3.h3_to_geo_boundary(idx, True)) for idx in tiles],
        columns=["hex_id"],
        crs="EPSG:4326",
    )

    return tessellation

In [4]:
# create the Tomorrow.io API token at https://app.tomorrow.io/development/keys
TOMORROW_API_KEY = 'uyOL03vZtbjfkTwKIRFyzO1HOcgoigaa'

## Area of Interest: Chennai

In [4]:
import os
os.getcwd()

'c:\\Users\\sahit\\OneDrive\\Documents\\01 World Bank\\India\\india-vulnerability-to-heatwaves\\notebooks'

In [5]:
INDIA = geopandas.read_file("../../data/shapefiles/india_district/sh819zz8121.shp").to_crs("EPSG:4326")
CHENNAI = INDIA[INDIA["laa"] == "CHENNAI"]

In [6]:
CHENNAI['geometry'] = CHENNAI['geometry'].apply(lambda x: shapely.wkb.loads(
        shapely.wkb.dumps(x, output_dimension=2)))

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  super().__setitem__(key, value)


## Tessellation in H3

In this step, we generate a tessellation layer for the **area of interest** using [H3 resolution 7](https://h3geo.org).

In [7]:
TESSELLATION = get_h3_tessellation(CHENNAI, name="laa", resolution=7)

In [8]:
TESSELLATION.explore()

In [11]:
TESSELLATION['centre'] = TESSELLATION['geometry'].apply(lambda x: x.centroid)

In [12]:
TESSELLATION["geojson"] = TESSELLATION["geometry"].apply(
    lambda x: json.dumps(shapely.geometry.mapping(x))
)

TESSELLATION["centre_geojson"] = TESSELLATION["centre"].apply(
    lambda x: json.dumps(shapely.geometry.mapping(x))
)

## Retrieve `Historical` from Tomorrow.io API

> Tomorrow.io's Historical Weather API allows you to query weather conditions (limited to historical data layers ]) by specifying the location (GeoJSON of a Point, LineString or Polygon), fields ("temperature", "windSpeed", ...), timesteps ("1h", "1d") and the startTime and endTime, such that the response include a historical timeline.

    Polygon and Polyline limits:

    Polygon - 10,000 square km and no more than 70km per segment.
    Polyline - 2,000 km long.
    Max number of vertices - 550.
    Timerange is limited to up to 30 days per API call.

    If the location is a Polygon or a Polyline, you can specify whether you want the min/max/avg values throughout that coverage area by adding them as a suffix to any of the available fields (temperatureMax, temperatureMaxTime) and if not specified the response will default to Max.

    The historical archive is based on a reanalysis model that blends past short-range weather forecasts with observations through advanced data assimilation techniques. The historical archive data deviates from the recent historical data -7 days since it is based on a different observation data assimilation system that incorporates a larger set of final observational records.
    Please note that our reanalysis model takes between 7 to 90 days to calculate the data fields. Please see the historical data field Availability.

    See also: https://docs.tomorrow.io/reference/historical-overview

In [13]:
class TomorrowAPIClient:
    """An Asynchronous API client for Tomorrow.io API"

    Parameters
    ----------
    token : str
        Tomorrow.io API token

    Notes
    -----
    For more information, please see https://docs.tomorrow.io
    """

    BASE_URL = "https://api.tomorrow.io/v4"

    def __init__(
        self, session: Optional[ClientSession] = None, token: Optional[str] = None
    ):
        self.session = session or ClientSession()
        self.semaphore = asyncio.BoundedSemaphore(4)
        self.token = token or os.getenv("TOMORROW_TOKEN")

    async def __aenter__(self):
        return self

    async def __aexit__(self, *args):
        await self.close()

    async def close(self):
        await self.session.close()

    async def post(self, url, json, params={}, headers={}):
        params["apikey"] = self.token
        async with self.semaphore, self.session.post(
            url, json=json, params=params, headers=headers
        ) as response:
            return await response.json()

### Creating `intervals`

Let's start in 2021, from January 1st to December 31th. The Tomorrow.io API limits the date range to 30 days. Thus, we create 13 periods of 28 days and add 1 day to the last period.

In [27]:
date_range = pd.date_range("2023-05-10", "2023-05-20", periods=11)
intervals = list(zip(date_range, date_range[1:]))

Fix by adding last day, 

In [16]:
intervals[-1] = (pd.Timestamp("2022-06-30"), pd.Timestamp("2022-03-24"))

In [28]:
intervals

[(Timestamp('2023-05-10 00:00:00'), Timestamp('2023-05-11 00:00:00')),
 (Timestamp('2023-05-11 00:00:00'), Timestamp('2023-05-12 00:00:00')),
 (Timestamp('2023-05-12 00:00:00'), Timestamp('2023-05-13 00:00:00')),
 (Timestamp('2023-05-13 00:00:00'), Timestamp('2023-05-14 00:00:00')),
 (Timestamp('2023-05-14 00:00:00'), Timestamp('2023-05-15 00:00:00')),
 (Timestamp('2023-05-15 00:00:00'), Timestamp('2023-05-16 00:00:00')),
 (Timestamp('2023-05-16 00:00:00'), Timestamp('2023-05-17 00:00:00')),
 (Timestamp('2023-05-17 00:00:00'), Timestamp('2023-05-18 00:00:00')),
 (Timestamp('2023-05-18 00:00:00'), Timestamp('2023-05-19 00:00:00')),
 (Timestamp('2023-05-19 00:00:00'), Timestamp('2023-05-20 00:00:00'))]

### Create `payloads`

In [33]:
payloads = [
    {
        "location": location,
        "fields": ["temperatureMax"],
        "timesteps": ["1d"],
        "startTime": startTime.isoformat(),
        "endTime": endTime.isoformat(),
        "units": "metric",
    }
    for location in TESSELLATION["centre_geojson"]
    for (startTime, endTime) in intervals
]

Just checking the cardinality,

In [30]:
len(payloads)

300

In [21]:
len(payloads) == len(TESSELLATION) * len(intervals)

True

Now, let's call the Tomorrow.io API!

In [22]:
print(TOMORROW_API_KEY)

uyOL03vZtbjfkTwKIRFyzO1HOcgoigaa


In [70]:
async with TomorrowAPIClient(token=TOMORROW_API_KEY) as client:

    url = f"https://api.tomorrow.io/v4/historical"
    headers = {"Accept": "application/json", "Content-Type": "application/json"}

    futures = [client.post(url, json=payload, headers=headers) for payload in payloads[93:250]]
    chennai_tmax2 = await asyncio.gather(*futures)

In [107]:
payloads[0]

{'location': '{"type": "Point", "coordinates": [80.2702631402776, 12.99373019931917]}',
 'fields': ['temperatureMax'],
 'timesteps': ['1d'],
 'startTime': '2023-05-10T00:00:00',
 'endTime': '2023-05-11T00:00:00',
 'units': 'metric'}

In [111]:
chennai_tmax[3]

{'data': {'timelines': [{'timestep': '1d',
    'endTime': '2023-05-14T00:00:00Z',
    'startTime': '2023-05-13T00:00:00Z',
    'intervals': [{'startTime': '2023-05-13T00:00:00Z',
      'values': {'temperatureMax': 34.91}},
     {'startTime': '2023-05-14T00:00:00Z',
      'values': {'temperatureMax': 36.52}}]}]}}

In [114]:
import json

with open('../../data/weather/chennai_tmax_0_93.json', "w") as outfile:
    json.dump(chennai_tmax, outfile)

In [102]:
df = pd.DataFrame(chennai_tmax)
df=df[df['code'].isna()]
df

Unnamed: 0,data,code,type,message
0,"{'timelines': [{'timestep': '1d', 'endTime': '...",,,
1,"{'timelines': [{'timestep': '1d', 'endTime': '...",,,
2,"{'timelines': [{'timestep': '1d', 'endTime': '...",,,
3,"{'timelines': [{'timestep': '1d', 'endTime': '...",,,
4,"{'timelines': [{'timestep': '1d', 'endTime': '...",,,
...,...,...,...,...
90,"{'timelines': [{'timestep': '1d', 'endTime': '...",,,
91,"{'timelines': [{'timestep': '1d', 'endTime': '...",,,
92,"{'timelines': [{'timestep': '1d', 'endTime': '...",,,
93,"{'timelines': [{'timestep': '1d', 'endTime': '...",,,


In [106]:
df.iloc[0]['data']

{'timelines': [{'timestep': '1d',
   'endTime': '2023-05-11T00:00:00Z',
   'startTime': '2023-05-10T00:00:00Z',
   'intervals': [{'startTime': '2023-05-10T00:00:00Z',
     'values': {'temperatureMax': 33.22}},
    {'startTime': '2023-05-11T00:00:00Z',
     'values': {'temperatureMax': 33.68}}]}]}

## Data Manipulation

In [88]:
pd.dataFrame(dataframes)

AttributeError: module 'pandas' has no attribute 'dataFrame'

In [80]:
for i in [0,93]:
    item = chennai_tmax[i]
    dataframes = pd.json_normalize(item["data"]["timelines"][0]["intervals"])

In [56]:
import ast
dataframes =  pd.json_normalize(dict(chennai_tmax[0])['data']['timelines'][0]['intervals'])

In [82]:
dataframes = [pd.json_normalize(item["data"]["timelines"][0]["intervals"]) for item in chennai_tmax[0:93]]

Concatenating,

In [58]:
df = pd.concat(dataframes)
df

Unnamed: 0,startTime,values.temperatureMax
0,2023-05-10T00:00:00Z,33.22
1,2023-05-11T00:00:00Z,33.68
0,2023-05-11T00:00:00Z,33.68
1,2023-05-12T00:00:00Z,34.42
0,2023-05-12T00:00:00Z,34.42
...,...,...
1,2023-05-11T00:00:00Z,33.68
0,2023-05-11T00:00:00Z,33.68
1,2023-05-12T00:00:00Z,34.42
0,2023-05-12T00:00:00Z,34.42
