# Tutorial 2: Query PAIRS for climate data

In this tutorial you will:  
1. Perform a point query in IBM PAIRS to retrieve climate data for three locations and two data layers (ERA5 rainfall, id 49459 and temperature, 49423)
2. Plot the results as a time series with plotly 

Full documentation and further IBM PAIRS examples are available here: https://pairs.res.ibm.com/tutorial/tutorials/api/index.html.

Please note you will need an IBM ID and PAIRS account for this tutorial. If you do not have this, please consult the workshop setup steps [here](https://github.com/C2MA-workshop/c2ma-docs).

Many more data layers are available in PAIRS, please consult the Data Explorer in the PAIRS GUI [here].(https://ibmpairs.mybluemix.net/data-explorer)

# Preparatory steps

### Toggle here to run on Watson Studio or locally

In [19]:
running_watson_studio=True

### Set up Watson studio project token - replace project ids and tokens for your Watson Studio project as described in workshop setup instructions [here](https://github.com/C2MA-workshop/c2ma-docs)

In [20]:
# @hidden_cell
# The project token is an authorization token that is used to access project resources like data sources, connections, and used by platform APIs.
if running_watson_studio:
    from project_lib import Project
    project = Project(project_id='XXXX', project_access_token='XXXX')
    pc = project.project_context

### Install the PAIRS api library  

In [3]:
!pip install ibmpairs

Collecting ibmpairs
  Downloading ibmpairs-0.1.3-py2.py3-none-any.whl (43 kB)
[K     |████████████████████████████████| 43 kB 1.3 MB/s eta 0:00:01
[?25hCollecting pandas
  Downloading pandas-1.2.4-cp38-cp38-macosx_10_9_x86_64.whl (10.5 MB)
[K     |████████████████████████████████| 10.5 MB 4.2 MB/s eta 0:00:01
Collecting fs
  Downloading fs-2.4.13-py2.py3-none-any.whl (131 kB)
[K     |████████████████████████████████| 131 kB 3.8 MB/s eta 0:00:01
[?25hCollecting futures
  Using cached futures-3.1.1-py3-none-any.whl (2.8 kB)
Collecting numpy
  Downloading numpy-1.20.3-cp38-cp38-macosx_10_9_x86_64.whl (16.0 MB)
[K     |████████████████████████████████| 16.0 MB 4.1 MB/s eta 0:00:01
[?25hCollecting shapely
  Using cached Shapely-1.7.1-cp38-cp38-macosx_10_9_x86_64.whl (1.0 MB)
Collecting pytz
  Downloading pytz-2021.1-py2.py3-none-any.whl (510 kB)
[K     |████████████████████████████████| 510 kB 3.7 MB/s eta 0:00:01
[?25hCollecting Pillow>=1.6
  Downloading Pillow-8.2.0-cp38-cp38-mac

### Load other required libraries

In [21]:
import numpy as np
import pandas as pd
import math
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# PAIRS point query

### Connect to PAIRS - Watson Studio version
You should already have copied your api key to your Watson Studio project following the setup instrictions [here](https://github.com/C2MA-workshop/c2ma-docs).

If not please do so now, then return to this tutorial.

### PAIRS authentication in Watson Studio

In [4]:
if running_watson_studio:
    from ibmpairs import paw, authentication
    my_file = project.get_file("pairspass.txt") 
    PAIRS_API_KEY=my_file.readline().decode('utf-8') 
    PAIRS_SERVER = "https://pairs.res.ibm.com"
    OAUTH = authentication.OAuth2(api_key = PAIRS_API_KEY )

### PAIRS authentication - local version

In [25]:
# Local version
if not running_watson_studio:
    from ibmpairs import paw, authentication 
    with open("/Users/annejones/pairspass.txt") as my_file: #change to location of your pairspass.txt file
        PAIRS_API_KEY=my_file.readline()
        PAIRS_SERVER = "https://pairs.res.ibm.com"
        OAUTH = authentication.OAuth2(api_key = PAIRS_API_KEY )

### PAIRS point query
The PAIRS query is specified using a dictionary which gives layer id, spatial domain required and time. This is passed to PAIRS as a json string. 

In this example we are going to perform a simple point query of ERA-5 temperature and rainfall data for several months in 2014/2015, for three locations in Limpopo province, SA.  

**Station locations in Limpopo:**

In [76]:
lat1 = -24.99
lon1 = 31.59

lat2 = -22.97
lon2 = 30.50

lat3 = -22.27
lon3 = 29.90

**PAIRS layers:**

In [77]:
pairs_layer_era_t = 49423
pairs_layer_era_r = 49459

In [96]:
query_json = {
      "layers" : [
          {"type" : "raster", "id" : pairs_layer_era_t},
          {"type" : "raster", "id" : pairs_layer_era_r}
      ],
      "spatial" : {
          "type" : "point",  
          # note coords are specified as list in [latitude, longitude] pairs
          "coordinates" : [str(lat1), str(lon1), str(lat2), str(lon2), str(lat3), str(lon3)]
      }, 
      "temporal" : {
          "intervals" : [
          {
              "start" : "2014-10-01T00:00:00Z", 
              "end" : "2015-03-31T00:00:00Z"
          }
      ]}
  }

### Use the PAIRS library to create a query object

In [103]:
query = paw.PAIRSQuery(query_json, PAIRS_SERVER,  auth=OAUTH, authType='api-key') 

### Submit the query

In [104]:
query.submit()

### Retrieve the data, which is returned in a data frame

In [105]:
query.vdf.head()

Unnamed: 0,layerId,timestamp,longitude,latitude,value,region,property,geometry
0,49423,2014-10-01 01:00:00+00:00,31.59,-24.99,293.361633,,,POINT (31.59000 -24.99000)
1,49423,2014-10-01 01:00:00+00:00,30.5,-22.97,292.30426,,,POINT (30.50000 -22.97000)
2,49423,2014-10-01 01:00:00+00:00,29.9,-22.27,295.686646,,,POINT (29.90000 -22.27000)
3,49423,2014-10-01 02:00:00+00:00,31.59,-24.99,291.767334,,,POINT (31.59000 -24.99000)
4,49423,2014-10-01 02:00:00+00:00,30.5,-22.97,291.204529,,,POINT (30.50000 -22.97000)


### Add labels to the dataframe to make it easier to extract the layers

In [106]:
query.vdf['variable']=None
query.vdf.loc[query.vdf['layerId']==pairs_layer_era_t, 'variable'] = "ERA T"
query.vdf.loc[query.vdf['layerId']==pairs_layer_era_r, 'variable'] = "ERA R"

### Convert rainfall to mm and temperature to deg C 

In [107]:
query.vdf.loc[query.vdf['layerId']==pairs_layer_era_t, 
              'value'] = query.vdf.loc[query.vdf['layerId']==pairs_layer_era_t, 'value'] - 273.15
query.vdf.loc[query.vdf['layerId']==pairs_layer_era_r, 
              'value'] = query.vdf.loc[query.vdf['layerId']==pairs_layer_era_r, 'value']*1000.0

### Plot the data

In [116]:
units_dict = {'ERA R' : 'mm per hour', 'ERA T' : 'degC'}

lat = lat1
lon = lon1
variable = 'ERA T'
units = units_dict[variable]

df = query.vdf[(query.vdf['latitude']==lat) & (query.vdf['longitude']==lon) & (query.vdf['variable']==variable)]
infostr = 'location: ' + str(lat) + ' N, ' + str(lon) + ' E' 

fig = make_subplots(rows=1, cols=1, shared_xaxes=True, \
                   subplot_titles = [variable + " for " + infostr],
                   vertical_spacing = 0.05)
fig.add_trace(
    go.Scatter(x=df['timestamp'], y=df['value'], showlegend=False), 
    row=1, col=1) 

fig.update_layout(
    autosize=False,
    width=1000,
    height=600)
fig.update_yaxes(title_text=variable + ' [' + units + ']', row=1, col=1)
fig.for_each_yaxis(lambda axis: axis.title.update(font=dict(size=12)))
fig.show()

### Plot all locations for a single variable

In [120]:
variable = 'ERA T'
units = units_dict[variable]
fig = make_subplots(rows=1, cols=1, shared_xaxes=True, \
                   subplot_titles = [variable],
                   vertical_spacing = 0.05)

for [lat, lon] in [[lat1, lon1], [lat2, lon2], [lat3, lon3]]:

    df = query.vdf[(query.vdf['latitude']==lat) & (query.vdf['longitude']==lon) & (query.vdf['variable']==variable)]
    infostr = 'location: ' + str(lat) + ' N, ' + str(lon) + ' E' 
    fig.add_trace(
    go.Scatter(x=df['timestamp'], y=df['value'], showlegend=True, name = infostr), 
    row=1, col=1) 

fig.update_layout(
    autosize=False,
    width=1000,
    height=600)
fig.update_yaxes(title_text=variable + ' [' + units + ']', row=1, col=1)
fig.for_each_yaxis(lambda axis: axis.title.update(font=dict(size=12)))
fig.show()

### Author and license

Anne Jones is a Research Staff Member at IBM Research, specialising in AI for Climate Risk and Impacts. 

Copyright © 2021 IBM. This notebook and its source code are released under the terms of the MIT License.