<a href="https://colab.research.google.com/github/Vizzuality/copernicus-climate-data/blob/master/upload_and_define_datasets.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Prepare data for the copernicus-climate project

https://github.com/Vizzuality/copernicus-climate-data

`Edward P. Morris (vizzuality.)`

## Description
This notebook exports tables of time-series per location, and defines datasets and layers using the API.

```
MIT License

Copyright (c) 2020 Vizzuality

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
```

# Setup

Instructions for setting up the computing environment.

In [0]:
%%bash
# Remove sample_data
rm -r sample_data

## Linux dependencies

Instructions for adding linux (including node, ect.) system packages. 

In [2]:
# Packages for projections and geospatial processing
!apt install -q -y libspatialindex-dev libproj-dev proj-data proj-bin libgeos-dev

Reading package lists...
Building dependency tree...
Reading state information...
proj-data is already the newest version (4.9.3-2).
proj-data set to manually installed.
The following additional packages will be installed:
  libspatialindex-c4v5 libspatialindex4v5
Suggested packages:
  libgdal-doc
The following NEW packages will be installed:
  libgeos-dev libproj-dev libspatialindex-c4v5 libspatialindex-dev
  libspatialindex4v5 proj-bin
0 upgraded, 6 newly installed, 0 to remove and 31 not upgraded.
Need to get 860 kB of archives.
After this operation, 5,014 kB of additional disk space will be used.
Get:1 http://archive.ubuntu.com/ubuntu bionic/universe amd64 libgeos-dev amd64 3.6.2-1build2 [73.1 kB]
Get:2 http://archive.ubuntu.com/ubuntu bionic/universe amd64 libspatialindex4v5 amd64 1.8.5-5 [219 kB]
Get:3 http://archive.ubuntu.com/ubuntu bionic/universe amd64 libspatialindex-c4v5 amd64 1.8.5-5 [51.7 kB]
Get:4 http://archive.ubuntu.com/ubuntu bionic/universe amd64 libproj-dev amd64 4

## Python packages

In [0]:
# connect to Google cloud storage
!pip install -q gcsfs

In [4]:
# xarray, Zarr and geometry tools
!pip install -q cftime netcdf4 nc-time-axis zarr xarray bottleneck rtree geopandas shapely --upgrade

[K     |████████████████████████████████| 327kB 4.8MB/s 
[K     |████████████████████████████████| 4.1MB 10.9MB/s 
[K     |████████████████████████████████| 3.3MB 32.0MB/s 
[K     |████████████████████████████████| 71kB 9.1MB/s 
[K     |████████████████████████████████| 931kB 47.1MB/s 
[K     |████████████████████████████████| 3.8MB 47.0MB/s 
[K     |████████████████████████████████| 14.7MB 259kB/s 
[K     |████████████████████████████████| 10.9MB 41.6MB/s 
[?25h  Building wheel for zarr (setup.py) ... [?25l[?25hdone
  Building wheel for rtree (setup.py) ... [?25l[?25hdone
  Building wheel for asciitree (setup.py) ... [?25l[?25hdone
  Building wheel for numcodecs (setup.py) ... [?25l[?25hdone


In [5]:
!pip uninstall -y earthengine-api
!pip install 'earthengine-api==0.1.215'

Uninstalling earthengine-api-0.1.221:
  Successfully uninstalled earthengine-api-0.1.221
Collecting earthengine-api==0.1.215
[?25l  Downloading https://files.pythonhosted.org/packages/0f/4f/3856b3bbb0307170d39a1f62d7d162514194645d1275aa8c7d20b1cf7592/earthengine-api-0.1.215.tar.gz (152kB)
[K     |████████████████████████████████| 153kB 5.2MB/s 
Building wheels for collected packages: earthengine-api
  Building wheel for earthengine-api (setup.py) ... [?25l[?25hdone
  Created wheel for earthengine-api: filename=earthengine_api-0.1.215-cp36-none-any.whl size=180532 sha256=91608101a674296729725d524f6bb7456c1a9b958d548a856b8959e27a0d26b8
  Stored in directory: /root/.cache/pip/wheels/29/70/d4/b56c350f26eda59ca55196d67578f5fd3e7af47cc0502173d6
Successfully built earthengine-api
Installing collected packages: earthengine-api
Successfully installed earthengine-api-0.1.215


In [0]:
# Need to restart kernal
#import importlib
#importlib.reload(earthengine-api)

In [2]:
!pip install -q Skydipper carto 

[K     |████████████████████████████████| 51kB 3.4MB/s 
[K     |████████████████████████████████| 655kB 14.1MB/s 
[?25h  Building wheel for carto (setup.py) ... [?25l[?25hdone
  Building wheel for pypng (setup.py) ... [?25l[?25hdone
  Building wheel for pyrestcli (setup.py) ... [?25l[?25hdone


In [0]:
import Skydipper

In [4]:
# Show python package versions
!pip list

Package                  Version        
------------------------ ---------------
absl-py                  0.9.0          
alabaster                0.7.12         
albumentations           0.1.12         
altair                   4.1.0          
asciitree                0.3.3          
asgiref                  3.2.7          
astor                    0.8.1          
astropy                  4.0.1.post1    
astunparse               1.6.3          
atari-py                 0.2.6          
atomicwrites             1.4.0          
attrs                    19.3.0         
audioread                2.1.8          
autograd                 1.3            
Babel                    2.8.0          
backcall                 0.1.0          
beautifulsoup4           4.6.3          
bleach                   3.1.5          
blis                     0.4.1          
bokeh                    1.4.0          
boto                     2.49.0         
boto3                    1.13.13        
botocore        

## Authorisation

Setting up connections and authorisation to cloud services.

In [0]:
import os
import json
import shutil

env_fn = 'env-variables.json'

# Get json file defining env variable key-value pairs
shutil.copyfile(f"/content/drive/My Drive/{env_fn}", f"/root/.{env_fn}")
with open(f"/root/.{env_fn}") as f:
   for k,v in json.load(f).items():
      os.environ[k] = v

### Google Cloud

This can be done in the URL or via adding service account credentials.

If you do not share the notebook, you can mount your Drive and and transfer credentials to disk. Note if the notebook is shared you always need to authenticate via URL.  

In [0]:
# Set Google Cloud information
gc_project = "skydipper-196010"
gc_creds = "skydipper-196010-f842645fd0f3.json"
gc_user = "edward-morris@skydipper-196010.iam.gserviceaccount.com"
gcs_prefix = "gs://copernicus-climate"
gcs_http_url = "https://storage.googleapis.com/copernicus-climate"

In [0]:
# For auth WITHOUT service account
# https://cloud.google.com/resource-manager/docs/creating-managing-projects
#from google.colab import auth
#auth.authenticate_user()
#!gcloud config set project {project_id}

In [0]:
# If the notebook is shared
#from google.colab import drive
#drive.mount('/content/drive')

In [0]:
# If Drive is mounted, copy GC credentials to home (place in your GDrive, and connect Drive)
!cp "/content/drive/My Drive/{gc_creds}" "/root/.{gc_creds}"

In [10]:
# Auth WITH service account
!gcloud auth activate-service-account {gc_user} --key-file=/root/.{gc_creds} --project={gc_project}

Activated service account credentials for: [edward-morris@skydipper-196010.iam.gserviceaccount.com]


In [11]:
# Test GC auth
!gsutil ls {gcs_prefix}

gs://copernicus-climate/heatwaves_historical_Basque.zip
gs://copernicus-climate/heatwaves_longterm_Basque.zip
gs://copernicus-climate/spain.zarr.zip
gs://copernicus-climate/coldsnaps/
gs://copernicus-climate/data_for_PET/
gs://copernicus-climate/dataset/
gs://copernicus-climate/european-nuts-lau-geometries.zarr/
gs://copernicus-climate/heatwaves/
gs://copernicus-climate/pet/
gs://copernicus-climate/spain-zonal-stats.zarr/
gs://copernicus-climate/spain.zarr/
gs://copernicus-climate/tasmax/
gs://copernicus-climate/tasmin/
gs://copernicus-climate/to_delete/
gs://copernicus-climate/zonal_stats/


### Skydipper API

You need to register with the API at https://api.skydipper.com/auth , we then login with our email and password to get an authorisation token. Be aware users need specific authorisation scopes linked to projects.

In [0]:
# Set API information (note credentials should be defined in ENV)
sky_api_app = "copernicusClimate"
sky_creds = "skydipper-creds.txt"

In [0]:
# Set up first time
# Get auth token from API
#import requests
#import os
#payload = {
#    "email":os.environ['SKY_API_EMAIL'],
#    "password":os.environ['SKY_API_PWD']
#}
#url = 'https://api.skydipper.com/auth/login'
#headers = {'Content-Type': 'application/json'}
#r = requests.post(url, json=payload, headers=headers)
#r.json()
#token= r.json().get('data').get('token')
#headers = {'Authorization': f"Bearer {token}"}

In [0]:
# Copy previously generated creds
!mkdir /root/.Skydipper
!cp "/content/drive/My Drive/{sky_creds}" /root/.Skydipper/creds
with open("/root/.Skydipper/creds") as f:
   sky_api_token = f.read()

In [15]:
# Check it works
import Skydipper as sky

sky.Dataset('3a46bbff-73bc-4abc-bad6-11be6e99e2cb')

### Carto

In [0]:
# Set API information (note credentials should be defined in ENV)
carto_user = "skydipper"
carto_base_url = f"http://35.233.41.65/user/{carto_user}"  

In [17]:
from carto.auth import APIKeyAuthClient
import os

auth_client = APIKeyAuthClient(api_key=os.environ['CARTO_API_KEY'], base_url=carto_base_url)

  authentication!!!")


# Utils

Generic helper functions used in the subsequent processing. For easy navigation each function seperated into a section with the function name.

## copy_gcs

In [0]:
import os
import subprocess

def copy_gcs(source_list, dest_list, opts=""):
  """
  Use gsutil to copy each corresponding item in source_list
  to dest_list.

  Example:
  copy_gcs(["gs://my-bucket/data-file.csv"], ["."])

  """
  for s, d  in zip(source_list, dest_list):
    cmd = f"gsutil -m cp -r {opts} {s} {d}"
    print(f"Processing: {cmd}")
    r = subprocess.call(cmd, shell=True)
    if r == 0:
        print("Task created")
    else:
        print("Task failed")
  print("Finished copy")

## get_cached_remote_zarr

In [0]:
import gcsfs
import zarr
import xarray as xr



def get_cached_remote_zarr(
    group,
    root,
    project_id = gc_project,
    token=f"/root/.{gc_creds}",
    force_consolidate=False):
  
  # Connect to GS
  gc = gcsfs.GCSFileSystem(project=project_id, token=token)
  store = gc.get_mapper(root, check=False, create=True)
  if force_consolidate:
    # consolidate metadata at root
    zarr.consolidate_metadata(store)
  # Check zarr is consolidated
  consolidated = gc.exists(f'{root}/.zmetadata')
  # Cache the zarr store
  #store = zarr.ZipStore(store, mode='r')
  cache = zarr.LRUStoreCache(store, max_size=4737418240)
  # Return cached zarr group
  return xr.open_zarr(cache, group=group, consolidated=consolidated)

## set_acl_to_public

In [0]:
import subprocess

# Set to asset permissions to public for https read
def set_acl_to_public(gs_path):
  """ 
  Set all Google Storage assets to puplic read access.

  Requires GS authentication

  Parameters
  ----------
  gs_path str
    The google storage path, note the "-r" option is used, setting the acl of all assets below this path
  """
  cmd = f"gsutil -m acl -r ch -u AllUsers:R {gs_path}"
  print(cmd)
  r = subprocess.call(cmd, shell=True)
  if r is 0:
    print("Set acl(s) sucsessful")
  else:
    print("Set acl(s) failed")  

#set_acl_to_public("gs://skydipper-water-quality/cloud-masks")

## to_geopandas

In [0]:
import geopandas as gpd
import shapely

def to_geopandas(ds, rounding_precision=False):
  df = ds.reset_coords().to_dataframe().dropna().reset_index()
  # Return as geopandas object, converting geometry to shapley objects
  geoms = [shapely.wkb.loads(g, hex=True) for g in df.geometry.values]
  # Adjust precision
  if rounding_precision:
    geoms = [shapely.wkt.loads(shapely.wkt.dumps(g, rounding_precision=rounding_precision)) for g in geoms]
  return gpd.GeoDataFrame(df, geometry = geoms)

## create_admin_dict

In [0]:
# Create gid lookup tables
import geopandas as gpd
import rtree

def create_admin_dict(gdfs, debug=False):
  """ Generates dictionary of admin to lower admin gid codes.

  Input should be a list of geopandas dfs, level 0 to 4."""
  
  # Buffer geometry
  gdfbs =[gdfs[i][['gid', 'geometry']] for i in range(0, len(gdfs) -1)]
  for gdfb in gdfbs:
    gdfb.loc[:,'geometry'] = gdfb.buffer(0.1).values 

  # create dict of conversions
  return {
    "admin0to1": gpd.sjoin(gdfbs[0], gdfs[1][['gid', 'geometry']], op='contains').drop('geometry', axis=1).assign(admin_level = 1),
    "admin0to2": gpd.sjoin(gdfbs[0], gdfs[2][['gid', 'geometry']], op='contains').drop('geometry', axis=1).assign(admin_level = 2),
    "admin0to3": gpd.sjoin(gdfbs[0], gdfs[3][['gid', 'geometry']], op='contains').drop('geometry', axis=1).assign(admin_level = 3),
    "admin0to4": gpd.sjoin(gdfbs[0], gdfs[4][['gid', 'geometry']], op='contains').drop('geometry', axis=1).assign(admin_level = 4),
    "admin1to2": gpd.sjoin(gdfbs[1], gdfs[2][['gid', 'geometry']], op='contains').drop('geometry', axis=1).assign(admin_level = 2),
    "admin1to3": gpd.sjoin(gdfbs[1], gdfs[3][['gid', 'geometry']], op='contains').drop('geometry', axis=1).assign(admin_level = 3),
    "admin1to4": gpd.sjoin(gdfbs[1], gdfs[4][['gid', 'geometry']], op='contains').drop('geometry', axis=1).assign(admin_level = 4),
    "admin2to3": gpd.sjoin(gdfbs[2], gdfs[3][['gid', 'geometry']], op='contains').drop('geometry', axis=1).assign(admin_level = 3),
    "admin2to4": gpd.sjoin(gdfbs[2], gdfs[4][['gid', 'geometry']], op='contains').drop('geometry', axis=1).assign(admin_level = 4),
    "admin3to4": gpd.sjoin(gdfbs[3], gdfs[4][['gid', 'geometry']], op='contains').drop('geometry', axis=1).assign(admin_level = 4),
  }

# Processing

Data processing organised into sections.

## Geometries

### Write admin lookup to CSV and geometries to GeoJSON 

In [0]:
%%time
# Write CSV to GCS
import gcsfs
import pandas as pd
from dask.diagnostics import ProgressBar, Profiler, ResourceProfiler, CacheProfiler, visualize
import json

# Geometries
gda = get_cached_remote_zarr(group = 'nuts-2016-lau-2018', root = "copernicus-climate/european-nuts-lau-geometries.zarr")
print(gda)

p = "zonal_stats"
fs = gcsfs.GCSFileSystem(project=gc_project, token=f"/root/.{gc_creds}")
with Profiler() as prof, ResourceProfiler(dt=1) as rprof, CacheProfiler() as cprof:
  with ProgressBar():
    with fs.open(f"{gcs_prefix}/{p}/admin_lookup_esp_nuts_lau_levels_0to4.csv", 'w') as f:
      print("\nwriting Admin. lookup\n")
      # Create admin lookup dictionary and GeoJSON files
      levels = [0,1,2,3,4]
      gdfs = [to_geopandas(gda.where((gda.admin_level==l)&(gda.iso3=='ESP'), drop=True), rounding_precision=6) for l in levels]
      admin_dict = create_admin_dict([to_geopandas(gda.where((gda.admin_level==l)&(gda.iso3=='ESP'), drop=True), rounding_precision=6) for l in levels])
      pd.concat(admin_dict.values())[['gid_left','gid_right']].to_csv(f, index=False)
      
    print("\nwriting GeoJSON\n")
    # Write to GeoPackage
    # FIXME: Geopandas does not play well with stream as path!
    pd.concat(gdfs).to_file("geometries_esp_nuts_lau_levels_0to4.geojson", driver="GeoJSON")
    copy_gcs(["geometries_esp_nuts_lau_levels_0to4.geojson"], [f"{gcs_prefix}/{p}/geometries_esp_nuts_lau_levels_0to4.geojson"])

<xarray.Dataset>
Dimensions:      (gid: 104568)
Coordinates:
    admin_level  (gid) int64 dask.array<chunksize=(104568,), meta=np.ndarray>
    geoname      (gid) object dask.array<chunksize=(26142,), meta=np.ndarray>
  * gid          (gid) object 'AL' 'CZ' 'DE' ... 'UK_W06000023' 'UK_W06000024'
    iso3         (gid) object dask.array<chunksize=(26142,), meta=np.ndarray>
Data variables:
    geometry     (gid) object dask.array<chunksize=(26142,), meta=np.ndarray>
Attributes:
    crs:                 EPSG:4326
    geospatial_lat_max:  75.814181
    geospatial_lat_min:  26.018616
    geospatial_lon_max:  69.103165
    geospatial_lon_min:  61.78629
    history:             Created by combining `ref-nuts-2016-01m` and `LAU-20...
    keywords:            Statistical units, NUTS, LAU
    summary:             This dataset represents the regions for levels 1, 2 ...
    title:               European Union Nomenclature of Territorial Units for...

writing Admin. lookup

[########################

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self.obj[item] = s



writing GeoPackage

Processing: gsutil -m cp -r  geometries_esp_nuts_lau_levels_0to4.geojson gs://copernicus-climate/zonal_stats/geometries_esp_nuts_lau_levels_0to4.geojson
Task created
Finished copy
CPU times: user 25.4 s, sys: 753 ms, total: 26.2 s
Wall time: 34.2 s


### Set ACLs to public

In [0]:
# Set ACLs to public
set_acl_to_public(f"{gcs_prefix}/{p}/")

gsutil -m acl -r ch -u AllUsers:R gs://copernicus-climate/zonal_stats/
Set acl(s) sucsessful


### Upload to Carto

In [0]:
# Upload to Carto
# FIXME how to automatically make public??
import requests
import os

# Set some parameters
p = 'zonal_stats'
tis = ['admin_lookup', 'geometries']
ends = ['csv', 'geojson']
upload_tasks = list()

for ti, e in zip(tis, ends):
  payload = {
    "api_key":os.environ['CARTO_API_KEY'],
    "url":f"{gcs_http_url}/{p}/{ti}_esp_nuts_lau_levels_0to4.{e}",
    "privacy":"public",
    "interval":86400*7
    }
  url = f"{carto_base_url}/api/v1/synchronizations"
  headers = {'Content-Type': 'application/json'}
  r = requests.post(url=url, json=payload, headers=headers)
  upload_tasks.append(r.json())

for task in upload_tasks:
  print(task)

{'data_import': {'endpoint': '/api/v1/imports', 'item_queue_id': '164f595f-b7cd-44c3-bb70-edb9b977ef91'}, 'id': '94d4734c-9eb0-11ea-b692-16056af2ae5d', 'name': None, 'interval': 604800, 'url': 'https://storage.googleapis.com/copernicus-climate/zonal_stats/admin_lookup_esp_nuts_lau_levels_0to4.csv', 'state': 'queued', 'user_id': 'c7980b72-f84a-4229-a346-ecc742f86552', 'created_at': '2020-05-25T17:53:03.662+00:00', 'updated_at': '2020-05-25T17:53:03.700+00:00', 'run_at': '2020-06-01T17:53:03.660+00:00', 'ran_at': '2020-05-25T17:53:03.660+00:00', 'modified_at': None, 'etag': None, 'checksum': '', 'log_id': None, 'error_code': None, 'error_message': None, 'retried_times': 0, 'service_name': None, 'service_item_id': None, 'type_guessing': True, 'quoted_fields_guessing': True, 'content_guessing': False, 'visualization_id': None, 'from_external_source': False}
{'data_import': {'endpoint': '/api/v1/imports', 'item_queue_id': 'fac89a72-39fb-4ba7-bc97-14a73a302d8c'}, 'id': '9505bd44-9eb0-11ea-b6

### Create Sky API datasets

In [0]:
import Skydipper as sky

In [0]:
# Remember Carto changes all '-' to '_' !
tis = ['admin_lookup', 'geometries']
datasets = list()
for ti in tis:
  atts = { 
    'name': f"{ti}_esp_nuts_lau_levels_0to4",
    'application': ['copernicusClimate'],
    'connectorType': 'rest',
    'provider': 'cartodb',
    'connectorUrl': f"http://35.233.41.65/user/skydipper/dataset/{ti}_esp_nuts_lau_levels_0to4",
    'tableName': f"{ti}_esp_nuts_lau_levels_0to4",
    'env': 'production'
    }
  #print(atts)
  ds = sky.Dataset(attributes=atts)
  datasets.append(ds)
  print(ds)

Dataset ce02bbcb-4dce-439f-83bd-68a99a9a3f2b admin_lookup_esp_nuts_lau_levels_0to4
Dataset 4069835a-7641-4969-abdc-712e27f9118d geometries_esp_nuts_lau_levels_0to4


In [0]:
datasets[0]

In [0]:
datasets[1]

## Create datasets for monthly climatic variables per location

### Write CSV tables to storage

In [0]:
%%time
# Write CSV to GCS
import gcsfs
import pandas as pd
from dask.diagnostics import ProgressBar, Profiler, ResourceProfiler, CacheProfiler, visualize

# Set some parameters
p = 'zonal_stats'
tis = ['historical', 'future-seasonal', 'future-longterm']

fs = gcsfs.GCSFileSystem(project=gc_project, token=f"/root/.{gc_creds}")
with Profiler() as prof, ResourceProfiler(dt=1) as rprof, CacheProfiler() as cprof:
  with ProgressBar():
    for ti in tis:
      print(f"writing {ti}")
      with fs.open(f"{gcs_prefix}/{p}/{ti}_monthly_zs_nuts-level-234.csv", 'w') as f:
        xr.merge([get_cached_remote_zarr(f"{ti}-monthly-zs-nuts-level-{l}", 'copernicus-climate/spain-zonal-stats.zarr') for l in [2,3,4]])\
        .to_dataframe().reset_index(drop=False).to_csv(f, index=False)

writing historical
[########################################] | 100% Completed |  0.5s
[########################################] | 100% Completed |  0.5s
[########################################] | 100% Completed | 17.3s
[########################################] | 100% Completed |  0.4s
[########################################] | 100% Completed |  0.5s
[########################################] | 100% Completed | 17.6s
[########################################] | 100% Completed |  0.5s
[########################################] | 100% Completed |  0.5s
[########################################] | 100% Completed | 16.9s
[########################################] | 100% Completed |  0.4s
[########################################] | 100% Completed |  0.5s
[########################################] | 100% Completed | 17.5s
[########################################] | 100% Completed |  0.4s
[########################################] | 100% Completed |  0.5s
[############################

### Set ACLs to public

In [0]:
# Set ACLs to public
set_acl_to_public(f"{gcs_prefix}/{p}/")

gsutil -m acl -r ch -u AllUsers:R gs://copernicus-climate/zonal_stats/
Set acl(s) sucsessful


### Upload to Carto

In [0]:
# Upload to Carto
# FIXME how to automatically make public??
import requests
import os

# Set some parameters
p = 'zonal_stats'
tis = ['historical', 'future-seasonal', 'future-longterm']

upload_tasks = list()
for ti in tis:
  #print(f"{gcs_http_url}/{p}/{ti}_monthly_zs_nuts-level-234.csv")
  payload = {
    "api_key":os.environ['CARTO_API_KEY'],
    "url":f"{gcs_http_url}/{p}/{ti}_monthly_zs_nuts-level-234.csv",
    "privacy":"public",
    "interval":86400*7
    }
  url = f"{carto_base_url}/api/v1/synchronizations"
  headers = {'Content-Type': 'application/json'}
  r = requests.post(url=url, json=payload, headers=headers)
  upload_tasks.append(r.json())
  for task in upload_tasks:
    print(task)

{'data_import': {'endpoint': '/api/v1/imports', 'item_queue_id': 'fa4a0954-5500-49b8-b730-01dde82acfbe'}, 'id': 'cf6485b2-9eb3-11ea-b692-16056af2ae5d', 'name': None, 'interval': 604800, 'url': 'https://storage.googleapis.com/copernicus-climate/zonal_stats/historical_monthly_zs_nuts-level-234.csv', 'state': 'queued', 'user_id': 'c7980b72-f84a-4229-a346-ecc742f86552', 'created_at': '2020-05-25T18:16:10.404+00:00', 'updated_at': '2020-05-25T18:16:10.454+00:00', 'run_at': '2020-06-01T18:16:10.402+00:00', 'ran_at': '2020-05-25T18:16:10.402+00:00', 'modified_at': None, 'etag': None, 'checksum': '', 'log_id': None, 'error_code': None, 'error_message': None, 'retried_times': 0, 'service_name': None, 'service_item_id': None, 'type_guessing': True, 'quoted_fields_guessing': True, 'content_guessing': False, 'visualization_id': None, 'from_external_source': False}
{'data_import': {'endpoint': '/api/v1/imports', 'item_queue_id': 'fa4a0954-5500-49b8-b730-01dde82acfbe'}, 'id': 'cf6485b2-9eb3-11ea-b69

### Create Sky API datasets

In [0]:
import Skydipper as sky

In [0]:
# Remember Carto changes all '-' to '_' !
tis = ['historical', 'future_seasonal', 'future_longterm']
datasets = list()
for ti in tis:
  #print(f"{ti}_monthly_zs_nuts-level-234")

  atts = { 
    'name': f"{ti}_monthly_zs_nuts-level-234",
    'application': ['copernicusClimate'],
    'connectorType': 'rest',
    'provider': 'cartodb',
    'connectorUrl': f"http://35.233.41.65/user/skydipper/dataset/{ti}_monthly_zs_nuts_level_234",
    'tableName': f"{ti}_monthly_zs_nuts_level_234",
    'env': 'production'
    }
  #print(atts)
  ds = sky.Dataset(attributes=atts)
  datasets.append(ds)
  print(ds)

Dataset 6aa6e725-4725-4e5a-8d46-4196db9f8634 historical_monthly_zs_nuts-level-234
Dataset f9fc8dc6-128a-48ee-90b3-8d79a718f2f3 future_seasonal_monthly_zs_nuts-level-234
Dataset 47586c47-5c58-4f88-8fe9-25f683180dd5 future_longterm_monthly_zs_nuts-level-234


In [0]:
datasets[0]

In [0]:
datasets[1]

In [0]:
datasets[2]

In [0]:
# Access Carto queries response

def get_timeseries(theme, time_interval, gid = "ES11", start_date = "1980-01-01", end_date = "2100-01-01"):
  
  

  # Define SQL conditions
  # experiment is only future_longterm
  se = ""
  we = ""
  dvs = ""

  # choose dataset
  # for future use mean {data_var}_mean 
  # and standard deviation {data_var}_std
  if time_interval == "future_longterm":
    se = "experiment, "
    we = "AND experiment = 'rcp85'"
    dataset_id = 'bef42c82-2714-4ba0-8694-75e49916013a'
    table_name = 'future_longterm_monthly_zs_nuts_level_234'
    if theme == 'heatwaves':
      data_vars = ["tasmax", "heatwave_alarms", "heatwave_alerts", "heatwave_warnings"] 
    if theme == 'coldsnaps':
      data_vars = ["tasmin", "coldsnap_alarms", "coldsnap_alerts", "coldsnap_warnings"]
    dvs = [f"{data_var}_mean, {data_var}_std " for data_var in data_vars]
  
  if time_interval == "future_seasonal":
    dataset_id = 'e1cc3f3e-133a-4a14-b2c2-f3192ee213c3'
    table_name = "future_seasonal_monthly_zs_nuts_level_234"
    if theme == 'heatwaves':
      data_vars = ["tasmax", "heatwave_alarms", "heatwave_alerts", "heatwave_warnings"] 
    if theme == 'coldsnaps':
      data_vars = ["tasmin", "coldsnap_alarms", "coldsnap_alerts", "coldsnap_warnings"]
    dvs = [f"{data_var}_mean, {data_var}_std " for data_var in data_vars]
    
  if time_interval == "historical":
    dataset_id = '3a46bbff-73bc-4abc-bad6-11be6e99e2cb'
    table_name = 'historical_monthly_zs_nuts_level_234'
    if theme == 'heatwaves':
      data_vars = ["tasmax", "heatwave_alarms", "heatwave_alerts", "heatwave_warnings", "heatstress_extreme", "heatstress_strong", "heatstress_moderate"] 
    if theme == 'coldsnaps':
      data_vars = ["tasmin", "coldsnap_alarms", "coldsnap_alerts", "coldsnap_warnings", "coldstress_extreme", "coldstress_strong", "coldstress_moderate"]
    dvs = [f"{data_var}_mean " for data_var in data_vars]
    

  # Convert variables to string
  dvstring = ", ".join(dvs)
  #print(dvstring)

  sql = \
  f"SELECT gid, {se}time, {dvstring}"\
  f"FROM {table_name} "\
  f"WHERE gid = '{gid}' AND time between '{start_date}' AND '{end_date}' {we} "\
  "ORDER BY time"

  #print(sql)

  url = f"http://api.skydipper.com/v1/query/{dataset_id}/"
  params = {"sql": sql}
  headers = {'Authorization': f"Bearer {sky_api_token}"}
  r = requests.post(url=url, params=params, headers=headers)
  return r

In [0]:
import urllib
themes = ['heatwaves', 'coldsnaps']
time_intervals = ['historical', 'future_seasonal', 'future_longterm']
headers = {'Authorization': f"Bearer {sky_api_token}"}

print("\nHeader:\n")
print(headers)
print("\nAPI queries:\n")
for theme in themes:  
  print(f"\n{theme}:\n")
  for time_interval in time_intervals:
    r = get_timeseries(theme, time_interval)
    print(f"\n{time_interval}:\n")
    print(urllib.parse.unquote_plus(r.url))



Header:

{'Authorization': 'Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpZCI6IjVlMzA0OGIzYTY4NWYzMDAxMDhkZjYyNCIsInJvbGUiOiJBRE1JTiIsInByb3ZpZGVyIjoibG9jYWwiLCJlbWFpbCI6ImVkd2FyZC5tb3JyaXNAdml6enVhbGl0eS5jb20iLCJleHRyYVVzZXJEYXRhIjp7ImFwcHMiOlsic2t5ZGlwcGVyIiwibWFuZ3JvdmVBdGxhcyIsInNvaWxzUmV2ZWFsZWQiLCJjb3Blcm5pY3VzQ2xpbWF0ZSJdfSwiY3JlYXRlZEF0IjoxNTkwNDA0NzU1MTc3LCJpYXQiOjE1OTA0MDQ3NTV9.wRRJQCFtvCZzMTtucly2pmCL5WhsFBgBFDUo2CmJSaY'}

API queries:


heatwaves:


historical:


future_seasonal:


future_longterm:


coldsnaps:


historical:


future_seasonal:


future_longterm:



## Create datasets for total climatic variables per location

### Write CSV tables to storage

In [0]:
%%time
# Write CSV to GCS
import gcsfs
import pandas as pd
from dask.diagnostics import ProgressBar, Profiler, ResourceProfiler, CacheProfiler, visualize

# Set some parameters
p = 'zonal_stats'
tis = ['historical', 'future-seasonal', 'future-longterm']

fs = gcsfs.GCSFileSystem(project=gc_project, token=f"/root/.{gc_creds}")
with Profiler() as prof, ResourceProfiler(dt=1) as rprof, CacheProfiler() as cprof:
  with ProgressBar():
    for ti in tis:
      print(f"writing {ti}")
      with fs.open(f"{gcs_prefix}/{p}/{ti}_total_zs_nuts-level-234.csv", 'w') as f:
        get_cached_remote_zarr(f"{ti}-total-zs-nuts-level-234", 'copernicus-climate/spain-zonal-stats.zarr')\
        .to_dataframe().reset_index(drop=False).to_csv(f, index=False)

writing historical
[########################################] | 100% Completed | 15.4s
[########################################] | 100% Completed |  0.1s
[########################################] | 100% Completed |  0.1s
[########################################] | 100% Completed |  0.1s
[########################################] | 100% Completed |  0.1s
[########################################] | 100% Completed |  1.5s
[########################################] | 100% Completed |  0.5s
[########################################] | 100% Completed |  0.4s
[########################################] | 100% Completed | 15.3s
[########################################] | 100% Completed | 15.0s
[########################################] | 100% Completed | 15.0s
[########################################] | 100% Completed | 15.0s
[########################################] | 100% Completed | 15.2s
[########################################] | 100% Completed | 15.0s
[############################

### Set ACLs to public

In [0]:
# Set ACLs to public
set_acl_to_public(f"{gcs_prefix}/{p}/")

gsutil -m acl -r ch -u AllUsers:R gs://copernicus-climate/zonal_stats/
Set acl(s) sucsessful


### Upload to Carto

In [0]:
# Upload to Carto
# FIXME how to automatically make public??
import requests
import os

# Set some parameters
p = 'zonal_stats'
tis = ['historical', 'future-seasonal', 'future-longterm']

upload_tasks = list()
for ti in tis:
  #print(f"{gcs_http_url}/{p}/{ti}_monthly_zs_nuts-level-234.csv")
  payload = {
    "api_key":os.environ['CARTO_API_KEY'],
    "url":f"{gcs_http_url}/{p}/{ti}_total_zs_nuts-level-234.csv",
    "privacy":"public",
    "interval":86400*7
    }
  url = f"{carto_base_url}/api/v1/synchronizations"
  headers = {'Content-Type': 'application/json'}
  r = requests.post(url=url, json=payload, headers=headers)
  upload_tasks.append(r.json())

# print overview
for task in upload_tasks:
  print(task)

{'data_import': {'endpoint': '/api/v1/imports', 'item_queue_id': 'b4a02a29-faba-4a65-a5c0-b9b07f4c7d83'}, 'id': '7b61cb06-9ed0-11ea-b692-16056af2ae5d', 'name': None, 'interval': 604800, 'url': 'https://storage.googleapis.com/copernicus-climate/zonal_stats/historical_total_zs_nuts-level-234.csv', 'state': 'queued', 'user_id': 'c7980b72-f84a-4229-a346-ecc742f86552', 'created_at': '2020-05-25T21:41:24.863+00:00', 'updated_at': '2020-05-25T21:41:24.893+00:00', 'run_at': '2020-06-01T21:41:24.861+00:00', 'ran_at': '2020-05-25T21:41:24.861+00:00', 'modified_at': None, 'etag': None, 'checksum': '', 'log_id': None, 'error_code': None, 'error_message': None, 'retried_times': 0, 'service_name': None, 'service_item_id': None, 'type_guessing': True, 'quoted_fields_guessing': True, 'content_guessing': False, 'visualization_id': None, 'from_external_source': False}
{'data_import': {'endpoint': '/api/v1/imports', 'item_queue_id': 'd5fb8d13-e098-49b1-821c-85d707d28309'}, 'id': '7b91b50a-9ed0-11ea-b692-

### Create Sky API datasets

In [0]:
import Skydipper as sky

In [0]:
# Remember Carto changes all '-' to '_' !
tis = ['historical', 'future_seasonal', 'future_longterm']
datasets = list()
for ti in tis:
  atts = { 
    'name': f"{ti}_total_zs_nuts-level-234",
    'application': ['copernicusClimate'],
    'connectorType': 'rest',
    'provider': 'cartodb',
    'connectorUrl': f"http://35.233.41.65/user/skydipper/dataset/{ti}_total_zs_nuts_level_234",
    'tableName': f"{ti}_total_zs_nuts_level_234",
    'env': 'production'
    }
  #print(atts)
  ds = sky.Dataset(attributes=atts)
  datasets.append(ds)
  print(ds)

Dataset 5d0bc927-6780-4f64-ba3d-dc241be6c26d historical_total_zs_nuts-level-234
Dataset 075eb3e5-77bb-4fd7-a6b9-3108ae6ba166 future_seasonal_total_zs_nuts-level-234
Dataset 817e02ec-802c-4594-a755-8dca6891175a future_longterm_total_zs_nuts-level-234


In [0]:
datasets[0]

In [0]:
datasets[1]

In [0]:
datasets[2]

In [0]:
# Access Carto queries response

def get_map(theme, time_interval, gid = "ES11"):
  
  # Get list of subadmins
  


  # Define SQL conditions
  # experiment is only future_longterm
  se = ""
  we = ""
  dvs = ""

  # choose dataset
  # for future use mean {data_var}_mean 
  # and standard deviation {data_var}_std
  if time_interval == "future_longterm":
    se = "experiment, "
    we = "AND experiment = 'rcp85'"
    dataset_id = 'bef42c82-2714-4ba0-8694-75e49916013a'
    table_name = 'future_longterm_monthly_zs_nuts_level_234'
    if theme == 'heatwaves':
      data_vars = ["tasmax", "heatwave_alarms", "heatwave_alerts", "heatwave_warnings"] 
    if theme == 'coldsnaps':
      data_vars = ["tasmin", "coldsnap_alarms", "coldsnap_alerts", "coldsnap_warnings"]
    dvs = [f"{data_var}_mean, {data_var}_std " for data_var in data_vars]
  
  if time_interval == "future_seasonal":
    dataset_id = 'e1cc3f3e-133a-4a14-b2c2-f3192ee213c3'
    table_name = "future_seasonal_monthly_zs_nuts_level_234"
    if theme == 'heatwaves':
      data_vars = ["tasmax", "heatwave_alarms", "heatwave_alerts", "heatwave_warnings"] 
    if theme == 'coldsnaps':
      data_vars = ["tasmin", "coldsnap_alarms", "coldsnap_alerts", "coldsnap_warnings"]
    dvs = [f"{data_var}_mean, {data_var}_std " for data_var in data_vars]
    
  if time_interval == "historical":
    dataset_id = '3a46bbff-73bc-4abc-bad6-11be6e99e2cb'
    table_name = 'historical_monthly_zs_nuts_level_234'
    if theme == 'heatwaves':
      data_vars = ["tasmax", "heatwave_alarms", "heatwave_alerts", "heatwave_warnings", "heatstress_extreme", "heatstress_strong", "heatstress_moderate"] 
    if theme == 'coldsnaps':
      data_vars = ["tasmin", "coldsnap_alarms", "coldsnap_alerts", "coldsnap_warnings", "coldstress_extreme", "coldstress_strong", "coldstress_moderate"]
    dvs = [f"{data_var}_mean " for data_var in data_vars]
    

  # Convert variables to string
  dvstring = ", ".join(dvs)
  #print(dvstring)

  sql = \
  f"SELECT gid, {se}time, {dvstring}"\
  f"FROM {table_name} "\
  f"WHERE gid = '{gid}' AND time between '{start_date}' AND '{end_date}' {we} "\
  "ORDER BY time"

  #print(sql)

  url = f"http://api.skydipper.com/v1/query/{dataset_id}/"
  params = {"sql": sql}
  headers = {'Authorization': f"Bearer {sky_api_token}"}
  r = requests.post(url=url, params=params, headers=headers)
  return r

In [0]:
import urllib
themes = ['heatwaves', 'coldsnaps']
time_intervals = ['historical', 'future_seasonal', 'future_longterm']
headers = {'Authorization': f"Bearer {sky_api_token}"}

print("\nHeader:\n")
print(headers)
print("\nAPI queries:\n")
for theme in themes:  
  print(f"\n{theme}:\n")
  for time_interval in time_intervals:
    r = get_timeseries(theme, time_interval)
    print(f"\n{time_interval}:\n")
    print(urllib.parse.unquote_plus(r.url))



Header:

{'Authorization': 'Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpZCI6IjVlMzA0OGIzYTY4NWYzMDAxMDhkZjYyNCIsInJvbGUiOiJBRE1JTiIsInByb3ZpZGVyIjoibG9jYWwiLCJlbWFpbCI6ImVkd2FyZC5tb3JyaXNAdml6enVhbGl0eS5jb20iLCJleHRyYVVzZXJEYXRhIjp7ImFwcHMiOlsic2t5ZGlwcGVyIiwibWFuZ3JvdmVBdGxhcyIsInNvaWxzUmV2ZWFsZWQiLCJjb3Blcm5pY3VzQ2xpbWF0ZSJdfSwiY3JlYXRlZEF0IjoxNTkwNDA0NzU1MTc3LCJpYXQiOjE1OTA0MDQ3NTV9.wRRJQCFtvCZzMTtucly2pmCL5WhsFBgBFDUo2CmJSaY'}

API queries:


heatwaves:


historical:


future_seasonal:


future_longterm:


coldsnaps:


historical:


future_seasonal:


future_longterm:



# LayerManager

In [0]:
datasets[1].attributes

{'application': ['copernicusClimate'],
 'attributesPath': None,
 'blockchain': {},
 'clonedHost': {},
 'connectorType': 'rest',
 'connectorUrl': 'http://35.233.41.65/user/skydipper/dataset/geometries_esp_nuts_lau_levels_0to4',
 'createdAt': '2020-05-25T17:05:43.277Z',
 'dataLastUpdated': None,
 'dataPath': None,
 'env': 'production',
 'errorMessage': 'Error - Error obtaining fields',
 'geoInfo': False,
 'isPrivate': False,
 'layer': [],
 'layerRelevantProps': [],
 'legend': {'binary': [],
  'boolean': [],
  'byte': [],
  'country': [],
  'date': [],
  'double': [],
  'float': [],
  'half_float': [],
  'integer': [],
  'keyword': [],
  'nested': [],
  'region': [],
  'scaled_float': [],
  'short': [],
  'text': []},
 'mainDateField': None,
 'metadata': [],
 'name': 'geometries_monthly_zs_nuts-level-234',
 'overwrite': False,
 'protected': False,
 'provider': 'cartodb',
 'published': True,
 'slug': 'geometries_monthly_zs_nuts-level-234',
 'sources': [],
 'status': 'failed',
 'subtitle': 

In [0]:



d = {
    "id": "protected-areas",
    "name": "Protected areas",
    "config": {
      "type": "vector",
      "source": {
        "type": "vector",
        "promoteId": "cartodb_id",
        "provider": {
          "type": "carto",
          "account": "skydipper",
          "layers": [
            {
              "options": {
                "cartocss": "#wdpa_protected_areas {  polygon-opacity: 1.0; polygon-fill: #704489 }",
                "cartocss_version": "2.3.0",
                "sql": "SELECT * FROM wdpa_protected_areas"
              },
              "type": "mapnik"
            }
          ]
        }
      },
      "render": {
        "layers": [
          {
            "type": "fill",
            "source-layer": "layer0",
            "featureState": {},
            "paint": {
              "fill-color": [
                "case",
                [
                  "boolean",
                  [
                    "feature-state",
                    "hover"
                  ],
                  false
                ],
                "#000",
                "#5ca2d1"
              ],
              "fill-color-transition": {
                "duration": 300,
                "delay": 0
              },
              "fill-opacity": 1
            }
          },
          {
            "type": "line",
            "source-layer": "layer0",
            "paint": {
              "line-color": "#000000",
              "line-opacity": 0.1
            }
          }
        ]
      }
    },
    "paramsConfig": [],
    "legendConfig": {
      "type": "basic",
      "items": [
        {
          "name": "Protected areas",
          "color": "#5ca2d1"
        }
      ]
    },
    "interactionConfig": {
      "enabled": true,
      "type": "hover"
    }
  }

In [0]:
import json

class SetEncoder(json.JSONEncoder):
  def default(self, obj):
    if isinstance(obj, set):
      return list(obj)
    return json.JSONEncoder.default(self, obj)

# vector carto layer for bioclimatic variables data
def gen_layer_manager(iso, scenario, biovar, year, buckets, rampColors, opacity = 1):
  d: dict = {
    "id": "layer-bioclimatic",
    "name": "Bioclimatic layer",
    "type": "vector",
    "active": "true",

    "sqlParams": {
      "where": {
        "iso3": iso
      },
      "where2": {
        biovar,
        scenario
      }
    },

    "source": {
      "type": "vector",
      "provider": {
        "type": "carto",
        "account": "simbiotica",
        "layers": [
          {
            "options": {
              "sql": "WITH a AS (SELECT cartodb_id, the_geom_webmercator, uuid, iso3 FROM all_geometry {{where}}) SELECT a.the_geom_webmercator, a.cartodb_id, b.uuid, b.timeinterval as year, b.biovar, b.scenario, b.wieghtedmean FROM ${iso.toLowerCase()}_zonal_bv_uuid as b INNER JOIN a ON b.uuid = a.uuid {{where2}}"
            },
            "type": "cartodb"
          }
        ]
      }
    },

    "render": {
      "maxzoom": 3,
      "minzoom": 2,
      "layers": [
        {
          "filter": ['==', 'year', year],
          "paint": {
            'fill-color': [
              'interpolate',
              ['linear'],
              ['get', 'wieghtedmean'],
              buckets[0],
              rampColors[0],
              buckets[1],
              rampColors[1],
              buckets[2],
              rampColors[2]
            ],
            "fill-opacity": opacity
          },
          "source-layer": "layer0",
          "type": "fill"
        }
      ]
    }
  }
  return d

l = gen_layer_manager(iso='SWE', scenario='rcp85', biovar='biovar1', year=2030, buckets=[0,10,100], rampColors=['blue', 'green', 'red'])
print(json.dumps(l, cls=SetEncoder))

{"id": "layer-bioclimatic", "name": "Bioclimatic layer", "type": "vector", "active": "true", "sqlParams": {"where": {"iso3": "SWE"}, "where2": ["biovar1", "rcp85"]}, "source": {"type": "vector", "provider": {"type": "carto", "account": "simbiotica", "layers": [{"options": {"sql": "WITH a AS (SELECT cartodb_id, the_geom_webmercator, uuid, iso3 FROM all_geometry {{where}}) SELECT a.the_geom_webmercator, a.cartodb_id, b.uuid, b.timeinterval as year, b.biovar, b.scenario, b.wieghtedmean FROM ${iso.toLowerCase()}_zonal_bv_uuid as b INNER JOIN a ON b.uuid = a.uuid {{where2}}"}, "type": "cartodb"}]}}, "render": {"maxzoom": 3, "minzoom": 2, "layers": [{"filter": ["==", "year", 2030], "paint": {"fill-color": ["interpolate", ["linear"], ["get", "wieghtedmean"], 0, "blue", 10, "green", 100, "red"], "fill-opacity": 1}, "source-layer": "layer0", "type": "fill"}]}}


In [0]:
# Display the associated webpage in a new window
import IPython
url = 'https://vizzuality.github.io/layer-manager/'
iframe = '<iframe src=' + url + ' width=100% height=800px></iframe>'
IPython.display.HTML(iframe)