# Geocube Data Access Tutorial

-------

**Short description**

This notebook introduces you to the Geocube Python Client. You will learn how to list available layers, construct a query over an aoi and a time interval and retrieve cubes of data.

-------

**Requirements**

-------

- Python 3.7
- The Geocube Python Client library : https://github.com/airbusgeo/geocube-client-python.git
- The url of a Geocube Server & its Client ApiKey (for the purpose of this notebook, GEOCUBE_SERVER and GEOCUBE_CLIENTAPIKEY environment variable)

- To have done the **Geocube Data Indexation Tutorial** or to have access to a Geocube with data.

-------

**Installation**

-------

Install Python client:

```shell
pip install --user git+https://github.com/airbusgeo/geocube-client-python.git
```

Run docker (example):
```shell
docker run --rm --network=host -v $(pwd)/inputs:$(pwd)/inputs geocube -dbConnection=postgresql://user:password@localhost:5432/geocube -local
```


## 1 - Connect to the Geocube


In [None]:
import math
import os
from datetime import datetime
import numpy as np
from matplotlib import pyplot as plt
%matplotlib inline

In [None]:
from geocube import Client, utils, entities

# Define the connection to the server
secure = False # in local, or true to use TLS
geocube_client_server  = os.environ['GEOCUBE_SERVER']        # e.g. 127.0.0.1:8080 for local use
geocube_client_api_key = os.environ['GEOCUBE_CLIENTAPIKEY']  # Usually empty for local use

# Connect to the server
client = Client(geocube_client_server, secure, geocube_client_api_key)

## 2 - Get data in a nutshell
To extract data from the Geocube, you need a **record** and an **instance** of a variable.
If you don't already know theses concepts, they will soon be defined in details, but in short, a record defines the data-take and the variable definies the kind of data.

The data is retrieved as 2D arrays, that are defined by a **rectangle extent in a given CRS** and a **resolution**.

<img src="GetImage.png" width=400>

In [None]:
print("To get data: define the extent, the records and the variable")
cubeparams = entities.CubeParams.from_tile(
    tile=entities.Tile(crs="epsg:32632", transform=entities.geo_transform(532478, 6184957, 20), shape=(128,128)),
    instance=client.variable("NDVI").instance("master"),
    records=client.list_records(name="S2A_MSIL1C_20190224T103019_N0207_R108_T32UNG_20190224T141253")
)

print("And get the cube !")
images, _ = client.get_cube(cubeparams)

plt.rcParams['figure.figsize'] = [10, 8]
plt.imshow(images[0][..., 0])

## 2 - Records
A record defines a data-take by its geometry, its sensing time and user-defined tags that describe the context in more detail.
A record is usually linked to an image of a satellite. For example, the image taken by S2A over the 31TDL tile on the 1st of April 2018 is described by the record:
- **S2A_MSIL1C_20180401T105031_N0206_R051_T31TDL_20180401T144530**
    * **AOI** : _31TDL tile (POLYGON ((2.6859855651855 45.680294036865, 2.811126708984324 45.680294036865, 2.811126708984324 45.59909820556617, 2.6859855651855 45.59909820556617, 2.6859855651855 45.680294036865)))_
    * **DateTime** : _2018-04-01 10:50:31_
    
But a record can describe any product like a mosaic over a country, or a decade mosaic :
- **Mosaic of France January 2020**
    * **AOI** : _France_
    * **DateTime** : _2020-01-31 00:00:00_
 
<img src="RecordsSeveralLayers.png" width=400>
    

### List records
`list_records()` is used to search for records :
- By `name` (Support *, ? to match all or any characters and (?i) suffix to be case-insensitive)
- By `tags`, as a dictionnary (for example `tags={"satellite":"SENTINEL%"}`). If the value is empty, `list_records` returns all the records that have the given tag. (Support *, ? and (?i))
- By date, using `from_time` and `to_time`.
- By `aoi`, the records whose geometry intersects.
- By `instances_id` (future update). See below

All parameters are optional. `list_records` returns the records that match all parameters.

In [None]:
# Sentinel2 on Denmark from January to August 2019
aoi       = utils.read_aoi('inputs/Denmark.json')
from_time = datetime(2019, 1, 1)
to_time   = datetime(2019, 8, 31)

records = client.list_records(tags={"constellation":"SENTINEL2"}, aoi=aoi, from_time=from_time, to_time=to_time, with_aoi=True)

print('---------------------------------')
print('{} records found'.format(len(records)))
print('---------------------------------')

In [None]:
# Graphical visualization of records
import geopandas as gpd
from matplotlib import pyplot as plt
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
base = world.plot(color='lightgrey', edgecolor='white')

aoi_gpd = gpd.GeoDataFrame({'id': ['1'], 'geometry': gpd.GeoSeries(aoi, crs='epsg:4326')})
aoi_gpd.plot(ax=base, color='None', edgecolor='black')

for r in records[:100]:
    r.geodataframe().plot(ax=base, alpha=0.1)
 

margin = 1
plt.xlim([aoi.bounds[0]-margin, aoi.bounds[2]+margin])
plt.ylim([aoi.bounds[1]-margin, aoi.bounds[3]+margin])


### Tags of records
The tags are user-defined and depend on the project (currently, no standard are implemented).

In [None]:
print(records[0].tags)

## 3 - Variables
A variable describes the kind of data stored in a product, for example _a spectral band, the NDVI, an RGB image, the backscatter, the coherence_...

This entity has what is needed to **describe**, **process** and **visualize** the product.

In particular, the variable has a `dformat` (for _data format_ ):
- dformat.dtype   : _data type_
- dformat.min     : theoretical minimum value
- dformat.max     : theoretical maximum value
- dformat.no_data : the NoData value

In the Geocube Database, the (internal) data format of an image indexed in the Geocube may be different (for exemple, in order to optimize storage costs), but when the data is retrieved, the Geocube maps the internal format  to the data format of the variable. This process may map the data below the minimum or above the maximum value. In that case, no crop is performed.


In [None]:
# Get a variable from its name
ndvi = client.variable(name="NDVI")
print(ndvi)

fieldimage = client.variable(name="RGB")
print(fieldimage)

In [None]:
# List all variables
client.list_variables()

### Instance of a variable
The instance is a declination of the variable with different processing parameters.

For example, the biophysical parameters can be processed specifically for a crop type (rapeseed, wheat, corn...) or with a generic profile (anycrop). The SAR products can be processed with different processing graphs or softwares (GammaSW, SNAP...), but they all belongs to the same variable.

The processing parameters are provided in the `metadata` field of the instance.

In [None]:
print(client.variable(name="RGB").instance('master'))

## 4 - Dataset

As we saw in introduction of this notebook, a dataset is defined by a **record** and an **instance** of a variable. The data is retrieved as 2D arrays, that are defined by a **rectangle extent in a given CRS** and a **resolution**.
If necessary, the dataset is reprojected or scaled on the fly.

<img src="GetImage.png" width=400>

All this information is stored in a `CubeParams`. A simple call to `get_cube` with a `CubeParams` get the datasets.

To easily declare a `CubeParams`, some helpers are available:
- `CubeParams.from_records`: the basic
- `CubeParams.from_tags`: a shortcut to provide `tags` and `date` interval instead of `records`.
- `CubeParams.from_tile`: see `tileAOI` below

### Get a dataset

In [None]:
# Get a record
record = client.list_records(name="S2B_MSIL1C_20190118T104359_N0207_R008_T32UNG_20190118T123528")

# Get the variable RGB:master
rgb = client.variable(name="RGB").instance("master")

# Define the cubeParam
cube_params = entities.CubeParams.from_records(
    crs           = "epsg:32632",
    transform     = entities.geo_transform(563087,6195234, 200),
    shape         = (128, 128),
    instance      = rgb,
    records       = record)

# Get the dataset
images, _ = client.get_cube(cube_params, compression=9)

# Display
plt.imshow(images[0])

### Get a cube of data

A cube of data is defined by a **collection of records**, an **instance** and an **extent** in a **crs**.

So, to get a cube of data, we just need to provide more records !

The Geocube will optimize data access to serve the timeserie as efficiently as possible.
- It is faster to **request a timeserie** than each image one by one.
- It is better to **request a large area** (>256x256) than a lot of small areas (like a field).

`get_cube()` usually returns less results than records because:
- The requested area is smaller
- Images with only no-data are removed

In [None]:
# Get records
records = client.list_records(name="S2*")

# Get the variable RGB:master
rgb = client.variable(name="RGB").instance("master")

# Define the cubeParam
cube_params = entities.CubeParams.from_records(
    crs           = "epsg:32632",
    transform     = entities.geo_transform(563087,6195234, 200),
    shape         = (128, 128),
    instance      = rgb,
    records       = records)

# Get the dataset
images_cube, records_cube = client.get_cube(cube_params, compression=9)

In [None]:
nbimages=len(images_cube)

plt.rcParams['figure.figsize'] = [20, 16]
for f in range(0, nbimages):
    plt.subplot(math.ceil(nbimages/4), 4, f+1)
    if images_cube[f].shape[2] == 1:
        plt.imshow(np.squeeze(images_cube[f], axis=2), 'gray', vmin=0, vmax=1)
    else:
        plt.imshow(images_cube[f])

### Get a cube of data grouped by records
The Geocube offers the possibility to group records. The datasets of a group of records are merged pixel by pixel using the most recent one.

NB : To be used carefully : the edges are not blended and it may result in visible seams.

In [None]:
import affine
from datetime import date
from geocube import entities

# Get records 
recordsS2B = client.list_records(
    tags          = {"satellite": "SENTINEL2B"},
    from_time     = datetime(2019, 1, 4),
    to_time       = datetime(2019, 1, 6)
)

# Define an extent and a resolution
cube_params = entities.CubeParams.from_records(
    crs           = "epsg:32632",
    transform     = entities.geo_transform(540694, 6303946, 200),
    shape         = (128, 128),
    instance      = client.variable("NDVI").instance("master"),
    records       = recordsS2B,
)    

# Let's get the cube
print('---------------------------------')
print('Get cube without grouped records')
print('---------------------------------')
not_grouped, ng_records = client.get_cube(cube_params, compression=9)

plt.subplot(2, 2, 1)
plt.imshow(np.squeeze(not_grouped[0], axis=2), 'gray', vmin=0, vmax=1)
plt.subplot(2, 2, 2)
plt.imshow(np.squeeze(not_grouped[1], axis=2), 'gray', vmin=0, vmax=1)
                       
# Now, let's group records by date
cube_params.records = entities.Record.group_by(recordsS2B, lambda r: r.datetime.date())


print('---------------------------------')
print('Get cube with grouped records')
print('---------------------------------')
grouped, _ = client.get_cube(cube_params, compression=9)

plt.subplot(2, 2, 3)
plt.imshow(np.squeeze(grouped[0], axis=2), 'gray', vmin=0, vmax=1)

print("Finished !")

## 5 - Get data covering a large AOI with tiling

If the AOI is too large to be retrieve in one time, it must be tiled.

### Tile an AOI
To avoid defining the extents by hand, the `tile_aoi` function transforms an AOI (in 4326) to a set of rectangle extents.


In [None]:
tiles = client.tile_aoi(aoi, crs="epsg:32632", resolution=20, shape=(1024, 1024))

print(f"AOI covered by {len(tiles)} tiles")

# Graphical visualization of tiles
base = entities.Tile.plot(tiles)

import geopandas as gpd
aoi_gpd = gpd.GeoDataFrame({'id': ['1'], 'geometry': gpd.GeoSeries(aoi, crs='epsg:4326')})
aoi_gpd.plot(ax=base, color='None', edgecolor='black')

### Get a cube from tile
Given a tile, the function `CubeParam.from_tile` makes it easy to get a cube.

In [None]:
cube_params = entities.CubeParams.from_tile(
    tile          = tiles[79],
    instance      = client.variable("RGB").instance("master"),
    tags          = {"source":"tutorial"},
    from_time     = datetime(2019, 1, 1),
    to_time       = datetime(2019, 5, 1),
)

_ = client.get_cube(cube_params, compression=9)

print("Finished !")

## 6 - Some useful functions

### Get a tile, a dataset or a cube from record aoi
The function `Tile.from_record` creates a tile covering the aoi of the record.
Then, a dataset can be easily downloaded using `CubeParam.from_tile`.

Make sure that the aoi is loaded using `client.load_aoi`

In [None]:
import affine

# Get a random record
record = client.list_records("*T32UNG*", tags={"source":"tutorial"}, limit=1, with_aoi=True)[0]

# Get the tile from the record
tile = entities.Tile.from_record(record=record, crs="epsg:32632", resolution=120)
print(tile)

# Define an extent and a resolution
cube_params = entities.CubeParams.from_tile(tile,
                                            instance=client.variable("RGB").instance("master"),
                                            records=[record])

# Get the dataset
images, _ = client.get_cube(cube_params, compression=9)

# Display
plt.imshow(images[0])

### Export to tiff

In [None]:
try: 
    os.mkdir('outputs')
except:
    pass

transform = entities.geo_transform(563087,6195234, 200)
crs = "epsg:32632"

for i in range(0,len(images_cube)):
    filename = os.getcwd() + '/outputs/{}.tif'.format(records_cube[i][0].datetime.strftime("%Y-%m-%d"))
    utils.image_to_geotiff(images_cube[i], transform, crs, None, filename)

!ls './outputs/'

### Bonus : Create a timeserie animation

In [None]:
from geocube.utils import timeserie_to_animation

imagesu1=[]
for i in images_cube:   
    imagesu1.append(i.astype(np.uint8))


timeserie_to_animation(imagesu1, "./outputs/animation.gif", duration=0.5)

from IPython.display import Image
with open(os.getcwd() + '/outputs/animation.gif','rb') as f:
    display(Image(data=f.read(), format='png', width=512, height=512))


## 7 - Conclusion
In this notebook, you have learnt to list records, load variable and retrieve cubes of data over a large AOI.

With the Geocube Client, you are also able to create aois, records, variables, you can edit them or delete it. You can index new datasets and optimize the data. It will the topic of another notebook.