## Analysis Ready Sentinel-1 Backscatter Imagery with Intake

In [1]:
import datetime

import intake
import pandas as pd
import xarray as xr
from distributed import Client

In [2]:
client = Client(processes=True, n_workers=4, threads_per_worker=1)
client

0,1
Client  Scheduler: tcp://127.0.0.1:53780  Dashboard: http://127.0.0.1:8787/status,Cluster  Workers: 4  Cores: 4  Memory: 16.00 GiB


## Driver registry

In [3]:
for key, value in intake.registry.items():
    print(f'short name={key}    --->     implementation={value}')

short name=yaml_file_cat    --->     implementation=<class 'intake.catalog.local.YAMLFileCatalog'>
short name=yaml_files_cat    --->     implementation=<class 'intake.catalog.local.YAMLFilesCatalog'>
short name=alias    --->     implementation=<class 'intake.source.derived.AliasSource'>
short name=catalog    --->     implementation=<class 'intake.catalog.base.Catalog'>
short name=csv    --->     implementation=<class 'intake.source.csv.CSVSource'>
short name=intake_remote    --->     implementation=<class 'intake.catalog.remote.RemoteCatalog'>
short name=ndzarr    --->     implementation=<class 'intake.source.zarr.ZarrArraySource'>
short name=numpy    --->     implementation=<class 'intake.source.npy.NPySource'>
short name=textfiles    --->     implementation=<class 'intake.source.textfiles.TextFilesSource'>
short name=zarr_cat    --->     implementation=<class 'intake.catalog.zarr.ZarrGroupCatalog'>
short name=netcdf    --->     implementation=<class 'intake_xarray.netcdf.NetCDFSource

## Open a catalog

- Use the `catalog` driver to load our YAML catalog

In [4]:
cat = intake.open_catalog("../catalogs/sentinel-1-aws-catalog-cache.yaml")
cat

sentinel-1-aws-catalog-cache:
  args:
    path: ../catalogs/sentinel-1-aws-catalog-cache.yaml
  description: ''
  driver: intake.catalog.local.YAMLFileCatalog
  metadata: {}


In [5]:
list(cat)

['sentinel_1_aws']

If the name of the data source is a valid Python identifier, we can use the .dot notation to access the source

In [6]:
cat.sentinel_1_aws()

sentinel_1_aws:
  args:
    chunks:
      y: 2745
    storage_options:
      anon: true
    urlpath: s3://sentinel-s1-rtc-indigo/tiles/RTC/1/IW/14/T/PN/2020/S1A_20200801_14TPN_ASC/Gamma0_VV.tif
  description: 'Analysis Ready Sentinel-1 Backscatter Imagery. Documentation -->

    https://sentinel-s1-rtc-indigo-docs.s3-us-west-2.amazonaws.com/data_format.html#data-structure

    '
  driver: intake_xarray.raster.RasterIOSource
  metadata:
    cache:
    - argkey: urlpath
      type: file
    catalog_dir: /Users/abanihi/devel/andersy005/intake-tutorial/notebooks/../catalogs/


We can use the dictionary syntax, too. This works for data sources whose names aren't valid python identifiers (for e.g. `sentinel-1-aws` or `sentinel 1 aws`)

In [7]:
cat['sentinel_1_aws']

sentinel_1_aws:
  args:
    chunks:
      y: 2745
    storage_options:
      anon: true
    urlpath: s3://sentinel-s1-rtc-indigo/tiles/RTC/1/IW/14/T/PN/2020/S1A_20200801_14TPN_ASC/Gamma0_VV.tif
  description: 'Analysis Ready Sentinel-1 Backscatter Imagery. Documentation -->

    https://sentinel-s1-rtc-indigo-docs.s3-us-west-2.amazonaws.com/data_format.html#data-structure

    '
  driver: intake_xarray.raster.RasterIOSource
  metadata:
    cache:
    - argkey: urlpath
      type: file
    catalog_dir: /Users/abanihi/devel/andersy005/intake-tutorial/notebooks/../catalogs/


## Retrieve catalog entries of interest

In [8]:
cat.sentinel_1_aws(day=22, year=2021, month=4)

sentinel_1_aws:
  args:
    chunks:
      y: 2745
    storage_options:
      anon: true
    urlpath: s3://sentinel-s1-rtc-indigo/tiles/RTC/1/IW/14/T/PN/2021/S1A_20210422_14TPN_ASC/Gamma0_VV.tif
  description: 'Analysis Ready Sentinel-1 Backscatter Imagery. Documentation -->

    https://sentinel-s1-rtc-indigo-docs.s3-us-west-2.amazonaws.com/data_format.html#data-structure

    '
  driver: intake_xarray.raster.RasterIOSource
  metadata:
    cache:
    - argkey: urlpath
      type: file
    catalog_dir: /Users/abanihi/devel/andersy005/intake-tutorial/notebooks/../catalogs/


In [9]:
# Invalid parameters
cat.sentinel_1_aws(day=1, year=2010, month=6)

ValueError: year=2010 is less than 2016

In [None]:
cat.sentinel_1_aws(orbit_direction='test')

## Load data into an appropriate data container

- Use `.to_dask()` to lazily load catalog entries into data container (Numpy array, pandas DataFrame, xarray objects)
    - This is the appropriate method for remote, big datasets
- Use `.read()` to eagerly load data in memory
- The data container is defined by the driver


In [10]:
%%time

ds = cat.sentinel_1_aws(day=22, year=2021, month=4).to_dask()
ds

CPU times: user 158 ms, sys: 59.6 ms, total: 218 ms
Wall time: 312 ms


Unnamed: 0,Array,Chunk
Bytes,114.98 MiB,57.49 MiB
Shape,"(1, 1, 5490, 5490)","(1, 1, 2745, 5490)"
Count,5 Tasks,2 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 114.98 MiB 57.49 MiB Shape (1, 1, 5490, 5490) (1, 1, 2745, 5490) Count 5 Tasks 2 Chunks Type float32 numpy.ndarray",1  1  5490  5490  1,

Unnamed: 0,Array,Chunk
Bytes,114.98 MiB,57.49 MiB
Shape,"(1, 1, 5490, 5490)","(1, 1, 2745, 5490)"
Count,5 Tasks,2 Chunks
Type,float32,numpy.ndarray


In [17]:
date_range = pd.date_range(start='2021-01-01', end='2021-04-01')


# Function for cleaning the data: rename band -> time and create datetime object
def preprocess(ds):
    ds["band"] = [datetime.datetime.fromisoformat(ds.attrs["DATE"])]
    ds = ds.rename({'band': 'time'})
    return ds


def retrieve_dataset(value):
    try:
        ds = cat.sentinel_1_aws(year=value.year, month=value.month, day=value.day).to_dask()
        return preprocess(ds)
    except Exception:
        return None


datasets = client.map(retrieve_dataset, date_range)
datasets = client.gather(datasets)
datasets = [dataset for dataset in datasets if dataset is not None]
ds = xr.concat(datasets, dim='time', compat='override', coords='minimal').squeeze()
ds

Unnamed: 0,Array,Chunk
Bytes,1.46 GiB,57.49 MiB
Shape,"(13, 5490, 5490)","(1, 2745, 5490)"
Count,117 Tasks,26 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 1.46 GiB 57.49 MiB Shape (13, 5490, 5490) (1, 2745, 5490) Count 117 Tasks 26 Chunks Type float32 numpy.ndarray",5490  5490  13,

Unnamed: 0,Array,Chunk
Bytes,1.46 GiB,57.49 MiB
Shape,"(13, 5490, 5490)","(1, 2745, 5490)"
Count,117 Tasks,26 Chunks
Type,float32,numpy.ndarray


## Visualize data 

Let's use hvplot to interactively visualize the constructed dataset. Since we're using full resolution arrays it's key to set the rasterize=True keyword argument. That uses the datashader library to pre-render images before sending them to the browser.

This is extremely powerful because, resolution updates as you zoom in, and you can scrub through the data with an interactive slider widget

In [18]:
import hvplot.xarray

width = 800
height = 400
widget_type = 'scrubber'
widget_location = 'bottom'


ds.hvplot.image(
    rasterize=True,
    aspect='equal',
    x="x",
    y="y",
    cmap='gray',
    clim=(0, 0.4),
    width=width,
    height=height,
    widget_type=widget_type,
    widget_location=widget_location,
)