# Global Precipitation Measurement <img align="right" src="../../resources/easi_logo.jpg">

#### Index
- [Overview](#Overview)
- [Setup (dask, imports, query)](#Setup)
- [Product definition (measurements, flags)](#Product-definition)
- [Quality layer (mask)](#Quality-layer)
- [Scaling and nodata](#Scaling-and-nodata)
- [Visualisation](#Visualisation)
- [Appendix](#Appendix)

## Overview

The Global Precipitation Measurement (GPM) mission is an international network of satellites that provide next-generation global observations of rain and snow (https://gpm.nasa.gov/missions/GPM).

The Integrated Multi-satellitE Retrievals for GPM (IMERG) is the unified U.S. algorithm that provides the multi-satellite precipitation product for the U.S. GPM team. The precipitation estimates from the various precipitation-relevant satellite passive microwave (PMW) sensors comprising the GPM constellation are computed using the 2017 version of the Goddard Profiling Algorithm (GPROF2017), then gridded, intercalibrated to the GPM Combined Ku Radar-Radiometer Algorithm (CORRA) product, and merged into half-hourly 0.1°x0.1° (roughly 10x10 km) fields (https://disc.gsfc.nasa.gov/datasets/GPM_3IMERGM_06/summary).



#### Data source and documentation

| Product name | Long name | Notes |
|--|--|--|
| gpm_3imerg_month | GPM IMERG Final Precipitation L3 1 month 0.1 degree x 0.1 degree V06 (GPM_3IMERGM) | "Final" is available ~3.5 months after the observation month |

- [Algorithm Theoretical Basis Document (ATBD), Version 06, 13 March 2019](https://docserver.gesdisc.eosdis.nasa.gov/public/project/GPM/IMERG_ATBD_V06.pdf)
- [Integrated Multi-satellitE Retrievals for GPM (IMERG) Technical Documentation, 6 October 2020](https://docserver.gesdisc.eosdis.nasa.gov/public/project/GPM/IMERG_doc.06.pdf)
- [IMERG V06 Quality Index, 15 March 2019](https://docserver.gesdisc.eosdis.nasa.gov/public/project/GPM/IMERGV06_QI.pdf)
- [README Document for the GPM Data, 03/12/2017](https://gpm1.gesdisc.eosdis.nasa.gov/data/GPM_L3/doc/README.GPM.pdf)
- [V06 IMERG Release Notes, 6 October 2020](https://docserver.gesdisc.eosdis.nasa.gov/public/project/GPM/IMERG_V06_release_notes.pdf)

#### EASI pipeline

| Task | Summary |
|------|---------|
| Source | https://gpm1.gesdisc.eosdis.nasa.gov/data/GPM_L3/GPM_3IMERGM.06 |
| Download | Manual |
| Preprocess | Extract HDF5 variables |
| Format | COG |
| Prepare | EO3 |
| TODO | |

| Measurement name | Description |
|--|--|
| precipitation | Merged satellite-gauge precipitation estimate (recommended for general use) |
| random_error | Random error for merged satellite-gauge precipitation. Computed as in Huffman (1997) |
| gauge_relative_weighting | Weighting of gauge precipitation relative to the multi-satellite precipitation |
| probability_liquid_precipitation | Accumulation-weighted probability of liquid precipitation phase. Essentially the fraction of the monthly accumulation that fell as liquid |
| precipitation_quality_index | Quality Index for precipitation field. Equivalent number of gauges per 2.5° box<br>The approximate number of gauges required to produce the estimated random error, given the estimated precipitation<br>0-2 = Equivalent to the gauge coverage in regions such as central Africa, where the lack of data in a gauge-only analysis a critical problem<br>2-4 = Mid-range has enough gauge data to ensure reasonable bias adjustment, but still require interpolation to fill in gaps several grid boxes wide between stations more or less routinely<br>4+ = Developed areas with good-to-excellent gauge networks |

## Setup

#### Dask

In [None]:
from dask.distributed import Client

client = Client("tcp://10.0.79.184:37537")
client

#### Imports

In [None]:
# Data tools
import numpy as np
import xarray as xr
import pandas as pd
from datetime import datetime

# Datacube
import datacube
from datacube.utils import masking  # https://github.com/opendatacube/datacube-core/blob/develop/datacube/utils/masking.py
from odc.algo import enum_to_bool   # https://github.com/opendatacube/odc-tools/blob/develop/libs/algo/odc/algo/_masking.py
from odc.algo import xr_reproject   # https://github.com/opendatacube/odc-tools/blob/develop/libs/algo/odc/algo/_warp.py
from datacube.utils.geometry import GeoBox, box  # https://github.com/opendatacube/datacube-core/blob/develop/datacube/utils/geometry/_base.py
from datacube.utils.rio import configure_s3_access

# Holoviews, Datashader and Bokeh
import hvplot.pandas
import hvplot.xarray
import holoviews as hv
import panel as pn
import colorcet as cc
import cartopy.crs as ccrs
from datashader import reductions
from holoviews import opts
# import geoviews as gv
# from holoviews.operation.datashader import rasterize
hv.extension('bokeh', logo=False)

# Python
import sys, os, re

# Optional EASI tools
sys.path.append(os.path.expanduser('~/hub-notebooks/scripts'))
import notebook_utils

#### ODC database

In [None]:
# For development products:
#  - This is a development ODC database while we test and demo this product.
# CONF = """
# [datacube]
# db_hostname: v2-db-easihub-csiro-eks.cluster-ro-cvaedcg0qvwd.ap-southeast-2.rds.amazonaws.com
# db_database: user_dev_odc
# db_username: user
# db_password: secretpassword
# """
# from datacube.config import read_config, LocalConfig
# dc = datacube.Datacube(config=LocalConfig(read_config(CONF)), env='datacube')

# dc = datacube.Datacube()

#### Example query

Change any of the parameters in the query object below to adjust the location, time, projection, or spatial resolution of the returned datasets.

Use the Explorer interface to check the temporal and spatial coverage for each product:
- https://explorer.csiro.easi-eo.solutions  + /product (when available)

In [None]:
# Area name
min_longitude, max_longitude = (-180, 180)
min_latitude, max_latitude = (-90, 90)
min_date = '2000-06-01'
max_date = '2021-12-31'
product = 'gpm_3imerg_month'

native_crs = 'epsg:4326'

query = {
    'product': product,                     # Product name
    'x': (min_longitude, max_longitude),    # "x" axis bounds
    'y': (min_latitude, max_latitude),      # "y" axis bounds
    'time': (min_date, max_date),           # Any parsable date strings
    'output_crs': native_crs,               # EPSG code
    'resolution': (-0.1, 0.1),             # Target resolution
    'group_by': 'solar_day',                # Scene ordering
    'dask_chunks': {'latitude': 2048, 'longitude': 2048},  # Dask chunks
}

In [None]:
# Load data
dc = datacube.Datacube()
data = dc.load(**query)

notebook_utils.heading(notebook_utils.xarray_object_size(data))
display(data)

# Calculate valid (not nodata) masks for each layer
valid_mask = masking.valid_data_mask(data)
notebook_utils.heading('Valid data masks for each variable')
display(valid_mask)

## Product definition

Display the measurement definitions for the selected product.

Use `list_measurements` to show the details for a product, and `masking.describe_variable_flags` to show the flag definitions.

In [None]:
# Measurement definitions for the selected product
measurement_info = dc.list_measurements().loc[query['product']]
notebook_utils.heading(f'Measurement table for product: {query["product"]}')
notebook_utils.display_table(measurement_info)

# Separate lists of measurement names and flag names
measurement_names = measurement_info[ pd.isnull(measurement_info.flags_definition)].index
flag_names        = measurement_info[pd.notnull(measurement_info.flags_definition)].index

notebook_utils.heading('Selected Measurement and Flag names')
notebook_utils.display_table(pd.DataFrame({
    'group': ['Measurement names', 'Flag names'],
    'names': [', '.join(measurement_names), ', '.join(flag_names)]
}))

# Flag definitions
for flag in flag_names:
    notebook_utils.heading(f'Flag definition table for flag name: {flag}')
    notebook_utils.display_table(masking.describe_variable_flags(data[flag]))

## Quality layer

In [None]:
# Make flags image
flag_name = 'precipitation_quality_index'
flag_data = data[[flag_name]].where(valid_mask[flag_name]).persist()   # Dataset
display(flag_data)

In [None]:
# These options manipulate the color map and colorbar to show the categories for this product
options = {
    'title': f'Flag data for: {query["product"]} ({flag_name})',
    'cmap': cc.rainbow,
    'colorbar': True,
    'width': 800,
    'height': 450,
    'aspect': 'equal',
    'tools': ['hover'],
}

# Set the Dataset CRS
plot_crs = native_crs
if plot_crs == 'epsg:4326':
    plot_crs = ccrs.PlateCarree()


# Native data and coastline overlay:
# - Comment `crs`, `projection`, `coastline` to plot in native_crs coords
# TODO: Update the axis labels to 'longitude', 'latitude' if `coastline` is used

quality_plot = flag_data.hvplot.image(
    x = 'longitude', y = 'latitude',         # Dataset x,y dimension names
    rasterize = True,                        # Use Datashader
    aggregator = reductions.mode(),          # Datashader selects mode value, requires 'hv.Image'
    precompute = True,                       # Datashader precomputes what it can
    crs = plot_crs,                          # Dataset CRS
    projection = ccrs.PlateCarree(),         # Output Projection (ccrs.PlateCarree() when coastline=True)
    coastline = '10m',                       # Coastline = '10m'/'50m'/'110m'
).options(opts.Image(**options)).hist()

# display(quality_plot)
# Optional: Change the default time slider to a dropdown list, https://stackoverflow.com/a/54912917
fig = pn.panel(quality_plot, widgets={'time': pn.widgets.Select})  # widget_location='top_left'
display(fig)

In [None]:
# Create mask layer

# precipitation_quality_index
# 0-2 = Equivalent to the gauge coverage in regions such as central Africa, where the lack of data in a gauge-only analysis a critical problem
# 2-4 = Mid-range has enough gauge data to ensure reasonable bias adjustment, but still require interpolation to fill in gaps several grid boxes wide between stations more or less routinely
# 4+ = Developed areas with good-to-excellent gauge networks

flag_name = 'precipitation_quality_index'
good_pixel_mask = data[flag_name] > 2

## Masking and Scaling

In [None]:
# Define scaling

# No scaling
# scale = 1.
# offset = 0.

In [None]:
# Select a layer and apply masking and scaling, then persist in dask

layer_name = 'precipitation'

# Apply valid mask and good pixel mask
layer = data[[layer_name]].where(valid_mask[layer_name] & good_pixel_mask)
layer = layer.persist()

## Visualisation

In [None]:
# Generate a plot

options = {
    'title': f'{query["product"]}: {layer_name}',
    'width': 800,
    'height': 450,
    'aspect': 'equal',
    'cmap': cc.rainbow,
    'clim': (0, 2),                          # Limit the color range depending on the layer_name
    'colorbar': True,
    'tools': ['hover'],
}

# Set the Dataset CRS
plot_crs = native_crs
if plot_crs == 'epsg:4326':
    plot_crs = ccrs.PlateCarree()


# Native data and coastline overlay:
# - Comment `crs`, `projection`, `coastline` to plot in native_crs coords
# TODO: Update the axis labels to 'longitude', 'latitude' if `coastline` is used

layer_plot = layer.hvplot.image(
    x = 'longitude', y = 'latitude',                        # Dataset x,y dimension names
    rasterize = True,                        # Use Datashader
    aggregator = reductions.mean(),          # Datashader selects mean value
    precompute = True,                       # Datashader precomputes what it can
    crs = plot_crs,                        # Dataset crs
    projection = ccrs.PlateCarree(),         # Output projection (use ccrs.PlateCarree() when coastline=True)
    coastline='10m',                         # Coastline = '10m'/'50m'/'110m'
).options(opts.Image(**options)).hist(bin_range = options['clim'])

# display(layer_plot)
# Optional: Change the default time slider to a dropdown list, https://stackoverflow.com/a/54912917
fig = pn.panel(layer_plot, widgets={'time': pn.widgets.Select})  # widget_location='top_left'
display(fig)

## Appendix