`Xarray-DataAccessor` + `FCPGtools` Demo Notebook
==================================================

**Author:** [Xavier R Nogueira](https://github.com/xaviernogueira)

**Notebook Steps:**
1. Using the `xarray_data_accessor.DataAccessorFactory` library to search available datasets/variables.
2. Identify [HydroBASINS](https://www.hydrosheds.org/products/hydrobasins) watershed boundaries for a point in South America using an [ArcGIS REST API endpoint](https://imapinvasives.natureserve.org/arcgis/rest/services/hydrobasins/MapServer/2).
3. Read-in hourly ERA5 precipitation data from [Planet OS's AWS S3 bucket](https://github.com/planet-os/notebooks/blob/master/aws/era5-pds.md) for our South American basin of choice.
4. Load in a USGS Flow Direction Raster (FDR) for our watershed that is stored locally in `/example_data`.
5. Use `fcpgtools` to resample our precipitation data to match the resolution of the FDR.
6. Use `fcpgtools` to calculate precipitation accumulation for all hourly time steps.

All the while we will be creating interactive plots/maps using [`hvplot`](https://hvplot.holoviz.org/) and [`geoviews`](https://geoviews.org/).

**Set Up:**
1. Clone `Xarray-DataAccessor` locally.
2. Create the conda environment using `environment.yml`.
3. Add `xarray_data_accessor` (the package) to your conda environment using the `conda develop %PATH%/Xarray-DataAccessor/src` command.
4. Attempt to pip install `fcpgtools`. If it works...great, you can skip step 5/6!
5. If it fails (can be finicky for technical reasons), clone the [`FCPGtools`](https://github.com/usgs/water-fcpg-tools) repo and add the contents of `/src/` to your conda environment using the `conda develop %PATH%/FCPGtools/src` command.
6. Install `pysheds` into your environment using pip or conda.


In [None]:
# import dependencies
import requests
import geopandas
import shapely
import xarray as xr
import numpy as np
import hvplot.xarray
import hvplot.pandas
import cartopy.crs as ccrs
from pathlib import Path
import gc

# import our library
import xarray_data_accessor

In [None]:
# env not liking fcpgtools - I cloned the repo and added it to my environment (also pip installed pysheds)
import fcpgtools

# 🗃️ Explore data availability 🗃️

## First we see what datasets can be accessed by different 'data accessors"

In [None]:
# lets start by seeing which DataAccessor objects are currently available
xarray_data_accessor.DataAccessorFactory.data_accessor_names()

In [None]:
# next lets see what datasets each can access
xarray_data_accessor.DataAccessorFactory.supported_datasets()

## Next we see which ERA5 hourly variables can be fetched with the `AWSDataAccessor`

In [None]:
xarray_data_accessor.DataAccessorFactory.supported_variables(
    data_accessor_name='AWSDataAccessor',
    dataset_name='reanalysis-era5-single-levels',
)

# 🌄 Define a watershed AOI 🌄

Here we read in a HUC8 level watershed within South America using HydroBASINS polygons accessed via [ArcGIS REST endpoint](https://imapinvasives.natureserve.org/arcgis/rest/services/hydrobasins/MapServer/2).

In [None]:
# select a lat/long in South America
lat = 8.54166666666685
lon = -77.31666666666663

# get signs for the given lat/lon
signs = {
    'lat': '+',
    'lon': '+',
}
if lat < 0:
    signs['lat'] = '-'
if lon < 0:
    signs['lon'] = '-'

In [None]:
prefix = 'https://imapinvasives.natureserve.org/arcgis/rest/services/hydrobasins/MapServer/2/query?where=&text=&objectIds=&time=&'
point_url = f'geometry={signs["lon"]}{abs(lon)}%2C{signs["lat"]}{abs(lat)}&geometryType=esriGeometryPoint'
suffix = '&inSR=4326&spatialRel=esriSpatialRelIntersects&relationParam=&outFields=&returnGeometry=true&returnTrueCurves=false&maxAllowableOffset=&geometryPrecision=&outSR=4326&having=&returnIdsOnly=false&returnCountOnly=false&orderByFields=&groupByFieldsForStatistics=&outStatistics=&returnZ=false&returnM=false&gdbVersion=&historicMoment=&returnDistinctValues=false&resultOffset=&resultRecordCount=&queryByDistance=&returnExtentOnly=false&datumTransformation=&parameterValues=&rangeValues=&quantizationParameters=&f=geojson'
basin_url = prefix + point_url + suffix
print(f'Raw GeoJSON here: {basin_url}')

In [None]:
# get the basin and point as a geopandas dataframe
our_basins = geopandas.read_file(basin_url)
point_gdf = geopandas.GeoDataFrame({'geometry': [shapely.Point((lon, lat))]})

In [None]:
# plot our basin(s) and point
our_basins.hvplot(
    crs=our_basins.crs.to_wkt(),
    tiles='StamenTerrainRetina',
    width=500,
    height=500,
    fill_color=None,
    line_width=4,
    line_color='blue',
) * point_gdf.hvplot(
    color='red',
    crs=our_basins.crs.to_wkt(),
)

# 🛰️ Read-in ERA5 precipitation data from the Planet OS AWS cloud bucket 🛰️

In [None]:
%%time
xarray_data = xarray_data_accessor.get_xarray_dataset(
    data_accessor_name='AWSDataAccessor',
    dataset_name='reanalysis-era5-single-levels',
    variables=[
        'precipitation_amount_1hour_Accumulation',
    ],
    start_time='2022-09-28',
    end_time='2022-10-5',
    shapefile=our_basins,
)

In [None]:
xarray_data

In [None]:
# plot a central point over time
xarray_data['precipitation_amount_1hour_Accumulation'][:, 1, 1].hvplot(
    x='time',
)

# 🧰 Prep `FCPGtools` Inputs 🧰

## Read in our Flow Direction Raster (FDR)

In [None]:
# find all tifs
matching_tifs = []
for file in Path(Path.cwd() / 'example_data').iterdir():
    if file.suffix == '.tif':
        fdr = fcpgtools.load_raster(file)
        if (fdr.y.min().item() <= lat <= fdr.y.max().item()) and (fdr.x.min().item() <= lon <= fdr.x.max().item()):
            matching_tifs.append(file)
gc.collect()

In [None]:
matching_tifs

In [None]:
fdr = fcpgtools.load_raster(
    matching_tifs[0],
)
fdr

## Clip to our FDR to our basin AOI

In [None]:
# clip to the bbox
fdr = fcpgtools.clip(
    fdr,
    match_shapefile=our_basins,
)
# mask to only include values in our shapefile
fdr = fcpgtools.spatial_mask(
    fdr,
    mask_shp=our_basins,
)

In [None]:
fdr

In [None]:
fdr.where(fdr != 255, np.nan).hvplot(
    crs=fdr.rio.crs.to_wkt(),
    tiles='StamenTerrainRetina',
    width=500,
    height=500,
    clim=(0, 250),
    cmap='Category10',
) * our_basins.hvplot(
        crs=our_basins.crs.to_wkt(),
        fill_color=None,
        line_width=4,
        line_color='black',
) * point_gdf.hvplot(
    color='red',
    crs=our_basins.crs.to_wkt(),
)

## Align our ERA5 data with the prepped FDR

In [None]:
aligned_era5 = fcpgtools.align_raster(
    xarray_data['precipitation_amount_1hour_Accumulation'],
    fdr,
    resample_method='bilinear',
)

In [None]:
aligned_era5

In [None]:
# plot our basin(s) and point
aligned_era5.hvplot.image(
    crs=aligned_era5.rio.crs.to_wkt(),
    clim=(
        aligned_era5.min().item() + 0.00001,
        aligned_era5.max().item()
    ),
    cmap='PuBu',
    cnorm='log',
    width=600,
    height=500,
    widget_type='scrubber',
    widget_location='bottom',
    tiles='StamenTerrainRetina',
)

# 🌧️ Calculate flow accumulation over time 🌧️

In [None]:
%%time
gc.collect()
flow_accum = fcpgtools.accumulate_parameter(
    fdr,
    aligned_era5,
    engine='pysheds',
)

In [None]:
# sanity check that accumulation matches
flow_accum[:, 100, 100].hvplot(
    x='time',
)

In [None]:
# plot our basin(s) and point
flow_accum.hvplot.image(
    crs=flow_accum.rio.crs.to_wkt(),
    clim=(
        flow_accum.min().item() + 0.00001,
        flow_accum.max().item()
    ),
    cmap='PuBu',
    cnorm='log',
    width=600,
    height=500,
    widget_type='scrubber',
    widget_location='bottom',
    tiles='StamenTerrainRetina',
)