# Individual glacier analysis 1

This notebook will walk you through steps to read in and organize velocity data in a raster format using xarray and rioxarray tools

First, lets install the python libraries that were listed on the [Software](software.ipynb) page:

In [None]:
import geopandas as gpd
import os
import numpy as np
import xarray as xr
import rioxarray as rxr
import matplotlib.pyplot as plt
import matplotlib.ticker as mticker
from shapely.geometry import Polygon
from shapely.geometry import Point
import cartopy.crs as ccrs
from cartopy.mpl.gridliner import LONGITUDE_FORMATTER, LATITUDE_FORMATTER
import cartopy
import cartopy.feature as cfeature
import json
import urllib.request
from skimage.morphology import skeletonize


## Accessing ITS_LIVE data stored in s3 buckets

In [None]:

import urllib.request
with urllib.request.urlopen('https://its-live-data.s3.amazonaws.com/datacubes/catalog_v02.json') as url_catalog:
    itslive_catalog = json.loads(url_catalog.read().decode())
itslive_catalog.keys()

Take a look at a single catalog entry:

In [None]:
itslive_catalog['features'][0]

Use the function below to find the url that corresponds to the zarr datacube for a specific point:

In [None]:
def find_granule_by_point(input_dict, input_point): #[lon,lat]
    '''Takes an inputu dictionary (a geojson catalog) and a point to represent AOI.
    this returns a list of the s3 urls corresponding to zarr datacubes whose footprint covers the AOI'''
    #print([input_points][0])
    
    target_granule_urls = []
    #Point(coord[0], coord[1])
    #print(input_point[0])
    #print(input_point[1])
    point_geom = Point(input_point[0], input_point[1])
    #print(point_geom)
    point_gdf = gpd.GeoDataFrame(crs='epsg:4326', geometry = [point_geom])
    for granule in range(len(input_dict['features'])):
        
        #print('tick')
        bbox_ls = input_dict['features'][granule]['geometry']['coordinates'][0]
        bbox_geom = Polygon(bbox_ls)
        bbox_gdf = gpd.GeoDataFrame(index=[0], crs='epsg:4326', geometry = [bbox_geom])
        
        #if poly_gdf.contains(points1_ls[poly]).all() == True:

        if bbox_gdf.contains(point_gdf).all() == True:
            #print('yes')
            target_granule_urls.append(input_dict['features'][granule]['properties']['zarr_url'])
        else:
            pass
            #print('no')
    return target_granule_urls

This function will read in a xarray dataset from a url to a zarr datacube when we're ready:

In [None]:
def read_in_s3(http_url):
    s3_url = http_url.replace('http','s3')
    s3_url = s3_url.replace('.s3.amazonaws.com','')

    datacube = xr.open_dataset(s3_url, engine = 'zarr',
                                storage_options={'anon':True},
                                chunks = 'auto')

    return datacube

In [None]:
def get_bbox_single(input_xr):
    
    '''Takes input xr object (from itslive data cube), plots a quick map of the footprint. 
    currently only working for granules in crs epsg 32645'''

    xmin = input_xr.coords['x'].data.min()
    xmax = input_xr.coords['x'].data.max()

    ymin = input_xr.coords['y'].data.min()
    ymax = input_xr.coords['y'].data.max()

    pts_ls = [(xmin, ymin), (xmax, ymin),(xmax, ymax), (xmin, ymax), (xmin, ymin)]

    #print(input_xr.mapping.spatial_epsg)
    #print(f"epsg:{input_xr.mapping.spatial_epsg}")
    crs = f"epsg:{input_xr.mapping.spatial_epsg}"
    #crs = {'init':f'epsg:{input_xr.mapping.spatial_epsg}'}
    #crs = 'epsg:32645'
    #print(crs)

    polygon_geom = Polygon(pts_ls)
    polygon = gpd.GeoDataFrame(index=[0], crs=crs, geometry=[polygon_geom]) 
    #polygon = polygon.to_crs('epsg:4326')

    bounds = polygon.total_bounds

    return polygon

In [None]:
url = find_granule_by_point(itslive_catalog, [84.56, 28.54])
url

In [None]:
dc = read_in_s3(url[0])
dc

Check CRS of xr object: 

In [None]:
dc.mapping

## Read in vector data (Randolph Glacier Inventory)
- this is downloaded locally from the [NSIDC](https://nsidc.org/data/nsidc-0770)

In [None]:
se_asia = gpd.read_file('/Users/emarshall/Desktop/siparcs/data/nsidc0770_15.rgi60.SouthAsiaEast/15_rgi60_SouthAsiaEast.shp')
se_asia.head(3)

In [None]:
#project rgi data to match itslive
se_asia_prj = se_asia.to_crs('EPSG:32645') #we know the epsg from looking at the 'spatial epsg' attr of the mapping var of the dc object
se_asia_prj.head(3)

## Crop RGI to ITS_LIVE extent
- use get_bbox_single() from access nb but no plotting (above)

In [None]:
#first, get vector bbox of itslive

bbox_dc = get_bbox_single(dc)
bbox_dc['geometry']

#subset rgi to bounds 
se_asia_subset = gpd.clip(se_asia_prj, bbox_dc)
se_asia_subset
se_asia_subset.explore()

In [None]:
#find largest glacier (just to start)
se_asia_subset.sort_values('Area', ascending=False)

In [None]:
largest_glacier_vec = se_asia_subset.loc[se_asia_subset['Area'] == 42.398]
largest_glacier_vec

### Clip ITSLIVE dataset to individual glacier extent

First, we need to use rio.write_crs() to assign a CRS to the itslive object. If we don't do that first the `rio.clip()` command will produce an error
*Note*: it looks like you can only run write_crs() once, because it switches mapping from beign a `data_var` to a `coord` so if you run it again it will produce a key error looking for a var that doesnt' exist

In [None]:
dc = dc.rio.write_crs(f"epsg:{dc.mapping.attrs['spatial_epsg']}", inplace=True)

In [None]:
%%time

largest_glacier_raster = dc.rio.clip(largest_glacier_vec.geometry, largest_glacier_vec.crs)

In [None]:
largest_glacier_raster

In [None]:
largest_glacier_raster.isel(mid_date=0).v.plot()

In [None]:
largest_glacier_raster.v_error.isel(mid_date=0).plot()

In [None]:
largest_glacier_raster.v

### Seasonal mean velocities with groupby

In [None]:
#first define the function we'll apply to each group
def middate_mean(a):
    return a.mean(dim='mid_date')


In [None]:
seasons_gb = largest_glacier_raster.groupby(largest_glacier_raster.mid_date.dt.season).map(middate_mean)
#add attrs to gb object
seasons_gb.attrs = largest_glacier_raster.attrs #why didn't that work?
seasons_gb

In [None]:
fg = seasons_gb.v.plot(
    col='season',
    vmax = 150
)

# Glacier centerline extraction

-using scikit-image skeletonize()
- probably not the best/ideal way to do this, for now wanted to have an example but maybe not worth including here? 

Let's choose a different glacier to focus on this time:

In [None]:
glacier_04118 = se_asia_subset.loc[se_asia_subset['RGIId'] == 'RGI60-15.04118']
glacier_04118.explore()

We will go through the same steps of clipping the full itslive dataset to the extent of the glacier

In [None]:
raster_04118 = dc.rio.clip(glacier_04118.geometry, glacier_04118.crs)
raster_04118

Use the skimage function `skeletonize()` to extract a centerline (this is a rough approximation for the purposes of this example)

In [None]:
raster_04118