# Individual glacier analysis 1

This notebook will walk you through steps to read in and organize velocity data in a raster format using xarray and rioxarray tools

First, lets install the python libraries that were listed on the [Software](software.ipynb) page:

In [1]:
import geopandas as gpd
import os
import numpy as np
import xarray as xr
import rioxarray as rxr
import matplotlib.pyplot as plt
from geocube.api.core import make_geocube

### ALT cell: (for cloud data access, this is the preferable option but not currently working)
- try to open itslive xr object from s3 link. explain how we got the link (API that returns urls (?))
- itslive data cube catalong file: https://its-live-data.s3.amazonaws.com/datacubes/catalog_v02.json
** prob need to add zarr to env file, or just re-write env file

#### Two ways to get its_live urls via api: 

https://nsidc.org/apps/itslive-search/docs#/default/urls_velocities_urls__get

and via python api access below: 

In [None]:
#getting itslive urls from api in python: 

import requests

base_api = 'https://nsidc.org/apps/itslive-search/velocities/urls'

params = {
    'bbox':'84.899, 27.98, 85.95, 28.75',
    'start': '2013-01-01',
    'end': '2021-12-31',
    'percent_valid_pixels':20,
    'min_interval': 7,
    'max_interval':100
}

velocity_pairs = requests.get(base_api, params=params)

In [None]:
type(velocity_pairs)
#velocity_pairs
velocity_pairs_ls = velocity_pairs.json()
velocity_pairs_ls[0]['url']
#ds1 = xr.open_dataset(velocity_pairs_ls[0]['url'])

In [None]:
#image_pair1 = xr.open_dataset(velocity_pairs_ls[0]['url'],
#                            engine = 'h5netcdf')

In [None]:
#zarr_32645_x350000_y3050000 = 's3://its-live-data/datacubes/v02/N20E080/ITS_LIVE_vel_EPSG32645_G0120_X350000_Y3050000.zarr'

#ds_32645_x350000_y3050000 = xr.open_dataset(zarr_32645_x350000_y3050000, engine = 'zarr',
#                            storage_options= {'anon':True})

In [None]:
zarr_32645_x350000_y4650000 = "s3://its-live-data/datacubes/v02/N40E080/ITS_LIVE_vel_EPSG32645_G0120_X350000_Y4650000.zarr"

ds_32645_x350000_y4650000 = xr.open_dataset(zarr_32645_x350000_y4650000, engine = 'zarr',
                            storage_options={'anon':True})

In [None]:
ds_catalog.isel(mid_date=1).v.plot()

In [None]:
ds_catalog.coords['x'].data.min()

In [None]:
ds_catalog

In [None]:
import geopandas as gpd
from shapely.geometry import Polygon

def get_bbox(input_xr):

    xmin = input_xr.coords['x'].data.min()
    xmax = input_xr.coords['x'].data.max()

    ymin = input_xr.coords['y'].data.min()
    ymax = input_xr.coords['y'].data.max()

    pts_ls = [(xmin, ymin), (xmax, ymin),(xmax, ymax), (xmin, ymax), (xmin, ymin)]

    crs = {'init':f'epsg:{input_xr.mapping.spatial_epsg}'}
    polygon_geom = Polygon(pts_ls)
    polygon = gpd.GeoDataFrame(index=[0], crs=crs, geometry=[polygon_geom])  

    polygon.plot(facecolor='none', edgecolor='red')     


    

In [None]:
get_bbox(ds_catalog)

In [None]:
zarr_url = 's3://its-live-data/datacubes/v02/N20E080/ITS_LIVE_vel_EPSG32645_G0120_X250000_Y3250000.zarr'


itslive_zarr = xr.open_dataset(zarr_url,
                              engine = 'zarr',  
                              storage_options = {'anon':True}
                              )

## Workflow with local data, not sure why s3 isn't working

In [None]:
gen_path = '/Users/emarshall/Desktop/siparcs/xr_book1/'

## ITS_LIVE raster data

This section contains a workflow for reading in and organizing ITS_LIVE glacier velocity data that is accessed in netcdf format from the NSIDC DAAC. Whereas before, we needed to build a the magnitude of velocity variable from the velocity component variables (individual geotiff files), the netcdf file contains a variable for magnitude of velocity as well as many other variables representing land cover types, error estimates and metadata

In [None]:
%%time

itslive = rxr.open_rasterio(gen_path[:-9] + '/data/HMA_G0120_0000.nc',
                            chunks = 'auto').squeeze()

In [None]:
itslive

What is the CRS of this object?

There are two ways we can check that. First, by using the `rio.crs` accessor:

In [None]:
itslive.rio.crs

The netcdf object is in a different CRS than the geotiff object. Because **Asia North Lambert Conformal Conic** covers a larger spatial extent than a single UTM zone (the projection of the geotiff object), we will use that projection.
*add link to good explainer page?*

In [None]:
itslive.dims

In [None]:
itslive.coords

## Vector data 

In [None]:
#read in vector data 
se_asia = gpd.read_file(gen_path[:-9] + 'data/nsidc0770_15.rgi60.SouthAsiaEast/15_rgi60_SouthAsiaEast.shp')

How many glaciers are in this dataframe?

In [None]:
len(se_asia['RGIId'])

What coordinate reference system is this dataframe in? 

In [None]:
se_asia.crs

The vector dataset is in WGS 84, meaning that its coordinates are in degrees latitude and longitude rather than meters N and E. We will project this dataset to match the projection of the netcdf dataset.

## Handling projections

Let's project this dataframe to match the CRS of the itslive dataset

In [None]:
#not sure why but this didn't work for me specifying epsg code, had to specify full description
se_asia_prj = se_asia.to_crs('+proj=lcc +lat_1=15 +lat_2=65 +lat_0=30 +lon_0=95 +x_0=0 +y_0=0 +ellps=WGS84 +datum=WGS84 +units=m no_defs')
se_asia_prj.head(3)

## Let's start this analysis on a single glacier

We'll demonstrate analysis on a single glacier before scaling up to multiple glaciers. To start with, let's select the largest glacier in the dataframe.

In [None]:
se_asia_prj['Area'].idxmax()

In [None]:
se_asia_prj.iloc[11908]

### So, our sample glacier is:

In [None]:
sample_glacier_prj = se_asia_prj.loc[se_asia_prj['RGIId'] == 'RGI60-15.11909']
sample_glacier_prj

#### Clip raster data to vector (sample glacier)

We'll be following [this example](https://corteva.github.io/rioxarray/stable/examples/clip_geom.html), go check it out for more info  

In [None]:
%%time

glacier_raster = itslive.rio.clip(sample_glacier_prj.geometry, sample_glacier_prj.crs)

In [None]:
glacier_raster

In [None]:
glacier_clipped = xr.where(glacier_raster.ice == 1., glacier_raster, np.NaN)

In [None]:
glacier_clipped.v.plot()

In [None]:
glacier_clipped

In [None]:

#glacier_raster.ice.plot()

In [None]:
#fig, ax = plt.subplots()

#sample_glacier.plot(ax=ax, facecolor='white', edgecolor='red')
#glacier_raster.v.plot(ax=ax, cmap=plt.cm.cividis)

In [None]:
#glacier_raster.v.data.min()

### Handling missing data / selecting data (xr.where)
The above plot isn't that informative because you can see that the non-glaciated terrain surrounding the glacier is assigned negative values that skew the colorscale. Assigning missing or non-target datapoints a unique and distinctive numeric value can be useful in some cases, but for our purposes we don't want them showing up in our plots right now.

In [None]:
#glacier_raster.ice.data.shape

#glacier_raster.v.data[51]

**fix this part**

In [None]:
#anywhere glacier_raster.ice == 0, we want to turn to nan (I think?)
#glacier_raster_x = xr.where(glacier_raster.v != -32767., glacier_raster, np.nan)

In [None]:
#glacier_raster_x.v.plot()