# Data integration with ICESat-2 - Part I

```{admonition} Learning Objectives
**Goals**
- Identify and locate non-ICESat-2 data sets
- Acquiring data from the cloud or via download
- Open data in Pandas and Xarray and basic functioning of DataFrames
```
☝️ This formatting is a Jupyter Book [admonition](https://jupyterbook.org/content/content-blocks.html#notes-warnings-and-other-admonitions), that uses a custom version of Markdown called {term}`MyST`

## Computing environment

We'll be using the following open source Python libraries in this notebook:

In [1]:
%matplotlib widgets
import satsearch
print(satsearch.__version__)
# print(satsearch.config.API_URL)
from satsearch import Search
import geopandas as gpd
import ast
import pandas as pd
import geoviews as gv
import hvplot.pandas
from ipywidgets import interact
from IPython.display import display, Image
import intake # if you've installed intake-STAC, it will automatically import alongside intake
import xarray as xr
import matplotlib.pyplot as plt
import boto3
import rasterio as rio
from rasterio.session import AWSSession
from rasterio.plot import show
from dask.utils import SerializableLock
import os
import hvplot.xarray
import numpy as np
from pyproj import Proj, transform

# Suppress library deprecation warnings
import warnings
warnings.filterwarnings('ignore')

KeyError: 'widgets'

In [None]:
# Sets up credentials for acquiring images through dask/xarray
os.environ["AWS_REQUEST_PAYER"] = "requester"

## Data

We will examine raster images from the [MODIS instrument](https://modis.gsfc.nasa.gov/data/). "MODIS" stands for "MODerate Resolution SpectroRadiometer". Moderate resolution refers to the fact that MODIS data has at best a 250 meter pixel posting, but single images cover hundreds of kilometers, enabling [daily views of the entire globe](https://worldview.earthdata.nasa.gov/):


:::{figure-md} modis
<img src="https://eoimages.gsfc.nasa.gov/images/imagerecords/0/687/world_2000_110_rgb143.jpg" alt="modis day 1" width="800px">

Mosaic MODIS image from the first complete day of coverage, June 30, 2000
:::

☝️ This formatting is a Jupyter Book [markdown figure](https://jupyterbook.org/content/figures.html#markdown-figures)

Note there are two identical MODIS instruments on separate satellites

| Satellite | Launch date |
| - | -|
| Terra | December 18, 1999 | 
| Aqua | May 4, 2002 | 

## Find desired Landsat or Sentinel-2 images in the cloud

In [None]:
# Search for Landsat images based on a bounding box, date and other metadata if desired
# Save to geojson file
# NOTE this STAC API endpoint does not currently search the entire catalog

satellite = 'Landsat8'

bbox = (-105.30, -74.970, -105.08, -74.920) #(west, south, east, north) 

# timeRange = '2013-06-01/2014-06-30'
# timeRange = '2014-06-01/2015-06-30'
# timeRange = '2015-06-01/2016-06-30'
# timeRange = '2016-06-01/2017-06-30'
# timeRange = '2017-06-01/2018-06-30'
# timeRange = '2018-06-01/2019-06-30'
# timeRange = '2019-06-01/2020-06-30'
# timeRange = '2020-06-01/2021-06-30'
timeRange = '2022-01-20/2022-01-30'

if satellite=='Landsat8':
    url = 'https://ibhoyw8md9.execute-api.us-west-2.amazonaws.com/prod'
    collection = 'usgs-landsat/collection02/'
    band = 'blue'
    colnm = ['landsat:wrs_path','landsat:wrs_row']
    end = '_T2' # end of file name if don't want all
    qa_band = 'qa_pixel'
elif satellite=='Sentinel2':
    url = 'https://earth-search.aws.element84.com/v0' # maybe also https://services.sentinel-hub.com/api/v1/catalog/
    collection = 'sentinel-s2-l2a-cogs'
    band = 'B02'
    colnm = ['sentinel:latitude_band','sentinel:grid_square']
    end = 'L2A'
    qa_band = 'SCL'
elif satellite=='Sentinel1': ### Doesn't work
    url = 'http://eocloud.sentinel-hub.com/search'
    collection = 'sentinel-s1-rtc-indigo'
    
results = Search.search(url=url,
                        collection=collection,
                        datetime=timeRange,
                        bbox=bbox,    
                        # properties=properties,
                        sort=['<datetime'])

print('%s items' % results.found())
items = results.items()
items.save(f'/home/jovyan/shared-readwrite/geostacks/polynyas/{satellite}.geojson')

In [None]:
# Load the geojson file
gf = gpd.read_file(f'/home/jovyan/shared-readwrite/geostacks/polynyas/{satellite}.geojson')
gf.head()

In [None]:
gf.columns

In [None]:
# Plot search AOI and frames on a map using Holoviz Libraries (more on these later)
cols = gf.loc[:,('id',colnm[0],colnm[1],'geometry')]
alpha = 1/results.found()**0.5 # transparency scales w number of images

footprints = cols.hvplot(geo=True, line_color='k', hover_cols=[colnm[0],colnm[1]], alpha=alpha, title=satellite)
# footprints = cols.hvplot(geo=True, line_color='k', hover_cols=['landsat:wrs_path','landsat:wrs_row'], alpha=0.1, title='Landsat 8 T1')
tiles = gv.tile_sources.CartoEco.options(width=700, height=500) 
labels = gv.tile_sources.StamenLabels.options(level='annotation')
tiles * footprints * labels

In [None]:
# Need to set up proper credentials for acquiring data through rasterio
aws_session = AWSSession(boto3.Session(), requester_pays=True)

## Intake all scenes and add cloud mask - intake-STAC

In [None]:
catalog = intake.open_stac_item_collection(items)
# list(catalog)

In [None]:
# # Retrieve scene using rio
# item_num = 0
# if satellite=='Landsat8':
#     item_url = items[item_num].asset(band)['alternate']['s3']['href']
# elif satellite=='Sentinel2':
#     item_url = items[item_num].asset('red')['href']

# # Read and plot with grid coordinates 
# with rio.Env(aws_session):
#     with rio.open(item_url) as src:
#         fig, ax = plt.subplots(figsize=(9,8))
#         show(src,1)
#         profile = src.profile
#         arr = src.read(1)

# # Plot by index
# fig, ax = plt.subplots(figsize=(6,5))
# plt.imshow(arr)

In [None]:
# Load dask, suppress warnings, each worker needs to run as requester pays
from dask.distributed import Client
import logging
client = Client(processes=True, n_workers=4, 
                threads_per_worker=1,
                silence_logs=logging.ERROR)
client.run(lambda: os.environ["AWS_REQUEST_PAYER"] == "requester" )
client

In [None]:
sceneid = list(catalog)
sceneid

In [None]:
# Import to xarray with cloud mask  ***This is slow
# nans are in locations where concat of multiple scenes has expanded the grid
# Would like to use SR for all bands except TIR, but they are not available for many scenes so would
# need a separate model for scenes with SR and those without.
scenes = []

# Create time variable for time dim
time_var = xr.Variable('time',gf.loc[gf.id.isin([item for item in sceneid if item.endswith('_T2')])]['datetime'])
# time_var = xr.Variable('time', time_index_from_filenames(s3_links))

# hls_ts_da = xr.concat([rioxarray.open_rasterio(f, chunks=chunks).squeeze('band', drop=True) for f in s3_links], dim=time)
# hls_ts_da

# hls_ts_da_data = hls_ts_da.load()
    
    
for item in sceneid:
    if item.endswith('_T2'):
        item = catalog[item]
        print (item.name)
        
        bands = []
        band_names = []
        
        # Get band names
        for k in item.keys():
            M = getattr(item, k).metadata
            if 'eo:bands' in M:
                resol = M['eo:bands'][0]['gsd']
#                 print(k, resol)
                if resol >= 30: # thermal bands are up sampled from 100 to 30
                    band_names.append(k)
                    
        # Add qa band
        band_names.append('qa_pixel')
        
        # Construct xarray for scene
        for band_name in band_names:
            band = item[band_name](chunks=dict(band=1, x=2048, y=2048),urlpath=item[band_name].metadata['alternate']['s3']['href']).to_dask() # Specify chunk size, landsat is prob in 512 chunks so used multiple
            band['band'] = [band_name]
#           band = band.expand_dims(time=[pd.to_datetime(item.metadata['datetime'])]) # doesn't give right dates
            bands.append(band)
        scene = xr.concat(bands, dim='band')
        
        # Create and add cloud mask (this mask is technically everything except nan,ocean,ice clear sky)
        # 1 is no data, 21952 is ocean, 30048 is ice (can change this to add more)
        # Access mask via 'ts_scenes.isel(time=0).mask'
        qa = scene.sel(band='qa_pixel')
        cond = np.logical_or((qa.isin(ocean))|(qa.isin(ice)), qa==1)
#         cond = np.logical_or((qa==21952)|(qa==30048), qa==1)
        qa_c = qa.where(~cond, np.nan)
        cond = np.logical_or((qa_c.isin(ocean))|(qa_c.isin(ice)), qa_c==1)
        mask_c = qa_c.where(qa_c.isnull(),2) # cloud = 2
        
        # For optional ice mask
        qa_io = qa.where(qa.isin(ice), np.nan)
        mask_io = qa_io.where(~(qa_io.isin(ice)),1) # ice = 1
        mask_c = mask_io.where(mask_io==1,mask_c)

        scene.coords['mask'] = (('y', 'x'), mask_c.data)
        scenes.append(scene)

# Concatenate scenes with time variable ***This is the slowest
ts_scenes = xr.concat(scenes, dim=time_var)

# Get epsg for Landsat images
# epsg = item.metadata['proj:epsg']
# epsg = ts_scenes.crs
pix = ts_scenes.transform[0]

ts_scenes

In [None]:
sbands = ['blue', 'nir08', 'swir16','cirrus', 'lwir11']
sub_box = 1

# for t in thw.time.values:
t = thw.time.values[-6]
# Masks added to data
image = thw.sel(time=t,band=sbands).where(thw.sel(time=t).mask != 2)# * buffer_mask

if sub_box==1:
    # And further subset to polynya location if desired 
    pulx = -1590700
    puly = -428200
    plrx = -1584000
    plry = -433600
    image = image.sel(y=slice(plry,puly),x=slice(pulx,plrx))

# For use at the end for coordinates
pol_y = image.y
pol_x = image.x

image = np.array(image.where(image.notnull(),0))
image = np.moveaxis(image, 0, -1)

n_band = image.shape[2]
print (t)

In [None]:
fig, (ax,ax1) = plt.subplots(ncols=2,figsize=(6,4))
ax.imshow(image[:,:,0])
ax.title.set_text('Subset window')
(thw.sel(time=t,band='blue')).plot(ax=ax1)

### NASA GIBS basemap

NASA's [Global Imagery Browse Services (GIBS)](https://earthdata.nasa.gov/eosdis/science-system-description/eosdis-components/gibs) is a great Web Map Tile Service (WMTS) to visualize NASA data as pre-rendered tiled raster images. The NASA [Worldview](https://worldview.earthdata.nasa.gov) web application is a way to explore all GIBS datasets. We can also use ipyleaflet to explore GIBS datasets, like MODIS truecolor images, within a Jupyter Notebook. Use the slider in the image below to reveal the image from 2019-04-25:

In [None]:
m = Map(center=bbox_ctr, zoom=6)

right_layer = basemap_to_tiles(basemaps.NASAGIBS.ModisTerraTrueColorCR, "2019-04-25")
left_layer = TileLayer()
control = SplitMapControl(left_layer=left_layer, right_layer=right_layer)
m.add_control(control)

m.add_layer(rectangle)

m

### Exercise 1

Re-create the map above using different tile layers for both the right and left columns. Label a single point of interest with a marker, such as the city of Grand Junction, Colorado.

In [None]:
# Add your solution for exercise 1 here!

## Summary

 🎉 Congratulations! You've completely this tutorial and have seen how we can add  notebook can be formatted, and how to create interactive map visualization with ipyleaflet.
 

```{note}
You may have noticed Jupyter Book adds some extra formatting features that do not necessarily render as you might expect when *executing* a noteook in Jupyter Lab. This "admonition" note is one such example.
```

:::{warning}
Jupyter Book is very particular about [Markdown header ordering](https://jupyterbook.org/structure/sections-headers.html?highlight=headers#how-headers-and-sections-map-onto-to-book-structure) to automatically create table of contents on the website. In this tutorial we are careful to use a single main header (#) and sequential subheaders (#, ##, ###, etc.)
:::

## References

To further explore the topics of this tutorial see the following detailed documentation:

* [Jupyter Book rendering of .ipynb notebooks](https://jupyterbook.org/file-types/notebooks.html)
* [Jupyter Book guide on writing narrative content](https://jupyterbook.org/content/index.html)
* [ipyleaflet documentation](https://ipyleaflet.readthedocs.io)