# Data integration with ICESat-2 - Part I

```{admonition} Learning Objectives
**Goals**
- Identify and locate non-ICESat-2 data sets
- Acquiring data from the cloud or via download
- Open data in Pandas and Xarray and basic functioning of DataFrames
```
☝️ This formatting is a Jupyter Book [admonition](https://jupyterbook.org/content/content-blocks.html#notes-warnings-and-other-admonitions), that uses a custom version of Markdown called {term}`MyST`

### Needed libraries:
satsearch \
boto

matplotlib widget extension

## Computing environment

We'll be using the following open source Python libraries in this notebook:

In [4]:
import ipyleaflet
from ipyleaflet import Map, Rectangle, basemaps, basemap_to_tiles, TileLayer, SplitMapControl, Polygon

import ipywidgets
import datetime
import re

In [7]:
# %matplotlib widget
# import satsearch
# from satsearch import Search
import geopandas as gpd
import ast
import pandas as pd
import geoviews as gv
import hvplot.pandas
from ipywidgets import interact
from IPython.display import display, Image
import intake # if you've installed intake-STAC, it will automatically import alongside intake
import xarray as xr
import matplotlib.pyplot as plt
# import boto3
import rasterio as rio
from rasterio.session import AWSSession
from rasterio.plot import show
from dask.utils import SerializableLock
import os
import hvplot.xarray
import numpy as np
from pyproj import Proj, transform

# Suppress library deprecation warnings
import warnings
warnings.filterwarnings('ignore')

## Identify and acquire the ICESat2 product(s) of interest

* What is the application of this product?
* What region and resolution is needed?

In [16]:
# Show image of area of interest (data viz tutorial will get in deeper so don't explain much):
# Can either use a pre-determined image of landsat here or the MODIS ipyleaflet viz 
center = [69.2, -50]
zoom = 8
m = Map(basemap=basemap_to_tiles(basemaps.NASAGIBS.ModisAquaTrueColorCR, '2020-07-18'),center=center,zoom=zoom)
m

Map(center=[69.2, -50], controls=(ZoomControl(options=['position', 'zoom_in_text', 'zoom_in_title', 'zoom_out_…

#### Download ICESat-2 ATL03 data from desired region
__Zach__

In [1]:
# hey, this is a test

## Identify other products of interest, and their space/time boundaries

* What data are needed to supplement and enhance IS2 observations? 
    * Satellite or in situ data sets? 
    * Will show Landsat (AWS) and ATM (non-AWS)

* Where are the other data sets stored?
    * Cloud datasets in AWS 
        https://registry.opendata.aws/
    * EarthData
        https://search.earthdata.nasa.gov/search/

ATM: https://search.earthdata.nasa.gov/search/granules?p=C1000000062-NSIDC_ECS&pg[0][v]=f&pg[0][gsk]=-start_date&sb[0]=-50.56835%2C68.83371%2C-49.00139%2C69.56591&fi=ATM&as[instrument][0]=ATM&tl=1646767454.667!3!!&m=67.08956509568404!-48.11572265625!6!1!0!0%2C2
Sentinel-2: https://registry.opendata.aws/sentinel-2/
Landsat: https://registry.opendata.aws/usgs-landsat/

## Acquire non-cloud data and open: ATM data access
__Zach__

### Opening and manipulating data in Pandas
__Tasha or Zach__

## Download images from the cloud:
__Tasha__

An Amazon Web Services (AWS) account is required to directly access Landsat data in
the cloud housed within the usgs-landsat S3 requester pays bucket. Users can also find
all objects within the Landsat record by using either the SpatioTemporal Asset Catalog
(STAC) Browser (s3) or Satellite (SAT) Application Programing Interface (API) (https).

Scene location - https://landsat-pds.s3.amazonaws.com/c1/L8/202/025/LC08_L1TP_202025_20190420_20190507_01_T1/ \
Scene band url for download - https://landsat-pds.s3.amazonaws.com/c1/L8/202/025/LC08_L1TP_202025_20190420_20190507_01_T1/LC08_L1TP_202025_20190420_20190507_01_T1_B4.TIF

#### Using data in the cloud from S3 buckets:
Simple Storage Services (S3) bucket...

Landsat S3 User Manual: https://d9-wret.s3.us-west-2.amazonaws.com/assets/palladium/production/s3fs-public/atoms/files/LSDS-2032-Landsat-Commercial-Cloud-Direct-Access-Users-Guide-v2.pdf.pdf

Scene band url for download - s3://usgs-landsat/collection02/level-2/standard/oli-tirs/2020/202/025/LC08_L1TP_202025_20190420_20190507_01_T1/LC08_L1TP_202025_20190420_20190507_01_T1_B4.TIF

## Find and open Landsat images in the cloud
__Tasha__

If want to only open one Landsat band and not talk too much about COGs: http://www.acgeospatial.co.uk/cog-part-2-python/

SpatioTemporal Asset Catalog (STAC) https://github.com/radiantearth/stac-spec

In [None]:
# Sets up credentials for acquiring images through dask/xarray
os.environ["AWS_REQUEST_PAYER"] = "requester"

# Need to set up proper credentials for acquiring data through rasterio
aws_session = AWSSession(boto3.Session(), requester_pays=True)

In [None]:
# Search STAC API for Landsat images based on a bounding box, date and other metadata if desired

satellite = 'Landsat8'

bbox = (-105.30, -74.970, -105.08, -74.920) #(west, south, east, north) 

timeRange = '2022-01-20/2022-01-30'

band = 'blue'
colnm = ['landsat:wrs_path','landsat:wrs_row']
end = '_T2' # end of file name if don't want all
qa_band = 'qa_pixel'
    
results = Search.search(url='https://ibhoyw8md9.execute-api.us-west-2.amazonaws.com/prod',
                        collection='usgs-landsat/collection02/',
                        datetime=timeRange,
                        bbox=bbox,    
                        # properties=properties,
                        sort=['<datetime'])

print('%s items' % results.found())
items = results.items()

# Save search to geojson file
gjson_outfile = f'/home/jovyan/website2022/book/tutorials/DataIntegration/{satellite}.geojson'
items.save(gjson_outfile)

Load the geojson file into geopandas and inspect

In [None]:
# Load the geojson file
gf = gpd.read_file(gjson_outfile)
gf.head()

In [None]:
gf.columns

In [None]:
# Plot search AOI and frames on a map using Holoviz Libraries (more on these later)
cols = gf.loc[:,('id',colnm[0],colnm[1],'geometry')]
alpha = 1/results.found()**0.5 # transparency scales w number of images

footprints = cols.hvplot(geo=True, line_color='k', hover_cols=[colnm[0],colnm[1]], alpha=alpha, title=satellite)
# footprints = cols.hvplot(geo=True, line_color='k', hover_cols=['landsat:wrs_path','landsat:wrs_row'], alpha=0.1, title='Landsat 8 T1')
tiles = gv.tile_sources.CartoEco.options(width=700, height=500) 
labels = gv.tile_sources.StamenLabels.options(level='annotation')
tiles * footprints * labels

#### Intake all scenes - intake-STAC

In [None]:
catalog = intake.open_stac_item_collection(items)
list(catalog)

In [None]:
# Extra code for examining items
sceneid = 'LC08_L1GT_004113_20140326_20200911_02_T2'
sceneid = list(catalog)[0]
item = catalog[sceneid]
catalog[sceneid]['blue'].metadata
items[1].asset('blue')['eo:bands']
item

In [None]:
# This is the url needed to grab data and it needs to be plugged in manually below
items[1].asset('blue')['alternate']['s3']['href'] # can use item asset name or title
item.blue.metadata['alternate']['s3']['href'] # must use item asset name 

In [None]:
# Retrieve scene using rio
item_num = 0
if satellite=='Landsat8':
    item_url = items[item_num].asset(band)['alternate']['s3']['href']
elif satellite=='Sentinel2':
    item_url = items[item_num].asset('red')['href']

# Read and plot with grid coordinates 
with rio.Env(aws_session):
    with rio.open(item_url) as src:
        fig, ax = plt.subplots(figsize=(9,8))
        show(src,1)
        profile = src.profile
        arr = src.read(1)

# Plot by index
fig, ax = plt.subplots(figsize=(6,5))
plt.imshow(arr)

In [None]:
# Load dask, suppress warnings, each worker needs to run as requester pays
from dask.distributed import Client
import logging
client = Client(processes=True, n_workers=4, 
                threads_per_worker=1,
                silence_logs=logging.ERROR)
client.run(lambda: os.environ["AWS_REQUEST_PAYER"] == "requester" )
client

In [None]:
sceneid = list(catalog)
sceneid

In [None]:
# Import to xarray with cloud mask 
# nans are in locations where concat of multiple scenes has expanded the grid

scenes = []

# Create time variable for time dim
time_var = xr.Variable('time',gf.loc[gf.id.isin([item for item in sceneid if item.endswith('_T2')])]['datetime'])
    
for item in sceneid:
    if item.endswith('_T2'):
        item = catalog[item]
        print (item.name)
        
        bands = []
        band_names = []
        
        # Get band names
        for k in item.keys():
            M = getattr(item, k).metadata
            if 'eo:bands' in M:
                resol = M['eo:bands'][0]['gsd']
#                 print(k, resol)
                if resol >= 30: # thermal bands are up sampled from 100 to 30
                    band_names.append(k)
                    
        # Add qa band
        band_names.append('qa_pixel')
        
        # Construct xarray for scene
        for band_name in band_names:
            band = item[band_name](chunks=dict(band=1, x=2048, y=2048),urlpath=item[band_name].metadata['alternate']['s3']['href']).to_dask() # Specify chunk size, landsat is prob in 512 chunks so used multiple
            band['band'] = [band_name]
#           band = band.expand_dims(time=[pd.to_datetime(item.metadata['datetime'])]) # doesn't give right dates
            bands.append(band)
        scene = xr.concat(bands, dim='band')
        scenes.append(scene)

# Concatenate scenes with time variable ***This is the slowest
ts_scenes = xr.concat(scenes, dim=time_var)

# Get epsg for Landsat images
# epsg = item.metadata['proj:epsg']
# epsg = ts_scenes.crs
pix = ts_scenes.transform[0]

ts_scenes

In [None]:
sbands = ['blue', 'nir08', 'swir16','cirrus', 'lwir11']
sub_box = 1

# for t in thw.time.values:
t = thw.time.values[-6]
# Masks added to data
image = thw.sel(time=t,band=sbands).where(thw.sel(time=t).mask != 2)# * buffer_mask

if sub_box==1:
    # And further subset to polynya location if desired 
    pulx = -1590700
    puly = -428200
    plrx = -1584000
    plry = -433600
    image = image.sel(y=slice(plry,puly),x=slice(pulx,plrx))

# For use at the end for coordinates
pol_y = image.y
pol_x = image.x

image = np.array(image.where(image.notnull(),0))
image = np.moveaxis(image, 0, -1)

n_band = image.shape[2]
print (t)

In [None]:
fig, (ax,ax1) = plt.subplots(ncols=2,figsize=(6,4))
ax.imshow(image[:,:,0])
ax.title.set_text('Subset window')
(thw.sel(time=t,band='blue')).plot(ax=ax1)

### NASA GIBS basemap

NASA's [Global Imagery Browse Services (GIBS)](https://earthdata.nasa.gov/eosdis/science-system-description/eosdis-components/gibs) is a great Web Map Tile Service (WMTS) to visualize NASA data as pre-rendered tiled raster images. The NASA [Worldview](https://worldview.earthdata.nasa.gov) web application is a way to explore all GIBS datasets. We can also use ipyleaflet to explore GIBS datasets, like MODIS truecolor images, within a Jupyter Notebook. Use the slider in the image below to reveal the image from 2019-04-25:

### Exercise 1

Re-create the map above using different tile layers for both the right and left columns. Label a single point of interest with a marker, such as the city of Grand Junction, Colorado.

In [None]:
# Add your solution for exercise 1 here!

## Summary

 🎉 Congratulations! You've completely this tutorial and have seen how we can add  notebook can be formatted, and how to create interactive map visualization with ipyleaflet.
 

```{note}
You may have noticed Jupyter Book adds some extra formatting features that do not necessarily render as you might expect when *executing* a noteook in Jupyter Lab. This "admonition" note is one such example.
```

:::{warning}
Jupyter Book is very particular about [Markdown header ordering](https://jupyterbook.org/structure/sections-headers.html?highlight=headers#how-headers-and-sections-map-onto-to-book-structure) to automatically create table of contents on the website. In this tutorial we are careful to use a single main header (#) and sequential subheaders (#, ##, ###, etc.)
:::

## References

To further explore the topics of this tutorial see the following detailed documentation:

* [Jupyter Book rendering of .ipynb notebooks](https://jupyterbook.org/file-types/notebooks.html)
* [Jupyter Book guide on writing narrative content](https://jupyterbook.org/content/index.html)
* [ipyleaflet documentation](https://ipyleaflet.readthedocs.io)