# Working with EMIT L2A Reflectance

## Summary    

This notebook will explain how to access Earth Surface Minteral Dust Source Investigation (EMIT) data programmaticly using the [earthaccess python library](https://github.com/nsidc/earthaccess). `earthaccess` is useful python library that reduces finding and downloading or streaming data over https or s3 to only a few lines of code. `earthaccess` searches NASA's Common Metadata Repository (CMR), a metadata system that catalogs Earth Science data and associated metadata records, then can be used to download granules or generate lists granule search result URLs.  

## Requirements  

- A NASA [Earthdata Login](https://urs.earthdata.nasa.gov/) account is required   

## Learning Objectives  

- How to find EMIT data using `earthaccess`
- How to work with EMIT reflectance data
- How to mask and quality filter EMIT reflectance data

## Exercise 

Import the Python libraries we need. 

In [None]:
# Import Packages
import warnings
# Some cells may generate warnings that we can ignore. Comment below lines to see.
warnings.filterwarnings('ignore')

import os
import math
import earthaccess
import numpy as np
from osgeo import gdal
import rasterio as rio
import rioxarray as rxr
import hvplot.xarray
import hvplot.pandas
import holoviews as hv
import geopandas as gp

import sys
sys.path.append('../../tools/emit/python/modules/')
from emit_tools import emit_xarray, ortho_xr

Note that we that we are importing a local module for handling EMIT data.

### Authenticate  

Earthdata Login credentials (i.e., username and password) are required to access NASA Earthdata data assets. We will use the `earthaccess` package to authenticate using our Earthdata Login credentials. 

In [None]:
auth = earthaccess.login(persist=True)
auth.refresh_tokens()

### Query for EMIT Data  

In this exercise, we want to find the EMIT L2 reflectance granules/scenes that intersect with our regions of interest (ROI) and for our specified date range. We can read in a geojson file containing our ROI and pass the bounding box of the feature to the `earthaccess` `search_data()` function to identify granules/scenes that we are interested in.  

In [None]:
geojson = gp.read_file('../../data/co_agriculture.geojson')
geojson.geometry

In [None]:
bbox = tuple(list(geojson.total_bounds))
bbox

Pass the **bbox** python variable to the `bounding_box` argument and enter a start and end date, as a python tuple, to the `temporal` argument.

In [None]:
results = earthaccess.search_data(
    short_name='EMITL2ARFL',
    bounding_box=bbox,
    temporal=('2023-06-01','2023-09-30'),
    count=100
)

Use `data_links()` convienence function to extract the data links for all of the granules. In this case there are multiple files associated with a single granule.

In [None]:
emit_results_urls = [granule.data_links() for granule in results]
emit_results_urls[:2]

Combine our list of lists into a single list.

In [None]:
url_list = [url for urls in emit_results_urls for url in urls]
url_list[:10]

### Working with EMIT Data

EMIT collections and their associated granules are archived and distributed from NASA's Earthdata Cloud. Because of this, data assets/files can be accessed with in a variety of ways. Data distributed from Earthdata Cloud can be:  
- *Downloaded* - This has been available since the existance of the NASA DAAC. Users can use the data link(s) to download the data files to their local working environment. This can be done whether the user is working from a non-cloud or cloud environment.
- *Streamed* - Streaming is on-the-fly random reading of remote files, i.e. files not saved locally. The data accessed, however, must be able to be held in the workspaces' memory. This can be done whether the user is working from a non-cloud or cloud environment.
- *Acccessed in-place (i.e., direct s3 access)* - This is only available for working environment deployed in AWS us-west-2.

#### Download Data  

The `download()` function from `earthaccess` can be used to efficiently download the data links from a `earthaccess` search results. A list of URLs can also be passed to the function. The convient part of using the `download()` function is that authentication is taken care of on behave of the user.  

In [None]:
outloc = '../../data/'

Download `earthaccess` search results

In [None]:
earthaccess.download(results, local_path=outloc)

Download from URL list

In [None]:
#earthaccess.download(url_list, local_path=outloc)

#### Streaming Data  

Data in NASA Earthdata Cloud can be read into the workspace by streaming the data, that is, no download is required. Here we will assign a single URL for EMIT *reflectance* and for the *mask* layer from our **url_list** to read in and explore.

In [None]:
emit_rfl = url_list[0]
emit_rfl

In [None]:
emit_qa = url_list[2]
emit_qa

We need to pass our Earthdata Login credentials to stream data from NASA's Earthdata Cloud. We will use `earthaccess`' `get_fsspec_https_session()` function to pass this information and allow us to access these data.

In [None]:
# Get HTTPS Session using Earthdata Login Info
fs = earthaccess.get_fsspec_https_session()

# Use the session (i.e., fs) to connect to the file
emit_fp = fs.open(emit_rfl)
emit_qa_fp = fs.open(emit_qa)

We now have an authenicated connection to the data links. We can now start exploring these data.

#### Opening and Exploring EMIT Reflectance Data

EMIT L2A Reflectance Data are distributed in a non-orthocorrected spatially raw NetCDF4 (.nc) format consisting of the data and its associated metadata. Inside the L2A Reflectance `.nc` file there are 3 groups. Groups can be thought of as containers to organize the data. 

1. The root group that can be considered the main dataset contains the reflectance data described by the downtrack, crosstrack, and bands dimensions.  
2. The `sensor_band_parameters`  group containing the wavelength center and the full-width half maximum (FWHM) of each band.  
3. The `location` group contains latitude and longitude values at the center of each pixel described by the crosstrack and downtrack dimensions, as well as a geometry lookup table (GLT) described by the ortho_x and ortho_y dimensions. The GLT is an orthorectified image (EPSG:4326) consisting of 2 layers containing downtrack and crosstrack indices. These index positions allow us to quickly project the raw data onto this geographic grid.

To work with the EMIT data, we will use the `emit_tools` module. There are other ways to work with the data and a more thorough explanation of the `emit_tools` in the [EMIT-Data-Resources Repository](https://github.com/nasa/EMIT-Data-Resources).

Open the example EMIT scene using the `emit_xarray` function. In this step we will use the `ortho=False` argument (default) read in the data in its source non-orthocorrected form. 

In [None]:
# Load the data to speed up future cells
emit_ds = emit_xarray(emit_fp, ortho=False).load()
emit_ds

Since the **wavelengths** coordinate variable is indexed, we can use `sel()` functions to filter for specific wavelength values from our EMIT datacube.

In [None]:
emit_ds['reflectance'].sel(wavelengths=380, method='nearest').plot()

Note the orientation of the plotted image. Remember this is not orthocorrected and thus is not north up. You may notice that EMIT radiance and reflenctance scenes a rows of missing data in some scenes. This is due to EMIT's on-board cloud filtering. Additional, filtering can be applied using the **mask** layer (example later in this exercise).

In [None]:
# %%time
# # This cell isn't needed - Just another capability to subset the streamed data before orthorectifying
# # This is more efficient in terms of memory and slightly faster, but doesn't look as nice for the interactive explore cell
# # Load polygon
# shape = gp.read_file("../data/dangermond_boundary.geojson")
# # Subset and load
# emit_ds = spatial_subset(emit_xarray(emit_fp), shape).load()

We will now create an orthocorrected image of our data using the `ortho_xr()` function from the `emit_tools` module.

In [None]:
emit_ds = ortho_xr(emit_ds)
emit_ds

In [None]:
emit_ds['reflectance'].sel(wavelengths=380, method='nearest').plot()

We now have a orthorectified image that is north up! 

Using the `good_wavelengths` flag from the `sensor_band_parameters` group, we can mask out bands where water absorption features were assigned a value of -0.01 reflectance. Typically data around 1320-1440 nm and 1770-1970 nm is noisy due to the moisture present in the atmosphere; therefore, these spectral regions offer little information about targets and can be excluded from calculations. 

In [None]:
emit_ds['reflectance'].data[:,:,emit_ds['good_wavelengths'].data==0] = np.nan
emit_ds['reflectance'].data[emit_ds['reflectance'].data == -9999] = np.nan

##### Plot a Spectra  

We will now plot the spectra of an individual pixel closest to a specified latitude and longitude we want using the `sel` function from `xarray`.  

In [None]:
scene_center = emit_ds.latitude.values[int(len(emit_ds.latitude)/2)],emit_ds.longitude.values[int(len(emit_ds.longitude)/2)]
scene_center

In [None]:
#point = emit_ds.sel(latitude=scene_center[0],longitude=scene_center[1], method='nearest')
point = emit_ds.sel(latitude=40.2, longitude=-105.6, method='nearest')

point.hvplot.line(y='reflectance', 
                  x='wavelengths', 
                  color='black').opts(title=f'Latitude = {point.latitude.values.round(3)}, Longitude = {point.longitude.values.round(3)}')

We can also plot individual bands spatially by selecting a wavelength, then plotting. Select the band with a wavelengths of 850 nm and plot it using ESRI imagery as a basemap to get a better understanding of where the scene was acquired. 

In [None]:
emit_layer = emit_ds.sel(wavelengths=850,method='nearest')

emit_layer.hvplot.image(cmap='viridis',
                        geo=True, 
                        tiles='ESRI', 
                        crs='EPSG:4326', 
                        frame_width=720,
                        frame_height=405, 
                        alpha=0.7, 
                        fontscale=2).opts(title=f"{emit_layer.wavelengths:.3f} {emit_layer.wavelengths.units}", xlabel='Longitude',ylabel='Latitude')

##### Applying Quality Masks to EMIT Data

The EMIT L2A Mask file contains some bands that are direct masks (Cloud, Dilated, Cirrus, Water, Spacecraft), and some (AOD550 and H2O (g cm-2)) that contain information calculated during the L2A reflectance retrieval. These may be used as additional screening, depending on the application.

> Note: It is more memory efficient to apply the mask before orthorectifying, so during the automation section we will do that.

In [None]:
emit_mask = emit_xarray(emit_qa_fp, ortho=True)
emit_mask

List the quality flags contained in the `mask_bands` dimension.

In [None]:
emit_mask.mask_bands.data.tolist()

As mentioned, we will use the `Dilated Cloud Flag`. Select that band with the `sel` function as we did for wavelengths before.

In [None]:
emit_cloud_mask = emit_mask.sel(mask_bands='Dilated Cloud Flag')

Now we can visualize our aggregate quality mask. You may have noticed before that we added a lot of parameters to our plotting function. If we want to consistently apply the same formatting for multiple plots, we can add those arguments to a dictionary that we can unpack into `hvplot` functions using `**`.

Create two dictionaries with plotting options.

In [None]:
size_opts = dict(frame_height=405, frame_width=720, fontscale=2)
map_opts = dict(geo=True, crs='EPSG:4326', alpha=0.7, xlabel='Longitude',ylabel='Latitude')

In [None]:
emit_cloud_mask.hvplot.image(cmap='viridis', tiles='ESRI', **size_opts, **map_opts)

Values of 1 in the mask indicate areas to omit. Apply the mask to our EMIT Data by assigning values where the `mask.data == 1` to `np.nan`

In [None]:
emit_ds.reflectance.data[emit_cloud_mask.mask.data == 1] = np.nan

We can confirm our masking worked with a spatial plot.

In [None]:
emit_layer_filtered_plot = emit_ds.sel(wavelengths=850, method='nearest').hvplot.image(cmap='viridis',tiles='ESRI',**size_opts, **map_opts)
emit_layer_filtered_plot

#### Create Interactive Spectral Plots

Combining the Spatial and Spectral information into a single visualization can be a powerful tool for exploring and inspecting imaging spectroscopy data. Using the streams module from Holoviews we can link a spatial map to a plot of spectra.

We could plot a single band image as we previously have, but using a multiband image, like an RGB may help infer what targets we're examining. Build an RGB image following the steps below.

Select bands to represent red (650 nm), green (560 nm), and blue (470 nm) by finding the nearest to a wavelength chosen to represent that color.


In [None]:
emit_rgb = emit_ds.sel(wavelengths=[650, 560, 470], method='nearest')

We may need to adjust balance the brightness of the selected wavelengths to make a prettier map. **This will not affect the data, just the visuals.** To do this we will use the function below. We can change the `bright` argument to increase or decrease the brightness of the scene as a whole. A value of 0.2 usually works pretty well.

In [None]:
def gamma_adjust(rgb_ds, bright=0.2, white_background=False):
    array = rgb_ds.reflectance.data
    gamma = math.log(bright)/math.log(np.nanmean(array)) # Create exponent for gamma scaling - can be adjusted by changing 0.2 
    scaled = np.power(np.nan_to_num(array,nan=1),np.nan_to_num(gamma,nan=1)).clip(0,1) # Apply scaling and clip to 0-1 range
    if white_background == True:
        scaled = np.nan_to_num(scaled, nan = 1) # Assign NA's to 1 so they appear white in plots
    rgb_ds.reflectance.data = scaled
    return rgb_ds

In [None]:
emit_rgb = gamma_adjust(emit_rgb,white_background=True)

Now that we have an RGB dataset, we can use that to create a spatial plot, and data selected by clicking on that 'map' can be inputs for a function to return values from the full dataset at that latitude and longitude location using the cell below. To visualize the spectral and spatial data side-by-side, we use the Point Draw tool from the holoviews library.

Define a limit to the quantity of points and spectra we will plot, a list of colors to cycle through, and an initial point. Then use the input from the Tap function to provide clicked x and y positions on the map and use these to retrieve spectra from the dataset at those coordinates.

Click in the RGB image to add spectra to the plot. You can also click and hold the mouse button then drag previously place points. To remove a point click and hold the mouse button down, then press the backspace key.

In [None]:
# Interactive Points Plotting
# Modified from https://github.com/auspatious/hyperspectral-notebooks/blob/main/03_EMIT_Interactive_Points.ipynb
POINT_LIMIT = 10
color_cycle = hv.Cycle('Category20')

# Create RGB Map
map = emit_rgb.hvplot.rgb(fontscale=1.5, xlabel='Longitude',ylabel='Latitude',frame_width=480, frame_height=480)

# Set up a holoviews points array to enable plotting of the clicked points
xmid = emit_ds.longitude.values[int(len(emit_ds.longitude) / 2)]
ymid = emit_ds.latitude.values[int(len(emit_ds.latitude) / 2)]

first_point = ([xmid], [ymid], [0])
points = hv.Points(first_point, vdims='id')
points_stream = hv.streams.PointDraw(
    data=points.columns(),
    source=points,
    drag=True,
    num_objects=POINT_LIMIT,
    styles={'fill_color': color_cycle.values[1:POINT_LIMIT+1], 'line_color': 'gray'}
)

posxy = hv.streams.PointerXY(source=map, x=xmid, y=ymid)
clickxy = hv.streams.Tap(source=map, x=xmid, y=ymid)

# Function to build spectral plot of clicked location to show on hover stream plot
def click_spectra(data):
    coordinates = []
    if data is None or not any(len(d) for d in data.values()):
        coordinates.append(clicked_points[0][0], clicked_points[1][0])
    else:
        coordinates = [c for c in zip(data['x'], data['y'])]
    
    plots = []
    for i, coords in enumerate(coordinates):
        x, y = coords
        data = emit_ds.sel(longitude=x, latitude=y, method="nearest")
        plots.append(
            data.hvplot.line(
                y="reflectance",
                x="wavelengths",
                color=color_cycle,
                label=f"{i}"
            )
        )
        points_stream.data["id"][i] = i
    return hv.Overlay(plots)

def hover_spectra(x,y):
    return emit_ds.sel(longitude=x,latitude=y,method='nearest').hvplot.line(y='reflectance',x='wavelengths',
                                                                            color='black', frame_width=400)
# Define the Dynamic Maps
click_dmap = hv.DynamicMap(click_spectra, streams=[points_stream])
hover_dmap = hv.DynamicMap(hover_spectra, streams=[posxy])
# Plot the Map and Dynamic Map side by side
hv.Layout(hover_dmap*click_dmap + map * points).cols(2).opts(
    hv.opts.Points(active_tools=['point_draw'], size=10, tools=['hover'], color='white', line_color='gray'),
    hv.opts.Overlay(show_legend=False, show_title=False, fontscale=1.5, frame_height=480)
)

We can take these selected points and the corresponding reflectance spectra and save them as a `.csv` for later use.

Select 10 points by adding to the figure above. We will save these and use them in a to calculate Equivalent Water Thickness or Canopy water content in the next notebook.

Build a dictionary of the selected points and spectra, then export the spectra to a .csv file.

In [None]:
data = points_stream.data
wavelengths = emit_ds.wavelengths.values

rows = [["id", "x", "y"] + [str(i) for i in wavelengths]]
 
for p in zip(data['x'], data['y'], data['id']):
    x, y, i = p
    spectra = emit_ds.sel(longitude=x, latitude=y, method="nearest").reflectance.values
    row = [i, x, y] + list(spectra)
    rows.append(row)

We've preselected 10 points, but feel free to uncomment the cell below to use your own. This will overwrite the file containing the preselected points.

In [None]:
# with open('../data/emit_click_data.csv', 'w', newline='') as f:
#     writer = csv.writer(f)
#     writer.writerows(rows)

#### Cropping EMIT data to a Region of Interest

To crop our dataset to our ROI we first need to open a shapefile of the region. Open the included `geojson` for Sedgwick Reserve and Plot it onto our EMIT 850nm reflectance spatial plot. To ensure the plotting of the shape and EMIT scene works, be sure to specify the CRS (this is done for the image in the `map_opts` dictionary).

In [None]:
shape = gp.read_file('../../data/co_agriculture.geojson')
shape

In [None]:
emit_ds.sel(wavelengths=850, method='nearest').hvplot.image(cmap='viridis',**size_opts,**map_opts,tiles='ESRI')*shape.hvplot(color='#d95f02',alpha=0.5, crs='EPSG:4326')

Now use the `clip` function from `rasterio` to crop the data to our ROI using our shape's `geometry` and `crs`. The `all_touched=True` argument will ensure all pixels touched by our polygon will be included.

In [None]:
emit_cropped = emit_ds.rio.clip(shape.geometry.values,shape.crs, all_touched=True)

Plot the cropped data.

In [None]:
emit_cropped.sel(wavelengths=850,method='nearest').hvplot.image(cmap='viridis', tiles='ESRI', **size_opts, **map_opts)

#### Write an output

Lastly for our EMIT dataset, we can write a smaller output that we can use in later notebooks, to calculate Canopy water content or other applications. We use the `granule_id` from the dataset to keep a similar naming convention.

In [None]:
# Write Clipped Output
emit_cropped.to_netcdf(f'../data/{emit_cropped.granule_id}_dangermond.nc')