## Explore and analyze EMIT surface reflectance

In this notebook, we will continue our EMIT agriculture tutorial by using the EMIT data we previously downloaded and prepared with the additional cropland statistics and geospatial layers you have already copied into your working directory in the previous steps. Specifically, we will show you how to crop EMIT data to the spatial extent of the cropland GIS data, create new maps of spectral vegetation indices using EMIT images, identify specific crop types in EMIT data using the GIS layers, extract EMIT data from specific crop types, and compare and contrast spectral differences in the cropping systems.

We will use the image granules that you orthorectified in the previous step **"1_Orthorectify_images.ipynb"**

Goals for this notebook:
- Open and plot EMIT orthorectified data
- Create Spectral Vegetation Indices (SVI)
- Examine the spectral profiles of major crop types
- Examine the differences in SVIs between crop types and irrigation status

### Step 1. Setup notebook

Import packages. We have a long list here because we are bringing together a lot of different parts in this one notebook!

In [None]:
import os, sys, fnmatch
import warnings
from osgeo import gdal
import numpy as np
import math
import rasterio as rio
import xarray as xr
import holoviews as hv
import hvplot.xarray
import pandas as pd
import hvplot.pandas
import geopandas as gpd
import rioxarray as rxr
import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns
import folium
from branca.element import Figure
from shapely.geometry import Point
from shapely.geometry import mapping

# Get custom functions
sys.path.append(os.path.join(os.path.expanduser("~"),"HYR-SENSE","tools","functions"))
from spectral_index import *

# Setup paths for notebook
datadir = os.path.join(os.path.expanduser('~'),'HYR-SENSE/data/Agriculture/')
figdir = os.path.join(os.path.expanduser('~'),'HYR-SENSE/data/Agriculture/output')
source_file_path = os.path.join(os.path.expanduser("~"),"HYR-SENSE","data","Agriculture","emit")

# Save figures?
savefigs = True # True/False

# Define some projections to use in the notebook
albers = 'EPSG:5070' # Albers Equal Area CONUS
llwgs84 = 'EPSG:4326' # Geographic - Latitude/Longitude WGS84

# This will ignore some warnings caused by holoviews
warnings.simplefilter('ignore')

In [None]:
# Prep output of saving figures
if savefigs:
    if not os.path.exists(figdir):
        os.makedirs(figdir)
        
print(" ")
print("****** Figure Output Folder ******")
print(figdir)
print(" ")
print(" ")

### Step 2. Find all downloaded and orthorectified agricultural images

In [None]:
# Define workflow which selects the appropriate image data folder
print("*** EMIT data folder: " + source_file_path)
print("")
print("*** GIS data folder: " + datadir)
print("")

In [None]:
### List all of the available data located in the EMIT data folder
granules = fnmatch.filter(os.listdir(source_file_path), '*ortho.nc')
print("*** EMIT Data Granules ***")
granules

### Step 3. Select and load a previously orthorectified EMIT image

First let's remind ourselves where we currently EMIT data ready to be analyzed.  We can do this by re-plotting the EMIT bounding boxes that we saved in the data search and download step. 

For this notebook, we will use just one of the granules which is centered over Yuma, CO.

In [None]:
### Load the EMIT bounding boxes and plot on the map
gdf_all = gpd.read_file(os.path.join(datadir,'emit_granule_footprints.gpkg'))
gdf_all = gdf_all.to_crs(albers)

# Filter to the Yuma, CO granule
gdf_yuma = gdf_all[gdf_all['meta.native-id'] == 'EMIT_L2A_RFL_001_20230729T205630_2321014_019']
gdf_yuma = gdf_yuma.to_crs(albers)

In [None]:
print("*** EMIT granule bounding boxes ***")
gdf_all

Let's review where we have already downloaded and prepared EMIT datasets

In [None]:
# Create the interactive map using folium
fig = Figure(width="750px", height="375px")
map1 = folium.Map(tiles='https://mt1.google.com/vt/lyrs=y&x={x}&y={y}&z={z}', attr='Google')
fig.add_child(map1)

gdf_all.explore(
    "meta.native-id",
    categorical=True,
    tooltip=[
        "meta.native-id",
        "start_datetime",
    ],
    popup=True,
    style_kwds=dict(fillOpacity=0.1, width=2),
    name="EMIT",
    m=map1,
    legend=False
)

map1.fit_bounds(map1.get_bounds(), padding=(30, 30))
display(fig)

For now, let's select a single scene ID to explore the data more closely. We will choose the scene that covers Yuma Colorado, shown in blue below

In [None]:
### Show just the Yuma CO scene
gdf_yuma.explore(
    "meta.native-id",
    categorical=True,
    tooltip=[
        "meta.native-id",
        "start_datetime",
    ],
    popup=True,
    style_kwds=dict(fillOpacity=0.91, width=5),
    name="EMIT",
    m=map1,
    legend=False
)
map1.fit_bounds(map1.get_bounds(), padding=(30, 30))
display(fig)

In [None]:
# Pick an image granule to explore - we will start with EMIT_L2A_RFL_001_20230729T205630_2321014_019_ortho.nc
# Load the selected image to memory
img_file = 'EMIT_L2A_RFL_001_20230729T205630_2321014_019_ortho.nc'
img_file_dat = os.path.join(source_file_path,img_file)
ds_geo = xr.open_dataset(img_file_dat)
ds_geo

### Step 4. Quickly display the selected orthorectified image

Here we will view the selected orthorectified image that contains Yuma Colorado, shown below with the yellow dot

In [None]:
# Selecting the 850 nm wavelength for display
refl850 = ds_geo.sel(wavelengths=850, method='nearest')
yuma_df = [[40.1222,-102.7252]]
yuma_df = pd.DataFrame(yuma_df, columns=['Latitude', 'Longitude'])

# Plot the EMIT band
img_plot = ds_geo.sel(wavelengths=850, method='nearest').hvplot.image(cmap='Viridis', geo=True, tiles='ESRI', alpha=0.8, frame_height=600).opts(
    title=f"Reflectance at {refl850.wavelengths.values:.3f} {refl850.wavelengths.units} (Orthorectified)")
pt_plot = yuma_df.hvplot.points("Longitude", "Latitude", geo=True, color="yellow", alpha=0.9, s=250, global_extent=False)
plots = img_plot * pt_plot
plots

We can see that the orthorectification step placed the data on a geo geographic that matches pretty well with ESRI tiles. Now that we have a better idea of what the target area looks like, we can also plot the spectra using the georeferenced data. 

We can also display three bands side-by-side that cover the visible through the near-infrared to look a the differences in surfaec reflectance at the three different wavelengths (EMIT band centers)

In [None]:
refl550 = ds_geo.sel(wavelengths=550, method='nearest');refl650 = ds_geo.sel(wavelengths=650, method='nearest')
# create the side by side plot
plots = refl550.hvplot.image(cmap='viridis', aspect = 'equal', frame_width=300).opts(title="Band: 550") + \
refl650.hvplot.image(cmap='viridis', aspect = 'equal', frame_width=300).opts(title="Band: 650") + \
refl850.hvplot.image(cmap='viridis', aspect = 'equal', frame_width=300).opts(title="Band: 850")
plots

In [None]:
# Clear up some memory before we continue
del refl550, refl650, refl850

### Step 5. Plot example spectra

Now let's plot some example spectra found in the image. Before we do this, we should filter out the water absorption bands like we did earlier. By limiting the third dimension of the array to good_wavelengths.

In [None]:
ds_geo['reflectance'].data[:,:,ds_geo['good_wavelengths'].data==0] = np.nan
print(f"Data type: {type(ds_geo)}; Shape: {ds_geo.rio.shape}")

Now let's select some random points from within the scene covering different crop types and crop stages from center-pivot irrigation (A), rainfed (B), and bare (C)

In [None]:
point1 = ds_geo.sel(longitude=-102.694,latitude=40.347,method='nearest')
point2 = ds_geo.sel(longitude=-102.957,latitude=40.160,method='nearest')
point3 = ds_geo.sel(longitude=-102.516,latitude=40.428,method='nearest')
spectra_plots = point1.hvplot.line(y='reflectance',x='wavelengths', color='black', frame_height=400, frame_width=440).opts(
    title = f'A) Latitude = {point1.latitude.values.round(3)}, Longitude = {point1.longitude.values.round(3)}') + \
point2.hvplot.line(y='reflectance',x='wavelengths', color='black', frame_height=400, frame_width=440).opts(
    title = f'B) Latitude = {point2.latitude.values.round(3)}, Longitude = {point2.longitude.values.round(3)}') + \
point3.hvplot.line(y='reflectance',x='wavelengths', color='black', frame_height=400, frame_width=440).opts(
    title = f'C) Latitude = {point3.latitude.values.round(3)}, Longitude = {point3.longitude.values.round(3)}') 
spectra_plots

In [None]:
# Clear up the memory before we continue
del point1, point2, point3

Based on what you see above, what can you take away from these different spectral plots about the status, vigor, and cover of different crops from these three different agricultural fields?

**A)** ![Group2_proximal_ASD_reflectance_12.png](../../images/Group2_proximal_ASD_reflectance_12.png) **B)** ![Group1_proximal_ASD_reflectance_11.png](../../images/Group1_proximal_ASD_reflectance_11.png)

We can also quickly compare the "proximal" visible through shortwave infrared (VSWIR) spectra you collected on Monday with the ASD to the example spectra from EMIT. Proximal spectra refers to spectra collected close to, but not in contact with, the surface/materials of interest, such as the soil or plant canopy. You can see that despite the EMIT spectra coming from agricultural fields, and the proximal data collected at the CU Boulder "Confluence", clear similarities can be seen in the shapes of the image spectra shown in the EMIT panels B and C. This is because the data from the CU Boulder Confluence represented mixed grasses with different fractions of dry and live grasses as well as exposed soil and the EMIT spectra panels B and C are representative of ag fields with lower vegetation cover (B) and fields with mostly exposed soil and dry plant materials (EMIT panel C).

### Step 6. Spectral vegetation indices

Before we proceed with our analysis of the spectral differences and characteristics of different crop types in the Yuma, CO region let's first discuss what we can learn about plant health and status using high spectral resolution data like EMIT. We will review what information is contained in the spectral domain as well as how we can use spectral vegetation indices to tease out subtle differences in plant pigments, water content and physiology using just the raw spectral data provided by EMIT.

For more examples of how to calculate spectral vegetation indices with EMIT, you can review the "How To" notebook **"Calculate_spectral_vegetation_indices.ipynb"** located in the HYR-SENSE repo [here](https://github.com/CU-ESIIL/HYR-SENSE/blob/main/notebooks/how_to/Calculate_spectral_vegetation_indices.ipynb).  In this notebook we will proceed with our analysis using the functions provided in the functions file [spectral_index.py](https://github.com/CU-ESIIL/HYR-SENSE/blob/main/tools/functions/spectral_index.py) and use these SVIs later to compare and contrast the water, pigment, and physiological differences of our select crop types.

First, let's take a moment to consider again what remote sensing imagery like those provided by EMIT can tell us about the underlying vegetation. The internal structure and biochemistry of leaves (A) within a canopy control the optical signatures observed by remote sensing instrumentation (B). The amount of incident radiation that is reflected by, transmitted through, or absorbed by leaves within a canopy is regulated by these structural and biochemical properties of leaves. For example, leaf properties such as a thick cuticle layer, high wax, and/or a large amount of leaf hairs can significantly influence the amount of first-surface reflectance (that is the reflected light directly off the outer leaf layer that does not interact with the leaf interior), causing less solar radiation to penetrate into the leaf. The thickness of the mesophyll layer associated with other properties, such as thicker leaves, can cause higher degree of internal leaf scattering, less transmittance through the leaf, and higher absorption in some wavelengths. Importantly, the diffuse reflectance out of the leaf is that modified by internal leaf properties and contains useful for mapping functional traits (B). High spectral resolution measurements of leaves and plant canopies enable the indirect, non-contact measurement of key structural and chemical absorption features that are associated with the physiological and biochemical properties of plants (B)

**A)** ![leaf_anatomy.jpg](../../images/leaf_anatomy_figure.jpg) **B)** ![spectral_signatures.jpg](../../images/spectral_signatures.jpg)

We can make use of all of the information contained within the emergent spectral signatures provided by vegetation.  We can do this by directly utilizing the spectral profiles of a leaf or an EMIT pixel, or we can instead target specific wavebands provided by data like EMIT to calculate a spectral vegetation index (SVI). SVIs range widely in the wavelengths/bands, structure, and applications. This article provides some background information ([https://www.nature.com/articles/s41597-023-02096-0](https://www.nature.com/articles/s41597-023-02096-0)) but you can find comprehensive list of SVIs [here](https://www.indexdatabase.de/)


For more information, you can review these select articles and resources that discuss how leaf and canopy structure, leaf chemical properties, and stress can alter the spectral signatures we see with remote sensing data like those provided by EMIT and how SVIs provide us a way to easily probe the properties of vegetation remotely.

[Sources of variability in canopy reflectance and the convergent properties of plants](https://doi.org/10.1111/j.1469-8137.2010.03536.x)

[Retrieval of foliar information about plant pigment systems from high resolution spectroscopy](https://doi.org/10.1016/j.rse.2008.10.019)

[Scaling Functional Traits from Leaves to Canopies](https://link.springer.com/chapter/10.1007/978-3-030-33157-3_3)

#### Calculate NDVI

We can start by calculating a very simple, yet powerful, and widely-used SVI, the normalized difference vegetation index (NDVI). NDVI has been used for over 40 years to study changes on the Earth's surface, specifically related to vegetation, stress, and agriculture. For more information on NDVI, you can explore this article from NASA: [https://earthobservatory.nasa.gov/features/MeasuringVegetation/measuring_vegetation_1.php](https://earthobservatory.nasa.gov/features/MeasuringVegetation/measuring_vegetation_1.php)

To calculate NDVI, we need to select which bands we want to include in the calculation. In general, NDVI is defined using a red and near-infrared band, so lets use a band centered at 650nm and another at 850nm, both squarely within the red and NIR wavelength range as shown in the example spectral graph above

The basic structure of the NDVI is: NDVI = (NIR−Red)/(NIR + Red)

In [None]:
### Let's calculate the NDVI using the provided normalized_diff function
ndvi = normalized_diff(input_xarray = ds_geo, band1=650, band2=850, index_name='ndvi', proj=llwgs84)
ndvi.hvplot.image(cmap='viridis', geo=True, tiles='ESRI', aspect = 'equal', frame_width=720, clim=(0,1)).opts(title="NDVI Image")

Above is the NDVI map generated from the Yuma CO scene. What do you new imformation or details do you notice in this new NDVI map?

In [None]:
### Now let's calculate 2 more SVIs and show the three SVIs side by side for comparison

# We can calculate the Normalized Difference Water Index that is designed to capture subtle variations in canopy water content
ndwi = normalized_diff(input_xarray = ds_geo, band1=2200, band2=864, index_name='ndwi', proj=llwgs84)

# We can also calculate the Red Edge NDVI that is highly sensitive the subtle variations in plant photosynthetic pigment content
reNDVI = normalized_diff(input_xarray = ds_geo, band1=705, band2=750, index_name='reNDVI', proj=llwgs84)

# Show the three plots side by side
ndvi.hvplot.image(cmap='viridis', geo=True, aspect = 'equal', frame_width=300, clim=(0,1)).opts(title="NDVI Image") + \
ndwi.hvplot.image(cmap='viridis', geo=True, aspect = 'equal', frame_width=300, clim=(0,1)).opts(title="NDWI Image") + \
reNDVI.hvplot.image(cmap='viridis', geo=True, aspect = 'equal', frame_width=300, clim=(0,1)).opts(title="reNDVI Image")

Based on what you see above, what are the similarities and differences across the three normalized-difference SVIs: NDVI, NDWI, and the reNDVI?  Are there particular areas that show strong similarities? Are there observable differences?

We will further explore the similarities and differences in crop condition and physiology using raw EMIT spectra and SVIs below.

### Step 7. Spectra of major crop types

Now that we have loaded and prepared our EMIT surface reflectance data, we can start to examine the spectral differences among crop types in the granule footprint. To do this we will use the **National Agricultural Statistics Service (NASS) Cropland Data Layer (CDL)** which provides estimated crop types with a 30-meter spatial resolution based on agricultural surveys and remote sensing data from the Landsat mission. We have pre-loaded a GeoTIFF of the NASS CDL for the South Platte and Republican river basins (recall in the first notebook when we plotted that data).

Using this data, we can identify the most common crop types within the area of our EMIT granule. We can also generate random samples of different crop types to compare the spectral differences between them. 

You can read more about the NASS CDL: https://www.nass.usda.gov/Research_and_Science/Cropland/SARS1a.php

#### Step 7a. Load, crop, and mask the NASS CDL GeoTIFF

The first step is to align our CDL raster with the EMIT granule prior to sampling the pixel values of different crop types. In the code below, we extract one band from the EMIT spectra to use as a reference array when matching the CDL data.

In [None]:
# Select a wavelength from the EMIT data to work with
# Define the desired wavelength (e.g., 550 or 850 nm)
wavelength = 850
# Select the data at the specific wavelength (closest to desired_wavelength)
emit_band = ds_geo['reflectance'].sel(wavelengths=wavelength, method='nearest')

# Extract the extent of the EMIT granule
emit_extent = emit_band.rio.bounds()
print(f"EMIT Granule Extent: {emit_extent}")

In [None]:
### Load the Cropland Data Layer (CDL) for our region
## Crop to the extent of our EMIT granule
## Mask no-data values to match the EMIT granule

# Load the CDL raster
cdl_path = os.path.join(datadir,'CDL_2023_CO_SouthPlatte_Republican.tif')
cdl = rxr.open_rasterio(cdl_path, masked=True, cache=False).squeeze().astype(rio.uint8)
cdl = cdl.rio.reproject("EPSG:4326")

# Clip the CDL raster to the EMIT dataset's extent
cdl_clip = cdl.rio.clip_box(*emit_extent)

# Print the unique pixel values
print(np.unique(cdl_clip.values))

# Clear up memory
del cdl



We successfully cropped the CDL to the extent of our EMIT band. However, we also want to make sure that we mask the CDL to retain only pixel which overlap non-null pixels in the EMIT band. To do this, we can create a null data mask from the EMIT band and apply this to the CDL.

In [None]:
# Create a mask of the NaN values from the selected wavelength band
nan_mask = emit_band.isnull().squeeze()
nan_mask = nan_mask.rio.write_crs("EPSG:4326") # Set the CRS

# Remove additional coordinates that are not relevant
nan_mask = nan_mask.drop_vars(['wavelengths', 'fwhm', 'good_wavelengths', 'elev'], errors='ignore')
# Convert the boolean mask to a numerical mask (e.g., 0 for False, 1 for True)
nan_mask_int = nan_mask.astype(np.uint8)

# Reproject match to the CDL data array
nan_mask_repr = nan_mask_int.rio.reproject_match(cdl_clip)
print(nan_mask_repr)

# Clear up memory
del nan_mask, nan_mask_int

# Now mask the CDL to retain only data within the same bounds as the non-null EMIT data
cdl_masked = cdl_clip.where(nan_mask_repr == 0)
cdl_masked = cdl_masked.rio.write_crs("EPSG:4326") # Set the CRS

# Clear up memory
del cdl_clip


Now we should have CDL data that matches the extent and valid data for our EMIT granule. To check on what we just did, we can plot side-by-side maps of each step.

In [None]:
emit_band.hvplot.image(
    geo=True, cmap='viridis', tiles='ESRI', 
    aspect='equal', frame_width=300).opts(title=f'Wavelength: {wavelength}') + \
cdl_masked.hvplot.image(
    geo=True, cmap='tab20', tiles='ESRI', 
    aspect='equal', frame_width=300).opts(title="CDL Crop Types")

#### Step 7b. Create sample points for the three major crop types

Now that we have our CDL data prepared, we can identify the major crop types within the EMIT granule. We'll identify the three most common crops and then take a random sample of lat/long coordinates which represent pixel centers.

In [None]:
# Get the most common pixel values by counting the frequency
unique, counts = np.unique(cdl_masked.data, return_counts=True)
freq = dict(zip(unique, counts))
# Handle NoData value and background codes (0,255)
valid = ~np.isnan(unique) & (unique != 0) & (unique != 255)
unique = unique[valid]
counts = counts[valid]

# Create a dictionary of frequencies
freq = dict(zip(unique, counts))
# Print the top 10, seperate the codes and the counts
top10_crops = sorted(freq, key=freq.get, reverse=True)[:10]
top10_counts = [freq[key] for key in top10_crops]
print("Top 10 most common crop types (by code)")
print(top10_crops)

# Convert the counts to a DataFrame
counts_df = pd.DataFrame({'Codes': top10_crops, 'Count': top10_counts})

print("Counts DataFrame:")
counts_df

Above you can see the counts by CDL crop class for the 10 most common classes sorted by total counts (excluding null values). However, these codes themselves are not very useful to us. Thankfully, the CDL data also has a lookup table to link these codes to their associated class name. We can load this lookup table from the data store and attach the names to our samples.

In [None]:
# Load the crop type lookup table
lookup = pd.read_csv(os.path.join(datadir,'CDL_codes.csv'))

# Join the counts DataFrame with the lookup table
result_df = counts_df.merge(lookup, on='Codes', how='left')

print("Top ten most common crop types and their pixel values:")
print(result_df)

Now lets select three crop classes to analyze below.  The top three most common crop types (excluding natural vegetation classes); **Winter Wheat**, **Corn**, and **Millet**. In the code below, we generate random samples from these three major crop types using the CDL raster data.

In [None]:
# Select the CDL codes for the three desired crop types
major_crops = [24,1,29]
num_samples = 70  # the number of samples per crop type, probably don't want to exceed 100

# Ensure CDL raster is in geographic coordinates
cdl_masked = cdl_masked.rio.reproject("EPSG:4326")
transform = cdl_masked.rio.transform()

# Generate random points within each of the three classes
points = []
for code in major_crops:
    # Create a mask for the current crop type
    mask = cdl_masked == code
    
    # Get the coordinates of the pixels matching the current crop type
    coords = np.column_stack(np.where(mask.values))
    # Take a random sample of 50 pixels 
    coords = coords[np.random.choice(coords.shape[0], num_samples, replace=False)]
    
    # Convert pixel coordinates to geographic coordinates
    lon, lat = rio.transform.xy(transform, coords[:, 0], coords[:, 1], offset='center')
    
    # Append the results to the list
    for x, y in zip(lon, lat):
        points.append({
            'Latitude': y,
            'Longitude': x,
            'Codes': code
        })

# Create a DataFrame from the results
samples_df = pd.DataFrame(points)
samples_df = pd.merge(samples_df,lookup,on="Codes")

# Create a unique ID column which will be useful as we work through the analysis
samples_df['id'] = samples_df.index.astype(str) + "_" + samples_df['Codes'].astype(str) # add a unique ID column

# Clear up memory
del mask, coords, lon, lat

samples_df.head()

#### Step 7c. Plot average spectra for the three major crop types

Now that we have a random sample from the major crop types in our granule, we can plot the spectra at these points to examine any difference between the crop types. First, lets plot the samples points ontop of one of the EMIT bands.

First, lets make a plot of the EMIT band we selected and sample points we just created to make sure they line up as expected.

In [None]:
# Plot the EMIT data (just select one band by its wavelength)
emit_plot = emit_band.hvplot.image(
    cmap='greys',
    frame_height=500,
    frame_width=500,
    geo=True,
    crs='EPSG:4326'
).opts(title="Major Crop Type Samples")

# Plot the sample points
points_plot = samples_df.hvplot.points(
    x='Longitude',
    y='Latitude',
    by='Class_Names',
    color=hv.Cycle('Dark2'),
    geo=True,
    crs='EPSG:4326',
    size=150
)

# Combine the plots
combined_plot = emit_plot * points_plot
combined_plot

Looking good! We now have a random sample of pixels for the three different crop types. Using these samples, we can now extract the EMIT spectra and plot the results. Note that we are going to extract data from "ds_geo" which contains the entire EMIT spectra.

In [None]:
# Setup the samples array to extract EMIT spectra
samples = samples_df.set_index(['id'])
xp = samples.to_xarray()
xp

In [None]:
# Extract EMIT wavelength values at our sample points
extracted = ds_geo.sel(latitude=xp.Latitude,longitude=xp.Longitude, method='nearest').to_dataframe()
extracted = extracted.reset_index()
extracted = pd.merge(extracted,samples_df,on="id")
extracted

Okay now we have a sample from the EMIT data. Let's calculate the mean reflectance by class and plot.

In [None]:
mean_reflectance = extracted.groupby(['Class_Names', 'wavelengths'])['reflectance'].mean().reset_index()
mean_refl_plot = mean_reflectance.hvplot(x='wavelengths',y='reflectance', by=['Class_Names'], color=hv.Cycle('Dark2'), 
                                         frame_height=500, frame_width=700).opts(
    title='Spectral Profile for Major Crop Types', xlabel='Wavelengths (nm)',ylabel='Reflectance (0-1)')
mean_refl_plot

What do you notice about the similarities and differences in the EMIT spectral profiles between the three diferent crop types? What might be creating these differences in mean spectra?  Is there a seasonal component to the differences?  Considering the spectral plot at the beginning that shows where in the spectra we can extract important physiological and biochemical information, what do these spectra tell us about these different crops? What else would we need to know in order to tease out important differences?

In [None]:
# Save the graphic to your local scratch folder for later review
if savefigs:
    plt.figure(figsize=(10, 6))
    mean_reflectance2 = mean_reflectance.set_index('wavelengths', inplace=False)
    mean_reflectance2.groupby('Class_Names')['reflectance'].plot(kind='line', 
                                                                 linewidth=3, 
                                                                 figsize=(11, 8), 
                                                                 fontsize=14, 
                                                                 legend=True, 
                                                                 xlabel="Wavelength (nm)", 
                                                                 ylabel="Reflectance (0-1)")
    plt.savefig(os.path.join(figdir,'emit_mean_spectra_by_croptype.pdf'))
    plt.savefig(os.path.join(figdir,'emit_mean_spectra_by_croptype.png'))

### Step 8. Exploring Irrigated vs. Non-irrigated Fields

Up to this point, we have successfully used the CDL data layer to create a sample of points for each of the three most common crop types and plotted their EMIT spectral profiles. The next step is to examine the differences between irrigated and non-irrigated fields using one of the SVIs we created above: the Normalized Difference Water Index (NDWI). 

We will use estimated irrigated lands from the LANID dataset which provides irrigated / non-irrigated land at 30-meter resolution (REF). To compare irrigated and non-irrigated fields, we will first create a data stack of the EMIT-based NDWI and the LANID irrigated lands (cropped to our EMIT data) and then extract values at our sample points for the three major crop types.

- LANID dataset: https://zenodo.org/records/5548555
- Research article describing the product: https://essd.copernicus.org/articles/13/5689/2021/

#### 8a. Prepare the LANID 30-meter gridded product

As we did with the CDL, we need to crop and mask the LANID data to the extent of the EMIT granule.

In [None]:
# Load the LANID data
lanid_path = os.path.join(datadir,'LANID_Irrigation_CO_SouthPlatte_Republican.tif')
lanid = rxr.open_rasterio(lanid_path, masked=True).squeeze() # open to GeoTIFF
lanid = lanid.rio.reproject("EPSG:4326") # geographic coordinates to match the EMIT data

# Clip the LANID raster to the EMIT dataset's extent
lanid_clipped = lanid.rio.clip_box(*emit_extent)

# Now mask the LANID to retain only data within the same bounds as the non-null EMIT data
# We did this for the CDL as well
lanid_masked = lanid_clipped.where(nan_mask_repr == 0)
lanid_masked = lanid_masked.rio.write_crs("EPSG:4326") # Set the CRS
print(lanid_masked)

del lanid_clipped, lanid

In [None]:
# plot the irrigated lands mask with our sample points
lanid_plot = lanid_masked.hvplot.image(
    geo=True, tiles='ESRI', 
    aspect='equal', frame_width=500).opts(title="Irrigated Lands")

# Plot the sample points
points_plot = samples_df.hvplot.points(
    x='Longitude',
    y='Latitude',
    by='Class_Names',
    color=hv.Cycle('Dark2'),
    geo=True,
    crs='EPSG:4326',
    size=100
)

# Combine the plots
combined_plot = lanid_plot * points_plot
combined_plot

#### 8b. Prepare the NDWI array

Now we need to make sure the NDWI array is ready for analysis by assigning geographic coordinates and matching the format of the irrigated land GIS layer. We do this in order to stack the NDWI and irrigated lands together so we can extract data from the NDWI image by crop type and record whether the field is irrigated or not in the output dataframe.  This allows us to then compare the values of NDWI for each crop type binned by irrigated or non-irrigated classes

In [None]:
# Prep the NDWI array - match format of lanid_masked so we can layer stack
ndwi_da = ndwi['ndwi'].squeeze()
ndwi_da_repr = ndwi_da.rio.reproject("EPSG:4326")

#### 8c. Create a data stack

Now we can stack our NDWI and LANID data arrays together. This enables us to perform sampling and other analysis on each of the different data arrays. We could also extract other SVIs to include in the stack if we wanted to.

In [None]:
# Ensure the LANID array matches the NDWI
lanid_masked_repr = lanid_masked.rio.reproject_match(ndwi_da_repr)

# Create a data stack
ds_stack = xr.Dataset({
    'ndwi': ndwi_da_repr,
    'irrigation': lanid_masked_repr
})
print(ds_stack)

#### 8d. Extract the pixel values to our sample points

As we did with the EMIT spectra, we will now extract the pixel values at our sample points for NDWI and LANID.

In [None]:
# Extract data values at the sample points
extracted = ds_stack.sel(
    y=xp['Latitude'], x=xp['Longitude'], method='nearest'
).to_dataframe().reset_index()

# Merge the results
extracted = extracted[['id','ndwi','irrigation']]
extracted = pd.merge(extracted, samples_df, on="id")
extracted.head()

#### 8e. Examine the differences between crop types and irrigation status

Now we have our data frame with pixel values for our three crop types. In the code chunk below, we create a boxplot showing the differences in NDWI between irrigation status and amongst the different crop types.

In [None]:
# Convert irrigation values to categorical
extracted['Irrigation'] = extracted['irrigation'].apply(lambda x: 'Irrigated' if x == 1 else 'Non-Irrigated')

# Create the box plot
plt.figure(figsize=(8, 6))
sns.boxplot(data=extracted, x='Class_Names', y='ndwi', hue='Irrigation',
           palette={'Irrigated': '#ADD8E6', 'Non-Irrigated': '#D2B48C'})
plt.title('NDWI for Irrigated vs Non-Irrigated Fields by Crop Type')
plt.xlabel('Crop Type')
plt.ylabel('NDWI')
plt.legend(title='Irrigation Status')

# save figure if requested
if savefigs:
    plt.savefig(os.path.join(figdir,'emit_ndwi_by_croptype.pdf'))
    plt.savefig(os.path.join(figdir,'emit_ndwi_by_croptype.png'))
# show the figure to the screen
plt.show()

What do you notice about the patterns in NDWI between both irrigated / non-irrigated as well as between crop types? Given the date of the EMIT image, what might that tell us about the within and across-crop type differences? How might you extend this analysis using other SVIs or additional image data?

#### 8f. Compare multiple SVIs by crop type and irrigated vs. non-irrigated fields

In [None]:
# Prep the SVIs
ndvi_da = ndvi['ndvi'].squeeze()
ndvi_da_repr = ndvi_da.rio.reproject("EPSG:4326")

reNDVI_da = reNDVI['reNDVI'].squeeze()
reNDVI_da_repr = reNDVI_da.rio.reproject("EPSG:4326")

pri = pri(input_xarray = ds_geo, band1=531, band2=570, scaled=True, index_name='pri', proj=llwgs84)
pri_da = pri['pri'].squeeze()
pri_da_repr = pri_da.rio.reproject("EPSG:4326")

In [None]:
# Create a data stack of all SVIs and the irrigation land mask
ds_stack = xr.Dataset({
    'ndvi': ndvi_da_repr,
    'ndwi': ndwi_da_repr,
    'reNDVI': reNDVI_da_repr,
    'pri': pri_da_repr,
    'irrigation': lanid_masked_repr
})

# Extract data values at the sample points
extracted = ds_stack.sel(
    y=xp['Latitude'], x=xp['Longitude'], method='nearest'
).to_dataframe().reset_index()

# Merge the results
extracted = extracted[['id','ndvi','ndwi','reNDVI','pri','irrigation']]
extracted = pd.merge(extracted, samples_df, on="id")
extracted.head()

In [None]:
# Convert irrigation values to categorical
extracted['Irrigation'] = extracted['irrigation'].apply(lambda x: 'Irrigated' if x == 1 else 'Non-Irrigated')

# Create the box plot - in this example we create a multi-panel plot to enable comparisons across SVIs
fig,a=plt.subplots(nrows=2, ncols=2, clear=True, figsize=(15,12))

# panel 1
ax=a[0,0]
sns.boxplot(ax=ax, data=extracted, x='Class_Names', y='ndvi', hue='Irrigation',
           palette={'Irrigated': '#ADD8E6', 'Non-Irrigated': '#D2B48C'})
ax.set_title('NDVI for Irrigated vs Non-Irrigated Fields by Crop Type')
ax.set_xlabel('Crop Type')
ax.set_ylabel('NDVI')
ax.legend(title='Irrigation Status')

# panel 2
ax=a[0,1]
sns.boxplot(ax=ax, data=extracted, x='Class_Names', y='ndwi', hue='Irrigation',
           palette={'Irrigated': '#ADD8E6', 'Non-Irrigated': '#D2B48C'})
ax.set_title('NDWI for Irrigated vs Non-Irrigated Fields by Crop Type')
ax.set_xlabel('Crop Type')
ax.set_ylabel('NDWI')
ax.legend(title='Irrigation Status')

# panel 3
ax=a[1,0]
sns.boxplot(ax=ax,data=extracted, x='Class_Names', y='reNDVI', hue='Irrigation',
           palette={'Irrigated': '#ADD8E6', 'Non-Irrigated': '#D2B48C'})
ax.set_title('reNDVI for Irrigated vs Non-Irrigated Fields by Crop Type')
ax.set_xlabel('Crop Type')
ax.set_ylabel('reNDVI')
ax.legend(title='Irrigation Status')

# panel 4
ax=a[1,1]
sns.boxplot(ax=ax,data=extracted, x='Class_Names', y='pri', hue='Irrigation',
           palette={'Irrigated': '#ADD8E6', 'Non-Irrigated': '#D2B48C'})
ax.set_title('PRI for Irrigated vs Non-Irrigated Fields by Crop Type')
ax.set_xlabel('Crop Type')
ax.set_ylabel('PRI')
ax.legend(title='Irrigation Status')

# save figure if requested
if savefigs:
    plt.savefig(os.path.join(figdir,'emit_svi_by_croptype.pdf'))
    plt.savefig(os.path.join(figdir,'emit_svi_by_croptype.png'))
# show the figure to the screen
plt.show()

#### CHALLENGE: Continue the analysis on your own using these suggestions or your own ideas to expand on this tutorial

Below we provide a few additional ideas for activities that you can carry out using the notebooks we have provided. You can use these notebooks as a starting point to create new custom notebooks that enable expanded analyses of EMIT data using specific pieces of what's provided here. Beyond this list, what other ideas do you have that could build on this tutorial and provide additional applications of EMIT to address agriculture and food security issues and monitoring?  What additional datasets or information would you like to include (e.g. GIS, rasters, ag statistics) to enable new comparisons and analyses (e.g. crop health and productivity/yield by crop type and irrigated vs non-irrigated).

**Short list:**

*) Expand the analysis above to include more crop types by adding additional crops in the subset selection in step 7b

*) Expand the analysis above to include additional SVIs and other crop metrics beyond those demonstrated here.  Where could you find information on additional crop or plant SVIs and metrics?

*) Expand the analysis above to include all of the downloaded and prepared EMIT images and compare crop types across multiple EMIT scenes

*) Expand the analysis above to include an EMIT time-series to study the changes in spectra for specific crop types at different times of the year

*) Add additional data visualization and plotting options to synthesize the information provided by EMIT and the associated GIS data

*) Incorporate ECOSTRESS data to evaluate optical and thermal differences across crop types and irrigated vs non-irrigated lands


This is just a short list of ideas for expansion of this notebook. Please discuss other ideas or interest in exploring these further with the facilitators who can help you get started. Happy data analyses!