# SHIFT V2 Reflectance Data Guide

This guide explains how to use the SHIFT V2 Reflectance data. The data is stored in an S3 bucket, 
which is mounted to your SMCE computing environment. You can access the data at the following location: `s3/dh-shift-curated/aviris/v2`

The SHIFT V2 Reflectance data has transitioned from ENVI files to NetCDF format. 
I have written a few functions to facilitate interaction with this new format. Below are some examples of how to work with the data.

In [3]:
import sys
sys.path.append('s3/dh-shift-curated/aviris/utils')
from shift_v2_utils import find_overlap_shift_v2, open_shift_v2_data

You can use the find overlap function to find flightlines that overlap a shapefile formated as a geopandas dataframe

In [4]:
import geopandas as gpd
from shapely.geometry import Polygon

polygon_1 = Polygon([
    (-119.8853015 , 34.42277795),
    (-119.86975941, 34.42312643),
    (-119.86921817, 34.4066284 ),
    (-119.88476322, 34.40623869),
    (-119.8853015 , 34.42277795)
])

polygon_2 = Polygon([
    (-119.8753015 , 34.41777795),
    (-119.85975941, 34.41812643),
    (-119.85921817, 34.4016284 ),
    (-119.87476322, 34.40123869),
    (-119.8753015 , 34.41777795)
])

# Create a GeoDataFrame
gdf = gpd.GeoDataFrame({
    'id': [1, 2],
    'name': ['Polygon 1', 'Polygon 2'],
    'geometry': [polygon_1, polygon_2]
}, crs="EPSG:4326")


The overlap function takes a GeoDataFrame as an argument. The GeoDataFrame must have its CRS set to EPSG:4326. You can filter the overlapping flightlines by date. 
The date argument can be provided as a single string or as a list of strings,

In [5]:
# all dates
find_overlap_shift_v2(gdf)

# single date filter
find_overlap_shift_v2(gdf, dates='20220316')

# multiple dates filter
flightlines = find_overlap_shift_v2(gdf, dates=['20220316', '20220914'])
flightlines

Unnamed: 0,index,flightline,date,path,geometry,index_right,id,name
0,306,ang20220316t184402_001,20220316,s3/dh-shift-curated/aviris/v2/L2a/20220316/ang...,"POLYGON ((-119.77003 34.38405, -119.76854 34.4...",1,2,Polygon 2
1,307,ang20220316t184402_002,20220316,s3/dh-shift-curated/aviris/v2/L2a/20220316/ang...,"POLYGON ((-119.85591 34.39932, -119.85424 34.4...",1,2,Polygon 2
2,1418,ang20220914t170915_000,20220914,s3/dh-shift-curated/aviris/v2/L2a/20220914/ang...,"POLYGON ((-119.80109 34.37997, -119.79822 34.4...",1,2,Polygon 2
3,1426,ang20220914t171806_003,20220914,s3/dh-shift-curated/aviris/v2/L2a/20220914/ang...,"POLYGON ((-119.80085 34.31877, -119.79571 34.4...",1,2,Polygon 2
4,1428,ang20220914t172925_001,20220914,s3/dh-shift-curated/aviris/v2/L2a/20220914/ang...,"POLYGON ((-119.77325 34.38317, -119.77163 34.4...",1,2,Polygon 2
5,1429,ang20220914t172925_002,20220914,s3/dh-shift-curated/aviris/v2/L2a/20220914/ang...,"POLYGON ((-119.86728 34.39939, -119.86555 34.4...",1,2,Polygon 2
6,307,ang20220316t184402_002,20220316,s3/dh-shift-curated/aviris/v2/L2a/20220316/ang...,"POLYGON ((-119.85591 34.39932, -119.85424 34.4...",0,1,Polygon 1
7,1426,ang20220914t171806_003,20220914,s3/dh-shift-curated/aviris/v2/L2a/20220914/ang...,"POLYGON ((-119.80085 34.31877, -119.79571 34.4...",0,1,Polygon 1
8,1428,ang20220914t172925_001,20220914,s3/dh-shift-curated/aviris/v2/L2a/20220914/ang...,"POLYGON ((-119.77325 34.38317, -119.77163 34.4...",0,1,Polygon 1
9,1429,ang20220914t172925_002,20220914,s3/dh-shift-curated/aviris/v2/L2a/20220914/ang...,"POLYGON ((-119.86728 34.39939, -119.86555 34.4...",0,1,Polygon 1


Using the output of the overlap function, you can use the path to open the dataset of interest. By default, the function will return the reflectance group. However, if you want to load additional groups, you can specify them by name: ['reflectance', 'water_vapor', 'aerosol_optical_thickness'].

The mask_and_scale argument controls how Xarray handles "NoData" values. By default, mask_and_scale is set to False, meaning Xarray will use the coded NoData value of -9999. If you set mask_and_scale to True, Xarray will fill NoData values with np.nan instead.

In [6]:
path = flightlines.iloc[5].path
ds = open_shift_v2_data(path)
ds = open_shift_v2_data(path, groups=['reflectance', 'water_vapor', 'aerosol_optical_thickness'], mask_and_scale=False)
ds

Unnamed: 0,Array,Chunk
Bytes,1.66 kiB,1.66 kiB
Shape,"(425,)","(425,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.66 kiB 1.66 kiB Shape (425,) (425,) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",425  1,

Unnamed: 0,Array,Chunk
Bytes,1.66 kiB,1.66 kiB
Shape,"(425,)","(425,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,3.54 GiB,3.54 GiB
Shape,"(425, 991, 2259)","(425, 991, 2259)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 3.54 GiB 3.54 GiB Shape (425, 991, 2259) (425, 991, 2259) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",2259  991  425,

Unnamed: 0,Array,Chunk
Bytes,3.54 GiB,3.54 GiB
Shape,"(425, 991, 2259)","(425, 991, 2259)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,8.54 MiB,8.54 MiB
Shape,"(991, 2259)","(991, 2259)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 8.54 MiB 8.54 MiB Shape (991, 2259) (991, 2259) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",2259  991,

Unnamed: 0,Array,Chunk
Bytes,8.54 MiB,8.54 MiB
Shape,"(991, 2259)","(991, 2259)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,8.54 MiB,8.54 MiB
Shape,"(991, 2259)","(991, 2259)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 8.54 MiB 8.54 MiB Shape (991, 2259) (991, 2259) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",2259  991,

Unnamed: 0,Array,Chunk
Bytes,8.54 MiB,8.54 MiB
Shape,"(991, 2259)","(991, 2259)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


The data is lazily loaded using Dask, meaning it is not fully loaded into memory until necessary.
You can perform operations, such as clipping the data, using tools like rioxarray without actually loading the data into memory.

In [7]:
clipped = ds.rio.clip(gdf.to_crs(ds.rio.crs).geometry)
clipped

Unnamed: 0,Array,Chunk
Bytes,1.66 kiB,1.66 kiB
Shape,"(425,)","(425,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 1.66 kiB 1.66 kiB Shape (425,) (425,) Dask graph 1 chunks in 2 graph layers Data type float32 numpy.ndarray",425  1,

Unnamed: 0,Array,Chunk
Bytes,1.66 kiB,1.66 kiB
Shape,"(425,)","(425,)"
Dask graph,1 chunks in 2 graph layers,1 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,254.50 MiB,254.50 MiB
Shape,"(425, 470, 334)","(425, 470, 334)"
Dask graph,1 chunks in 8 graph layers,1 chunks in 8 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 254.50 MiB 254.50 MiB Shape (425, 470, 334) (425, 470, 334) Dask graph 1 chunks in 8 graph layers Data type float32 numpy.ndarray",334  470  425,

Unnamed: 0,Array,Chunk
Bytes,254.50 MiB,254.50 MiB
Shape,"(425, 470, 334)","(425, 470, 334)"
Dask graph,1 chunks in 8 graph layers,1 chunks in 8 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,613.20 kiB,613.20 kiB
Shape,"(470, 334)","(470, 334)"
Dask graph,1 chunks in 8 graph layers,1 chunks in 8 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 613.20 kiB 613.20 kiB Shape (470, 334) (470, 334) Dask graph 1 chunks in 8 graph layers Data type float32 numpy.ndarray",334  470,

Unnamed: 0,Array,Chunk
Bytes,613.20 kiB,613.20 kiB
Shape,"(470, 334)","(470, 334)"
Dask graph,1 chunks in 8 graph layers,1 chunks in 8 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,613.20 kiB,613.20 kiB
Shape,"(470, 334)","(470, 334)"
Dask graph,1 chunks in 8 graph layers,1 chunks in 8 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 613.20 kiB 613.20 kiB Shape (470, 334) (470, 334) Dask graph 1 chunks in 8 graph layers Data type float32 numpy.ndarray",334  470,

Unnamed: 0,Array,Chunk
Bytes,613.20 kiB,613.20 kiB
Shape,"(470, 334)","(470, 334)"
Dask graph,1 chunks in 8 graph layers,1 chunks in 8 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


Once you have clipped the data you can load it into memory using compute. Loading the data into memory might take some time. Once the data has been accessed once following access attempts should be much quicker.

In [8]:
%%time
clipped.reflectance.compute()

CPU times: user 14.6 s, sys: 1.72 s, total: 16.3 s
Wall time: 16.4 s
