# SHIFT Data Quickstart Guide

This a quickstart guide for working with SHIFT data on the SMCE. The guide covers how to

- Read in data with the SHIFT Python Utilities Library
- Orthorectify data
- Clip data with a shapefile
- Write data to disk

In [1]:
import sys
sys.path.append('/efs/SHIFT-Python-Utilities/')
from shift_python_utilities.intake_shift import shift_catalog
import rioxarray as rxr
import rasterio as rio
import geopandas as gpd
from shapely.geometry import Polygon

# Intialize an instance of the catalog
cat = shift_catalog()

## Working with SHIFT Gridded Data

Read a shapefile using the Geopandas library

In [2]:
geodf = gpd.read_file("/efs/edlang1/SHIFT-Python-Utilities/shift_python_utilities/tests/test_data/quick_start_shp/quick_start_shp.shp")
geodf

Unnamed: 0,FID,geometry
0,0,"POLYGON ((-120.49103 34.49217, -120.48940 34.4..."
1,1,"POLYGON ((-120.48611 34.49070, -120.48595 34.4..."
2,2,"POLYGON ((-120.48925 34.48861, -120.48823 34.4..."


Read in the gridded data using the shift python utilities library and assign the appropiate CRS

In [3]:
ds = cat.aviris_v1_gridded.read_chunked()

# assign the crs from the metadata to the xarray dataset
ds.rio.write_crs(rio.CRS.from_wkt(",".join(ds.attrs['coordinate system string'])), inplace=True)
ds

Unnamed: 0,Array,Chunk
Bytes,3.32 TiB,22.27 MiB
Shape,"(13, 12023, 425, 13739)","(1, 1, 425, 13739)"
Count,156300 Tasks,156299 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 3.32 TiB 22.27 MiB Shape (13, 12023, 425, 13739) (1, 1, 425, 13739) Count 156300 Tasks 156299 Chunks Type float32 numpy.ndarray",13  1  13739  425  12023,

Unnamed: 0,Array,Chunk
Bytes,3.32 TiB,22.27 MiB
Shape,"(13, 12023, 425, 13739)","(1, 1, 425, 13739)"
Count,156300 Tasks,156299 Chunks
Type,float32,numpy.ndarray


Clip the data using the Geopandas dataframe. Make sure the dataframe and the gridded data have the same CRS.

In [4]:
clipped = ds.rio.clip(geodf.to_crs(ds.rio.crs).geometry.values, all_touched=True)
clipped

Unnamed: 0,Array,Chunk
Bytes,200.35 MiB,200.35 MiB
Shape,"(13, 98, 425, 97)","(13, 98, 425, 97)"
Count,337921 Tasks,1 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 200.35 MiB 200.35 MiB Shape (13, 98, 425, 97) (13, 98, 425, 97) Count 337921 Tasks 1 Chunks Type float32 numpy.ndarray",13  1  97  425  98,

Unnamed: 0,Array,Chunk
Bytes,200.35 MiB,200.35 MiB
Shape,"(13, 98, 425, 97)","(13, 98, 425, 97)"
Count,337921 Tasks,1 Chunks
Type,float32,numpy.ndarray


Write the result as a GeoTIFF

To make the data compatable with rioxarray's to raster function you must

- Reduce the dimensionality so the data being written is 2D or 3D. In this case I am reducing the dimensionality along the time dimension by writting a file for each date 
- Select the data variable you would like to write (reflectance)
- Transpose the data to the dimensional ordering rioxarray requires (band, y_dim, x_dim)

In [6]:
# Only 2D and 3D data can be written so here we select time to reduce the dimensionality
clipped.sel(time='2022-02-24').reflectance.transpose('wavelength', 'y', 'x').rio.to_raster('outpath_2022_02_24.tif', driver="GTIFF")
clipped.sel(time='2022-05-29').reflectance.transpose('wavelength', 'y', 'x').rio.to_raster('outpath_2022_05_29.tif', driver="GTIFF")

## Working with the Raw SHIFT Data

Create a Geopandas Dataframe from coordinates, or read a shapefile using the Geopandas library. Verify your shapefile is using the appropriate CRS

In [21]:
shp = Polygon([
    (-119.8853015 , 34.42277795),
    (-119.86975941, 34.42312643),
    (-119.86921817, 34.4066284 ),
    (-119.88476322, 34.40623869),
    (-119.8853015 , 34.42277795)]
)
geodf = gpd.GeoDataFrame(geometry=[shp], crs=4326)
geodf = geodf.to_crs(geodf.estimate_utm_crs())
geodf

Unnamed: 0,geometry
0,"POLYGON ((234835.191 3812810.517, 236265.070 3..."


Using the shift python utilities library you can pass you shapefile data along with the date and time of the flight and retrieve the data for you area of interest

In [22]:
ds = cat.L2a(date=20220224, time=200332, ortho=True, subset=geodf ).read_chunked()
ds

Write the data as a GeoTIFF

In [27]:
ds.reflectance.transpose('wavelength', 'lat', 'lon').rio.to_raster(raster_path="outpath.tif",  driver="GTIFF")