# BioSCape Data Access

The BioSCape data is stored in an S3 bucket associated with the SMCE environment. You can access this data in several ways:


## 1. Intake Catalog 

**For reflectance data only**

The simplest method of access is through the BioSCape intake catalog.

Make sure intake, intake-xarray, s3fs, zarr, rioxarray, jinja2 are installed in your conda environment.

In [2]:
import intake
import warnings
warnings.filterwarnings('ignore')

# Load the catalog
catalog = intake.open_catalog('s3://bioscape-data/bioscape_avng.yaml')
# access a specific reflectance file
data = catalog.ang20231022t092801.ang20231022t092801_001
data = data.read_chunked()

# write the crs using rioxarray
data.rio.write_crs(data.spatial_ref.attrs['crs_wkt'], inplace=True)
data

Unnamed: 0,Array,Chunk
Bytes,741.02 MiB,97.66 MiB
Shape,"(607, 425, 753)","(80, 425, 753)"
Dask graph,8 chunks in 2 graph layers,8 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 741.02 MiB 97.66 MiB Shape (607, 425, 753) (80, 425, 753) Dask graph 8 chunks in 2 graph layers Data type float32 numpy.ndarray",753  425  607,

Unnamed: 0,Array,Chunk
Bytes,741.02 MiB,97.66 MiB
Shape,"(607, 425, 753)","(80, 425, 753)"
Dask graph,8 chunks in 2 graph layers,8 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


In [3]:
%%time
# read the data into memory
data.reflectance.compute()

CPU times: user 2 s, sys: 1.5 s, total: 3.5 s
Wall time: 6.67 s


## 2. Cropping API Access

**For reflectance data only**

An API is available that allows you to:

1. **Submit a GeoJSON**: This request returns the overlapping flightlines.
2. **Retrieve Data Cropped**: This request returns cropped data in NetCDF fromat. Provide a flightline, subsection number, a geojson, and an output file name.

The API can be used to download cropped data to any machine. It does require that you have a BioSCape SMCE username and password.

In [1]:
import requests
import json
import random
import os

URLTOKEN = "https://crop.bioscape.io/token/"
URLCROP = "https://crop.bioscape.io/crop/"
URLOVERLAP = "https://crop.bioscape.io/overlap/"

1. Set your variables

In [2]:
GEOJSON_FILE = "path_to_your_geojson"
OUTPUT_FILE = "output_file_name.nc" #This must be a netcdf!!!
SMCE_USERNAME = 'your_smce_username'
SMCE_PASSWORD = 'your_smce_password'

2. Get an Access Token

In [4]:
response = requests.post(
    URLTOKEN,
    data={"username": SMCE_USERNAME, "password": SMCE_PASSWORD},
    headers={"Content-Type": "application/x-www-form-urlencoded"}
)
response.raise_for_status()  # Raise an error for bad HTTP status codes
access_token = response.json().get('access_token')
response

<Response [200]>

3. Find out which files overlap with your GeoJson

In [5]:
with open(GEOJSON_FILE, 'rb') as f:
    response = requests.post(
        URLOVERLAP,
        headers={"Authorization": f"Bearer {access_token}"},
        files={"geojson": f}
    )
response.raise_for_status()  # Raise an error for bad HTTP status codes

# Extract the response body and HTTP status code
response_body = response.json()
files = response_body.get('files', [])
files

['ang20231022t092801_000',
 'ang20231022t094938_035',
 'ang20231022t094938_036',
 'ang20231029t120919_045',
 'ang20231029t123011_001',
 'ang20231029t123011_002']

4. Select which file you want to crop

In [6]:
flightline = 'ang20231022t094938'
subsection = 35

5. Request the cropped data

In [10]:
with open(GEOJSON_FILE, 'rb') as f:
    response = requests.post(
        URLCROP,
        headers={"Authorization": f"Bearer {access_token}", "Accept": "application/x-netcdf"},
        files={
            "geojson": f,
            "flightline": (None, flightline),
            "subsection": (None, str(subsection)),
            "outpath": (None, OUTPUT_FILE)
        }
    )
response.raise_for_status() # Raise an error for bad HTTP status code
response

<Response [200]>

6. Access the data with Xarray or write the output

In [8]:
import xarray as xr
import io

# Make sure you have the appropriate packages installed to use Xarary's netCDF backend
import h5netcdf

# Read the NetCDF data into an xarray Datase
dataset = xr.open_dataset(io.BytesIO(response.content))
dataset

In [9]:
# Write the response content to the output file
with open(OUTPUT_FILE, 'wb') as f:
    f.write(response.content)

## 3. S3FS and Rioxarray

For smaller files, you can use s3fs and rioxarray to access the data. For larger files it can take too long to read the data in from S3

In [3]:
import rioxarray as rxr
import os
import s3fs

In [4]:
s3 = s3fs.S3FileSystem(anon=False)
files = s3.ls('bioscape-data/')
files

['bioscape-data/AVNG',
 'bioscape-data/LVIS',
 'bioscape-data/PRISM',
 'bioscape-data/bioscape_avng.yaml',
 'bioscape-data/old']

In [5]:
sub_files = s3.ls('bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_000')
sub_files

['bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_000/ang20231022t092801_000_L1B_ORT_main_46dd9a4a_LOC',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_000/ang20231022t092801_000_L1B_ORT_main_46dd9a4a_LOC.hdr',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_000/ang20231022t092801_000_L1B_ORT_main_46dd9a4a_LOC_ORT',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_000/ang20231022t092801_000_L1B_ORT_main_46dd9a4a_LOC_ORT.hdr',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_000/ang20231022t092801_000_L1B_ORT_main_46dd9a4a_OBS',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_000/ang20231022t092801_000_L1B_ORT_main_46dd9a4a_OBS.hdr',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_000/ang20231022t092801_000_L1B_ORT_main_46dd9a4a_OBS_ORT',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_000/ang20231022t092801_000_L1B_ORT_main_46dd9a4a_OBS_ORT.hdr',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t0928

In [6]:
%%time
print(f"File Name: {sub_files[4]}")
rxr.open_rasterio(os.path.join('s3://', sub_files[4])).compute()

File Name: bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_000/ang20231022t092801_000_L1B_ORT_main_46dd9a4a_OBS
CPU times: user 136 ms, sys: 74.2 ms, total: 210 ms
Wall time: 2.81 s


## 4. Direct Download

Data is available for direct download from the [JPL endpoint](https://popo.jpl.nasa.gov/avng/).

Additionally, Kit Lewers has developed a script for easy data extraction from the JPL BioSCape endpoint. Her code can be found [here](https://github.com/kllewers/BioSCrapes).