# BioSCape Data Access

The BioSCape data is stored in an S3 bucket associated with the SMCE environment. You can access this data in several ways:


## 1. Intake Catalog 

**For reflectance data only**

The simplest method of access is through the BioSCape intake catalog.

Make sure intake, intake-xarray, s3fs, zarr, rioxarray, jinja2 are installed in your conda environment.

In [2]:
import intake
import warnings
warnings.filterwarnings('ignore')

# Load the catalog
catalog = intake.open_catalog('s3://bioscape-data/bioscape_avng.yaml')
# access a specific reflectance file
data = catalog.ang20231022t092801.ang20231022t092801_001
data = data.read_chunked()

# write the crs using rioxarray
data.rio.write_crs(data.spatial_ref.attrs['crs_wkt'], inplace=True)
data

Unnamed: 0,Array,Chunk
Bytes,741.02 MiB,97.66 MiB
Shape,"(607, 425, 753)","(80, 425, 753)"
Dask graph,8 chunks in 2 graph layers,8 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 741.02 MiB 97.66 MiB Shape (607, 425, 753) (80, 425, 753) Dask graph 8 chunks in 2 graph layers Data type float32 numpy.ndarray",753  425  607,

Unnamed: 0,Array,Chunk
Bytes,741.02 MiB,97.66 MiB
Shape,"(607, 425, 753)","(80, 425, 753)"
Dask graph,8 chunks in 2 graph layers,8 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,741.02 MiB,97.66 MiB
Shape,"(607, 425, 753)","(80, 425, 753)"
Dask graph,8 chunks in 2 graph layers,8 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray
"Array Chunk Bytes 741.02 MiB 97.66 MiB Shape (607, 425, 753) (80, 425, 753) Dask graph 8 chunks in 2 graph layers Data type float32 numpy.ndarray",753  425  607,

Unnamed: 0,Array,Chunk
Bytes,741.02 MiB,97.66 MiB
Shape,"(607, 425, 753)","(80, 425, 753)"
Dask graph,8 chunks in 2 graph layers,8 chunks in 2 graph layers
Data type,float32 numpy.ndarray,float32 numpy.ndarray


In [3]:
%%time
# read the data into memory
data.reflectance.compute()

CPU times: user 2 s, sys: 1.5 s, total: 3.5 s
Wall time: 6.67 s


## 2. BioSCape Cropping Web Application

**For reflectance data only**

Users can perform the following actions with **BioSCAPE or EMIT data**:

- **Submit a GeoJSON**: This request returns the overlapping flightlines.
- **Retrieve Data Cropped**: This request returns cropped data in NetCDF format. Provide a flightline, subsection number, a GeoJSON, and an output file name.

Check it out at [crop.bioscape.io](https://crop.bioscape.io).

**Note**: A BioSCape SMCE username and password are required.

This application is in beta phase. The current user interface is basic and will be improved. Please report any issues via GitHub or Slack.

For more detailed information, visit the [User Guide](/pages/cropping_app).

## 3. BioSCape Tools Python Library

**For reflectance data only**

The BioSCape Tools library allows users to perform the following actions with **BioSCAPE or EMIT data**:

- **Submit a GeoJSON**: This request returns the overlapping flightlines.
- **Retrieve Data Cropped**: This request returns cropped data in NetCDF format. Provide a flightline, subsection number, a GeoJSON, and an output file name.

The BioSCape Tools library can be used outside of the SMCE. A BioSCape SMCE username and password are required.

### Installation

The library can be installed via pip:

```bash
pip install bioscape_tools
```
The library can also be installed via the Conda Store:

1. Select and edit your desired environment.

2. Choose YAML editing mode.

3. Add the following lines:

```yaml
Copy code
- pip
  - bioscape-tools
```

4. Build (**Note: It will not show up in the Conda Store UI, but it will still be installed.**)

Please report any bugs via GitHub issues or via Slack.

In [6]:
from bioscape_tools import Bioscape, Emit

OUTPATH = 'test.nc'
GEOJSON_FILE = "path_to_your_geojson"
GEOJSON_FILE = "/home/edlang/Documents/crop-api/bioscape_shape_test.geojson"

Use your BioSCape SMCE username and password to get credentials.

In [7]:
b = Bioscape(persist=True)

Find overlapping data.

In [8]:
flightlines = b.get_overlap(GEOJSON_FILE)
flightlines

Unnamed: 0,geometry,flightline,subsection
0,"POLYGON ((18.75585 -32.97929, 18.75674 -32.944...",ang20231022t092801,0
1,"POLYGON ((18.78096 -33.00205, 18.78218 -32.953...",ang20231022t094938,35
2,"POLYGON ((18.77505 -32.96264, 18.77627 -32.913...",ang20231022t094938,36
3,"POLYGON ((18.71476 -32.98757, 18.71623 -32.930...",ang20231029t120919,45
4,"POLYGON ((18.73772 -32.9587, 18.73861 -32.9237...",ang20231029t123011,1
5,"POLYGON ((18.74498 -32.98879, 18.74588 -32.953...",ang20231029t123011,2


Crop and retrieve the data.

In [9]:
bioscape_data = b.crop_flightline(flightline="ang20231022t092801", subsection=000, geojson=GEOJSON_FILE, output_path=None, mask_and_scale=True)
bioscape_data

Optionally, you can download the data by providing an output path.

In [10]:
b.crop_flightline(flightline="ang20231022t092801", subsection=000, geojson=GEOJSON_FILE, output_path=OUTPATH, mask_and_scale=True)

The same operations can be preformed on EMIT data.

In [11]:
e = Emit(persist=True)
emit_data = e.get_overlap(GEOJSON_FILE, temporal_range=("2024-01-01", "2024-10-01"), cloud_cover=(0,10))
emit_data[0]

In [15]:
e.crop_scene(geojson=GEOJSON_FILE, granule_ur=emit_data[0].granule_ur, output_path=None, mask_and_scale=True)

In [None]:
OUTPATH = 'test.nc'
e.crop_scene(geojson=GEOJSON_FILE, granule_ur=emit_data[0].granule_ur, output_path=OUTPATH, mask_and_scale=True)

## 4. S3FS and Rioxarray

For smaller files, you can use s3fs and rioxarray to access the data. For larger files it can take too long to read the data in from S3

In [3]:
import rioxarray as rxr
import os
import s3fs

In [4]:
s3 = s3fs.S3FileSystem(anon=False)
files = s3.ls('bioscape-data/')
files

['bioscape-data/AVNG',
 'bioscape-data/LVIS',
 'bioscape-data/PRISM',
 'bioscape-data/bioscape_avng.yaml',
 'bioscape-data/old']

In [5]:
sub_files = s3.ls('bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_000')
sub_files

['bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_000/ang20231022t092801_000_L1B_ORT_main_46dd9a4a_LOC',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_000/ang20231022t092801_000_L1B_ORT_main_46dd9a4a_LOC.hdr',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_000/ang20231022t092801_000_L1B_ORT_main_46dd9a4a_LOC_ORT',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_000/ang20231022t092801_000_L1B_ORT_main_46dd9a4a_LOC_ORT.hdr',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_000/ang20231022t092801_000_L1B_ORT_main_46dd9a4a_OBS',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_000/ang20231022t092801_000_L1B_ORT_main_46dd9a4a_OBS.hdr',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_000/ang20231022t092801_000_L1B_ORT_main_46dd9a4a_OBS_ORT',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_000/ang20231022t092801_000_L1B_ORT_main_46dd9a4a_OBS_ORT.hdr',
 'bioscape-data/AVNG/ang20231022t092801/ang20231022t0928

In [6]:
%%time
print(f"File Name: {sub_files[4]}")
rxr.open_rasterio(os.path.join('s3://', sub_files[4])).compute()

File Name: bioscape-data/AVNG/ang20231022t092801/ang20231022t092801_000/ang20231022t092801_000_L1B_ORT_main_46dd9a4a_OBS
CPU times: user 136 ms, sys: 74.2 ms, total: 210 ms
Wall time: 2.81 s


## 5. Direct Download

Data is available for direct download from the [JPL endpoint](https://popo.jpl.nasa.gov/avng/).

Additionally, Kit Lewers has developed a script for easy data extraction from the JPL BioSCape endpoint. Her code can be found [here](https://github.com/kllewers/BioSCrapes).