# Direct DAAC S3 Bucket Access (BETA)
Authors: Alex Mandel (Development Seed), Brian Freitag (NASA MSFC), Jamison French (Development Seed)

Date: September 27, 2023

Description: In this tutorial, we demonstrate how to assume the MAAP data reader role to access specific DAAC buckets.

***This tutorial demonstrates an experimental feature to allow access to DAACs without using EarthDataLogin***.

This method currently works for a select number of DAACs and their EarthDataCloud datasets which are stored in AWS S3:
- [NSIDC](https://search.earthdata.nasa.gov/search?ff=Available%20in%20Earthdata%20Cloud&fdc=National%2BSnow%2Band%2BIce%2BData%2BCenter%2BDistributed%2BActive%2BArchive%2BCenter%2B%2528NSIDC%2BDAAC%2529)
- [ORNL](https://search.earthdata.nasa.gov/search?ff=Available%20in%20Earthdata%20Cloud&fdc=Oak%2BRidge%2BNational%2BLaboratory%2BDistributed%2BActive%2BArchive%2BCenter%2B%2528ORNL%2BDAAC%2529)
- [GES DISC](https://search.earthdata.nasa.gov/search?ff=Available%20in%20Earthdata%20Cloud&fdc=Goddard%2BEarth%2BSciences%2BData%2Band%2BInformation%2BServices%2BCenter%2B%2528GES%2BDISC%2529)
- [LPDAAC](https://search.earthdata.nasa.gov/search?ff=Available%20in%20Earthdata%20Cloud&fdc=Land%2BProcess%2BDistributed%2BActive%2BArchive%2BCenter%2B%2528LPDAAC%2529)
- [PODAAC](https://search.earthdata.nasa.gov/search?ff=Available%20in%20Earthdata%20Cloud&fdc=Physical%2BOceanography%2BDistributed%2BActive%2BArchive%2BCenter%2B%2528PO.DAAC%2529)

## Run This Notebook
To access and run this tutorial within MAAP's Algorithm Development Environment (ADE), please refer to the ["Getting started with the MAAP"](https://docs.maap-project.org/en/latest/getting_started/getting_started.html) section of our documentation.

Disclaimer: this tutorial **must** be run within MAAP's ADE to assume the necessary permissions. This tutorial was tested using the **vanilla** workspace image. If you encounter issues with the installs, ensure you have the latest version of pip installed.

## Additional Resources
- [Searching Granules in CMR](../search/granules.ipynb)
- [Searching Collections in CMR](../search/granules.ipynb)
- [Package: fsspec s3fs](https://s3fs.readthedocs.io/en/latest/)

## Importing Packages
If the packages below are not installed already, uncomment the following cell.

In [None]:
%pip install h5netcdf fsspec s3fs xarray rioxarray --quiet

In [1]:
import boto3
import fsspec
import xarray
import rioxarray
import matplotlib.pyplot as plt
import rasterio
from rasterio.session import AWSSession

## Access The Data
We'll create a couple helper functions to setup the assumed role session and view the data.

In [2]:
def assume_role_credentials(ssm_parameter_name):
    # Create a session using your current credentials
    session = boto3.Session()

    # Retrieve the SSM parameter
    ssm = session.client('ssm', "us-west-2")
    parameter = ssm.get_parameter(
        Name=ssm_parameter_name, 
        WithDecryption=True
    )
    parameter_value = parameter['Parameter']['Value']

    # Assume the DAAC access role
    sts = session.client('sts')
    assumed_role_object = sts.assume_role(
        RoleArn=parameter_value,
        RoleSessionName='TutorialSession'
    )

    # From the response that contains the assumed role, get the temporary 
    # credentials that can be used to make subsequent API calls
    credentials = assumed_role_object['Credentials']

    return credentials

# We can pass assumed role credentials into fsspec
def fsspec_access(credentials):
    return fsspec.filesystem(
        "s3",
        key=credentials['AccessKeyId'],
        secret=credentials['SecretAccessKey'],
        token=credentials['SessionToken']
    )

# We can also pass assumed role credentials into rasterio AWSSession
def rasterio_access(credentials):
    aws_session = AWSSession(
        aws_access_key_id=credentials['AccessKeyId'],
        aws_secret_access_key=credentials['SecretAccessKey'],
        aws_session_token=credentials['SessionToken'] 
    )
    
    return rasterio.Env(aws_session)

Initialize the assumed role sessions

In [3]:
s3_fsspec = fsspec_access(assume_role_credentials("/iam/maap-data-reader"))
s3_rasterio = rasterio_access(assume_role_credentials("/iam/maap-data-reader"))

### NSIDC DAAC Access
We can use `xarray` to open a specific GROUP within the HDF5 such as the "gt1l" track.

In [4]:
nsidc_object = "s3://nsidc-cumulus-prod-protected/ATLAS/ATL08/006/2023/06/21/ATL08_20230621235543_00272011_006_01.h5"
with s3_fsspec.open(nsidc_object) as f:
    ds = xarray.open_dataset(f, group='gt1l/land_segments', engine="h5netcdf", phony_dims='sort')
ds

### ORNL DAAC Access
We can also use `rioxarray` to inspect our TIF objects.

In [5]:
ornl_object = "s3://ornl-cumulus-prod-protected/gedi/GEDI_L4B_Gridded_Biomass_V2_1/data/GEDI04_B_MW019MW223_02_002_02_R01000M_SE.tif"

with s3_fsspec.open(ornl_object) as obj:
    data_array = rioxarray.open_rasterio(obj)
data_array

### PO DAAC Access

In [7]:
po_object = "s3://podaac-ops-cumulus-protected/OPERA_L3_DSWX-HLS_PROVISIONAL_V1/OPERA_L3_DSWx-HLS_T55GEM_20230813T235239Z_20230815T154108Z_S2B_30_v1.0_B01_WTR.tif"
with s3_fsspec.open(po_object) as obj:
    data_array = rioxarray.open_rasterio(obj)
data_array

### GES DISC Access

In [8]:
ges_disc_object = "s3://gesdisc-cumulus-prod-protected/Landslide/Global_Landslide_Nowcast.1.1/2020/Global_Landslide_Nowcast_v1.1_20201231.tif"

with s3_fsspec.open(ges_disc_object) as obj:
    data_array = rioxarray.open_rasterio(obj)
data_array

### LP DAAC Access
We can also use rasterio to directly inspect our TIF objects.

In [9]:
lp_object = "s3://lp-prod-protected/HLSL30.020/HLS.L30.T56JMN.2023225T234225.v2.0/HLS.L30.T56JMN.2023225T234225.v2.0.B11.tif"

with s3_rasterio:
    with rasterio.open(lp_object) as src:
        print(f'Width: {src.width}')
        print(f'Height: {src.height}')
        print(f'Bounds: {src.bounds}')
        print(f'CRS: {src.crs}')
        print(f'Count: {src.count}')
        print(f'Data type: {src.dtypes}')


Width: 3660
Height: 3660
Bounds: BoundingBox(left=399960.0, bottom=-3309780.0, right=509760.0, top=-3199980.0)
CRS: EPSG:32656
Count: 1
Data type: ('int16',)
