# Sliderule Output to S3

```{admonition} Learning Objectives
- basics of Parquet and Geoparquet formats
- how to output Sliderule results as parquet files on S3
- how to work with outputs on S3
```

In [None]:
from sliderule import sliderule, icesat2
import geopandas as gpd
import s3fs
import os
import boto3

```{tip}
Parquet is cloud-optimized format. At a very basic level, it is for tabular data. Unlike CSV files which are stored as plain text and writen row-wise, Parquet is a columnar binary format that is well-suited to hosting on S3 for data analysis.
```

Sliderule documentation has an extensive description of [Parquet](https://slideruleearth.io/web/rtd/user_guide/GeoParquet.html). And a [tutorial](https://slideruleearth.io/web/rtd/tutorials/user/geoparquet_output.html) with code examples! 

Here we will show a basic example to output this data to S3. As this example was put together for ICESat-2 Hackweek 2023, we are using CryoCloud JupyterHub which has a preconfigured S3 bucket.

## Set Area of Interest

We will use a geojson file from the [sliderule GitHub Repository over Grand Mesa, Colorado. 

In [None]:
gfa = gpd.read_file('https://raw.githubusercontent.com/ICESat2-SlideRule/sliderule-python/main/data/grandmesa.geojson')

In [None]:
folium_map = gfa.explore(tiles="Stamen Terrain", 
                         style_kwds=dict(fill=False, color='magenta'),
                        )
folium_map

## Configure SlideRule

In [None]:
# Connect to server
icesat2.init("slideruleearth.io")

In [None]:
# Sliderule Processing Parameters
parms = {
    "poly": sliderule.toregion(gfa)["poly"],
    "srt": icesat2.SRT_LAND,
    "cnf": icesat2.CNF_SURFACE_HIGH,
    "len": 40.0,
    "res": 20.0,
    "maxi": 6
}

### Get Temporary AWS Credentials (JupyterHub)

```{warning}
This will only work on CryoCloud JupyterHub
```

In [None]:
# Get Temporary AWS Credentials on CryoCloud JupyterHub
client = boto3.client('sts')
# https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sts/client/assume_role_with_web_identity.html

with open(os.environ['AWS_WEB_IDENTITY_TOKEN_FILE']) as f:
    TOKEN = f.read()

response = client.assume_role_with_web_identity(
    RoleArn=os.environ['AWS_ROLE_ARN'],
    RoleSessionName=os.environ['JUPYTERHUB_CLIENT_ID'],
    WebIdentityToken=TOKEN,
    DurationSeconds=3600
)

ACCESS_KEY_ID = response['Credentials']['AccessKeyId']
SECRET_ACCESS_KEY_ID = response['Credentials']['SecretAccessKey']
SESSION_TOKEN = response['Credentials']['SessionToken']

### Configure Parquet and S3 Output

In [None]:
S3_OUTPUT = 's3://nasa-cryo-scratch/sliderule-example/grandmesa.parquet'

parms["output"] = {
    "path": S3_OUTPUT, 
    "format": "parquet", 
    "open_on_complete": False,
    "region": "us-west-2",
    "credentials": {
         "aws_access_key_id": ACCESS_KEY_ID,
         "aws_secret_access_key": SECRET_ACCESS_KEY_ID,
         "aws_session_token": SESSION_TOKEN
     }
}

### Run SlideRule processing

In [None]:
%%time

output_path = icesat2.atl06p(parms,  version='006')
output_path

## Read output from S3


In [None]:
gf = gpd.read_parquet(output_path)

In [None]:
print("Start:", gf.index.min().strftime('%Y-%m-%d'))
print("End:", gf.index.max().strftime('%Y-%m-%d'))
print("Reference Ground Tracks: {}".format(gf["rgt"].unique()))
print("Cycles: {}".format(gf["cycle"].unique()))
print("Elevation Measurements: {} ".format(gf.shape[0]))
gf.head(2)

100,000+ is a lot of points to visualize! Let's randomly sample 1000 of them and plot on our map

In [None]:
# Need to turn timestamps into strings first
points = gf.sample(1000).reset_index()
points['time'] = points.time.dt.strftime('%Y-%m-%d')
points.explore(column='h_mean', m=folium_map)

## Summary

We processed all ATL03 v006 data covering Grand Mesa, Colorado spanning 2018-10-16 to 2023-03-07 to ATL06-SR elevations. We output our results in GeoParquet format to an AWS S3 bucket and quickly visualized some of the results.