# Cloud-Optimized ICESat-2

## Cloud-Optimized HDF5

Recall from [03-cloud-optimized-data-access.ipynb](./03-cloud-optimized-data-access.ipynb) that we can make any HDF5 file cloud-optimized by restructuring the file so that all the metadata is in one place and chunks are "not too big" and "not too small". However, as users of the data, not archivers, we don't control how the file is generated and distributed, so if we're restructuring the data we might want to go with something even better - a **"cloud-native"** format.

:::{important} Cloud-Native Formats
Cloud-native formats are formats that were designed to be used in a cloud environment. This usually means that metadata and indexes for data is separated from metadata in a way that allows for logical dataset access across multiple files. In other words, it is fast to open a large dataset and access just the parts of it that you need.
:::

:::{warning}
Generating cloud-native formats is non-trivial.
:::

## Geoparquet

To demonstrate one such cloud-native format, geoparquet, we have generated a geoparquet store (see the atl08_parquet_files for more code/) for the ATL08 dataset and will visualize it using a very performant geospatial vector visualization library, [`lonboard`](https://developmentseed.org/lonboard/latest/).

:::{seealso} Resource on Geoparquet
* https://guide.cloudnativegeo.org/geoparquet/
* https://geoparquet.org/
:::

## Demo

In [1]:
import geopandas as gpd
import pyarrow.parquet as pq
from pyarrow import fs
import pyarrow.dataset as ds
from shapely import wkb

s3  = fs.S3FileSystem(region="us-west-2", anonymous=True)
dataset = pq.ParquetDataset("eodc-public/atl08_parquet/", filesystem=s3,
                            partitioning="hive", filters=[('year', '>=', 2021), ('year', '<=', 2021), ('month', '>=', 11), ('month', '<=', 11)])
table = dataset.read(columns=["h_canopy", "geometry"])
df = table.to_pandas()
df['geometry'] = df['geometry'].apply(wkb.loads)


gdf = gpd.GeoDataFrame(df, geometry='geometry')
gdf

ModuleNotFoundError: No module named 'geopandas'

In [None]:
%%time
from lonboard import Map, ScatterplotLayer
from lonboard.colormap import apply_continuous_cmap
from palettable.colorbrewer.diverging import BrBG_10

h_canopy = gdf_filtered['h_canopy']
layer.get_fill_color = apply_continuous_cmap(h_canopy, BrBG_10, alpha=0.7)

m = Map(layer)
m