# Cloud-Optimized Geospatial Formats Overview

Google Slides version of this content: [Cloud-Optimized Geospatial Formats](https://docs.google.com/presentation/d/1F89kcrtX9LNQPTOuwyL5FRex_8--Vlg-DA8GJNzWqGk/edit?usp=sharing).

This notebook was generated using [RISE](https://rise.readthedocs.io/en/stable/). You can view the live slideshow version at TODO. You can view the slides if you have a local copy of this repository and either running a jupyter notebook server and clicking the histogram icon for starting the RISE slideshow or following the instructions on the [PDF Export page](https://rise.readthedocs.io/en/stable/exportpdf.html) of the RISE docs.

## What makes cloud-optimized challenging?
* No one size fits all approach
* Earth observation data may be processed into raster, vector and point cloud data types and stored in a long list of data formats and structures.
* Optimization depends on the user.
* Users must learn new tools and which data is accessed and how may differ depending on the user.
* ... hopefully only a few new methods and concepts are necessary.

![points-lines-polygons](./images/2019-points-lines-polygons.png)

image credit: https://ui.josiahparry.com/spatial-analysis.html#types-of-spatial-data

# What makes cloud-optimized challenging?

> There is no one-size-fits-all packaging for data, as the optimal packaging is highly use-case dependent.


[Task 51 - Cloud-Optimized Format Study](https://ntrs.nasa.gov/citations/20200001178)
Authors: Chris Durbin, Patrick Quinn, Dana Shum

File formats are read-oriented to support:

* Partial reads
* Parallel reads

# What does cloud-optimized mean?

* File metadata in one read
* When accessing data over the internet, such as when data is in cloud storage, latency is high when compared with local storage so it is preferable to fetch lots of data in fewer reads.
* An easy win is metadata in one read, which can be used to read a cloud-native dataset.
* A cloud-native dataset is one with small addressable chunks via files, internal tiles, or both.

# What does “cloud-optimized” mean?

<div style="width: 65%; float: left;">
    <ul>
        <li>Accessible over HTTP using range requests</li>
        <li>This makes it compatible with object storage (a file storage alternative to local disk) and thus accessible via HTTP, from many compute instances.</li>
        <li>Supports lazy access and intelligent subsetting.</li>
        <li>Integrates with high-level analysis libraries and distributed frameworks</li>        
    </ul>
</div>
<img alt="higher level libraries" src="./images/higher-level-libraries.png" width="35%"/>

# Formats by Data Type

| Format  | Data Type  | Standard Status  |
|:--------|:-----------|:-----------------|
| Cloud-Optimized GeoTIFF (COG)                                 | Raster                   | OGC standard for comment               |
| Zarr, Kerchunk                                                | Multi-dimensional raster | ESDIS and OGC standards in development |
| Cloud-Optimized Point Cloud (COPC), Entwine Point Tiles (EPT) | Point Clouds*            | no known ESDIS or OGC standard         |
| Flatgeobuf, GeoParquet,                                       | Vector                   | no known ESDIS, draft OGC standard     |

# Formats by Adoption

| Format  | Adoption | Standard Status   |
|:--------|:---------| :-----------------|
| Cloud-Optimized GeoTIFF (COG)                                 | Widely adopted                                            | OGC standard for comment               |
| Zarr, Kerchunk                                                | (Less) widely adopted, especially in specific communities | ESDIS and OGC standards in development |
| Entwine Point Tiles (EPT), Cloud-Optimized Point Cloud (COPC) | Less common (PDAL Supported)                              | no known ESDIS or OGC standard         |
| GeoParquet, FlatGeobuf                                        | Less common (OGR Supported)                               | no known ESDIS, draft OGC standard     |