# Introduction

A *very brief* introduction to the Planetary Computer.

We host lots of geospatial data. Anyone can use it (ideally from Azure!).

<img src="data-catalog.png" width="50%"/>

We provide APIs for searching and working with that data

In [None]:
from IPython.display import IFrame

IFrame("https://planetarycomputer-staging.microsoft.com/api/stac/v1/docs", width=800, height=400)

Azure has many ways of doing compute. We host a Dask-enabled JupyterHub. Right now we're on another JupyterHub deployment.

That's the Planetary Computer: Data + APIs + Compute, all on Azure.

## Cloud-native

1. We host public datasets.
2. *Everyone* has *direct* access to *all* of the data.
3. Compute is next to the data.
4. Cloud-native file formats
5. Scale

### Compute → Data

Putting the compute next to the data can be crucial for performance. Let's consider the simple task of reading the metadata from a COG file with `gdalinfo`.

Running this command from my laptop in Des Moines, IA, we spend a *lot* of time waiting:

```console
$ time gdalinfo /vsicurl/https://naipeuwest.blob.core.windows.net/naip/v002/ia/2019/ia_60cm_2019/42091/m_4209150_sw_15_060_20190828.tif > /dev/null
real    0m7.158s
user    0m0.195s
sys     0m0.032s
```

Running that from this Jupyter kernel, which is in the same Azure data center as the dataset, things look different.

In [None]:
!time gdalinfo /vsicurl/https://naipeuwest.blob.core.windows.net/naip/v002/ia/2019/ia_60cm_2019/42091/m_4209150_sw_15_060_20190828.tif > /dev/null

So about 7s -> 0.2s!

## STAC

Having access to the data is great, but it's not enough. How do you find all the Sentinel-2 images over Wyoming for July 2021? Consider what we'd do if we just had files in blob storage:

In [None]:
import adlfs
import planetary_computer

token = planetary_computer.sas.get_token("sentinel2l2a01", "sentinel2-l2").token

fs = adlfs.AzureBlobFileSystem("sentinel2l2a01", credential=token)
fs.ls("sentinel2-l2/01/C/DH/2021/")  # ...?

Some of those kinda look like dates. I don't know what the "C" and "DH" mean.

But STAC makes this kind of spatio-temporal filtering straightforward.

In [None]:
import pystac_client

catalog = pystac_client.Client.open("https://planetarycomputer.microsoft.com/api/stac/v1")

In [None]:
wyoming_bbox = [-111.0717, 41.0296, -103.9965, 45.02695]
search = catalog.search(collections=["sentinel-2-l2a"], bbox=wyoming_bbox, datetime="2021-07-01/2021-07-31")
%time items = search.get_all_items()

In [None]:
len(items)

Now do the same thing for `landsat-8-c2-l2`..

In [None]:
# we'll improve this later
wyoming_bbox = [-111.0717, 41.0296, -103.9965, 45.02695]
search = catalog.search(collections=["landsat-8-c2-l2"], bbox=wyoming_bbox, datetime="2021-07-01/2021-07-31")
%time landsat_items = search.get_all_items()

In [None]:
import geopandas

df = geopandas.GeoDataFrame.from_features(items.to_dict()).set_crs(4326)

df[["geometry", "s2:mgrs_tile", "datetime"]].explore(column="s2:mgrs_tile", style_kwds={"fillOpacity": 0.1})

## Data APIs

In [None]:
item = items[0]
item.assets

In [None]:
import ipyleaflet
import requests
import shapely

In [None]:
def plot(item, map_kwargs={}):
    tiles_url, = requests.get(item.assets["tilejson"].href).json()["tiles"]
    center = shapely.geometry.shape(item.geometry).centroid.bounds[1::-1]

    m = ipyleaflet.Map(center=center, controls=[ipyleaflet.FullScreenControl()], **map_kwargs)
    layer = m.add_layer(ipyleaflet.TileLayer(url=tiles_url))
    m.scroll_wheel_zoom = True
    return m

In [None]:
plot(items[1])

In [None]:
wyoming_bbox = [-111.0717, 41.0296, -103.9965, 45.02695]
search = catalog.search(
    collections=["sentinel-2-l2a"], bbox=wyoming_bbox, datetime="2021-07-01/2021-07-31",
    query={"eo:cloud_cover": {"lt": 10}}
)
%time items = search.get_all_items()
len(items)

In [None]:
plot(items[1], map_kwargs=dict(zoom=9))