# Data Access on the Planetary Computer

All the data on the Planteary Computer stored in Azure Blob Storage. You *could* use APIs like `azure.storage.blob` to list blobs in containers to find the ones you want, but I wouldn't recommend it. Instead, we start with the Planetary Computer's STAC API.

- Quickstart (Python): https://planetarycomputer.microsoft.com/docs/quickstarts/reading-stac/
- Quickstart (R): https://planetarycomputer.microsoft.com/docs/quickstarts/reading-stac-r/
- Reference: https://planetarycomputer.microsoft.com/docs/reference/stac/

## Item search with STAC

The Planetary Computer uses STAC, the SpatioTemporal Asset Catalog, to catalog all of the data available to you.



In [None]:
import pystac_client

catalog = pystac_client.Client.open("https://planetarycomputer.microsoft.com/api/stac/v1")
catalog

We'll be working with items roughly matching [this search](https://planetarycomputer.microsoft.com/explore?c=123.9678%2C-16.9770&z=8.71&d=sentinel-2-l2a&m=Most+recent+%28low+cloud%29&r=Natural+color) from the Explorer.

In [None]:
%%time
aoi = {
  "type": "Polygon",
  "coordinates": [
    [
      [122.97285864599655, -17.716650135965253],
      [124.96269843337086, -17.716650135965253],
      [124.96269843337086, -16.234522088978864],
      [122.97285864599655, -16.234522088978864],
      [122.97285864599655, -17.716650135965253]
    ]
  ]
}

items = catalog.search(
    collections=["sentinel-2-l2a"],
    datetime="2022-01-01/2022-01-24",
    intersects=aoi,
).get_all_items()
len(items)

STAC items are OGC Features, and so can be treated kind of like data themselves. For example, we can load the items into geopandas to plot the footprints.

In [None]:
import pandas as pd
import geopandas

df = (
    geopandas.GeoDataFrame.from_features(
        items.to_dict()["features"]
    ).set_crs(4326)
)
df.head()

In [None]:
m = df[["geometry", "datetime", "s2:mgrs_tile"]].explore(column="s2:mgrs_tile", style_kwds=dict(fillOpacity=0.1))
m

In [None]:
df.assign(datetime=pd.to_datetime(df.datetime)).set_index("datetime")["eo:cloud_cover"].plot(style="k.");

Thus far, we've just worked with the metadata from the STAC API. To actually load the *data* from Azure Blob Storage, we'll first sign the assets.

In [None]:
import planetary_computer

signed_items = planetary_computer.sign(items)

In [None]:
import rasterio.plot
import matplotlib.pyplot as plt

In [None]:
fig, ax = plt.subplots(figsize=(10, 10))

ds = rasterio.open(signed_items[11].assets["preview"].href)
rasterio.plot.show(ds, ax=ax)
ax.set_axis_off()

If you're working with a single asset, then you can use `rioxarray` to load in the geospatially referenced data.

In [None]:
%%time
import rioxarray

ds = rioxarray.open_rasterio(signed_items[0].assets["B04"].href)
ds

If you're working with multiple assets or items, then you can use `stackstac` to build a data cube.

In [None]:
import stackstac

ds = stackstac.stack(signed_items, assets=["B03", "B04", "B05"])
ds

## Accessing Tabular data

For tabular data, we'll again use STAC.

In [None]:
fia = catalog.get_collection("us-census")
items = {
    item.id: item for item in fia.get_all_items()
}
items

In [None]:
import pandas as pd
import dask.dataframe as dd
import dask_geopandas

In [None]:
states = items["2020-cb_2020_us_state_500k"]
asset = planetary_computer.sign(states.assets["data"])
states_df = (
    geopandas.read_parquet(asset.href, storage_options=asset.extra_fields["table:storage_options"])
    .assign(geometry=lambda df: df.simplify(tolerance=0.01))
)
states_df.explore()

For large tables, you can use `dask.dataframe` or `dask-geopandas`.

In [None]:
geo_asset = planetary_computer.sign(items["2020-census-blocks-geo"]).assets["data"]

geo = dask_geopandas.read_parquet(
    geo_asset.href,
    storage_options=geo_asset.extra_fields["table:storage_options"],
)
geo

In [None]:
%time len(geo)

In [None]:
pop_asset = planetary_computer.sign(items["2020-census-blocks-population"]).assets["data"]

pop = dask_geopandas.read_parquet(
    pop_asset.href,
    storage_options=pop_asset.extra_fields["table:storage_options"],
)
pop

In [None]:
df = dd.merge(geo, pop)
df

### Earth Systems Science data

This is typically stored as Zarr or NetCDF and loaded into xarray

In [None]:
import pystac

collection = catalog.get_collection("terraclimate")
collection

In [None]:
asset = planetary_computer.sign(collection.assets["zarr-abfs"])
asset

In [None]:
import fsspec
import xarray as xr

store = fsspec.get_mapper(asset.href, **asset.extra_fields["xarray:storage_options"])
ds = xr.open_zarr(store, consolidated=True)
ds

In [None]:
import cartopy.crs as ccrs
import matplotlib.pyplot as plt


average_max_temp = ds.isel(time=-1)["tmax"].coarsen(lat=8, lon=8).mean().load()

fig, ax = plt.subplots(figsize=(20, 10), subplot_kw=dict(projection=ccrs.Robinson()))

average_max_temp.plot(ax=ax, transform=ccrs.PlateCarree())
ax.coastlines();