# Data Access on the Planetary Computer

In this notebook, we'll take a whirlwind tour of accessing geospatial data in many flavors.

In [None]:
import pystac_client
import planetary_computer
import stackstac
import numpy as np
import geopandas
import warnings
import fsspec
import xarray as xr
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
import urllib.request
import matplotlib.pyplot as plt
import cartopy.crs as ccrs


warnings.filterwarnings("ignore", message="pandas.Float64Index")

catalog = pystac_client.Client.open("https://planetarycomputer.microsoft.com/api/stac/v1")

In [None]:
from distributed import Client

client = Client()
client

## Raster Data

Raster data is typically stored as Cloud Optimized GeoTIFF. Some examples include

* Satellite imagery / aerial photography
    - [Landsat C2-L2](https://planetarycomputer.microsoft.com/dataset/landsat-8-c2-l2)
    - [Sentinel 2 L2A](https://planetarycomputer.microsoft.com/dataset/sentinel-2-l2a)
    - [NAIP](https://planetarycomputer.microsoft.com/dataset/naip)
* Land use / land cover
    - [Esri / IO 10-Meter Land Cover](https://planetarycomputer.microsoft.com/dataset/io-lulc-9-class)
    - [Land Cover of Canada](https://planetarycomputer.microsoft.com/dataset/nrcan-landcover)
* Elevation
    - [COP DEM](https://planetarycomputer.microsoft.com/dataset/cop-dem-glo-30)
    - [NASADEM](https://planetarycomputer.microsoft.com/dataset/nasadem)
* "Derived variables"
    - [Chloris Biomass](https://planetarycomputer.microsoft.com/dataset/chloris-biomass)
    - [HGB](https://planetarycomputer.microsoft.com/dataset/hgb)
    - [HREA](https://planetarycomputer.microsoft.com/dataset/hrea)



In [None]:
search = catalog.search(
    bbox=[-122.28, 47.55, -121.96, 47.75],
    datetime="2020-01-01/2020-12-31",
    collections=["sentinel-2-l2a"],
    query={"eo:cloud_cover": {"lt": 25}},
)

items = search.get_all_items()
print(len(items))

In [None]:
signed_items = planetary_computer.sign(items)

In [None]:
data = (
    stackstac.stack(
        signed_items,
        assets=["B04", "B08"],  # red, nir
        resolution=100,
    )
    .where(lambda x: x > 0, other=np.nan)  # sentinel-2 uses 0 as nodata
)
data

In [None]:
red = data.sel(band="B04")
nir = data.sel(band="B08")

ndvi = (red - nir) / (red + nir)
x = ndvi.isel(time=0).persist()

In [None]:
m = stackstac.show(x, range=(-0.9, 0.9))
m.scroll_wheel_zoom = True
m

Derived products like the Chloris Biomass dataset also are fall in this bucket.

## Earth systems data

* Climate model output
* Reanalysis

Typically stored as Zarr or NetCDF.

In [None]:
terraclimate = catalog.get_collection("terraclimate")
asset = terraclimate.assets["zarr-https"]


store = fsspec.get_mapper(asset.href)
ds = xr.open_zarr(store, **asset.extra_fields["xarray:open_kwargs"])
ds

In [None]:
average_max_temp = ds.isel(time=-1)["tmax"].coarsen(lat=8, lon=8).mean().load()

fig, ax = plt.subplots(figsize=(20, 10), subplot_kw=dict(projection=ccrs.Robinson()))

average_max_temp.plot(ax=ax, transform=ccrs.PlateCarree())
ax.coastlines();


## Operational forecast data

* Weather forecast

Typcially stored as Zarr or GRIB2.

In [None]:
staging_catalog = pystac_client.Client.open(
    "https://planetarycomputer-staging.microsoft.com/api/stac/v1"
)
search = staging_catalog.search(
    collections=["ecmwf-forecast"],
    query={
        "ecmwf:stream": {"eq": "wave"},
        "ecmwf:type": {"eq": "fc"},
        "ecmwf:step": {"eq": "0h"},
    },
)
items = search.get_all_items()
item = items[0]
item

In [None]:
url = item.assets["data"].href
filename, _ = urllib.request.urlretrieve(url)

ds = xr.open_dataset(filename, engine="cfgrib")
ds

In [None]:
projection = projection = ccrs.Robinson()
fig, ax = plt.subplots(figsize=(16, 9), subplot_kw=dict(projection=projection))

ds.swh.plot(ax=ax, transform=ccrs.PlateCarree());

In [None]:
import seaborn as sns

grid = sns.jointplot(
    x=ds.mwp.data.ravel(), y=ds.swh.data.ravel(), alpha=0.25, marker=".", height=12
)
grid.ax_joint.set(xlabel="Mean wave period", ylabel="Significant wave height");

## Tabular data

Typically stored as geoparquet

In [None]:
search = catalog.search(collections=["us-census"])
items = planetary_computer.sign(search.get_all_items())
items = {x.id: x for x in items}
item = items["2020-cb_2020_us_cd116_500k"]
item

In [None]:
asset = item.assets["data"]
df = geopandas.read_parquet(asset.href, storage_options=asset.extra_fields["table:storage_options"])
df

In [None]:
maryland = df[df.STATEFP == "24"].astype({"GEOID": "category"})
maryland.explore(column="GEOID")

In [None]:
import dask_geopandas
asset = items["2020-census-blocks-geo"].assets["data"]

geo = dask_geopandas.read_parquet(
    asset.href,
    storage_options=asset.extra_fields["table:storage_options"],
)
geo

In [None]:
import dask.dataframe
asset = items["2020-census-blocks-population"].assets["data"]

pop = dask.dataframe.read_parquet(
    asset.href,
    storage_options=asset.extra_fields["table:storage_options"],
)
pop

In [None]:
df = geo.join(pop)
df

In [None]:
start = [x for x in geo.divisions if x.startswith("44")][0]
stop = "4499"

ri = geo.loc[start:stop].compute()
ri.head()

## Point-cloud data

Typically stored as COPC.