![Landsat8](./images/nasa_landsat8.jpg "Landsat8")

# Data Ingestion - Intake

---

## Overview

In the last notebook, you learned how to efficiently load data from the Microsoft Planetary Computer platform. If that approach works for you, please proceed to a workflow example. In this notebook we will demonstrate common alternative approaches and techniques for general data access, centered around [Intake](https://intake.readthedocs.io).

<div class="admonition alert alert-info">
    <p class="admonition-title" style="font-weight:bold">Info</p>
        A great way to contribute to this cookbook is to create a notebook that focuses on data access from a specific provider.
</div>

## Prerequisites

| Concepts | Importance | Notes |
| --- | --- | --- |
| [Intro to Landsat](./0.0_Intro_Landsat.ipynb) | Necessary | Background |
| [Data Ingestion - Planetary Computer](1.0_Data_Ingestion-Planetary_Computer.ipynb) | Helpful | |
| [Pandas Cookbook](https://foundations.projectpythia.org/core/pandas.html) | Helpful |  |
| [xarray Cookbook](https://foundations.projectpythia.org/core/xarray.html) | Necessary |  |
| [Intake Quickstart](https://intake.readthedocs.io/en/latest/index.html) | Helpful |  |
|[Intake Cookbook](https://projectpythia.org/intake-cookbook/README.html)| Necessary | |

- **Time to learn**: 20 minutes

---

## Imports

In [None]:
import json

import xarray as xr
import intake
import panel as pn
import hvplot.xarray
import planetary_computer
import rioxarray as rxr


import warnings
warnings.simplefilter('ignore', FutureWarning) # Ignore warning about the format of epsg codes

To get started, we need to provide a STAC URL (or any other data source URL) to intake, and we can ask intake to recommend some suitable datatypes.

In [None]:
url = "https://planetarycomputer.microsoft.com/api/stac/v1"
data_types = intake.readers.datatypes.recommend(url)
print(data_types)

We will use STACJSON to read the URL.

In [None]:
data_type = intake.datatypes.STACJSON(url)
data_type

Similarly, we can check out the possible readers to use with the STACJSON datatype.

In [None]:
readers = data_type.possible_readers
print(readers)

The StacCatalogReader is probably the most suitable for our use case. We can use it to read the STAC catalog and explore the available contents.

In [None]:
reader = intake.catalogs.StacCatalogReader(
    data_type, signer=planetary_computer.sign_inplace
)
reader

We can read the catalog and see what's available:

In [None]:
stac_cat = reader.read()
metadata = {}
for data_description in stac_cat.data.values():
    data = data_description.kwargs["data"]
    metadata[data["id"]] = data["description"]
list(metadata.keys())

We can print the description of the desired IDs.

In [None]:
print("1:", metadata["landsat-c2-l1"])
print("2:", metadata["landsat-c2-l2"])

Specifically, we want landsat-c2-l2.

In [None]:
landsat_reader = stac_cat["landsat-c2-l2"]
landsat_reader.read()

We can get a preview of the dataset by looking at the thumbnail.

In [None]:
landsat_thumbnail = landsat_reader["thumbnail"].read()
landsat_thumbnail


If that's desired, we can move on to get the items in the catalog.


In [None]:
landsat_items = landsat_reader["geoparquet-items"]
landsat_ddf = landsat_items.to_reader("dask").read()
landsat_ddf.head()

Let's select a single item to work with.

In [None]:
selected_item = landsat_ddf.tail(1).iloc[0]

In [None]:
for link in selected_item["links"].tolist():
    if link["rel"] == "self":
        href = link["href"]
        break

href

Since it's a JSON, we repeat a similar process above.

In [None]:
stac_json = intake.readers.datatypes.STACJSON(href)

In [None]:
stac_json.possible_readers

In [None]:
reader = intake.readers.DaskJSON(stac_json)

In [None]:
row = reader.read().compute().iloc[0]

Then, we can load the actual assets.

In [None]:
assets = json.loads(row["assets"].replace("'", '"'))

In [None]:
list(assets)

Now build a dataset from the assets; we use sign_inplace to properly modify the URL to use proper authentication.

In [None]:
da_list = []
for band in ["red", "green", "blue", "nir08"]:
    url = planetary_computer.sign_inplace(assets[band]["href"])
    da_list.append(rxr.open_rasterio(url).squeeze(drop=True).rename(band))
ds = xr.merge(da_list).load()

Now, we can plot the true color imagery with the extracted bands.

In [None]:
ds[["red", "green", "blue"]].to_array().plot.imshow(robust=True, figsize=(10, 10))

Or calculate and display the NDVI.

In [None]:
red = ds["red"].astype("float")
nir = ds["nir08"].astype("float")
ndvi = (nir - red) / (nir + red)
ndvi.plot.imshow(cmap="viridis", figsize=(10, 10))

To create an interactive version of the plot, we can use hvplot instead! The details will be resized as you zoom in (when you actually run this notebook, i.e. not in the static docs notebook)

In [None]:
ndvi.hvplot.image(rasterize=True, cmap="BrBG", symmetric=True)