![Landsat8](./images/nasa_landsat8.jpg "Landsat8")

# Data Ingestion - Intake

---

## Overview

In the last notebook, you learned how to efficiently load data from the Microsoft Planetary Computer platform. If that approach works for you, please proceed to a workflow example. In this notebook we will demonstrate common alternative approaches and techniques for general data access, centered around [Intake](https://intake.readthedocs.io).

<div class="admonition alert alert-info">
    <p class="admonition-title" style="font-weight:bold">Info</p>
        A great way to contribute to this cookbook is to create a notebook that focuses on data access from a specific provider.
</div>

## Prerequisites

| Concepts | Importance | Notes |
| --- | --- | --- |
| [Intro to Landsat](./0.0_Intro_Landsat.ipynb) | Necessary | Background |
| [Data Ingestion - Planetary Computer](1.0_Data_Ingestion-Planetary_Computer.ipynb) | Helpful | |
| [Pandas Cookbook](https://foundations.projectpythia.org/core/pandas.html) | Helpful |  |
| [xarray Cookbook](https://foundations.projectpythia.org/core/xarray.html) | Necessary |  |
| [Intake Quickstart](https://intake.readthedocs.io/en/latest/index.html) | Helpful |  |
|[Intake Cookbook](https://projectpythia.org/intake-cookbook/README.html)| Necessary | |

- **Time to learn**: 20 minutes

---

## Imports

In [None]:
import intake
import hvplot.xarray
import planetary_computer

import warnings
warnings.simplefilter('ignore', FutureWarning) # Ignore warning about the format of epsg codes

To get started, we need to provide a STAC URL (or any other data source URL) to intake, and we can ask intake to recommend some suitable datatypes.

In [None]:
url = "https://planetarycomputer.microsoft.com/api/stac/v1"
data_types = intake.readers.datatypes.recommend(url)
print(data_types)

We will use STACJSON to read the URL.

In [None]:
data_type = intake.datatypes.STACJSON(url)
data_type

Similarly, we can check out the possible readers to use with the STACJSON datatype.

In [None]:
readers = data_type.possible_readers
print(readers)

The StacCatalogReader is probably the most suitable for our use case. We can use it to read the STAC catalog and explore the available contents.

In [None]:
reader = intake.catalogs.StacCatalogReader(
    data_type, signer=planetary_computer.sign_inplace
)
reader

We can read the catalog and see what's available:

In [None]:
stac_cat = reader.read()

In [None]:
metadata = {}
for data_description in stac_cat.data.values():
    data = data_description.kwargs["data"]
    metadata[data["id"]] = data["description"]
list(metadata.keys())

We can print the description of the desired IDs.

In [None]:
print("1:", metadata["landsat-c2-l1"])
print("2:", metadata["landsat-c2-l2"])

Specifically, we want landsat-c2-l2.

In [None]:
landsat_reader = stac_cat["landsat-c2-l2"]

We can see the metadata below.

In [None]:
landsat_reader.read().metadata

We can get a preview of the dataset by looking at the thumbnail.

In [None]:
# data as array
landsat_reader["thumbnail"].read()

In [None]:
# render with panel
landsat_reader["thumbnail"].to_reader("panel")

If that's desired, we can move on to get the items in the catalog.


In [None]:
landsat_items = landsat_reader["geoparquet-items"]
landsat_items

In [None]:
# note `output_instance`: this is because .tail() makes a pandas from a dask dataframe. GeoDataFrameToSTACCatalog
# works specifically with geopandas only
cat = landsat_items.tail(output_instance="geopandas:GeoDataFrame").GeoDataFrameToSTACCatalog.read()

In [None]:
# this is an "item collection"; each item is a set of assets (many levels here)
cat

Repeat the process aforementioned.

In [None]:
item_key = list(cat.entries.keys())[0]
subcat = cat[item_key].read()
subcat

In [None]:
# single image in one band
subcat.red.read()

In [None]:
# unfortunately, the "signer" didn't make it through
catbands = cat[item_key].to_reader(reader="StackBands", bands=["red", "green", "blue"], signer=planetary_computer.sign_inplace)

Then, we can load the actual assets.

In [None]:
# multiband image. Unfortunately, the value of the "band" variable in each input is 1, not the real
# value; they could be relabelled here
data = catbands.read(dim="band")
data

Now, we can plot the true color imagery with the extracted bands.

In [None]:
data.plot.imshow(robust=True, figsize=(10, 10))