![Landsat8](./images/nasa_landsat8.jpg "Landsat8")

# Data Ingestion - Intake

---

## Overview

In the last notebook, you learned how to efficiently load data from the Microsoft Planetary Computer platform. If that approach works for you, please proceed to a workflow example. In this notebook we will demonstrate common alternative approaches and techniques for general data access, centered around [Intake](https://intake.readthedocs.io).

<div class="admonition alert alert-info">
    <p class="admonition-title" style="font-weight:bold">Info</p>
        A great way to contribute to this cookbook is to create a notebook that focuses on data access from a specific provider.
</div>

## Prerequisites

| Concepts | Importance | Notes |
| --- | --- | --- |
| [Intro to Landsat](./0.0_Intro_Landsat.ipynb) | Necessary | Background |
| [Data Ingestion - Planetary Computer](1.0_Data_Ingestion-Planetary_Computer.ipynb) | Helpful | |
| [Pandas Cookbook](https://foundations.projectpythia.org/core/pandas.html) | Helpful |  |
| [xarray Cookbook](https://foundations.projectpythia.org/core/xarray.html) | Necessary |  |
| [Intake Quickstart](https://intake.readthedocs.io/en/latest/index.html) | Helpful |  |
|[Intake Cookbook](https://projectpythia.org/intake-cookbook/README.html)| Necessary | |

- **Time to learn**: 20 minutes

---

## Imports

In [None]:
import json

import xarray as xr
import intake
import panel as pn
import hvplot.xarray
import planetary_computer
import rioxarray as rxr


import warnings
warnings.simplefilter('ignore', FutureWarning) # Ignore warning about the format of epsg codes

To get started, we need to provide a STAC URL (or any other data source URL) to intake, and we can ask intake to recommend some suitable datatypes.

In [41]:
url = "https://planetarycomputer.microsoft.com/api/stac/v1"
data_types = intake.readers.datatypes.recommend(url)
print(data_types)

[<class 'intake.readers.datatypes.STACJSON'>, <class 'intake.readers.datatypes.Handle'>, <class 'intake.readers.datatypes.JSONFile'>, <class 'intake.readers.datatypes.TiledService'>, <class 'intake.readers.datatypes.CatalogAPI'>]


We will use STACJSON to read the URL.

In [42]:
data_type = intake.datatypes.STACJSON(url)
data_type

STACJSON, {'url': 'https://planetarycomputer.microsoft.com/api/stac/v1', 'storage_options': None, 'metadata': {}}

Similarly, we can check out the possible readers to use with the STACJSON datatype.

In [43]:
readers = data_type.possible_readers
print(readers)

{'importable': [<class 'intake.readers.catalogs.StacCatalogReader'>, <class 'intake.readers.catalogs.StackBands'>, <class 'intake.readers.catalogs.StacSearch'>, <class 'intake.readers.readers.DaskJSON'>, <class 'intake.readers.readers.FileByteReader'>, <class 'intake.readers.readers.FileExistsReader'>], 'not_importable': [<class 'intake.readers.readers.RayJSON'>, <class 'intake.readers.readers.DuckJSON'>, <class 'intake.readers.readers.PolarsJSON'>, <class 'intake.readers.readers.AwkwardJSON'>, <class 'intake.readers.readers.RayBinary'>]}


The StacCatalogReader is probably the most suitable for our use case. We can use it to read the STAC catalog and explore the available contents.

In [46]:
reader = intake.catalogs.StacCatalogReader(
    data_type, signer=planetary_computer.sign_inplace
)
reader

StacCatalogReader reader producing intake.readers.entry:Catalog

We can read the catalog and see what's available:

In [48]:
stac_cat = reader.read()
metadata = {}
for data_description in stac_cat.data.values():
    data = data_description.kwargs["data"]
    metadata[data["id"]] = data["description"]
list(metadata.keys())

['daymet-annual-pr',
 'daymet-daily-hi',
 '3dep-seamless',
 '3dep-lidar-dsm',
 'fia',
 'sentinel-1-rtc',
 'gridmet',
 'daymet-annual-na',
 'daymet-monthly-na',
 'daymet-annual-hi',
 'daymet-monthly-hi',
 'daymet-monthly-pr',
 'gnatsgo-tables',
 'hgb',
 'cop-dem-glo-30',
 'cop-dem-glo-90',
 'goes-cmi',
 'terraclimate',
 'nasa-nex-gddp-cmip6',
 'gpm-imerg-hhr',
 'gnatsgo-rasters',
 '3dep-lidar-hag',
 'io-lulc-annual-v02',
 '3dep-lidar-intensity',
 '3dep-lidar-pointsourceid',
 'mtbs',
 'noaa-c-cap',
 '3dep-lidar-copc',
 'modis-64A1-061',
 'alos-fnf-mosaic',
 '3dep-lidar-returns',
 'mobi',
 'landsat-c2-l2',
 'era5-pds',
 'chloris-biomass',
 'kaza-hydroforecast',
 'planet-nicfi-analytic',
 'modis-17A2H-061',
 'modis-11A2-061',
 'daymet-daily-pr',
 '3dep-lidar-dtm-native',
 '3dep-lidar-classification',
 '3dep-lidar-dtm',
 'gap',
 'modis-17A2HGF-061',
 'planet-nicfi-visual',
 'gbif',
 'modis-17A3HGF-061',
 'modis-09A1-061',
 'alos-dem',
 'alos-palsar-mosaic',
 'deltares-water-availability',
 

We can print the description of the desired IDs.

In [49]:
print("1:", metadata["landsat-c2-l1"])
print("2:", metadata["landsat-c2-l2"])

1: Landsat Collection 2 Level-1 data, consisting of quantized and calibrated scaled Digital Numbers (DN) representing the multispectral image data. These [Level-1](https://www.usgs.gov/landsat-missions/landsat-collection-2-level-1-data) data can be [rescaled](https://www.usgs.gov/landsat-missions/using-usgs-landsat-level-1-data-product) to top of atmosphere (TOA) reflectance and/or radiance. Thermal band data can be rescaled to TOA brightness temperature.

This dataset represents the global archive of Level-1 data from [Landsat Collection 2](https://www.usgs.gov/core-science-systems/nli/landsat/landsat-collection-2) acquired by the [Multispectral Scanner System](https://landsat.gsfc.nasa.gov/multispectral-scanner-system/) onboard Landsat 1 through Landsat 5 from July 7, 1972 to January 7, 2013. Images are stored in [cloud-optimized GeoTIFF](https://www.cogeo.org/) format.

2: Landsat Collection 2 Level-2 [Science Products](https://www.usgs.gov/landsat-missions/landsat-collection-2-leve

Specifically, we want landsat-c2-l2.

In [None]:
landsat_reader = stac_cat["landsat-c2-l2"]
landsat_reader.read()

We can get a preview of the dataset by looking at the thumbnail.

In [None]:
landsat_thumbnail = landsat_reader["thumbnail"].read()
pn.pane.Image(landsat_thumbnail)


If that's desired, we can move on to get the items in the catalog.


In [None]:
landsat_items = landsat_reader["geoparquet-items"]
landsat_ddf = landsat_items.kwargs["data"].to_reader("dask").read()
landsat_ddf.head()

Let's select a single item to work with.

In [51]:
selected_item = landsat_ddf.tail(1).iloc[0]

We can extract the links.

In [None]:
for link in selected_item["links"]:
    if link["rel"] == "self":
        href = link["href"]
        break

Repeat the process aforementioned.

In [None]:
stac_json = intake.readers.datatypes.STACJSON(href)

In [None]:
stac_json.possible_readers

In [None]:
reader = intake.readers.DaskJSON(stac_json)

In [None]:
row = reader.read().compute().iloc[0]

Then, we can load the actual assets.

In [None]:
assets = json.loads(row["assets"].replace("'", '"'))

In [None]:
list(assets)

Now build a dataset from the assets; we use sign_inplace to properly modify the URL to use proper authentication.

In [None]:
da_list = []
for band in ["red", "green", "blue", "nir08"]:
    url = planetary_computer.sign_inplace(assets[band]["href"])
    da_list.append(rxr.open_rasterio(url).squeeze(drop=True).rename(band))
ds = xr.merge(da_list).load()

Now, we can plot the true color imagery with the extracted bands.

In [None]:
ds[["red", "green", "blue"]].to_array().plot.imshow(robust=True, figsize=(10, 10))

Or calculate and display the NDVI.

In [None]:
red = ds["red"].astype("float")
nir = ds["nir08"].astype("float")
ndvi = (nir - red) / (nir + red)
ndvi.plot.imshow(cmap="viridis", figsize=(10, 10))

To create an interactive version of the plot, we can use hvplot instead! The details will be resized as you zoom in (when you actually run this notebook, i.e. not in the static docs notebook)

In [None]:
ndvi.hvplot.image(rasterize=True, cmap="Greens")