![Landsat8](./images/nasa_landsat8.jpg "Landsat8")

# Data Ingestion - Intake

---

## Overview

In the last notebook, you learned how to efficiently load data from the Microsoft Planetary Computer platform. If that approach works for you, please proceed to a workflow example. However, there are many other established approaches and platforms (e.g. [NASA EarthData](https://search.earthdata.nasa.gov/search), [USGS EarthExplorer](https://earthexplorer.usgs.gov/), [Google](https://developers.google.com/earth-engine/datasets/catalog/landsat), and [AWS](https://registry.opendata.aws/usgs-landsat/)),  for loading Landsat data that may not preprocess the data in the exact same way or use a similar API. For this reason, in this notebook we will demonstrate common alternative approaches and techniques for general data access, centered around [Intake](https://intake.readthedocs.io).

First, to demonstrate simple cases, we will begin by reading in a local files with [Pandas](https://pandas.pydata.org/docs/) and [xarray](https://docs.xarray.dev/en/stable/). Then we will demonstrate the combined use of [Intake](https://intake.readthedocs.io) and [Dask](https://docs.dask.org/en/stable/) libraries to efficiently fetch data from a remote server.

<div class="admonition alert alert-info">
    <p class="admonition-title" style="font-weight:bold">Info</p>
        A great way to contribute to this cookbook is to create a notebook that focuses on data access from a specific provider.
</div>

## Prerequisites

| Concepts | Importance | Notes |
| --- | --- | --- |
| [Intro to Landsat](./0.0_Intro_Landsat.ipynb) | Necessary | Background |
| [Data Ingestion - Planetary Computer](1.0_Data_Ingestion-Planetary_Computer.ipynb) | Helpful | |
| [Pandas Cookbook](https://foundations.projectpythia.org/core/pandas.html) | Helpful |  |
| [xarray Cookbook](https://foundations.projectpythia.org/core/xarray.html) | Necessary |  |
| [Intake Quickstart](https://intake.readthedocs.io/en/latest/index.html) | Helpful |  |
|[Intake Cookbook](https://projectpythia.org/intake-cookbook/README.html)| Necessary | |

- **Time to learn**: 20 minutes

---

## Imports

In [1]:
import pandas as pd
import xarray as xr
import intake
import panel as pn
import hvplot.xarray
import planetary_computer

import warnings
warnings.simplefilter('ignore', FutureWarning) # Ignore warning about the format of epsg codes

In [2]:
url = "https://planetarycomputer.microsoft.com/api/stac/v1"
data_types = intake.readers.datatypes.recommend(url)
print(data_types)

[<class 'intake.readers.datatypes.STACJSON'>, <class 'intake.readers.datatypes.Handle'>, <class 'intake.readers.datatypes.JSONFile'>, <class 'intake.readers.datatypes.CatalogAPI'>, <class 'intake.readers.datatypes.TiledService'>]


In [3]:
data_type = data_types[0](url)
data_type

STACJSON, {'url': 'https://planetarycomputer.microsoft.com/api/stac/v1', 'storage_options': None, 'metadata': {}}

In [4]:
readers = data_type.possible_readers
print(readers)

{'importable': [<class 'intake.readers.catalogs.StacCatalogReader'>, <class 'intake.readers.catalogs.StackBands'>, <class 'intake.readers.catalogs.StacSearch'>, <class 'intake.readers.readers.DaskJSON'>, <class 'intake.readers.readers.FileByteReader'>, <class 'intake.readers.readers.FileExistsReader'>], 'not_importable': [<class 'intake.readers.readers.AwkwardJSON'>, <class 'intake.readers.readers.RayJSON'>, <class 'intake.readers.readers.DuckJSON'>, <class 'intake.readers.readers.PolarsJSON'>, <class 'intake.readers.readers.RayBinary'>]}


In [5]:
reader = intake.readers.catalogs.StacCatalogReader(
    data_type, signer=planetary_computer.sign_inplace
)
reader

StacCatalogReader reader producing intake.readers.entry:Catalog

In [6]:
stac_cat = reader.read()

In [9]:
metadata = {}
for data_description in stac_cat.data.values():
    data = data_description.kwargs["data"]
    metadata[data["id"]] = data["description"]
list(metadata.keys())

['daymet-annual-pr',
 'daymet-daily-hi',
 '3dep-seamless',
 '3dep-lidar-dsm',
 'fia',
 'sentinel-1-rtc',
 'gridmet',
 'daymet-annual-na',
 'daymet-monthly-na',
 'daymet-annual-hi',
 'daymet-monthly-hi',
 'daymet-monthly-pr',
 'gnatsgo-tables',
 'hgb',
 'cop-dem-glo-30',
 'cop-dem-glo-90',
 'goes-cmi',
 'terraclimate',
 'nasa-nex-gddp-cmip6',
 'gpm-imerg-hhr',
 'gnatsgo-rasters',
 '3dep-lidar-hag',
 'io-lulc-annual-v02',
 '3dep-lidar-intensity',
 '3dep-lidar-pointsourceid',
 'mtbs',
 'noaa-c-cap',
 '3dep-lidar-copc',
 'modis-64A1-061',
 'alos-fnf-mosaic',
 '3dep-lidar-returns',
 'mobi',
 'landsat-c2-l2',
 'era5-pds',
 'chloris-biomass',
 'kaza-hydroforecast',
 'planet-nicfi-analytic',
 'modis-17A2H-061',
 'modis-11A2-061',
 'daymet-daily-pr',
 '3dep-lidar-dtm-native',
 '3dep-lidar-classification',
 '3dep-lidar-dtm',
 'gap',
 'modis-17A2HGF-061',
 'planet-nicfi-visual',
 'gbif',
 'modis-17A3HGF-061',
 'modis-09A1-061',
 'alos-dem',
 'alos-palsar-mosaic',
 'deltares-water-availability',
 

In [10]:
print("1:", metadata["landsat-c2-l1"])
print("2:", metadata["landsat-c2-l2"])

1: Landsat Collection 2 Level-1 data, consisting of quantized and calibrated scaled Digital Numbers (DN) representing the multispectral image data. These [Level-1](https://www.usgs.gov/landsat-missions/landsat-collection-2-level-1-data) data can be [rescaled](https://www.usgs.gov/landsat-missions/using-usgs-landsat-level-1-data-product) to top of atmosphere (TOA) reflectance and/or radiance. Thermal band data can be rescaled to TOA brightness temperature.

This dataset represents the global archive of Level-1 data from [Landsat Collection 2](https://www.usgs.gov/core-science-systems/nli/landsat/landsat-collection-2) acquired by the [Multispectral Scanner System](https://landsat.gsfc.nasa.gov/multispectral-scanner-system/) onboard Landsat 1 through Landsat 5 from July 7, 1972 to January 7, 2013. Images are stored in [cloud-optimized GeoTIFF](https://www.cogeo.org/) format.

2: Landsat Collection 2 Level-2 [Science Products](https://www.usgs.gov/landsat-missions/landsat-collection-2-leve

In [11]:
landsat_reader = stac_cat["landsat-c2-l2"]

In [32]:
landsat_reader.read()

Catalog
 named datasets: ['geoparquet-items', 'thumbnail']

In [23]:
landsat_thumbnail = landsat_reader["thumbnail"].read()
pn.pane.Image(landsat_thumbnail)


BokehModel(combine_events=True, render_bundle={'docs_json': {'476e8edb-874d-4490-abb4-e2eb54493117': {'version…

In [26]:
landsat_items = landsat_reader["geoparquet-items"]
landsat_ddf = landsat_items.kwargs["data"].to_reader("dask").read()

In [35]:
landsat_ddf.head()

Unnamed: 0,type,stac_version,stac_extensions,id,geometry,bbox,links,assets,collection,gsd,...,landsat:wrs_row,landsat:scene_id,landsat:wrs_path,landsat:wrs_type,view:sun_azimuth,landsat:correction,view:sun_elevation,landsat:cloud_cover_land,landsat:collection_number,landsat:collection_category
0,Feature,1.0.0,[https://stac-extensions.github.io/raster/v1.0...,LT04_L2SP_005021_19820824_02_T2,b'\x01\x03\x00\x00\x00\x01\x00\x00\x00\x05\x00...,"[-56.48370366, 54.85543528, -52.52843421, 56.9...",[{'href': 'https://planetarycomputer.microsoft...,{'ang': {'description': 'Collection 2 Level-1 ...,landsat-c2-l2,30,...,21,LT40050211982236PAC00,5,2,148.383289,L2SP,41.575283,0.0,2,T2
1,Feature,1.0.0,[https://stac-extensions.github.io/raster/v1.0...,LT04_L2SP_005022_19820824_02_T2,b'\x01\x03\x00\x00\x00\x01\x00\x00\x00\x05\x00...,"[-57.14583596, 53.46625525, -53.34619289, 55.4...",[{'href': 'https://planetarycomputer.microsoft...,{'ang': {'description': 'Collection 2 Level-1 ...,landsat-c2-l2,30,...,22,LT40050221982236PAC00,5,2,146.913681,L2SP,42.56378,53.0,2,T2
2,Feature,1.0.0,[https://stac-extensions.github.io/raster/v1.0...,LT04_L2SP_005023_19820824_02_T1,b'\x01\x03\x00\x00\x00\x01\x00\x00\x00\x05\x00...,"[-57.81007877, 52.05865522, -54.13956097, 54.0...",[{'href': 'https://planetarycomputer.microsoft...,{'ang': {'description': 'Collection 2 Level-1 ...,landsat-c2-l2,30,...,23,LT40050231982236PAC00,5,2,145.418622,L2SP,43.54541,24.0,2,T1
3,Feature,1.0.0,[https://stac-extensions.github.io/raster/v1.0...,LT04_L2SP_005024_19820824_02_T1,b'\x01\x03\x00\x00\x00\x01\x00\x00\x00\x05\x00...,"[-58.41818223, 50.67222521, -54.85626831, 52.6...",[{'href': 'https://planetarycomputer.microsoft...,{'ang': {'description': 'Collection 2 Level-1 ...,landsat-c2-l2,30,...,24,LT40050241982236PAC00,5,2,143.93897,L2SP,44.491386,57.0,2,T1
4,Feature,1.0.0,[https://stac-extensions.github.io/raster/v1.0...,LT04_L2SP_005025_19820824_02_T1,b'\x01\x03\x00\x00\x00\x01\x00\x00\x00\x05\x00...,"[-59.00151608, 49.2550752, -55.54549517, 51.24...",[{'href': 'https://planetarycomputer.microsoft...,{'ang': {'description': 'Collection 2 Level-1 ...,landsat-c2-l2,30,...,25,LT40050251982236PAC00,5,2,142.421677,L2SP,45.427839,54.0,2,T1


In [36]:
selected_item = landsat_ddf.head(1).iloc[0]
bands_of_interest = ["red", "green", "blue"]
bbox = selected_item["bbox"]

In [38]:
selected_item

type                                                                     Feature
stac_version                                                               1.0.0
stac_extensions                [https://stac-extensions.github.io/raster/v1.0...
id                                               LT04_L2SP_005021_19820824_02_T2
geometry                       b'\x01\x03\x00\x00\x00\x01\x00\x00\x00\x05\x00...
bbox                           [-56.48370366, 54.85543528, -52.52843421, 56.9...
links                          [{'href': 'https://planetarycomputer.microsoft...
assets                         {'ang': {'description': 'Collection 2 Level-1 ...
collection                                                         landsat-c2-l2
gsd                                                                           30
created                                              2022-05-06T17:20:50.200161Z
sci:doi                                                         10.5066/P9IAXOVV
datetime                    

In [50]:
for link in selected_item["links"]:
    if link["rel"] == "self":
        href = link["href"]
        break

In [51]:
intake.readers.datatypes.recommend(href)

[intake.readers.datatypes.STACJSON,
 intake.readers.datatypes.Handle,
 intake.readers.datatypes.JSONFile,
 intake.readers.datatypes.CatalogAPI,
 intake.readers.datatypes.TiledService]

In [65]:
stac_json = intake.readers.datatypes.STACJSON(href)

In [73]:
stac_json.possible_readers

{'importable': [intake.readers.catalogs.StacCatalogReader,
  intake.readers.catalogs.StackBands,
  intake.readers.catalogs.StacSearch,
  intake.readers.readers.DaskJSON,
  intake.readers.readers.FileByteReader,
  intake.readers.readers.FileExistsReader],
 'not_importable': [intake.readers.readers.AwkwardJSON,
  intake.readers.readers.RayJSON,
  intake.readers.readers.DuckJSON,
  intake.readers.readers.PolarsJSON,
  intake.readers.readers.RayBinary]}

In [87]:
reader = intake.readers.DaskJSON(stac_json)

In [96]:
row = reader.read().compute().iloc[0]

In [119]:
assets = json.loads(row["assets"].replace("'", '"'))

In [140]:
assets

{'qa': {'href': 'https://landsateuwest.blob.core.windows.net/landsat-c2/level-2/standard/tm/1982/005/021/LT04_L2SP_005021_19820824_20200918_02_T2/LT04_L2SP_005021_19820824_20200918_02_T2_ST_QA.TIF',
  'type': 'image/tiff; application=geotiff; profile=cloud-optimized',
  'roles': ['data'],
  'title': 'Surface Temperature Quality Assessment Band',
  'description': 'Collection 2 Level-2 Quality Assessment Band (ST_QA) Surface Temperature Product',
  'raster:bands': [{'unit': 'kelvin',
    'scale': 0.01,
    'nodata': -9999,
    'data_type': 'int16',
    'spatial_resolution': 30}]},
 'ang': {'href': 'https://landsateuwest.blob.core.windows.net/landsat-c2/level-2/standard/tm/1982/005/021/LT04_L2SP_005021_19820824_20200918_02_T2/LT04_L2SP_005021_19820824_20200918_02_T2_ANG.txt',
  'type': 'text/plain',
  'roles': ['metadata'],
  'title': 'Angle Coefficients File',
  'description': 'Collection 2 Level-1 Angle Coefficients File'},
 'red': {'href': 'https://landsateuwest.blob.core.windows.net/l

In [125]:
pn.pane.PNG(assets["rendered_preview"]["href"])

BokehModel(combine_events=True, render_bundle={'docs_json': {'c1007665-5882-4d8b-b14c-f1eacefa3e67': {'version…

In [130]:
tif = intake.readers.datatypes.TIFF('https://landsateuwest.blob.core.windows.net/landsat-c2/level-2/standard/tm/1982/005/021/LT04_L2SP_005021_19820824_20200918_02_T2/LT04_L2SP_005021_19820824_20200918_02_T2_ST_QA.TIF')

In [139]:
ds = intake.readers.RasterIOXarrayReader(tif).read(signer=planetary_computer.sign)

FileNotFoundError: https://landsateuwest.blob.core.windows.net/landsat-c2/level-2/standard/tm/1982/005/021/LT04_L2SP_005021_19820824_20200918_02_T2/LT04_L2SP_005021_19820824_20200918_02_T2_ST_QA.TIF