#### Part 1: Understand what data is available and how STAC organizes it.

1. Connect to the Planetary Computer STAC catalog using `pystac-client`.

2. Find the Landsat Collection 2 Level-2 dataset. Per [Project Pythia](https://projectpythia.org/landsat-ml-cookbook/notebooks/data-ingestion-geospatial/):
> Weâ€™ll use the `landsat-c2-l2` dataset, which stands for Collection 2 Level-2. It contains data from several landsat missions and has better data quality than Level 1 (`landsat-c2-l1`). Microsoft Planetary Computer has descriptions of [Level 1](https://planetarycomputer.microsoft.com/dataset/landsat-c2-l1) and [Level 2](https://planetarycomputer.microsoft.com/dataset/landsat-c2-l2), but a direct and succinct comparison can be found in [this community post](https://gis.stackexchange.com/questions/439767/landsat-collections), and the information can be verified with [USGS](https://www.usgs.gov/landsat-missions/landsat-collection-2).

3. Explore the metadata by simply examining the collection object. What's the description? STAC version? Check out the links, the ID, the license, keywords, etc. List the assets available in this collection. What bands are available? What other assets (e.g., QA bands, thumbnails) are included?




In [1]:
from pystac_client import Client

catalog = Client.open("https://planetarycomputer.microsoft.com/api/stac/v1")

In [14]:
collection = catalog.get_collection('landsat-c2-l2')
collection.title

'Landsat Collection 2 Level-2'

In [18]:
collection

#### Part 2: Query for imagery over your study area and verify the results make sense.

4. Define a bounding box for Singapore. Hint: we deliberately chose Singapore because it's small enough to fit within a single Landsat tile; this simplifies loading.

5. Set up a Dask cluster for parallel processing. Use `odc.stac.configure_rio(cloud_defaults=True, client=client)` to configure `rasterio` for efficient cloud COG access. This step is criticalâ€”without it, loading data from the cloud will be extremely slow.

6. Search for Landsat scenes within your bounding box for 2025. Filter for scenes with less than 50% cloud cover. (Singapore is cloudy; you may need to adjust this threshold to get enough scenes.)

7. Preview your results. How many scenes did you get? Look at some thumbnails to verify you're getting what you expect.

8. Visualize the STAC item footprints on a map. Do the tiles cover your entire bounding box? Are there gaps?

9. Create a time series plot showing cloud cover percentage per scene across the year.

In [28]:

bbox = [103.585144,1.124902,104.101501,1.483032]  
datetime = "2025-01-01/2025-12-31"  
cloudy_less_than = 50  # percent

search = catalog.search(
    collections=[collection],
    bbox=bbox,
    datetime=datetime,
    query={"eo:cloud_cover": {"lt": cloudy_less_than}}
)
search

items = search.item_collection()
print(f"Returned {len(items)} Items:")
[[i, item.id] for i, item in enumerate(items)]

Returned 23 Items:


[[0, 'LC08_L2SP_125059_20251205_02_T1'],
 [1, 'LC09_L2SP_125059_20251026_02_T1'],
 [2, 'LC08_L2SP_125059_20251018_02_T1'],
 [3, 'LC08_L2SP_125059_20251002_02_T1'],
 [4, 'LC09_L2SP_125059_20250924_02_T1'],
 [5, 'LC08_L2SP_125059_20250916_02_T2'],
 [6, 'LC08_L2SP_125059_20250831_02_T1'],
 [7, 'LC08_L2SP_125059_20250815_02_T1'],
 [8, 'LC09_L2SP_125059_20250807_02_T1'],
 [9, 'LC08_L2SP_125059_20250730_02_T1'],
 [10, 'LC09_L2SP_125059_20250706_02_T1'],
 [11, 'LC09_L2SP_125059_20250604_02_T1'],
 [12, 'LC08_L2SP_125059_20250527_02_T1'],
 [13, 'LC08_L2SP_125059_20250511_02_T1'],
 [14, 'LC09_L2SP_125059_20250503_02_T1'],
 [15, 'LC08_L2SP_125059_20250425_02_T1'],
 [16, 'LC08_L2SP_125059_20250409_02_T1'],
 [17, 'LC09_L2SP_125059_20250401_02_T1'],
 [18, 'LC08_L2SP_125059_20250308_02_T1'],
 [19, 'LC08_L2SP_125059_20250220_02_T1'],
 [20, 'LC09_L2SP_125059_20250212_02_T1'],
 [21, 'LC09_L2SP_125059_20250127_02_T2'],
 [22, 'LC08_L2SP_125059_20250103_02_T1']]

In [None]:
from odc.stac import configure_rio, stac_load
from dask.distributed import Client, LocalCluster