## Reading Data from the STAC API

The Planetary Computer catalogs the datasets we host using the [STAC](http://stacspec.org/) (SpatioTemporal Asset Catalog) specification. We provide a [STAC API](https://github.com/radiantearth/stac-api-spec) endpoint for searching our datasets by space, time, and more. This quickstart will show you how to search for data using our STAC API and open-source Python libraries. To use our STAC API from R, see [Reading data from the STAC API with R](https://planetarycomputer.microsoft.com/docs/quickstarts/reading-stac-r/).

To get started you'll need the [pystac-client](https://github.com/stac-utils/pystac-client) library installed. You can install it via pip:

```
> python -m pip install pystac-client
> python -m pip install planetary_computer
> python -m pip install rioxarray
```

To access the data, we'll create a `pystac_client.Client`. We'll explain the `modifier` part later on, but it's what lets us download the data assets Azure Blob Storage.

In [None]:
! python -m pip install pystac-client
! python -m pip install planetary_computer
! python -m pip install rioxarray

In [None]:
import pystac_client
import planetary_computer

catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
    modifier=planetary_computer.sign_inplace,
)

### Searching

We can use the STAC API to search for assets meeting some criteria. This might include the date and time the asset covers, is spatial extent, or any other property captured in the STAC item's metadata.

In this example we'll search for imagery from [Landsat Collection 2 Level-2](https://planetarycomputer.microsoft.com/dataset/landsat-c2-l2) area around Muret (31) in June of 2021.

bbox = Ox (31)
    

In [None]:
time_range = "2024-06-01/2024-06-30"
# bbox = [-122.2751, 47.5469, -121.9613, 47.7458]
bbox = [1.283083,43.439552,1.302631,43.451268]
# bbox = Ox (31)

search = catalog.search(collections=["landsat-c2-l2"], bbox=bbox, datetime=time_range)
# items = search.get_all_items()
items = search.item_collection()
len(items)

In that example our spatial query used a bounding box with a `bbox`. Alternatively, you can pass a GeoJSON object as `intersects`

```python
area_of_interest = {
    "type": "Polygon",
    "coordinates": [
        [
            [-122.2751, 47.5469],
            [-121.9613, 47.9613],
            [-121.9613, 47.9613],
            [-122.2751, 47.9613],
            [-122.2751, 47.5469],
        ]
    ],
}

time_range = "2020-12-01/2020-12-31"

search = catalog.search(
    collections=["landsat-c2-l2"], intersects=area_of_interest, datetime=time_range
)
```

`items` is a [`pystac.ItemCollection`](https://pystac.readthedocs.io/en/stable/api/item_collection.html#pystac-item-collection). We can see that 4 items matched our search criteria.

In [None]:
len(items)

Each [`pystac.Item`](https://pystac.readthedocs.io/en/stable/api/pystac.html#pystac.Item) in this `ItemCollection` includes all the metadata for that scene. [STAC Items](https://github.com/radiantearth/stac-spec/blob/master/item-spec/item-spec.md) are GeoJSON features, and so can be loaded by libraries like [geopandas](http://geopandas.readthedocs.io/).

In [None]:
import geopandas

df = geopandas.GeoDataFrame.from_features(items.to_dict(), crs="epsg:4326")
df

Some collections implement the `eo` extension, which we can use to sort the items by cloudiness. We'll grab an item with low cloudiness:

In [None]:
selected_item = min(items, key=lambda item: item.properties["eo:cloud_cover"])
print(selected_item)

Each STAC item has one or more [Assets](https://github.com/radiantearth/stac-spec/blob/master/item-spec/item-spec.md#asset-object), which include links to the actual files.

In [None]:
import rich.table

table = rich.table.Table("Asset Key", "Description")
for asset_key, asset in selected_item.assets.items():
    table.add_row(asset_key, asset.title)

table

Here, we'll inspect the `rendered_preview` asset.

In [None]:
selected_item.assets["rendered_preview"].to_dict()

In [None]:
from IPython.display import Image

Image(url=selected_item.assets["rendered_preview"].href, width=500)

That `rendered_preview` asset is generated dynamically from the raw data using the Planetary Computer's [data API](http://planetarycomputer.microsoft.com/api/data/v1/). We can access the raw data, stored as Cloud Optimzied GeoTIFFs in Azure Blob Storage, using one of the other assets.

The actual data assets are in *private* [Azure Blob Storage containers](https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction#containers). If forget to pass `modifier=planetary_computer.sign_inplace` or manually sign the item, then you'll get a 404 when trying to access the asset.

That's why we included the `modifier=planetary_computer.sign_inplace` when we created the `pystac_client.Client` earlier. With that, the results returned by pystac-client are automatically signed, so that a token granting access to the file is included in the URL.

Je veux sauvegarder cette image sur s3  
**Exporter des données vers MinIO**  
 * voir Onyxia/Mon Compte/Connexion au stockage 

In [None]:
# les variables d'environnement sont dans utils.py
import utils

In [None]:
# paramétrage de s3fs
import s3fs
fs = s3fs.S3FileSystem(
    client_kwargs={'endpoint_url': 'https://'+'minio.lab.sspcloud.fr'},
    key = os.environ["AWS_ACCESS_KEY_ID"], 
    secret = os.environ["AWS_SECRET_ACCESS_KEY"], 
    token = os.environ["AWS_SESSION_TOKEN"])

In [None]:
# Soluce ChatGPT
import requests
from io import BytesIO

# Configuration AWS (optionnel si tes credentials sont déjà configurés) => voir onyxia/mes fichiers
BUCKET_NAME = "bballet/test_sat"
S3_FILENAME = "image_20250409.jpg"  # Nom sous lequel l'image sera stockée

# URL de l'image
image_url = selected_item.assets["rendered_preview"].href

# Télécharger l'image depuis l'URL
response = requests.get(image_url)
if response.status_code == 200:
    image_data = BytesIO(response.content)  # Convertir en format binaire

    # Initialiser S3FS => déjà fait plus haut 
    # fs = s3fs.S3FileSystem()

    # Sauvegarder l'image dans S3
    with fs.open(f"{BUCKET_NAME}/{S3_FILENAME}", "wb") as f:
        f.write(image_data.getbuffer())

    print(f"Image uploadée avec succès sur S3: s3://{BUCKET_NAME}/{S3_FILENAME}")
else:
    print("Échec du téléchargement de l'image")


In [None]:
selected_item.assets["blue"].href[:250]

 Everything after the `?` in that URL is a [SAS token](https://docs.microsoft.com/en-us/azure/storage/common/storage-sas-overview) grants access to the data. See https://planetarycomputer.microsoft.com/docs/concepts/sas/ for more on using tokens to access data.

In [None]:
import requests

requests.head(selected_item.assets["blue"].href).status_code

The `200` status code indicates that we were able to successfully access the data using the "signed" URL with the SAS token included.

We can load up that single COG using libraries like [rioxarray](https://corteva.github.io/rioxarray/html/rioxarray.html) or [rasterio](https://rasterio.readthedocs.io/en/latest/)

In [None]:
# import xarray as xr
import rioxarray

ds = rioxarray.open_rasterio(
    selected_item.assets["blue"].href, overview_level=4
).squeeze()
img = ds.plot(cmap="Blues", add_colorbar=False)
img.axes.set_axis_off();

If you wish to work with multiple STAC items as a datacube, you can use libraries like [stackstac](https://stackstac.readthedocs.io/) or [odc-stac](https://odc-stac.readthedocs.io/en/latest/index.html).

In [None]:
! python -m pip install stackstac

In [None]:
# cellule de config globale
execution_autorisee = False

In [None]:
# ChatGPT 1. Vérifier le CRS de chaque asset avant le stacking
if not execution_autorisee:
   raise RuntimeError("⛔ Cette cellule est désactivée. Active 'execution_autorisee' pour l’exécuter.")

for i, item in enumerate(items):
    print(f"Item {i}: {item.id}")
    for asset_key, asset in item.assets.items():
        print(f"  - Asset: {asset_key}, CRS: {asset.extra_fields.get('proj:epsg', 'Non défini')}")

print("Cellule autorisée !")

In [None]:
# C_GPT 2. Ajouter un CRS par défaut si manquant
for item in items:
    for asset in item.assets.values():
        if "proj:epsg" not in asset.extra_fields:
            asset.extra_fields["proj:epsg"] = 4326  # Assigne EPSG:4326 si manquant


In [None]:
import stackstac

ds = stackstac.stack(items)
ds

### Searching on additional properties

Previously, we searched for items by space and time. Because the Planetary Computer's STAC API supports the [query](https://github.com/radiantearth/stac-api-spec/blob/master/fragments/query/README.md) parameter, you can search on additional properties on the STAC item.

For example, collections like `sentinel-2-l2a` and `landsat-c2-l2` both implement the [`eo` STAC extension](https://github.com/stac-extensions/eo) and include an `eo:cloud_cover` property. Use `query={"eo:cloud_cover": {"lt": 20}}` to return only items that are less than 20% cloudy.

In [None]:
time_range = "2024-06-01/2024-06-30"
# bbox = [-122.2751, 47.5469, -121.9613, 47.7458]
bbox = [1.283083,43.439552,1.302631,43.451268]
# bbox = Ox (31)

search = catalog.search(
    collections=["sentinel-2-l2a"],
    bbox=bbox,
    datetime=time_range,
    query={"eo:cloud_cover": {"lt": 20}},
)
items = search.item_collection()

In [None]:
selected_item = min(items, key=lambda item: item.properties["eo:cloud_cover"])
print(selected_item)

In [None]:
print(selected_item.assets["visual"].href)

In [None]:
import requests
from IPython.display import Image, display

url = selected_item.assets["visual"].href
response = requests.get(url)


if response.status_code == 200:
    with open("temp_image.jpg", "wb") as f:
        f.write(response.content)
    display(Image("temp_image.jpg", width=500))
else:
    print("Image non accessible :", response.status_code)

# Sauvegarder l'image dans S3
if response.status_code == 200:
    with fs.open(f"{BUCKET_NAME}/{S3_FILENAME}", "wb") as f:
        f.write(image_data.getbuffer())

    print(f"Image uploadée avec succès sur S3: s3://{BUCKET_NAME}/{S3_FILENAME}")
else:
    print("Échec du téléchargement de l'image")


Problème confirmé :
IPython.display.Image(url=...) ne peut pas afficher directement une image au format .tif, surtout quand c’est un GeoTIFF riche en métadonnées spatiales.

In [None]:
import requests
import rasterio
import matplotlib.pyplot as plt
from rasterio.plot import show

# Téléchargement de l'image GeoTIFF
url = selected_item.assets["visual"].href
response = requests.get(url)

with open("image.tif", "wb") as f:
    f.write(response.content)

# Lecture avec rasterio
with rasterio.open("image.tif") as src:
    img = src.read([1, 2, 3])  # Lire les 3 bandes RGB (ordre B04, B03, B02 dans Sentinel-2)

# Affichage avec matplotlib
plt.figure(figsize=(10, 10))
show(img, transform=src.transform)
plt.axis('off')
plt.title("Image Sentinel-2 RGB")
plt.show()


Other common uses of the `query` parameter is to filter a collection down to items of a specific type, For example, the [GOES-CMI](https://planetarycomputer.microsoft.com/dataset/goes-cmi) collection includes images from various when the satellite is in various modes, which produces images of either the Full Disk of the earth, the continental United States, or a mesoscale. You can use `goes:image-type` to filter down to just the ones you want.

In [None]:
search = catalog.search(
    collections=["goes-cmi"],
    bbox=[-67.2729, 25.6000, -61.7999, 27.5423],
    datetime=["2018-09-11T13:00:00Z", "2018-09-11T15:40:00Z"],
    query={"goes:image-type": {"eq": "MESOSCALE"}},
)

### Analyzing STAC Metadata

STAC items are proper GeoJSON Features, and so can be treated as a kind of data on their own.

In [None]:
! python -m pip install contextily

In [None]:
import contextily
import geopandas

search = catalog.search(
    collections=["sentinel-2-l2a"],
    bbox=[-124.2751, 45.5469, -110.9613, 47.7458],
    datetime="2020-12-26/2020-12-31",
)
items = search.item_collection()

df = geopandas.GeoDataFrame.from_features(items.to_dict(), crs="epsg:4326")

ax = df[["geometry", "datetime", "s2:mgrs_tile", "eo:cloud_cover"]].plot(
    facecolor="none", figsize=(12, 6)
)
contextily.add_basemap(
    ax, crs=df.crs.to_string(), source=contextily.providers.Esri.NatGeoWorldMap
);

In [None]:
df[["geometry", "datetime", "s2:mgrs_tile", "eo:cloud_cover"]].head()

In [None]:
df.sort_values("eo:cloud_cover")[["datetime", "s2:mgrs_tile", "eo:cloud_cover"]]

Or we can plot cloudiness of a region over time.

In [None]:
import pandas as pd

search = catalog.search(
    collections=["sentinel-2-l2a"],
    bbox=[-124.2751, 45.5469, -123.9613, 45.7458],
    datetime="2020-01-01/2020-12-31",
)
items = search.item_collection()
df = geopandas.GeoDataFrame.from_features(items.to_dict())
df["datetime"] = pd.to_datetime(df["datetime"])

ts = df.set_index("datetime").sort_index()["eo:cloud_cover"].rolling(7).mean()
ts.plot(title="eo:cloud-cover (7-scene rolling average)");

### Working with STAC Catalogs and Collections

Our `catalog` is a [STAC Catalog](https://github.com/radiantearth/stac-spec/blob/master/catalog-spec/catalog-spec.md) that we can crawl or search. The Catalog contains [STAC Collections](https://github.com/radiantearth/stac-spec/blob/master/collection-spec/collection-spec.md) for each dataset we have indexed (which is not the yet the entirety of data hosted by the Planetary Computer).

Collections have information about the [STAC Items](https://github.com/radiantearth/stac-spec/blob/master/item-spec/item-spec.md) they contain. For instance, here we look at the [Bands](https://github.com/stac-extensions/eo#band-object) available for [Landsat 8 Collection 2 Level 2](https://planetarycomputer.microsoft.com/dataset/landsat-c2-l2) data:

In [None]:
import pandas as pd

landsat = catalog.get_collection("landsat-c2-l2")

pd.DataFrame(landsat.summaries.get_list("eo:bands"))

We can see what [Assets](https://github.com/radiantearth/stac-spec/blob/master/item-spec/item-spec.md#asset-object) are available on our item with:

In [None]:
pd.DataFrame.from_dict(landsat.extra_fields["item_assets"], orient="index")[
    ["title", "description", "gsd"]
]

Some collections, like [Daymet](https://planetarycomputer.microsoft.com/dataset/daymet-daily-na) include collection-level assets. You can use the `.assets` property to access those assets.

In [None]:
collection = catalog.get_collection("daymet-daily-na")
print(collection)

Just like assets on items, these assets include links to data in Azure Blob Storage.

In [None]:
asset = collection.assets["zarr-abfs"]
print(asset)

In [None]:
import xarray as xr

ds = xr.open_zarr(
    asset.href,
    **asset.extra_fields["xarray:open_kwargs"],
    storage_options=asset.extra_fields["xarray:storage_options"],
)
ds

### Manually signing assets

Earlier on, when we created our `pystac_client.Client`, we specified `modifier=planetary_computer.sign_inplace`. That `modifier` will automatically "sign" the STAC metadata, so that the assets can be accessed.

Alternatively, you can manually sign the items.

In [None]:
import pystac

item = pystac.read_file(selected_item.get_self_href())
signed_item = planetary_computer.sign(item)  # these assets can be accessed
requests.head(signed_item.assets["blue"].href).status_code

Internally, that `planetary_computer.sign` method is making a request to the Planetary Computer's [SAS API](http://planetarycomputer.microsoft.com/api/sas/v1/docs) to get a signed HREF for each asset. You could do that manually yourself.

In [None]:
collection = item.get_collection()
storage_account = collection.extra_fields["msft:storage_account"]
container = collection.extra_fields["msft:container"]

response = requests.get(
    f"https://planetarycomputer.microsoft.com/api/sas/v1/token/{collection.id}"
)

signed_url = item.assets["blue"].href + "?" + response.json()["token"]

requests.head(signed_url).status_code

See https://planetarycomputer.microsoft.com/docs/concepts/sas/ for more on how to manually sign assets.