## TEMPO acquisition: New York City region, November 7–10, 2024

This notebook focuses **only on data acquisition** for TEMPO (Tropospheric Emissions: Monitoring of Pollution).

### What is TEMPO?

TEMPO is a **geostationary** air-quality mission that measures atmospheric trace gases over North America during daylight hours at high temporal frequency. TEMPO products include nitrogen dioxide (NO₂) and other constituents, provided as Level-2 (swath) and Level-3 (gridded) data products.

TEMPO data are especially useful for:

- Studying **hourly daytime variability** in air pollution,
- Adding a **satellite remote-sensing layer** to complement ground sensors,
- Supporting event-window analysis across a metro region.

As with any remote-sensing product, TEMPO data require careful QA/QC and interpretation. Those steps are handled later in the lesson’s `m201-air-quality-measures-integrated` notebook.

### What this notebook does

- Authenticates to NASA Earthdata using `earthaccess`,
- Searches for TEMPO granules that intersect a New York City bounding box for the event window (Nov 7–10, 2024),
- Downloads the matching files to `data/raw/tempo/`,
- Performs a minimal “sanity check” by opening one file and listing variables (no plotting or analysis).


## Earthdata access

TEMPO data are accessed via NASA Earthdata.

- If you have not used Earthdata before, create a free Earthdata Login account.
- This notebook uses the `earthaccess` library, which will prompt you to authenticate.
- Download volume depends on the number of granules returned for the event window.


In [1]:
# Install required packages (run once per environment)
# Note: earthaccess handles authentication/search/download for NASA Earthdata.
!pip -q install earthaccess xarray rioxarray netcdf4 h5netcdf pandas


You should consider upgrading via the 'C:\Program Files\Python310\python.exe -m pip install --upgrade pip' command.


In [2]:
# Imports
from pathlib import Path
import pandas as pd
import xarray as xr

import earthaccess

# Paths (match the pattern used in the PurpleAir/QuantAQ acquisition notebooks)
BASE_DIR = Path(".")
DATA_DIR = BASE_DIR / "data"
RAW_DIR = DATA_DIR / "raw" / "tempo"
RAW_DIR.mkdir(parents=True, exist_ok=True)

# Event window (UTC dates; Earthdata temporal filters are date strings)
EVENT_START = "2024-11-07"
EVENT_END   = "2024-11-10"

# NYC bounding box (min_lon, min_lat, max_lon, max_lat)
# (This is used as a search filter; it does NOT guarantee the files are spatially subset.)
BBOX_NYC = (-74.2591, 40.4774, -73.7004, 40.9176)

print("RAW_DIR:", RAW_DIR.resolve())
print("Event window:", EVENT_START, "to", EVENT_END)
print("NYC bbox:", BBOX_NYC)


RAW_DIR: C:\git\TOPSTSCHOOL-air-quality\data\raw\tempo
Event window: 2024-11-07 to 2024-11-10
NYC bbox: (-74.2591, 40.4774, -73.7004, 40.9176)


In [14]:
# Authenticate (you may be prompted for Earthdata Login)
Auth = earthaccess.login()

# Search granules
# Common TEMPO collection names include TEMPO_NO2_L3 (gridded) and TEMPO_NO2_L2 (swath),
# with versions such as V03.
from datetime import datetime, timedelta, timezone

SHORT_NAME = "TEMPO_NO2_L3"
VERSION = "V03"

# Pick ONE illustrative snapshot within the event window.
# (Choose a daytime UTC that maps to late morning / afternoon in NYC.)
TARGET_UTC = datetime(2024, 11, 9, 20, 0, 0, tzinfo=timezone.utc)  # ~6am NYC (EST)

# Search ±30 minutes around the target
t0 = (TARGET_UTC - timedelta(minutes=30)).strftime("%Y-%m-%dT%H:%M:%SZ")
t1 = (TARGET_UTC + timedelta(minutes=30)).strftime("%Y-%m-%dT%H:%M:%SZ")

print("Searching TEMPO window:", t0, "to", t1)

results = earthaccess.search_data(
    short_name=SHORT_NAME,
    version=VERSION,
    temporal=(t0, t1),
    bounding_box=BBOX_NYC,
    cloud_hosted=True,
    count=200,
)

print("Granules found:", results)
if len(results) == 0:
    raise RuntimeError("No TEMPO granules found for the chosen snapshot window. Try a different TARGET_UTC.")

# Download ONLY the first match (or choose by title/date if you prefer)
one = [results[0]]

paths = earthaccess.download(one, RAW_DIR)
print("Downloaded:", paths[0])


Searching TEMPO window: 2024-11-09T19:30:00Z to 2024-11-09T20:30:00Z
Granules found: [Collection: {'ShortName': 'TEMPO_NO2_L3', 'Version': 'V03'}
Spatial coverage: {'HorizontalSpatialDomain': {'Geometry': {'GPolygons': [{'Boundary': {'Points': [{'Latitude': 57.66, 'Longitude': -111.48}, {'Latitude': 54.36, 'Longitude': -109.54}, {'Latitude': 49.28, 'Longitude': -107.26}, {'Latitude': 44.6, 'Longitude': -105.68}, {'Latitude': 39.52, 'Longitude': -104.36}, {'Latitude': 34.16, 'Longitude': -103.3}, {'Latitude': 28.68, 'Longitude': -102.48}, {'Latitude': 23.08, 'Longitude': -101.86}, {'Latitude': 17.24, 'Longitude': -101.4}, {'Latitude': 17.18, 'Longitude': -86.16}, {'Latitude': 17.28, 'Longitude': -74.36}, {'Latitude': 17.44, 'Longitude': -65.24001}, {'Latitude': 21.28, 'Longitude': -64.4}, {'Latitude': 25.64, 'Longitude': -63.2}, {'Latitude': 29.6, 'Longitude': -61.84}, {'Latitude': 33.18, 'Longitude': -60.34}, {'Latitude': 36.44, 'Longitude': -58.7}, {'Latitude': 39.48, 'Longitude': -56

QUEUEING TASKS | :   0%|          | 0/1 [00:00<?, ?it/s]

PROCESSING TASKS | :   0%|          | 0/1 [00:00<?, ?it/s]

COLLECTING RESULTS | :   0%|          | 0/1 [00:00<?, ?it/s]

Downloaded: data\raw\tempo\TEMPO_NO2_L3_V03_20241109T192221Z_S010.nc


In [8]:
# Quality Assurance: open one file and list variables

from pathlib import Path
import xarray as xr

if len(paths) == 0:
    raise RuntimeError("No TEMPO files were downloaded. Check authentication, short_name/version, and search filters.")

sample_path = Path(paths[0])
print("Opening sample file:", sample_path)

def try_open_dataset(p):
    """Try multiple engines; return (ds, engine_used) or (None, None)."""
    for engine in [None, "netcdf4", "h5netcdf"]:
        try:
            if engine is None:
                ds = xr.open_dataset(p)
                return ds, "default"
            else:
                ds = xr.open_dataset(p, engine=engine)
                return ds, engine
        except Exception as e:
            print(f"Failed open_dataset with engine={engine}: {type(e).__name__}: {e}")
    return None, None

ds, engine_used = try_open_dataset(sample_path)

if ds is not None:
    try:
        print("\nOpened with engine:", engine_used)
        print("\nDataset summary:")
        print(ds)

        print("\nDims:")
        print(dict(ds.dims))

        print("\nCoords:")
        print(list(ds.coords))

        print("\nData variables (name -> dims, dtype):")
        for v in ds.data_vars:
            da = ds[v]
            print(f" - {v}: dims={da.dims}, dtype={da.dtype}")

    finally:
        ds.close()

else:
    # Fallback: if the file is organized into groups, DataTree can expose them.
    # (Only available in newer xarray; if missing, this block will tell you.)
    try:
        from xarray import open_datatree
        dt = open_datatree(sample_path, engine="h5netcdf")
        print("\nOpened as a DataTree (file contains groups).")
        print("\nGroups:")
        for k in dt.groups:
            print(" -", k)
        # Show variables in root group, if any
        root_ds = dt.ds
        print("\nRoot dataset variables:")
        print(list(root_ds.data_vars))
    except Exception as e:
        raise RuntimeError(
            "Could not open the downloaded file with xarray as a Dataset or DataTree.\n"
            "Try installing/upgrading: netcdf4, h5netcdf, h5py, xarray.\n"
            f"Last error: {type(e).__name__}: {e}"
        )


Opening sample file: data\raw\tempo\TEMPO_NO2_L3_V03_20241107T122155Z_S002.nc

Opened with engine: default

Dataset summary:
<xarray.Dataset> Size: 91MB
Dimensions:    (longitude: 7750, latitude: 2950, time: 1)
Coordinates:
  * longitude  (longitude) float32 31kB -168.0 -168.0 -167.9 ... -13.03 -13.01
  * latitude   (latitude) float32 12kB 14.01 14.03 14.05 ... 72.95 72.97 72.99
  * time       (time) datetime64[ns] 8B 2024-11-07T12:22:13.024619520
Data variables:
    weight     (latitude, longitude) float32 91MB ...
Attributes: (12/40)
    history:                          2024-11-07T18:30:53Z: L2_regrid -v /tem...
    scan_num:                         2
    time_coverage_start:              2024-11-07T12:21:55Z
    time_coverage_end:                2024-11-07T13:01:41Z
    time_coverage_start_since_epoch:  1415017333.0246196
    time_coverage_end_since_epoch:    1415019719.6191332
    ...                               ...
    title:                            TEMPO Level 3 nitrogen di

  print(dict(ds.dims))


In [9]:
import h5py

p = str(sample_path)
print("Inspecting HDF5 groups/variables:", p)

with h5py.File(p, "r") as f:
    def walk(name, obj):
        if isinstance(obj, h5py.Dataset):
            print(f"DATASET: {name}  shape={obj.shape} dtype={obj.dtype}")
        elif isinstance(obj, h5py.Group):
            # print group names at a shallow level
            pass

    f.visititems(walk)


Inspecting HDF5 groups/variables: data\raw\tempo\TEMPO_NO2_L3_V03_20241107T122155Z_S002.nc
DATASET: geolocation/relative_azimuth_angle  shape=(1, 2950, 7750) dtype=float32
DATASET: geolocation/solar_zenith_angle  shape=(1, 2950, 7750) dtype=float32
DATASET: geolocation/viewing_zenith_angle  shape=(1, 2950, 7750) dtype=float32
DATASET: latitude  shape=(2950,) dtype=float32
DATASET: longitude  shape=(7750,) dtype=float32
DATASET: product/main_data_quality_flag  shape=(1, 2950, 7750) dtype=int16
DATASET: product/vertical_column_stratosphere  shape=(1, 2950, 7750) dtype=float64
DATASET: product/vertical_column_troposphere  shape=(1, 2950, 7750) dtype=float64
DATASET: product/vertical_column_troposphere_uncertainty  shape=(1, 2950, 7750) dtype=float64
DATASET: qa_statistics/max_vertical_column_stratosphere_sample  shape=(1, 2950, 7750) dtype=float64
DATASET: qa_statistics/max_vertical_column_total_sample  shape=(1, 2950, 7750) dtype=float64
DATASET: qa_statistics/max_vertical_column_troposp