This notebook uses a Zarr cache store to improve performance when repeatedly accessing remote data.

1. Load dependencies

In [34]:
import zarr
from zarr.storage import MemoryStore, FsspecStore
import numpy as np
import pandas as pd
from zarr.experimental.cache_store import CacheStore

2. Define remote and cached stores

In [35]:
source_store = FsspecStore.from_url(
    's3://nasa-power/merra2/spatial/power_merra2_daily_spatial_utc.zarr',
    read_only=True,
    storage_options={'anon': True}

)
cache_store = MemoryStore()
cached_store = CacheStore(
    store=source_store,
    cache_store=cache_store,
    max_size=256 * 1024 * 1024  # 256 MB cache
)
group = zarr.open_group(store=cached_store, mode='r')

3. Examine the group

In [36]:
# list arrays in the group
print("Arrays in the group:")
for name, array in group.arrays():
    print(f"- {name}: {array.metadata.attributes['long_name']}")

# Get detailed info about a specific array
temperature_array = group['T2M']
print(f"T2M definition: {temperature_array.metadata.attributes['definition']}")

Arrays in the group:
- CDD0: Cooling Degree Days Above 0 C
- CDD10: Cooling Degree Days Above 10 C
- CDD18_3: Cooling Degree Days Above 18.3 C
- DISPH: Zero Plane Displacement Height
- EVLAND: Evaporation Land
- EVPTRNS: Evapotranspiration Energy Flux
- FROST_DAYS: Frost Days
- FRSEAICE: Ice Covered Fraction
- FRSNO: Land Snowcover Fraction
- GWETPROF: Profile Soil Moisture
- GWETROOT: Root Zone Soil Wetness
- GWETTOP: Surface Soil Wetness
- HDD0: Heating Degree Days Below 0 C
- HDD10: Heating Degree Days Below 10 C
- HDD18_3: Heating Degree Days Below 18.3 C
- lat: Latitude
- lon: Longitude
- PBLTOP: Planetary Boundary Layer Top Pressure
- PRECSNO: Snow Precipitation
- PRECSNOLAND: Snow Precipitation Land
- PRECTOTCORR: Precipitation Corrected
- PS: Surface Pressure
- QV10M: Specific Humidity at 10 Meters
- QV2M: Specific Humidity at 2 Meters
- RH2M: Relative Humidity at 2 Meters
- RHOA: Surface Air Density
- SLP: Sea Level Pressure
- SNODP: Snow Depth
- T10M: Temperature at 10 Meters

4. Read locations from file

In [37]:
import csv

locations = list(csv.DictReader(open('assets/locations.csv')))
print(f"Loaded {len(locations)} locations.")
for loc in locations[:3]:
    print(loc)

Loaded 19 locations.
{'lat': '16.51', 'lon': '96.11'}
{'lat': '13.7572', 'lon': '100.4849'}
{'lat': '32.243254', 'lon': '-110.946221'}


5. Create a function to extract property values for a particular location and date range

In [39]:
def extract_property_values(array_name, lat, lon, start_date, end_date):
    array = group[array_name]
    latitudes = group['lat'][:]
    longitudes = group['lon'][:]
    times = pd.to_datetime(group['time'][:], unit='D')

    lat_idx = (np.abs(latitudes - lat)).argmin()
    lon_idx = (np.abs(longitudes - lon)).argmin()

    time_mask = (times >= pd.to_datetime(start_date)) & (times <= pd.to_datetime(end_date))
    time_indices = np.where(time_mask)[0]
    selected_times = times[time_mask]

    values = array[time_indices, lat_idx, lon_idx]

    units = array.attrs.get('units', 'unknown')

    return pd.DataFrame({
        'date': selected_times,
        array_name: values,
        'units': units
    })

6. Extract average temperatures for each location for one year

In [40]:
for location in locations:
    lat = float(location['lat'])
    lon = float(location['lon'])
    df = extract_property_values('T2M', lat, lon, '2009-01-01', '2009-12-31')
    avg_temp = df['T2M'].mean()
    print(f"Average temperature in 2009 for (lat: {lat}, lon: {lon}): {avg_temp:.2f} {df['units'].iloc[0]}")

Average temperature in 2009 for (lat: 16.51, lon: 96.11): 27.85 C
Average temperature in 2009 for (lat: 13.7572, lon: 100.4849): 28.39 C
Average temperature in 2009 for (lat: 32.243254, lon: -110.946221): 20.90 C
Average temperature in 2009 for (lat: 30.3, lon: 120.2): 16.95 C
Average temperature in 2009 for (lat: -34.86111866, lon: 179.0576651): 17.49 C
Average temperature in 2009 for (lat: 12.9165, lon: 79.1325): 26.92 C
Average temperature in 2009 for (lat: 48.21, lon: 16.37): 10.18 C
Average temperature in 2009 for (lat: -23.55, lon: -46.633): 19.73 C
Average temperature in 2009 for (lat: 41.657, lon: -91.549): 10.30 C
Average temperature in 2009 for (lat: 22.38, lon: 114.2): 23.76 C
Average temperature in 2009 for (lat: 10.76993, lon: -73.00479): 25.44 C
Average temperature in 2009 for (lat: 35.84, lon: 50.9391): 6.73 C
Average temperature in 2009 for (lat: 11.325, lon: 142.189): 28.30 C
Average temperature in 2009 for (lat: 55.329053, lon: 8.74336): 9.88 C
Average temperature in 