The [Zarr project](https://zarr.dev) produces specifications for storing large multi-dimensional arrays. It is particularly useful for cloud storage. [Zarr-Python](https://zarr.readthedocs.io/en/stable/) facilitates working with Zarr arrays from a variety of sources (in-memory, local, cloud). This notebook provides a simple example of using Zarr-Python to access NASA POWER data stored on AWS. We use the experimental caching feature to improve performance (in this case by ~10x).

1. Load dependencies

In [None]:
import zarr
from zarr.storage import MemoryStore, FsspecStore
import numpy as np
import pandas as pd
from zarr.experimental.cache_store import CacheStore

 2. Define a remote store and an in-memory store for caching

In [None]:
source_store = FsspecStore.from_url(
    's3://nasa-power/merra2/spatial/power_merra2_daily_spatial_utc.zarr',
    read_only=True,
    storage_options={'anon': True}

)
cache_store = MemoryStore()
cached_store = CacheStore(
    store=source_store,
    cache_store=cache_store,
    max_size=256 * 1024 * 1024  # 256 MB cache
)
group = zarr.open_group(store=cached_store, mode='r')

Zarr uses the [`fsspec` package](https://filesystem-spec.readthedocs.io/en/latest/index.html) to interact with a variety of filesystems, including an AWS S3 bucket in this example.

We create a store based on a Filesystem Spec pointing to the [NASA POWER S3 bucket on AWS](https://power.larc.nasa.gov/docs/services/aws/), specifying read-only anonymous access as credentials are not required to access this dataset.

Next, we create an empty in-memory store, then create a `CacheStore` that will use our in-memory store to cache data from the NASA POWER AWS S3 store as it is accessed. This is useful when repeatedly accessing array chuncks, as in this example.

Finally, we open the group of arrays in the NASA POWER collection via our `CacheStore` instance. For details on working with arrays, groups, and stores see the [Zarr User Guide](https://zarr.readthedocs.io/en/stable/user-guide/).


3. Examine the group

In [None]:
# list arrays in the group
print("Arrays in the group:")
for name, array in group.arrays():
    print(f"- {name}: {array.metadata.attributes['long_name']}")

# Get detailed info about a specific array
temperature_array = group['T2M']
print(f"T2M definition: {temperature_array.metadata.attributes['definition']}")

4. Read locations from file

In [None]:
import csv

locations = list(csv.DictReader(open('assets/locations.csv')))
print(f"Loaded {len(locations)} locations.")
for loc in locations[:3]:
    print(loc)

5. Create a function to extract property values for a particular location and date range

In [None]:
def extract_property_values(array_name, lat, lon, start_date, end_date):
    array = group[array_name]
    latitudes = group['lat'][:]
    longitudes = group['lon'][:]
    times = pd.to_datetime(group['time'][:], unit='D')

    lat_idx = (np.abs(latitudes - lat)).argmin()
    lon_idx = (np.abs(longitudes - lon)).argmin()

    time_mask = (times >= pd.to_datetime(start_date)) & (times <= pd.to_datetime(end_date))
    time_indices = np.where(time_mask)[0]
    selected_times = times[time_mask]

    values = array[time_indices, lat_idx, lon_idx]

    units = array.attrs.get('units', 'unknown')

    return pd.DataFrame({
        'date': selected_times,
        array_name: values,
        'units': units
    })

Zarr array data is accessed with NumPy-like syntax. Accessing slices of array data for different locations, as done in this function, would likely result in repeatedly reading data from the same array chunks. By setting up a cache store (see above), we can reduce the number of times data need to be transferred from the remote store.

6. Extract average temperatures for each location for one year

In [None]:
for location in locations:
    lat = float(location['lat'])
    lon = float(location['lon'])
    df = extract_property_values('T2M', lat, lon, '2009-01-01', '2009-12-31')
    avg_temp = df['T2M'].mean()
    print(f"Average temperature in 2009 for (lat: {lat}, lon: {lon}): {avg_temp:.2f} {df['units'].iloc[0]}")