# Enhanced GeoZarr for EOPF: Full Multiscale Zarr V3 Store with RGB Visualization

This enhanced notebook demonstrates how to transform EOPF (Earth Observation Processing Framework) Zarr stores into complete GeoZarr V3 compliant datasets with:

- **Full EOPF structure preservation**: All resolution groups and variables
- **Complete multiscale support**: COG-style overviews for all bands
- **Enhanced visualization**: RGB composite plots using overview levels
- **Modular code organization**: Most functionality moved to helper module

Following COG (Cloud Optimized GeoTIFF) conventions, the overviews maintain the native projection and use /2 downsampling logic, addressing the gaps found in EOPF Zarr format to achieve full GeoZarr compliance.

## Setup and Data Loading

In [None]:
import os
os.environ["ZARR_V3_EXPERIMENTAL_API"] = "1"

import json
import cf_xarray  # noqa
import dask.array as da
import matplotlib.pyplot as plt
import morecantile
import numpy as np
import panel
import rasterio
import numcodecs
import rioxarray  # noqa
import xarray as xr
import zarr
import dask
from rio_tiler.io.xarray import XarrayReader

# Import our enhanced COG-style multiscale utilities
from geozarr_examples.cog_multiscales import (
    setup_eopf_metadata,
    create_full_eopf_zarr_store,
    plot_rgb_overview,
    get_sentinel2_rgb_bands,
    verify_overview_coordinates,
    plot_overview_levels
)

In [None]:
# Set up paths and parameters
fp_base = "S2B_MSIL1C_20250113T103309_N0511_R108_T32TLQ_20250113T122458"
input_url = f"https://objectstore.eodc.eu:2222/e05ab01a9d56408d82ac32d69a5aae2a:sample-data/tutorial_data/cpm_v253/{fp_base}.zarr"
v3_output = f"../output/v3/{fp_base}_full_multiscales.zarr"

# Processing parameters
spatial_chunk = 4096
min_dimension = 256
tileWidth = 256

In [None]:
from xarray.namedarray.parallelcompat import list_chunkmanagers
list_chunkmanagers()

In [None]:
from dask.distributed import Client
client = Client()  # set up local cluster on your laptop
client

In [None]:
# Load the EOPF DataTree
dt = xr.open_datatree(input_url, engine="zarr", chunks={})
print("EOPF DataTree structure:")
print(dt)

## Explore the Original EOPF Structure

In [None]:
# Examine the reflectance measurements structure
reflectance_ds = dt["measurements/reflectance"]
print("Reflectance groups:")
for group in reflectance_ds.groups:
    if reflectance_ds[group].data_vars:
        print(f"  {group}: {list(reflectance_ds[group].data_vars)}")
        # Show dimensions for first variable
        first_var = list(reflectance_ds[group].data_vars)[0]
        dims = reflectance_ds[group][first_var].dims
        shape = reflectance_ds[group][first_var].shape
        print(f"    Dimensions: {dims} = {shape}")

In [None]:
# Quick visualization of original data
dt["measurements/reflectance/r60m"]["b01"].plot(figsize=(10, 6))
plt.title("Original EOPF Data - Band B01 at 60m resolution")
plt.show()

## Create Full EOPF Zarr Store with Multiscales

This section uses our enhanced helper functions to:
1. Process all resolution groups (r10m, r20m, r60m)
2. Set up proper CF metadata and CRS information
3. Create COG-style overview levels for all bands
4. Maintain the original EOPF structure while adding GeoZarr compliance

In [None]:
# Create the full EOPF Zarr store with multiscales
print("Creating full EOPF Zarr store with multiscales...")
print("This will process all resolution groups and create overview levels for all bands.")
print("This may take several minutes depending on data size and number of bands.")
print("\n‚ö†Ô∏è  Note: If you encounter timeout errors, try setting load_data=False for lazy loading.\n")

# Option 1: Load data into memory (faster but may timeout on large datasets)
try:
    result = create_full_eopf_zarr_store(
        dt=dt,
        output_path=v3_output,
        spatial_chunk=spatial_chunk,
        min_dimension=min_dimension,
        tileWidth=tileWidth,
        load_data=False,  # Load data into memory to avoid timeouts
        max_retries=3    # Retry failed operations
    )
    print("\n‚úÖ Full EOPF Zarr store created successfully!")
    
except Exception as e:
    print(f"\n‚ùå Failed with data loading: {e}")
    print("\nüîÑ Trying with lazy loading (slower but more reliable)...")
    
    # Option 2: Use lazy loading (slower but more reliable for large datasets)
    result = create_full_eopf_zarr_store(
        dt=dt,
        output_path=v3_output,
        spatial_chunk=spatial_chunk,
        min_dimension=min_dimension,
        tileWidth=tileWidth,
        load_data=False,  # Use lazy loading
        max_retries=5     # More retries for network operations
    )
    print("\n‚úÖ Full EOPF Zarr store created successfully with lazy loading!")

print(f"Output location: {v3_output}")
print(f"Processed groups: {list(result['processed_groups'].keys())}")

## Consolidate Metadata

In [None]:
# Consolidate metadata at the root of the Zarr store
zarr.consolidate_metadata(v3_output)
print("Metadata consolidated successfully!")

## Inspect the Enhanced Zarr V3 Store

In [None]:
# Inspect the structure of the created Zarr store
root = zarr.open_group(v3_output, mode="r")
print("Enhanced Zarr V3 store structure:")
print(root.tree())

## Verify Coordinates and CRS in Overview Levels

Let's verify that our overview levels maintain proper coordinates and CRS information.

In [None]:
# Verify coordinates for one of the resolution groups
# Let's check the r10m group which should have the highest resolution
if '/measurements/reflectance/r10m' in result['processed_groups']:
    group_name = '/measurements/reflectance/r10m'
    # Get the first band with overviews
    band_name = list(result['overview_levels'][group_name].keys())[0]
    overview_levels = result['overview_levels'][group_name][band_name]['levels']
    overview_path = result['overview_levels'][group_name][band_name]['path']
    
    # Get the native CRS from the processed group
    native_crs = result['processed_groups'][group_name].rio.crs
    
    print(f"Verifying coordinates for {group_name}/{band_name}:")
    verify_overview_coordinates(
        v3_output=overview_path,
        overview_levels=overview_levels,
        native_crs=native_crs,
        max_levels=3
    )
else:
    print("r10m group not found, checking available groups:")
    print(list(result['processed_groups'].keys()))

## Enhanced RGB Visualization with Overview Levels

Now let's create RGB composite visualizations using different overview levels. This demonstrates the power of having multiscale data for efficient visualization at different zoom levels.

In [None]:
# Find the best resolution group for RGB visualization
# Prefer r10m if available, otherwise use r20m or r60m
rgb_group = None
for preferred_group in ['/measurements/reflectance/r10m', '/measurements/reflectance/r20m', '/measurements/reflectance/r60m']:
    if preferred_group in result['processed_groups']:
        # Check if this group has the RGB bands we need
        available_bands = list(result['processed_groups'][preferred_group].data_vars)
        red_band, green_band, blue_band = get_sentinel2_rgb_bands(preferred_group)
        
        if all(band in available_bands for band in [red_band, green_band, blue_band]):
            rgb_group = preferred_group
            break

if rgb_group:
    print(f"Using {rgb_group} for RGB visualization")
    red_band, green_band, blue_band = get_sentinel2_rgb_bands(rgb_group)
    print(f"RGB bands: R={red_band}, G={green_band}, B={blue_band}")
else:
    print("No suitable group found for RGB visualization")
    print("Available groups and their bands:")
    for group_name, ds in result['processed_groups'].items():
        print(f"  {group_name}: {list(ds.data_vars)}")

In [None]:
# Create RGB visualization at native resolution (overview level 0)
if rgb_group:
    print("Creating RGB composite at native resolution...")
    fig_native = plot_rgb_overview(
        zarr_store_path=v3_output,
        group_name=rgb_group,
        red_band=red_band,
        green_band=green_band,
        blue_band=blue_band,
        overview_level=0,  # Native resolution
        figsize=(14, 10)
    )

In [None]:
# Create RGB visualization at overview level 1 (1:2 scale)
if rgb_group:
    print("Creating RGB composite at overview level 1 (1:2 scale)...")
    fig_overview1 = plot_rgb_overview(
        zarr_store_path=v3_output,
        group_name=rgb_group,
        red_band=red_band,
        green_band=green_band,
        blue_band=blue_band,
        overview_level=1,  # 1:2 scale
        figsize=(14, 10)
    )

In [None]:
# Create RGB visualization at overview level 2 (1:4 scale)
if rgb_group:
    print("Creating RGB composite at overview level 2 (1:4 scale)...")
    fig_overview2 = plot_rgb_overview(
        zarr_store_path=v3_output,
        group_name=rgb_group,
        red_band=red_band,
        green_band=green_band,
        blue_band=blue_band,
        overview_level=2,  # 1:4 scale
        figsize=(14, 10)
    )

In [None]:
# Create RGB visualization at overview level 3 (1:8 scale)
if rgb_group:
    print("Creating RGB composite at overview level 3 (1:8 scale)...")
    fig_overview2 = plot_rgb_overview(
        zarr_store_path=v3_output,
        group_name=rgb_group,
        red_band=red_band,
        green_band=green_band,
        blue_band=blue_band,
        overview_level=3,  # 1:8 scale
        figsize=(14, 10)
    )

In [None]:
# Create RGB visualization at overview level 4 (1:16 scale)
if rgb_group:
    print("Creating RGB composite at overview level 4 (1:16 scale)...")
    fig_overview2 = plot_rgb_overview(
        zarr_store_path=v3_output,
        group_name=rgb_group,
        red_band=red_band,
        green_band=green_band,
        blue_band=blue_band,
        overview_level=4,  # 1:16 scale
        figsize=(14, 10)
    )

In [None]:
# Create RGB visualization at overview level 5 (1:32 scale)
if rgb_group:
    print("Creating RGB composite at overview level 5 (1:32 scale)...")
    fig_overview2 = plot_rgb_overview(
        zarr_store_path=v3_output,
        group_name=rgb_group,
        red_band=red_band,
        green_band=green_band,
        blue_band=blue_band,
        overview_level=5,  # 1:32 scale
        figsize=(14, 10)
    )

## Compare Overview Levels for Individual Bands

Let's also visualize how individual bands look at different overview levels to demonstrate the quality of our COG-style downsampling.

In [None]:
# Plot overview levels for one of the bands
if rgb_group and red_band in result['overview_levels'][rgb_group]:
    overview_levels = result['overview_levels'][rgb_group][red_band]['levels']
    overview_path = result['overview_levels'][rgb_group][red_band]['path']
    
    print(f"Plotting overview levels for {rgb_group}/{red_band}:")
    fig_levels = plot_overview_levels(
        v3_output=overview_path,
        overview_levels=overview_levels,
        var=red_band,
        max_plots=3
    )

## Summary and Data Access Examples

Let's demonstrate how to access the created multiscale data programmatically.

In [None]:
# Show how to access different resolution groups and their overview levels
print("=== Data Access Examples ===")
print(f"\nZarr store location: {v3_output}")
print(f"Available resolution groups: {list(result['processed_groups'].keys())}")

for group_name in result['processed_groups'].keys():
    print(f"\n--- {group_name} ---")
    
    # Show native resolution access
    native_path = f"{v3_output}/{group_name}"
    print(f"Native resolution: {native_path}")
    
    # Show available bands
    bands = list(result['processed_groups'][group_name].data_vars)
    print(f"Available bands: {bands}")
    
    # Show overview levels for first band
    if bands and group_name in result['overview_levels']:
        first_band = bands[0]
        if first_band in result['overview_levels'][group_name]:
            overview_info = result['overview_levels'][group_name][first_band]
            overview_path = overview_info['path']
            levels = overview_info['levels']
            
            print(f"Overview levels for {first_band}:")
            for level in levels[:3]:  # Show first 3 levels
                scale = level['scale_factor']
                dims = f"{level['width']}x{level['height']}"
                level_path = f"{overview_path}/{level['level']}"
                print(f"  Level {level['level']} (1:{scale}): {dims} -> {level_path}")

In [None]:
# Example: Load and examine a specific overview level
if rgb_group and red_band in result['overview_levels'][rgb_group]:
    overview_path = result['overview_levels'][rgb_group][red_band]['path']
    
    # Load overview level 1
    overview_ds = xr.open_zarr(overview_path, group="1", zarr_format=3)
    
    print(f"\n=== Example: Overview Level 1 for {rgb_group}/{red_band} ===")
    print(f"Dimensions: {dict(overview_ds.sizes)}")
    print(f"Coordinates: {list(overview_ds.coords)}")
    print(f"Data variables: {list(overview_ds.data_vars)}")
    print(f"CRS: {overview_ds.rio.crs}")
    
    # Show coordinate ranges
    if 'x' in overview_ds.coords and 'y' in overview_ds.coords:
        x_range = (overview_ds.x.min().values, overview_ds.x.max().values)
        y_range = (overview_ds.y.min().values, overview_ds.y.max().values)
        print(f"X range: {x_range[0]:.2f} to {x_range[1]:.2f}")
        print(f"Y range: {y_range[0]:.2f} to {y_range[1]:.2f}")

## Summary

This enhanced notebook demonstrates a complete transformation of EOPF Zarr data into a fully GeoZarr-compliant multiscale dataset:

### ‚úÖ **Complete EOPF Structure Preservation**:
- All resolution groups (r10m, r20m, r60m) processed
- All spectral bands included with proper metadata
- Original EOPF hierarchy maintained while adding GeoZarr compliance

### ‚úÖ **Enhanced Multiscale Support**:
- COG-style overview levels for every band in every resolution group
- Efficient /2 downsampling with proper coordinate arrays
- Native CRS preservation across all overview levels

### ‚úÖ **Advanced Visualization Capabilities**:
- RGB composite generation using overview levels
- Automatic band selection for Sentinel-2 data
- Contrast stretching and proper coordinate display
- Multi-scale visualization for efficient data exploration

### ‚úÖ **Modular Code Organization**:
- Core functionality moved to `geozarr_examples.cog_multiscales` module
- Reusable functions for EOPF metadata setup
- Flexible RGB plotting with overview level selection
- Clean separation of concerns for maintainability

### ‚úÖ **Full GeoZarr Compliance**:
- CF-compliant metadata and standard names
- Proper CRS information in multiple formats
- Grid mapping attributes linking data to spatial reference
- Multiscale metadata for TMS compatibility
- Zarr V3 format with efficient compression

### üîç **Key Enhancements Over Original Notebook**:

1. **Comprehensive Processing**: Instead of just one band, all bands in all resolution groups are processed
2. **RGB Visualization**: Added RGB composite plotting with overview level selection
3. **Code Modularity**: Moved complex logic to helper functions for reusability
4. **Better Organization**: Clear separation between data processing and visualization
5. **Enhanced Documentation**: More detailed explanations and usage examples

The resulting Zarr store is a complete, cloud-optimized, multiscale geospatial dataset that maintains the rich structure of EOPF while providing the performance benefits of GeoZarr and the visualization capabilities demonstrated through RGB composites at multiple scales.