# 🌊 **Clipping North Atlantic Ocean by Latitude Bands**

This notebook prepares the **North Atlantic Ocean** geometry for **zonal analysis** by clipping it into latitude-based regions. We use global latitude bands to divide the ocean into horizontal slices (e.g., 30°N–40°N), then calculate daily chlorophyll-a concentration trends within each slice.

The workflow includes:
- Loading the **GOaS ocean boundaries** and **latitude band** datasets.
- Clipping the **North Atlantic** region by each latitude band.
- Extracting **daily chlorophyll values** per band from a NetCDF raster.
- Exporting a time series of chlorophyll means and generating plots.

## 📑 Table of Contents
- [🧰 1. Import Required Libraries](#1-import-required-libraries)  
- [🌊 2. Load Ocean Vector (GOaS)](#2-load-ocean-vector-goas)  
- [🗺️ 3. Load Latitude Bands](#3-load-latitude-bands)  
- [✂️ 4. Clip Ocean by Latitude](#4-clip-ocean-by-latitude)  
- [📈 5. Extract Chlorophyll Data](#5-extract-chlorophyll-data)  
- [📊 6. Visualize Time Series](#6-visualize-time-series)


## 🧰 **1. Import Required Libraries**

- **Geospatial libraries**  
  For working with map data like polygons and raster files:  
  - `geopandas` loads and manipulates vector data (e.g., shapefiles, GeoJSON).  
  - `shapely` helps define and clean up geometric shapes.  
  - `rasterio` and `rioxarray` let us read and clip NetCDF raster files (like chlorophyll data).  

- **Data handling and computation**  
  For loading, cleaning, and processing data efficiently:  
  - `pandas` handles tabular data.  
  - `numpy`, `xarray`, and `dask` allow large, multi-dimensional data processing in chunks.  

- **Plotting libraries**  
  To visualize chlorophyll concentrations and trends:  
  - `matplotlib.pyplot` draws plots and maps.  
  - `matplotlib.dates` formats date-based x-axes.  
  - `matplotlib.cm` provides access to color palettes.  

- **Utilities**  
  For smooth workflows and progress tracking:  
  - `tqdm` shows progress bars in loops.  
  - `pathlib` handles file paths across systems.  
  - `re` helps clean up and format text (like band labels).
  - `loguru` displays clean and colorful logging messages (e.g., 🎉 success, 🚩 error).

In [None]:
# --- Geospatial libraries ---
import geopandas as gpd  # Geographic vector data (GeoDataFrames)
from shapely.geometry import Polygon  # Creating polygon geometries
import rasterio  # Raster file I/O and metadata
import rasterio.mask  # Masking raster data with geometries
import rioxarray as rxr  # Geospatial extension for xarray (CRS-aware raster I/O)

# --- Data handling and computation ---
import pandas as pd  # Tabular data manipulation (DataFrames)
import numpy as np  # Numerical operations
import xarray as xr  # Multi-dimensional labeled arrays (NetCDF, climate, satellite)
import dask.array as da  # Parallel computation on large arrays
import dask  # Dask task scheduling and delayed execution
from dask.diagnostics import ProgressBar  # Progress bar for Dask computations

# --- Plotting libraries ---
import matplotlib.pyplot as plt  # Core plotting API
from matplotlib.dates import MonthLocator, DateFormatter  # Date-based x-axis formatting
from matplotlib.cm import get_cmap  # Access to colormaps for styling plots

# --- Utilities ---
from tqdm import tqdm  # Progress bars for loops
from pathlib import Path  # Object-oriented filesystem paths
import re  # Regular expressions for string parsing and pattern matching
from loguru import logger  # Simple, colorful logging
import sys  # For configuring logging output

# Set up the logger to show clean messages (colored text, no background)
from helpers import configure_logger
configure_logger()

# Check that everything was imported successfully
logger.success("🎉 Libraries successfully imported.")

## 🌊 **2. Load Ocean Vector (GOaS)**

Before proceeding, please download the **Global Oceans and Seas (GOaS)** dataset from the following link:  
   [Download GOaS_v1_20211214_gpkg.zip](https://www.marineregions.org/download_file.php?name=GOaS_v1_20211214_gpkg.zip)

   Once downloaded, **extract** the ZIP file and point to the `.gpkg` file in the cell below.

In [None]:
try:
    # Replace with the path to your downloaded GOaS dataset
    goas_vector = gpd.read_file("path/to/your/downloaded/GOaS_v1_20211214_gpkg/goas_v01.gpkg")
    logger.success("🎉 GOaS dataset successfully loaded into GeoDataFrame.")
except Exception as e:
    logger.error(f"🚩 Failed to load GOaS dataset: {e}")

### 2.1: Filter Ocean Vector
In this step, we apply a **filter** to the `goas_vector` **GeoDataFrame** to select features that match the **"North Atlantic Ocean"** in the `"name"` column.

- **Setting the Filter**: We define the `filter_name` variable as `"North Atlantic Ocean"`, which will be used to filter the `goas_vector` dataset.
  
- **Filtering**: We use `str.contains()` with the filter string to **select** only those rows where the `"name"` column contains the filter value, ensuring case-insensitivity and excluding `NaN` values.

This ensures that only relevant features (those matching "North Atlantic Ocean") are retained for further processing.


In [None]:
try:
    # Set the filter for "North Atlantic Ocean"
    filter_name = "North Atlantic Ocean"

    logger.info(f"Filtering GOaS vector for 'name' matching '{filter_name}'")
    goas_vector = goas_vector[goas_vector["name"].str.contains(filter_name, case=False, na=False)]

    if goas_vector.empty:
        logger.warning(f"⚠️ No features found matching 'name' with '{filter_name}'. Exiting.")
        raise ValueError(f"No features found for {filter_name}.")
    else:
        logger.success(f"🎉 Successfully filtered GOaS vector for 'name' matching '{filter_name}'.")

except Exception as filter_error:
    logger.error(f"🚩 Filtering error: {filter_error}")
    raise filter_error

## 🗺️ **3. Load Latitude Bands**

Next, we load the **latitude bands** that we created in "[01_create_latitude_blocks](01_create_latitude_blocks.ipynb)" to clip the ocean vector. These latitude bands will help us analyze one section of the ocean at a time, as different latitudes experience phytoplankton blooms at different times.

In [None]:
try:
    # Replace with the path to the latitude bands GeoJSON file
    clip_vector = gpd.read_file(Path.cwd()/"data/latitude_bands_global.geojson")
    
    logger.success("🎉 Latitude bands successfully loaded into GeoDataFrame.")
except Exception as e:
    logger.error(f"🚩 Failed to load latitude bands: {e}")

## ✂️ **4. Clip Ocean by Latitude**

This step performs the clipping of the **GOaS vector** by each latitude band. For each latitude band:

- We clean the **band label** to make it a valid filename by removing special characters and replacing spaces with underscores.
- We use **GeoPandas’ `clip` function** to clip the ocean data based on the latitude band’s geometry.

In [None]:
try:
    all_clipped_geometries = []  # List to store clipped geometries
    band_labels = []  # List to store corresponding band labels
    
    # Iterate through each latitude band and clip
    for _, row in clip_vector.iterrows():
        clip_geom = row.geometry
        band_label = row["Latitude_Range"]

        # Clean the band label: Replace spaces with "_", slashes with "-", and remove degree signs
        band_label = re.sub(r"[°]", "", band_label)  # Remove degree signs
        band_label = band_label.replace(" ", "_").replace("/", "-")

        logger.info(f"Clipping for latitude range {band_label}...")

        # Use GeoPandas `clip` function to clip the GOaS vector by the latitude band geometry
        clipped_gdf = gpd.clip(goas_vector, clip_geom)

        # Add to the list if not empty
        if not clipped_gdf.empty:
            all_clipped_geometries.append(clipped_gdf)  # Collect clipped data
            band_labels.append(band_label)  # Collect the corresponding band label
            logger.success(f"🎉 Clipped data for {band_label} added.")
        else:
            logger.warning(f"⚠️ No features found for latitude range {band_label}. This could be due to {filter_name} not occurring in this latitude range. Skipping.")

    # Combine all clipped geometries into a single GeoDataFrame
    if all_clipped_geometries:
        clipped_gdf = gpd.GeoDataFrame(pd.concat(all_clipped_geometries, ignore_index=True))
        clipped_gdf['Latitude_Range'] = band_labels  # Add latitude band labels to the combined GeoDataFrame

        # Log a success message with some stats
        logger.success(
            f"🎉 Successfully clipped {len(all_clipped_geometries)} latitude bands "
            f"with a total of {len(clipped_gdf)} features."
        )
    else:
        logger.warning("⚠️ No features were clipped. No file created.")
except Exception as e:
    logger.error(f"🚩 Error during the clipping process: {e}")

## 📈 **5. Extract Chlorophyll Data**

In [None]:
# File paths
netcdf_file = "replace_with_path/cmems_mod_glo_bgc-pft_anfc_0.25deg_P1D-m_chl_90.00W-30.00E_80.00S-90.00N_0.49m_2024-01-01-2025-01-01.nc"


In [None]:
# Open the NetCDF file with Dask for chunked processing
ds = xr.open_dataset(netcdf_file, chunks={'time': 1, 'lat': 100, 'lon': 100})  # Adjust chunk sizes based on your data

In [None]:
ds

In [None]:
chlorophyll_var = "chl"  # Adjust if variable name is different

# Extract time steps
time_steps = ds.time.values

In [None]:
# Prepare to store results
results = []

try:
    # Loop through each latitude band geometry
    for _, row in clipped_gdf.iterrows():
        band_geometry = row.geometry
        band_label = row["Latitude_Range"]

        logger.info(f"Processing latitude range {band_label}...")

        # Create a list to hold all delayed operations for this band
        delayed_tasks = []

        # Define the full processing pipeline as delayed operations
        for time in time_steps:
            def process_time_step(time_step, geometry, crs):
                # Select time and first depth level
                da = ds[chlorophyll_var].sel(time=time_step).isel(depth=0)
            
                # Set spatial reference
                da.rio.set_spatial_dims(x_dim="longitude", y_dim="latitude", inplace=True)
                da.rio.write_crs("EPSG:4326", inplace=True)
            
                # Clip and compute mean
                masked_raster = da.rio.clip([geometry], crs, drop=True, all_touched=True)
                return float(masked_raster.mean().compute())


            # Create a delayed task for the entire processing pipeline
            delayed_task = dask.delayed(process_time_step)(time, band_geometry, clipped_gdf.crs)
            delayed_tasks.append(delayed_task)

        # Compute all tasks for this latitude band with a single progress bar
        with ProgressBar():
            masked = dask.compute(*delayed_tasks)

        # Store results for the current latitude band
        results.append({"Latitude_Range": band_label, "chlorophyll_mean": list(masked)})

    # Log success after all latitude bands have been processed
    logger.success(f"🎉 Successfully processed all {len(clipped_gdf)} latitude bands!")

except Exception as e:
    logger.error(f"🚩 Error during processing: {str(e)}")


In [None]:
# Convert results to a DataFrame of latitude bands with their daily mean chlorophyll values
chlorophyll_data = pd.DataFrame({
    row["Latitude_Range"]: row["chlorophyll_mean"]
    for row in results
})

# Set index to dates (formatted as DD.MM.YY)
chlorophyll_data.index = pd.to_datetime(time_steps).strftime("%d.%m.%y")

# Reset index to turn dates into a column
chlorophyll_data.reset_index(inplace=True)
chlorophyll_data.rename(columns={"index": "date"}, inplace=True)

# Save to CSV
output_file = Path.cwd() / "data/chlorophyll_mean_timeseries_by_lat_band.csv"
chlorophyll_data.to_csv(output_file, index=False)

logger.success(f"🎉 CSV saved in desired format: {output_file}")

## 📊 **6. Visualize time-series**

In [None]:
# Create a colormap that can generate N colors
num_lines = len(chlorophyll_data.columns) - 1  # exclude date
cmap = get_cmap("Set2", num_lines)  # or "Set2", "tab10", etc.

# --- Plot setup ---
fig, ax = plt.subplots(figsize=(14, 7))

# Add light grey background for spring bloom
ax.axvspan(pd.Timestamp("2024-03-01"), pd.Timestamp("2024-06-30"), color="white", alpha=0.3)

# Plot each latitude band
for i, column in enumerate(chlorophyll_data.columns[1:]):
    ax.plot(
        chlorophyll_data["date"],
        chlorophyll_data[column],
        label=column.replace("_", " ") + "°N",
        linewidth=2,
        color=cmap(i)
    )

# Title & subtitle
plt.suptitle("2024 Chlorophyll-a concentration in the North Atlantic", fontsize=18, fontweight="bold", x=0.01, ha='left')
plt.title("Average daily chlorophyll-a concentration by 10° latitude block", fontsize=12, loc='left')

# X-axis formatting
ax.xaxis.set_major_locator(MonthLocator())
ax.xaxis.set_major_formatter(DateFormatter('%b'))

# Clean up
ax.set_ylim(0, 1.5)
ax.legend(title="", loc="upper left", fontsize=10)
ax.tick_params(axis='both', labelsize=10)
ax.grid(False)
for spine in ["top", "right"]:
    ax.spines[spine].set_visible(False)

plt.tight_layout()
plt.show()