# Whale Migration & Sea Ice Analysis

This notebook analyzes the relationship between whale migration patterns and sea ice conditions in the Arctic, specifically highlighting anomalous years where extreme sea ice conditions correlate with deviations in historical migration patterns.

## Overview
The analysis combines:
- **Whale telemetry data**: Bowhead whale locations from West Greenland
- **Sea ice data**: OSISAF sea ice concentration data for the Arctic
- **Temporal focus**: Summer months (July-September) across multiple years

## Prerequisites
Before running this notebook, ensure the required OSISAF datasets exist by running:
```bash
uv run zebra datasets create --config-name whale_viz
```


## Data 
This notebook will load two different datasets:
- Whale telemetry data
- OSISAF sea ice concentration data

To retrieve the OSISAF data we will rely on the ice-station-zebra pipeline, which accesses the CDS API (see readme for more details).

The telemetry data is available in csv format and can be downloaded here:
[Chambault, Philippine; Kovacs, Kit M; Lydersen, Christian et al. (2024). Future seasonal changes in habitat for Arctic whales during predicted ocean warming [Dataset]. Dryad. https://doi.org/10.5061/dryad.tqjq2bw2c](https://doi.org/10.5061/dryad.tqjq2bw2c)

## 1. Setup and Imports

Import all necessary libraries for data processing, visualization, and analysis. This includes scientific computing libraries (numpy, pandas, xarray), visualization tools (matplotlib, cartopy), and the ice-station-zebra framework components.


In [None]:
from pathlib import Path

# Third-party libraries
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import hydra
import matplotlib.animation as animation
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

# Jupyter/interactive
from IPython.display import HTML


## 2. Configuration

There are three accompanying config files to be found in ice_station_zebra/config
1) whale_viz.yaml - specifies OSISAF dataset config files & links to other two relevant config files
2) whale_telemetry - specifies parameters for filtering whale data
3) visualization - specifies parameters for visualizing data


In [None]:
with hydra.initialize(config_path="ice_station_zebra/config", version_base=None):
    config = hydra.compose(config_name="whale_viz")


## 4. Whale Data Inspect & Filter

Ensure you have downloaded the whale telemetry CSV dataset manually


In [None]:
 # Get whale data config
whale_config = config.whale_telemetry
    
# Load the CSV file
csv_path = Path(whale_config.input.csv.path)

# Load CSV with specified separator
separator = whale_config.input.csv.separator
whale_data = pd.read_csv(csv_path, sep=separator)

# Apply column mapping from config
column_mapping = whale_config.columns
whale_data = whale_data.rename(columns=column_mapping)

# Convert dateTime to datetime
whale_data['dateTime'] = pd.to_datetime(whale_data['dateTime'], format='%d/%m/%Y %H:%M')

# Extract month and year from dateTime
whale_data['month'] = whale_data['dateTime'].dt.month
whale_data['year'] = whale_data['dateTime'].dt.year

# Basic inspection of whale_data
print("Columns:", list(whale_data.columns))
print("Shape:", whale_data.shape)
print("\nRows per species:")
print(whale_data['species'].value_counts())

We can see that we have the most data for the Narwhal species.

Narwhals can survive in mean SIC conditions up to 98% (but rely on cracks and leads to breath)

Bowhead avoid SIC above 65%, prefer medium and first-year ice, prefer small floe size to avoid entrapment in winter but seek SIC above 65% during summer to avoid predators; they can break through ice fore breathing.

Because at the current resolution of IceNet forecasts we are not able to predict the location of cracks and leads, we will focus on Bowhead whales, where we are more likely to identify a SIC threshold to predict whale location.

In [None]:
bowhead_data = whale_data[whale_data['species'] == 'Bw']

In [None]:
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8))
    
# Year distribution
year_counts = bowhead_data.groupby('year').size()
ax1.bar(year_counts.index, year_counts.values, color='skyblue', alpha=0.7, edgecolor='navy')
ax1.set_title(f'Bowhead - Data Points per Year')
ax1.set_xlabel('Year')
ax1.set_ylabel('Number of Data Points')
ax1.grid(True, alpha=0.3)

# Month distribution
month_counts = bowhead_data.groupby('month').size()
month_names = [pd.Timestamp(2020, month, 1).strftime('%B') for month in month_counts.index]
ax2.bar(range(len(month_counts)), month_counts.values, color='lightcoral', alpha=0.7, edgecolor='darkred')
ax2.set_title(f'Bowhead - Data Points per Month')
ax2.set_xlabel('Month')
ax2.set_ylabel('Number of Data Points')
ax2.set_xticks(range(len(month_counts)))
ax2.set_xticklabels(month_names, rotation=45)
ax2.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

We can see that we do not have data points for the main migration periods which are April to June. However, we might be able to see some movement still during early spring months August. We also note that we have a decent amount of data for Bowhead whales for the year of 2010.

**2010**: Two adult male bowhead whales, one from West Greenland (BBDS stock) and one from Alaska (BCB stock), entered the Northwest Passage from opposite directions. They spent over two weeks in **Viscount Melville Sound** in September, coming within 130 km of each other, before returning to their normal seasonal ranges. This overlap occurred because the Northwest Passage was largely free of sea ice by 10 August 2010

Furthermore, we select the years 2003 and 2018 as alternative years with decent amount of data points and more or less ice coverage respectively (relative to 2010 based on reported September minimum ice extent https://climate.copernicus.eu/sea-ice-cover-september-2024)

We will also filter for the West side of Greenland, which is where the main migration movement is expected to happen

In [None]:
# Load and print the config
with hydra.initialize(config_path="ice_station_zebra/config", version_base=None):
    config = hydra.compose(config_name="whale_telemetry")
    
print("Whale telemetry config:")
print(f"Species filter: {config.whale_telemetry.filters.species}")
print(f"Side filter: {config.whale_telemetry.filters.side}")
print(f"Start month: {config.whale_telemetry.date_filter.start_month}")
print(f"End month: {config.whale_telemetry.date_filter.end_month}")
print(f"Target years: {config.whale_telemetry.date_filter.target_years}")

# Use config to filter the data
filtered_data = whale_data[
    (whale_data['species'] == config.whale_telemetry.filters.species) &
    (whale_data['side'] == config.whale_telemetry.filters.side) &
    (whale_data['month'] >= config.whale_telemetry.date_filter.start_month) &
    (whale_data['month'] <= config.whale_telemetry.date_filter.end_month) &
    (whale_data['year'].isin(config.whale_telemetry.date_filter.target_years))
]

print(f"\nFiltered records: {len(filtered_data):,}")

## 7. Whale Data Visualization

Let's visualize one day of the whale dataset to get a better idea

In [None]:
def create_separate_data_plots(whale_data_filtered: pd.DataFrame, config) -> None:
    """
    Create whale data visualization for one day from a randomly selected year.
    
    Args:
        whale_data_filtered: Filtered whale telemetry data
        config: Configuration object
    """

    # Get visualization parameters from config
    viz_config = config.visualization
    
    # Get the first day's data for this year
    first_date = whale_data_year['dateTime'].min().date()
    whale_data_date = whale_data_year[whale_data_year['dateTime'].dt.date == first_date]
    
    # Create whale locations plot
    fig, ax = plt.subplots(1, 1, figsize=(12, 10), 
                          subplot_kw={'projection': ccrs.LambertAzimuthalEqualArea(
                              central_longitude=0, central_latitude=90)})
    
    # Set Arctic extent (West Greenland / Canadian Arctic)
    plot_extent = [-110, -55, 60, 85]
    ax.set_extent(plot_extent, crs=ccrs.PlateCarree())
    
    # Add coastlines and land features
    ax.add_feature(cfeature.COASTLINE, linewidth=0.5, color='black')
    ax.add_feature(cfeature.LAND, facecolor='lightgray', alpha=0.7)
    
    # Plot whale locations if data exists
    if not whale_data_date.empty:
        # Plot whale locations with different colors per whale ID
        unique_whales = whale_data_date['id'].unique()
        colors = plt.cm.tab10(np.linspace(0, 1, len(unique_whales)))
        
        for i, whale_id in enumerate(unique_whales):
            whale_points = whale_data_date[whale_data_date['id'] == whale_id]
            ax.scatter(whale_points['lon'], whale_points['lat'], 
                       c=[colors[i]], label=f'Whale {whale_id}', 
                       s=50, alpha=0.7, edgecolors='black', linewidth=0.5,
                       transform=ccrs.PlateCarree())
        
        # Add legend
        ax.legend(bbox_to_anchor=(1.05, 1), loc='upper left', fontsize=10,
                  title='Whale IDs', title_fontsize=11, framealpha=0.9)
        
        # Add data summary info
        ax.text(0.02, 0.98, f'Total Whales: {len(unique_whales)}\nTotal Points: {len(whale_data_date)}', 
                transform=ax.transAxes, fontsize=9, style='italic',
                bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.8),
                verticalalignment='top')
    else:
        ax.text(0.5, 0.5, 'No whale data for this year', 
                ha='center', va='center', transform=ax.transAxes, fontsize=14)
    
    # Set title
    ax.set_title(f'Whale Telemetry Locations - {first_date}', 
                 fontsize=16, fontweight='bold', pad=20)
    
    # Add informative subtitle
    ax.text(0.5, 0.95, 'Bowhead Whale (Bw) | West Greenland Side', 
            transform=ax.transAxes, ha='center', va='top', 
            fontsize=10, style='italic', bbox=dict(boxstyle='round,pad=0.3', 
            facecolor='white', alpha=0.8))
    
    # Add gridlines
    ax.gridlines(draw_labels=True, dms=True, x_inline=False, y_inline=False,
                linewidth=0.7, alpha=0.8, color='gray')
    
    # Add coordinate system info
    ax.text(0.02, 0.02, 'Projection: Lambert Azimuthal Equal Area\nCRS: WGS84 (EPSG:4326)', 
            transform=ax.transAxes, fontsize=8, style='italic',
            bbox=dict(boxstyle='round,pad=0.2', facecolor='white', alpha=0.7))
    
    plt.show()

with hydra.initialize(config_path="ice_station_zebra/config", version_base=None):
    config = hydra.compose(config_name="whale_viz")

# Filter whale data for year
whale_data_year = filtered_data[filtered_data['year'] == 2018]
    
create_separate_data_plots(whale_data_filtered=whale_data_year, config=config)

We can see that on this random day we have data for 5 unique whales and for some of these whales we have several data points indicating their travel trajectory.

## 8. Sea Ice Data

We will now add sea ice to the mix! Execute the zebra datasets commands to create the required OSISAF sea ice datasets for the target years. This ensures all necessary data is available before proceeding with the analysis.

Run zebra datasets create command
`!uv run zebra datasets create --config-name whale_viz`

Let's load the prepared Zarr sea ice datasets using the lightning ZebraDataset format. This loads OSISAF sea ice concentration data for each target year and applies sanity checks to verify data integrity.


In [None]:
# Load sea ice datasets using ZebraDataset
from ice_station_zebra.data_loaders.zebra_dataset import ZebraDataset

with hydra.initialize(config_path="ice_station_zebra/config", version_base=None):
    config = hydra.compose(config_name="whale_viz")

    
 # Get the base path from config
base_path = Path(config.base_path)
    
# Dictionary to store datasets for each year
sea_ice_datasets = {}
    
# Get target years from whale telemetry config
target_years = config.whale_telemetry.date_filter.target_years
    
for year in target_years:
    dataset_name = f"osisaf-sic-north-{year}-08-24h-v1"
    dataset_path = base_path / "data" / "anemoi" / f"{dataset_name}.zarr"
                
    try:
        # Create ZebraDataset using the lightning format
            # This will load the data using anemoi.datasets.data.open_dataset internally
        dataset = ZebraDataset(
            name=dataset_name,
            input_files=[dataset_path]
        )

        # Store in dictionary
        sea_ice_datasets[str(year)] = dataset
                
    except Exception as e:
            print(f"Failed to load sea ice dataset for {year}: {e}")

dataset


In [None]:
# Get the first dataset (any year)
year, dataset = next(iter(sea_ice_datasets.items()))

# Get first day's data
first_day_data = dataset[0]  # Shape: (C, H, W)
first_day_data = first_day_data.squeeze(0)  # Remove channel dimension -> (H, W)

# Create plot
fig, ax = plt.subplots(figsize=(10, 8), subplot_kw={'projection': ccrs.NorthPolarStereo()})

# Data’s CRS (from the manual): LAEA, pole-centered, WGS84, lon0=0
src_crs = ccrs.LambertAzimuthalEqualArea(
    central_longitude=0, central_latitude=90,
    globe=ccrs.Globe(ellipse='WGS84')
)

nrows, ncols = 432, 432
px = 25_000.0  # meters per pixel

half_w = (ncols * px) / 2.0
half_h = (nrows * px) / 2.0
extent = [-half_w, half_w, -half_h, half_h]   # [xmin, xmax, ymin, ymax] in meters
print(extent) 
ax = plt.axes(projection=src_crs)

# Plot sea ice data using imshow
im1 = ax.imshow(
    first_day_data,
    vmin=0, vmax=1,
    transform=src_crs,         
    extent=extent,             # Use the EASE2 grid extent
    origin='upper',
    cmap='Blues_r'            # Blue to white colormap (white = high concentration)
)

# Add map features
ax.add_feature(cfeature.COASTLINE, linewidth=0.5, color='black')
ax.add_feature(cfeature.LAND, facecolor='lightgray', alpha=0.7,zorder=0)

plt.show()

## Overlay whale telemetry data on top of SIC - generate animation



In [None]:
def create_animation(sea_ice_dataset: 'ZebraDataset', whale_data_year: pd.DataFrame, year: str) -> 'matplotlib.animation.FuncAnimation':
    """
    Create animation for a single year's dataset.
    Each day is shown for 1 second (hardcoded).
    
    Args:
        sea_ice_dataset: Single year's sea ice dataset
        whale_data_year: Filtered whale telemetry data for this year
        year: Year as string for title
        
    Returns:
        Single animation object
    """
    # Get the total number of timesteps available
    total_timesteps = len(sea_ice_dataset)
    if total_timesteps == 0:
        return None
        
    # Create the figure and axis for animation
    fig, ax = plt.subplots(1, 1, figsize=(12, 10), 
                          subplot_kw={'projection': ccrs.LambertAzimuthalEqualArea(
                              central_latitude=90, central_longitude=0)})
    
    # Set the same extent as other plots
    plot_extent = [-110, -55, 60, 85]
    ax.set_extent(plot_extent, crs=ccrs.PlateCarree())
    
    # Add static features (coastlines, land)
    ax.add_feature(cfeature.COASTLINE, linewidth=0.5, color='black', zorder=15)
    ax.add_feature(cfeature.LAND, facecolor='lightgray', zorder=15)
    
    # Add gridlines
    ax.gridlines(draw_labels=True, dms=True, x_inline=False, y_inline=False,
                linewidth=0.5, alpha=0.7, color='gray')
    
    # Add overall title
    ax.set_title(f'Sea Ice & Whale Animation - {year}', fontsize=16, fontweight='bold', pad=20)
    
    # Add informative subtitle
    ax.text(0.5, 0.95, 'OSISAF Sea Ice + Bowhead Whale Telemetry', 
           transform=ax.transAxes, ha='center', va='top', 
           fontsize=10, style='italic', bbox=dict(boxstyle='round,pad=0.3', 
           facecolor='white', alpha=0.8), zorder=20)
    
    # Initialize empty plot objects that will be updated
    sea_ice_plot = None
    whale_scatter = None
    date_text = None
    
    def get_daily_whale_positions(df: pd.DataFrame, target_date: pd.Timestamp, days_back: int = 0) -> pd.DataFrame:
        """Get whale positions for a specific date, with option to go back in time"""
        if days_back > 0:
            target_date = target_date - pd.Timedelta(days=days_back)
        
        # Filter for the target date
        day_data = df[df['dateTime'].dt.date == target_date.date()]
        
        if day_data.empty:
            return pd.DataFrame()
        
        # Get the last datapoint for each whale on this day
        last_positions = day_data.groupby('id').tail(1)
        return last_positions
    
    def animate(frame_idx):
        """Animation function that updates the plot for each frame"""
        nonlocal sea_ice_plot, whale_scatter, date_text
        
        # Get the sea ice data for this timestep
        sea_ice_data = sea_ice_dataset[frame_idx].squeeze(0)  # Remove channel dimension
        
        # Get the date for this timestep
        frame_date = sea_ice_dataset.start_date + pd.Timedelta(days=frame_idx)
        frame_date_str = str(frame_date)[:10]
        
        # Get the date for this frame
        frame_date_pd = pd.Timestamp(frame_date)
        
        # Clear previous frame elements
        if sea_ice_plot is not None:
            sea_ice_plot.remove()
        if whale_scatter is not None:
            whale_scatter.remove()
        if date_text is not None:
            date_text.remove()
        
        # Set up EASE2 grid extent for sea ice data
        nrows, ncols = 432, 432
        px = 25_000.0  # meters per pixel
        half_w = (ncols * px) / 2.0
        half_h = (nrows * px) / 2.0
        extent = [-half_w, half_w, -half_h, half_h]
        
        # Plot sea ice data
        sea_ice_plot = ax.imshow(
            sea_ice_data,
            vmin=0, vmax=1,
            transform=ccrs.LambertAzimuthalEqualArea(central_latitude=90, central_longitude=0),         
            extent=extent,
            origin='upper',
            cmap='Blues_r',
            zorder=1
        )
        
        # Plot whale trail effect (current day + previous days with decreasing size/transparency)
        trail_days = 5  # Number of previous days to show in trail
        
        all_whale_lons = []
        all_whale_lats = []
        all_whale_colors = []
        all_whale_sizes = []
        all_whale_alphas = []
        
        # Get unique whale IDs from current frame data
        current_whale_data = get_daily_whale_positions(whale_data_year, frame_date_pd)
        if not current_whale_data.empty:
            unique_whales = current_whale_data['id'].unique()
            colors = plt.cm.tab10(np.linspace(0, 1, len(unique_whales)))
            
            # Plot trail for each whale (previous days first, then current day)
            for i, whale_id in enumerate(unique_whales):
                whale_color = colors[i]
                
                # Plot previous days (trail effect)
                for days_back in range(trail_days, 0, -1):
                    trail_data = get_daily_whale_positions(whale_data_year, frame_date_pd, days_back)
                    if not trail_data.empty:
                        whale_trail = trail_data[trail_data['id'] == whale_id]
                        if not whale_trail.empty:
                            # Decreasing size and transparency for trail effect
                            trail_size = 60 - (days_back * 8)  # 60, 52, 44, 36, 28
                            trail_alpha = 0.9 - (days_back * 0.15)  # 0.9, 0.75, 0.6, 0.45, 0.3
                            
                            all_whale_lons.append(whale_trail['lon'].iloc[0])
                            all_whale_lats.append(whale_trail['lat'].iloc[0])
                            all_whale_colors.append(whale_color)
                            all_whale_sizes.append(trail_size)
                            all_whale_alphas.append(trail_alpha)
                
                # Plot current day position (largest and most opaque)
                current_whale = current_whale_data[current_whale_data['id'] == whale_id]
                if not current_whale.empty:
                    all_whale_lons.append(current_whale['lon'].iloc[0])
                    all_whale_lats.append(current_whale['lat'].iloc[0])
                    all_whale_colors.append(whale_color)
                    all_whale_sizes.append(60)  # Largest size for current day
                    all_whale_alphas.append(0.9)  # Most opaque for current day
            
            # Plot all whale positions with varying sizes and transparency
            if all_whale_lons:
                whale_scatter = ax.scatter(all_whale_lons, all_whale_lats, 
                                         c=all_whale_colors, s=all_whale_sizes, 
                                         alpha=all_whale_alphas, edgecolors='black', 
                                         linewidth=0.8, transform=ccrs.PlateCarree(), zorder=20)
        
        # Add date stamp
        date_text = ax.text(0.5, 0.85, frame_date_str, 
                           transform=ax.transAxes, ha='center', va='top', 
                           fontsize=14, fontweight='bold',
                           bbox=dict(boxstyle='round,pad=0.3', facecolor='white', alpha=0.9), zorder=20)
        
        return sea_ice_plot, whale_scatter, date_text
    
    # Create the animation (1 second per day)
    anim = animation.FuncAnimation(
        fig, animate, frames=total_timesteps, 
        interval=1000,  # 1 second per frame
        blit=False,  # Don't use blitting for complex plots
        repeat=True
    )
    
    plt.close(fig)
    return anim

sel_year = '2010'
anim = create_animation(sea_ice_datasets[sel_year],filtered_data,sel_year)

HTML(anim.to_jshtml())
