<img src="../Images/DSC_Logo.png" style="width: 400px;">

# Data Visualization of PANGAEA Datasets

In this notebook, we will us the preprocessed datasets to:
- Map recorded orcinus orca sightings in the Arctic.
- Map cruise tracks in the Arctic.
- Follow whale sightings along a cruise route from Bremerhaven to Cape Town.

The focus is on showcasing both static and dynamic plotting options for data visualization, visual data analysis and communication.

>Note: Here we keep the visualization code rather lightweight and do not apply all geospatial data plotting best practices (e.g., scale bars, custom legends, detailed layout). For publication-quality figures, additional goal-specific styling and layout adjustments would be needed. Virtually anything is possible in terms of map design and plot customization using Python.

# 1. Preparation
## 1.1 Import Libraries

In Python, several libraries are available for visualizing (geospatial) data; in this notebook, we use `Matplotlib` and `Cartopy`. In addition, `Plotly` enables the creation of interactive plots.

In [None]:
!pip install plotly
!pip install cartopy

In [None]:
import pandas as pd
import numpy as np
import os

import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation  
from matplotlib import cm

import cartopy.crs as ccrs
import cartopy.feature as cfeature

import plotly.express as px
import plotly.io as pio
import plotly.graph_objects as go

## 1.2 Directory to Save Visualizations

In [None]:
os.makedirs("../Plots", exist_ok=True)

# 2 Load Datasets

In [None]:
dataset_directory = "../Data/PANGAEA_orca_data/Orca_preprocessed.txt"
Orca = pd.read_csv(dataset_directory, sep="\t", encoding="utf-8")

dataset_directory = "../Data/PANGAEA_mastertrack_data/Mastertracks_preprocessed.txt"
Mastertracks = pd.read_csv(dataset_directory, sep="\t", encoding="utf-8")

dataset_directory = "../Data/868991_dataset_preprocessed.txt"
Whales_868991 = pd.read_csv(dataset_directory, sep="\t", encoding="utf-8")

dataset_directory = "../Data/868991_dataset_mastertrack.txt"
Mastertrack_868991 = pd.read_csv(dataset_directory, sep="\t", encoding="utf-8")

Preview each dataset: 

In [None]:
Orca.head()

In [None]:
Mastertracks.head()

In [None]:
Whales_868991.head()

In [None]:
Mastertrack_868991.head()

The downloaded "868991" dataset has its own geospatial extent that we can quickly check using the minimum and maximum latitude (north–south range) and longitude (east–west range):

In [None]:
print(Whales_868991["Latitude"].min())
print(Whales_868991["Latitude"].max())

In [None]:
print(Whales_868991["Longitude"].min())
print(Whales_868991["Longitude"].max())

# 3. Map of Recorded Orcinus Orca Sightings

## 3.1 Static Map

In [None]:
# Map setup
fig, ax = plt.subplots(figsize=(8, 8), subplot_kw={'projection': ccrs.NorthPolarStereo()})
ax.set_extent([-40, 40, 60, 90])  # Arctic region

# Add ocean and land background
ax.set_facecolor('#EAF6FF')                                           # ocean background
ax.add_feature(cfeature.LAND, facecolor='lightgray')                  # land background
ax.add_feature(cfeature.LAKES, facecolor='#EAF6FF', edgecolor='none') # lakes same as ocean background

# Plot all sightings as blue dots
plt.scatter(                 # or use ax.plot() and define marker style
    Orca['LONGITUDE'],
    Orca['LATITUDE'],
    s=20, color='blue', transform=ccrs.PlateCarree()
)

Here is the simple map extended by code to:
- Show graticule lines
- Get color by year and size by individuals
- Add a legend

In [None]:
# Map setup
fig, ax = plt.subplots(figsize=(8, 8), subplot_kw={'projection': ccrs.NorthPolarStereo()})
ax.set_extent([-40, 40, 60, 90], crs=ccrs.PlateCarree())  # Arctic region

# Add ocean and land background
ax.set_facecolor('#EAF6FF')
ax.add_feature(cfeature.LAND, facecolor='lightgray')
ax.add_feature(cfeature.LAKES, facecolor='#EAF6FF', edgecolor='none')

# Add graticule lines
ax.gridlines(draw_labels=False, linestyle='--', color='gray', alpha=0.5)

# Plot sightings: color = Year, size = Individuals
years = np.sort(Orca['Year'].dropna().unique())
cmap = plt.get_cmap('viridis', len(years))
size_scale = 10
for i, y in enumerate(years):
    d = Orca[Orca['Year'] == y]
    ax.scatter(
        d['LONGITUDE'], d['LATITUDE'],
        s=d['Individuals [#]'] * size_scale,
        color=cmap(i), edgecolor='k', linewidth=0.3, alpha=0.9,
        transform=ccrs.PlateCarree(), label=f'Year {y}'
    )

# Build one combined legend manually
    # Year (colors)
year_handles = [
    plt.Line2D([], [], marker='o', linestyle='',
               markerfacecolor=cmap(i), markeredgecolor='k',
               markersize=8, label=str(y))
    for i, y in enumerate(years)
]
leg1 = ax.legend(handles=year_handles, title="Year",
                 loc='upper left', bbox_to_anchor=(1.02, 1.0),
                 frameon=False, fontsize=8)

    # Individuals (sizes)
size_handles = [
    plt.scatter([], [], s=s*size_scale, facecolors='none',
                edgecolors='k', linewidths=0.8, label=f'{s} ind.')
    for s in [1, 5, 10]
]
leg2 = ax.legend(handles=size_handles, title="Individuals",
                 loc='upper left', bbox_to_anchor=(1.02, 0.60),
                 frameon=False, fontsize=8)

ax.add_artist(leg1)  # both legends

# Save
plt.savefig("../Plots/orca_map.png", dpi=300, bbox_inches="tight")

## 3.2 Interactive Map

In interactive maps, users can zoom, pan, and hover over points to explore the data dynamically. To go from the static Matplotlib map to the interactive Plotly version, we switch from `ax.scatter()` to `px.scatter_geo()`.

In [None]:
# Show figures inline in notebooks
pio.renderers.default = "notebook"

# Keep colors in sorted order for assignement to years and categorical legend
years = np.sort(Orca['Year'].astype(int).unique())
year_labels = [str(int(y)) for y in years]
Orca['Year'] = pd.Categorical(Orca['Year'].astype(int).astype(str),
                            categories=year_labels, ordered=True)

# Interactive scatter on a polar-style map
fig = px.scatter_geo(
    Orca,
    lat='LATITUDE', lon='LONGITUDE',
    color='Year',                          # color by (ordered) year
    size='Individuals [#]',                # bubble size by counts
    size_max=22,                           # similar visual scale as before
    projection='stereographic',            # polar view
    color_discrete_sequence=px.colors.sequential.Viridis,
    labels={'Individuals [#]': 'Individuals'},
    category_orders={'Year': year_labels}  # enforce legend order
)

# Map appearance (ocean/land colors + polar focus + simple graticule)
fig.update_geos(
    projection_rotation=dict(lat=90, lon=0),    # look at North Pole
    lataxis=dict(range=[60, 90], showgrid=True, gridcolor='gray'),
    lonaxis=dict(range=[-40, 40], showgrid=True, gridcolor='gray'),
    showland=True,  landcolor='lightgray',
    showocean=True, oceancolor='#EAF6FF',
    showlakes=True, lakecolor='#EAF6FF',
    coastlinecolor="rgba(0,0,0,0)",             # hide coastline
    bgcolor='white'
)

# Single combined legend area: keep year colors + add simple size "legend"
for s in [1, 5, 10]:
    fig.add_trace(go.Scattergeo(
        lon=[None], lat=[None], mode='markers',
        marker=dict(size=s, sizemode='area', color='white',
                    line=dict(color='black', width=1)),
        showlegend=True, name=f'{s} ind.',
        legendgroup='size', legendgrouptitle_text='Individuals' if s == 1 else None
    ))

fig.update_layout(
    legend_title_text='Year',                   # legend shows year colors + size group
    margin=dict(l=10, r=10, t=30, b=10)
)

fig.show()

# 4 Map Master Cruise Tracks
## 4.1 Static Map

The following code block creates a plot of the individual tracks shown with distinct colors.

In [None]:
# Map setup
fig, ax = plt.subplots(figsize=(8, 8), subplot_kw={'projection': ccrs.NorthPolarStereo()})
ax.set_extent([-40, 40, 40, 90], crs=ccrs.PlateCarree())

# Background
ax.set_facecolor('#EAF6FF')
ax.add_feature(cfeature.LAND, facecolor='lightgray')
ax.add_feature(cfeature.LAKES, facecolor='#EAF6FF', edgecolor='none')

# Graticules (lat/lon grid)
ax.gridlines(draw_labels=False, linestyle='--', color='gray', alpha=0.5)

# Plot tracks: color by TrackID
track_groups = Mastertracks.groupby('Event', sort=False)
cmap = plt.get_cmap('tab10', len(track_groups))

for i, (tid, d) in enumerate(track_groups):
    ax.plot(
        d['Longitude'], d['Latitude'],
        color=cmap(i), linewidth=1.5,
        transform=ccrs.PlateCarree(),
        label=str(tid)
    )

# Legend
leg = ax.legend(title="Event", loc='upper left',
                bbox_to_anchor=(1.02, 1.0), frameon=False, fontsize=8)

# Save
plt.savefig("../Plots/mastertracks_map.png", dpi=300, bbox_inches="tight")

## 4.2 Animation

Here we visualize the temporal component of a single Arctic expedition by animating the ships track over time. To turn the static plot into an animation, you replace the normal `ax.plot` call with an empty line and then update its data step-by-step using `FuncAnimation`:

In [None]:
# Map setup
fig, ax = plt.subplots(figsize=(8, 8), subplot_kw={'projection': ccrs.NorthPolarStereo()})
ax.set_extent([-40, 40, 40, 90], crs=ccrs.PlateCarree())

# Background
ax.set_facecolor('#EAF6FF')
ax.add_feature(cfeature.LAND, facecolor='lightgray')
ax.add_feature(cfeature.LAKES, facecolor='#EAF6FF', edgecolor='none')

# Graticules (lat/lon grid)
ax.gridlines(draw_labels=False, linestyle='--', color='gray', alpha=0.5)

# Choose which track to animate
track_id = "PS85-track"         # change to any value from Mastertracks['Event']
d = (Mastertracks[Mastertracks['Event'] == track_id]
     .sort_values('Date/Time')  # ensure temporal order
     .reset_index(drop=True))

# Empty line to be animated
line, = ax.plot([], [], color='darkred', linewidth=1.8,
                transform=ccrs.PlateCarree(), label=track_id)

# Animation: reveal the line over time
def animate(i):
    line.set_data(d['Longitude'][:i], d['Latitude'][:i])
    return line,

anim = FuncAnimation(fig, animate, frames=len(d), interval=100, blit=True)

# Save
anim.save("../Plots/mastertracks_animation.gif", writer='pillow')

# 5 Exercise: Follow Whale Sightings Along a Cruise Track
Plot a static map that overlays whale sightings from dataset 868991 with the corresponding master cruise track. This helps to visualize where whales were observed relative to the route of the ship. Further analyses could help spot potential hotspots, or links to environmental conditions along the transect.

1. Preprocess datetime: Resample "Mastertrack_868991" to daily (see Sect. 3.3 in notebook 2 - data preprocessing).

2. We checked the geospatial extend of the "868991" dataset in Sect. 2. It is not located in the Artic region but extends from Cape Town to Bremerhaven (compare notebook 1 - download pangaea data). Therefore, a polar projection like `ccrs.NorthPolarStereo()` is not appropriate to visualize the data. Instead, we can use a rectangular lat/lon grid like `ccrs.PlateCarree()`. Adapt the **map setup** from Sect. 3.1 by replacing the polar projection with `ccrs.PlateCarree()` and choosing an appropriate `ax.set_extent(...)` so that the map covers the region between South Africa and Northern Europe. Add ocean and land background as well as graticule lines to the map as in the previous examples. 

3. Extend your map by plotting both the whale sightings and the mastertrack in the same figure. Use `ax.scatter()` to show the whale observations as blue dots scaled by the number of whales, and `ax.plot()` to draw the route. You should add both plotting commands below your existing map setup.

# 6 Multiple Artists in Animation

We can transform the static map into an animation that reveals the mastertrack over time and shows whale sightings that have occurred up to the current time step. We animate by time order and plot whale sightings as whale icons.

Using a custom icon is nothing that is regularly done in scientific visualizations but we show it here to demonstrate how flexible Python is for creating the exact visualization of your imagination. For using an own icon instead of scatter dots or other Python markers, we load the PNG with the icon once, place that image at each sighting (via `OffsetImage` + `AnnotationBbox`), and draw the icons while the track line updates. 

Specifically for using an own icon:

In [None]:
from matplotlib.offsetbox import OffsetImage, AnnotationBbox
import matplotlib.image as mpimg

# list to track icon artists
icon_artists = []         

# Load custom whale icon
whale_img = mpimg.imread("../Images/Whale_Icon.png")

Relative to the previous animation example in Sect. 4.2, comments in the code below are limited to what varies:
- The two frame-updating artists (track line and whale icons) instead of only one (line) artist.
- The custom whale icon.

In [None]:
fig, ax = plt.subplots(figsize=(8, 8), subplot_kw={'projection': ccrs.PlateCarree()})
ax.set_extent([-80, 30, -60, 70], crs=ccrs.PlateCarree())

ax.set_facecolor('#EAF6FF')
ax.add_feature(cfeature.LAND, facecolor='lightgray')
ax.add_feature(cfeature.LAKES, facecolor='#EAF6FF', edgecolor='none')

ax.gridlines(draw_labels=False, linestyle='--', color='gray', alpha=0.5)

size_scale = 10

# Need to compute icon zooms manually, since AnnotationBbox doesn’t have a s= parameter:
whale_sizes = Whales_868991['Whales'] * size_scale 

line, = ax.plot([], [], color='darkred', linewidth=1.8,
                transform=ccrs.PlateCarree(), label="Route")

# Make both datetime columns true datetimes
Mastertrack_868991['Date/Time'] = pd.to_datetime(Mastertrack_868991['Date/Time'])
Whales_868991['Date/Time'] = pd.to_datetime(Whales_868991['Date/Time'])

# Drive animation by time (aligned for both datasets)
times = Mastertrack_868991['Date/Time'].to_numpy()

def animate(i):
    
    # To know the current time:
    t = times[i]

    # Update the ship/track line so it "grows" over time: we take all points up to index i and show them on the line
    line.set_data(
        Mastertrack_868991['Longitude'].iloc[:i+1],
        Mastertrack_868991['Latitude'].iloc[:i+1]
    )

    # Each frame we remove the icons from the previous step, so we don’t stack them on top of each other
    for a in icon_artists:
        a.remove()
    icon_artists.clear()

    # Pick whale sightings that have happened so far (time <= t)
    seen = Whales_868991[Whales_868991['Date/Time'] <= t]

    # Work out icon sizes (zoom level) from the "whale_sizes" values
    sizes_now = whale_sizes.reindex(seen.index).astype(float).to_numpy()
    zooms = 0.02 + 0.002 * np.sqrt(sizes_now)

    # Draw each whale as an image at its (lon, lat) position
    for (lon, lat), z in zip(seen[['Longitude','Latitude']].to_numpy(), zooms):
        ab = AnnotationBbox(
            OffsetImage(whale_img, zoom=z),       # the picture + zoom
            (lon, lat),                           # where to place it
            xycoords=ccrs.PlateCarree()._as_mpl_transform(ax),  # use lon/lat coords
            frameon=False                         # no box around the picture
        )
        ax.add_artist(ab)        # actually add the icon to the map
        icon_artists.append(ab)  # remember it so we can remove it next time

    ax.set_title(f"868991 — {pd.to_datetime(t).strftime('%Y-%m-%d %H:%M')}")

    # Return BOTH artists (that changed)
    return [line, *icon_artists]

# Build animation (use as many frames as the larger dataset (ship track or whale sightings), 
# so the animation runs until both are fully shown
frames = max(len(Mastertrack_868991), len(Whales_868991))
anim = FuncAnimation(fig, animate, frames=frames, interval=100, blit=False)

anim.save("../Plots/868991_animation.gif", writer='pillow')