---
title: "Thames Barrier"
---




This document provides a complete analysis workflow for the Thames barrier system, combining barrier closure analysis, tide gauge data processing, tidal analysis, and visualization. The analysis covers the period from 1986 to 2024 and includes:

1. **Barrier Closure Analysis**: Loading and analyzing past barrier closure dates, counting closures per water year, and calculating statistics
2. **Tide Gauge Data Processing**: Loading and combining tide gauge data from multiple sources
3. **Tidal Analysis**: Performing harmonic tidal decomposition and generating predicted astronomical tides
4. **Visualization**: Creating comprehensive visualizations showing predicted high waters, observed water levels, and barrier closures

Water years run from July 1 to June 30, which is more appropriate for coastal flood analysis than calendar years.

---

# Barrier Closures

This section analyzes past closures of the Thames barrier. The analysis:

1. Loads barrier closure dates from an Excel file
2. Counts closures per water year (July 1 to July 1)
3. Calculates statistics (min, mean, max, total closures)
4. Creates a bar chart visualization


In [None]:
#| label: setup-master1-thames
#| include: true
import pandas as pd
import numpy as np
import pickle
import matplotlib.pyplot as plt
from datetime import datetime
import os

# Configuration - years from 1986 to 2024
Y = list(range(1986, 2025))  # [1986, 1987, ..., 2024]

# Output directory
output_dir = 'output'
os.makedirs(output_dir, exist_ok=True)

print(f"Analysis configuration:")
print(f"  Water years: {Y[0]}/{str(Y[0]+1)[2:]} to {Y[-1]}/{str(Y[-1]+1)[2:]} ({len(Y)} years)")

In [None]:
#| label: load-closures-thames
#| include: true

# 1. Observed closure data - Thames Barrier
data_dir = '../2_DATA/1_BARRIER_CLOSURES'
excel_file = os.path.join(data_dir, 'Thames_Barrier_Past_Closures_2024.xlsx')
df = pd.read_excel(excel_file, sheet_name='Closures')

# Extract closure dates from column 2 (index 1)
closure_dates_col = df.iloc[:, 1]  # Column 2 (0-indexed)

# Convert dates: pandas.read_excel automatically converts Excel dates to datetime objects
OCD = [d.to_pydatetime() if isinstance(d, pd.Timestamp) else d 
       for d in closure_dates_col if pd.notna(d)]

print(f"Loaded {len(OCD)} closure dates")
if len(OCD) > 0:
    print(f"  First closure: {OCD[0]}")
    print(f"  Last closure: {OCD[-1]}")

## Analysis


In [None]:
#| label: count-closures-thames
#| include: true

# 2. Count closures per water year (July 1 to July 1)
OCS = []  # Will store [index, year, year+1, num_closures]
YT = []   # Will store year labels like "1986/87"

for co, y in enumerate(Y, start=1):
    # Find closures between July 1 of year y and July 1 of year y+1
    start_date = datetime(y, 7, 1, 0, 0, 0)
    end_date = datetime(y + 1, 7, 1, 0, 0, 0)
    
    # Count closures in this water year
    closures_in_year = [d for d in OCD if start_date <= d < end_date]
    num_closures = len(closures_in_year)
    
    OCS.append([co, y, y + 1, num_closures])
    
    # Create year label (e.g., "1986/87")
    year_str = str(y + 1)
    YT.append(f"{y}/{year_str[2:]}")

OCS = np.array(OCS)

print(f"\nClosures per water year calculated for {len(OCS)} years")
print(f"  Total closures: {np.sum(OCS[:, 3])}")

In [None]:
#| label: statistics-thames
#| include: true

# 3. Calculate statistics
E = np.array([
    np.min(OCS[:, 3]),      # Minimum closures per year
    np.mean(OCS[:, 3]),     # Mean closures per year
    np.max(OCS[:, 3]),      # Maximum closures per year
    np.sum(OCS[:, 3])       # Total closures
])

print(f"\nClosure statistics:")
print(f"  Min per year: {E[0]:.1f}")
print(f"  Mean per year: {E[1]:.2f}")
print(f"  Max per year: {E[2]:.1f}")
print(f"  Total: {E[3]:.0f}")

In [None]:
#| label: save-master1-thames
#| include: true

# 4. Save data
output_file = os.path.join(output_dir, 'mast1_thames.pkl')
with open(output_file, 'wb') as f:
    pickle.dump({
        'OCD': OCD,
        'OCS': OCS,
        'E': E,
        'YT': YT
    }, f)
print(f"\nData saved to {output_file}")

## Visualization

This bar chart shows the number of barrier closures per water year from 1986/87 to 2024/25. The Thames Barrier closes when water levels exceed a threshold, typically during storm surge events.


In [None]:
#| label: plot-closures-thames
#| fig-cap: Number of Thames Barrier closures per water year (July 1 to June 30), 1986-2024

fig, ax = plt.subplots(figsize=(12, 6))
ax.bar(OCS[:, 0], OCS[:, 3], color='black')
ax.set_xlim(0.2, len(OCS) + 0.8)
ax.set_xticks(range(1, len(OCS) + 1))
ax.set_xticklabels(YT, rotation=90)
# Note: y-axis limit may need adjustment based on actual data
max_closures = int(np.max(OCS[:, 3])) + 1
ax.set_ylim(0, max_closures)
ax.set_yticks(range(0, max_closures + 1, max(1, max_closures // 10)))
ax.set_ylabel('Number of closures', fontweight='bold', fontsize=18)
ax.set_title('Thames Barrier', fontweight='bold', fontsize=18)
ax.grid(True, alpha=0.3)
ax.tick_params(labelsize=16)

plt.tight_layout()
plt.show()

# Save figure
fig_file = os.path.join(output_dir, 'master1_thames_closures.png')
plt.savefig(fig_file, dpi=150, bbox_inches='tight')
print(f"\nFigure saved to {fig_file}")

The following table lists all barrier closure dates, which correspond to storm events that triggered the Thames Barrier.


In [None]:
#| label: table-storms-thames
#| tbl-cap: Complete list of Thames Barrier closures (storm events), 1986-2024

# Create DataFrame with closure dates
storms_df = pd.DataFrame({
    'Closure Date': [d.strftime('%Y-%m-%d %H:%M') if isinstance(d, datetime) else str(d) for d in OCD],
    'Year': [d.year if isinstance(d, datetime) else None for d in OCD],
    'Month': [d.strftime('%B') if isinstance(d, datetime) else None for d in OCD],
    'YearMonth': [(d.year, d.month) if isinstance(d, datetime) else None for d in OCD]
})

# Add water year information
def get_water_year(date):
    """Determine water year (July 1 to June 30)"""
    if isinstance(date, datetime):
        if date.month >= 7:
            return f"{date.year}/{str(date.year + 1)[2:]}"
        else:
            return f"{date.year - 1}/{str(date.year)[2:]}"
    return None

storms_df['Water Year'] = [get_water_year(d) for d in OCD]

# Create index that groups storms by year and month
# Storms in the same year and month get the same index
unique_year_months = storms_df['YearMonth'].unique()
year_month_to_index = {ym: idx + 1 for idx, ym in enumerate(sorted(unique_year_months))}
storms_df['Event'] = storms_df['YearMonth'].map(year_month_to_index)

# Reorder columns (Index first, then remove YearMonth helper column)
storms_df = storms_df[['Event', 'Closure Date', 'Water Year', 'Year', 'Month']]

# Display table
print(f"\nTotal number of closures: {len(storms_df)}")
storms_df

---

# Tide Gauge Data Processing

This section loads and processes raw tide gauge data from the Thames estuary. The analysis uses the **Sheerness tide gauge** data from the GESLA4 (Global Extreme Sea Level Analysis) dataset, which is located near the Thames Barrier at coordinates 51.445639°N, 0.743361°E. The Sheerness gauge provides high-quality data from the British Oceanographic Data Centre (BODC) covering the period from 1952 to 2025.

The analysis:

1. Loads data from the GESLA4 format file (`sheerness-she-gbr-bodc`)
2. Parses the GESLA4 format (Date, Time, Sea level, QC flag, Use flag)
3. Filters data to the analysis period (1986-2023) and quality-controlled values
4. Interpolates the data to 10-minute intervals to match the analysis requirements
5. Creates a visualization of the complete water level time series

The GESLA4 format uses Admiralty Chart Datum (ACD) as the vertical reference. The data is provided at 15-minute intervals in the original file, which is interpolated to 10-minute intervals for consistency with the Eastern Scheldt analysis workflow. Missing values are indicated by -99.9999 in the GESLA4 format and are filtered out during processing.


In [None]:
#| label: setup-master2-thames
#| include: true

import numpy as np
import pandas as pd
import pickle
import matplotlib.pyplot as plt
from datetime import datetime
import os
import json

# Configuration
# Use GESLA4 data from RTides directory
gesla_dir = '../../RTides/data/GESLA4_ALL'
data_dir = '../2_DATA/2_TIDE_GAUGE/TB'
output_dir = 'output'

# Create output directory if it doesn't exist
os.makedirs(output_dir, exist_ok=True)

print(f"GESLA4 data directory: {gesla_dir}")
print(f"Output directory: {output_dir}")

This section loads the Sheerness tide gauge data from the GESLA4 dataset. The GESLA4 format is a standardized global sea level dataset that includes quality control flags and metadata. The Sheerness gauge is located on the Isle of Sheppey in the Thames estuary, approximately 50 km downstream from the Thames Barrier, making it an appropriate reference for barrier closure analysis.

The data loading process:
- Reads the GESLA4 file format, skipping header lines (starting with `#`)
- Parses date/time strings and converts them to datetime objects
- Extracts sea level values (in meters) and quality control flags
- Filters to only include data with `use_flag == 1` (recommended for analysis)
- Removes missing values (indicated by -99.9999)
- Filters to the analysis period (1986-2023)
- Interpolates from the original 15-minute intervals to 10-minute intervals using nearest-neighbor interpolation with a 15-minute tolerance

The resulting time series provides continuous water level data at 10-minute intervals for the entire analysis period.


In [None]:
#| label: load-tide-gauge-thames
#| include: true

# Load GESLA4 format tide gauge data for Thames (Sheerness gauge)
file_in = os.path.join(gesla_dir, 'sheerness-she-gbr-bodc')

if not os.path.exists(file_in):
    raise FileNotFoundError(
        f"Required file {file_in} not found. "
        f"Please ensure the GESLA4 data file exists."
    )

# Read GESLA4 file
# Format: Date (yyyy/mm/dd), Time (hh:mm:ss), Sea level (m), QC flag, Use flag
# Missing values are -99.9999
data_lines = []
with open(file_in, 'r') as f:
    for line in f:
        # Skip header lines (start with #)
        if line.startswith('#'):
            continue
        # Skip empty lines
        if line.strip():
            data_lines.append(line.strip().split())

# Parse data
dates = []
times = []
water_levels = []
use_flags = []

for line in data_lines:
    if len(line) >= 5:
        date_str = line[0]  # yyyy/mm/dd
        time_str = line[1]  # hh:mm:ss
        wl_str = line[2]    # sea level (m)
        qc_flag = line[3]  # QC flag
        use_flag = line[4] # use flag
        
        # Parse datetime
        dt_str = f"{date_str} {time_str}"
        dt = datetime.strptime(dt_str, '%Y/%m/%d %H:%M:%S')
        dates.append(dt)
        
        # Parse water level
        try:
            wl = float(wl_str)
            # Check for missing values (-99.9999)
            if wl == -99.9999 or abs(wl) > 100:  # Also filter unrealistic values
                wl = np.nan
        except ValueError:
            wl = np.nan
        
        water_levels.append(wl)
        use_flags.append(int(use_flag))

# Convert to numpy arrays
dates = np.array(dates)
water_levels = np.array(water_levels)

# Filter to use only data with use_flag == 1
use_mask = np.array(use_flags) == 1
dates_clean = dates[use_mask]
water_levels_clean = water_levels[use_mask]

# Filter to analysis period: 1986-2023
start_date = datetime(1986, 1, 1, 0, 0, 0)
end_date = datetime(2023, 12, 31, 23, 59, 59)
mask_period = (dates_clean >= start_date) & (dates_clean <= end_date)
dates_period = dates_clean[mask_period]
water_levels_period = water_levels_clean[mask_period]

# Create 10-minute interval time series for the period
TSP = pd.date_range(start=start_date, end=end_date, freq='10min').to_pydatetime()

# Interpolate water levels to 10-minute intervals using pandas
# Create Series with original timestamps
wl_series = pd.Series(water_levels_period, index=pd.DatetimeIndex(dates_period))
# Reindex to 10-minute intervals and interpolate
wl_series_10min = wl_series.reindex(pd.DatetimeIndex(TSP), method='nearest', 
                                     tolerance=pd.Timedelta(minutes=15))
WLP = wl_series_10min.values

print(f"Loaded Sheerness tide gauge (GESLA4):")
print(f"  Original data points: {len(dates):,}")
print(f"  Data in period 1986-2023: {len(dates_period):,}")
print(f"  Interpolated to 10-minute intervals: {len(TSP):,} points")
print(f"  Start: {TSP[0]}")
print(f"  End: {TSP[-1]}")
print(f"  Valid data: {np.sum(~np.isnan(WLP)):,} points ({100*np.sum(~np.isnan(WLP))/len(WLP):.1f}%)")

## Save Results (Part 2)

Save the combined time series to a pickle file for use in subsequent analyses.


In [None]:
#| label: save-master2-thames
#| include: true

# Save data
output_file = os.path.join(output_dir, 'mast2_thames.pkl')
with open(output_file, 'wb') as f:
    pickle.dump({
        'TSP': TSP,
        'WLP': WLP,
        'TSCHECK': TSP  # Reference time series
    }, f)
print(f"\nData saved to {output_file}")

## Visualization (Part 2)

This figure shows the combined tide gauge water level time series from 1986 to 2023 at 10-minute intervals. The data shows tidal variations, storm surges, and long-term water level patterns over the period.


In [None]:
#| label: plot-time-series-thames
#| fig-cap: Combined tide gauge water level time series for the Thames, 1986-2023. Data at 10-minute intervals.

fig, ax = plt.subplots(figsize=(14, 6))
ax.plot(TSP, WLP, 'b', linewidth=0.5)
ax.set_xlabel('Date', fontweight='bold', fontsize=20)
ax.set_ylabel('Water level (m NAP)', fontweight='bold', fontsize=20)
ax.grid(True, alpha=0.3)
ax.tick_params(labelsize=16)

plt.tight_layout()
plt.show()

# Save figure
fig_file = os.path.join(output_dir, 'master2_thames_tide_gauge.png')
plt.savefig(fig_file, dpi=150, bbox_inches='tight')
print(f"\nFigure saved to {fig_file}")

---

# Tidal Analysis

This section performs harmonic tidal analysis on tide gauge data from the Thames. The analysis:

1. Calculates data quality for each year (1986-2023)
2. Performs harmonic tidal decomposition using `pytides` (replacing MATLAB's `t_tide`)
3. Generates tidal predictions for all years
4. Compares observed water levels with predicted astronomical tides

The predicted tides represent only the astronomical component, while observed water levels include both tides and meteorological effects (storm surges, wind setup, etc.).

## Setup and Configuration


In [None]:
#| label: setup-master3-thames
#| include: true

import numpy as np
import pandas as pd
import pickle
import matplotlib.pyplot as plt
from datetime import datetime
from pytides.tide import Tide
import os

# Configuration
Y = list(range(1986, 2024))  # Years 1986 to 2023
th = 60  # Data quality threshold (percentage)
lat = 51.5  # Latitude for tidal analysis (Thames Barrier approximate location)
output_dir = 'output'

# Create output directory if it doesn't exist
os.makedirs(output_dir, exist_ok=True)

print(f"Analysis configuration:")
print(f"  Years: {Y[0]} to {Y[-1]} ({len(Y)} years)")
print(f"  Data quality threshold: {th}%")
print(f"  Latitude: {lat}°")

## Load Tide Gauge Data


In [None]:
#| label: load-data-master3-thames
#| include: true

print("Loading tide gauge data...")
mast2_file = os.path.join(output_dir, 'mast2_thames.pkl')
if not os.path.exists(mast2_file):
    raise FileNotFoundError(
        f"Required file {mast2_file} not found. "
        "Please run Part 2 (Tide Gauge Data Processing) first to generate the required data file."
    )
with open(mast2_file, 'rb') as f:
    data = pickle.load(f)
    TSP = data['TSP']
    WLP = data['WLP']

# Convert to numpy arrays if needed
TSP = np.array(TSP)
WLP = np.array(WLP)

print(f"Loaded {len(TSP):,} data points")
print(f"  Start: {TSP[0]}")
print(f"  End: {TSP[-1]}")
print(f"  Valid data: {np.sum(~np.isnan(WLP)):,} points ({100*np.sum(~np.isnan(WLP))/len(WLP):.1f}%)")

## Calculate Data Quality Per Year


In [None]:
#| label: data-quality-thames
#| include: true

print("\nCalculating data quality per year...")
DQ = []

for y in Y:
    # Find data points in this calendar year (Jan 1 to Jan 1 next year)
    start_date = datetime(y, 1, 1, 0, 0, 0)
    end_date = datetime(y + 1, 1, 1, 0, 0, 0)
    mask = (TSP >= start_date) & (TSP < end_date)
    j = np.where(mask)[0]
    
    if len(j) == 0:
        quality = 0
    else:
        # Count non-NaN values
        k = np.where(~np.isnan(WLP[j]))[0]
        quality = (len(k) / len(j)) * 100
    
    # Initialize with 3 columns: [year, quality, reference_year]
    # reference_year will be filled in during tidal analysis
    DQ.append([y, quality, np.nan])

DQ = np.array(DQ)

# Display summary statistics
print(f"\nData quality summary:")
print(f"  Range: {np.min(DQ[:, 1]):.1f}% to {np.max(DQ[:, 1]):.1f}%")
print(f"  Mean: {np.mean(DQ[:, 1]):.1f}%")
print(f"  Years with quality >= {th}%: {np.sum(DQ[:, 1] >= th)}/{len(Y)}")

## Tidal Analysis and Prediction

For each target year, the analysis determines a reference year, extracts data, performs harmonic decomposition, and generates predictions.


In [None]:
#| label: tidal-analysis-thames
#| include: true

print("\nPerforming tidal analysis and prediction...")
TSP2 = []
TIP = []

for co, y in enumerate(Y):
    print(f"  Processing year {y} ({co+1}/{len(Y)})...")
    
    # Determine reference year
    if DQ[co, 1] >= th:
        # Use target year as reference
        yr = y
        DQ[co, 2] = yr
    else:
        # Find nearest year with quality >= threshold
        # Calculate distances from target year
        distances = np.abs(DQ[:, 0] - y)
        
        # Filter to only years with quality >= threshold
        good_mask = DQ[:, 1] >= th
        
        if np.any(good_mask):
            # Get distances for good years only
            good_distances = distances[good_mask]
            good_indices = np.where(good_mask)[0]
            
            # Find nearest good year
            nearest_idx = good_indices[np.argmin(good_distances)]
            yr = int(DQ[nearest_idx, 0])
        else:
            # Fallback: use year with best quality
            best_idx = np.argmax(DQ[:, 1])
            yr = int(DQ[best_idx, 0])
        
        DQ[co, 2] = yr
    
    # Extract reference year data (1 year + 1 day for analysis)
    ref_start = datetime(yr, 1, 1, 0, 0, 0)
    ref_end = datetime(yr + 1, 1, 2, 0, 0, 0)  # +1 day
    
    mask_ref = (TSP >= ref_start) & (TSP < ref_end)
    j_ref = np.where(mask_ref)[0]
    
    if len(j_ref) == 0:
        print(f"    Warning: No data found for reference year {yr}")
        continue
    
    # Get water levels and times for reference year
    WLP_ref = WLP[j_ref]
    TSP_ref = TSP[j_ref]
    
    # Remove NaN values for pytides analysis
    valid_mask = ~np.isnan(WLP_ref)
    WLP_clean = WLP_ref[valid_mask]
    TSP_clean = TSP_ref[valid_mask]
    
    if len(WLP_clean) < 1000:  # Need sufficient data for analysis
        print(f"    Warning: Insufficient data for reference year {yr} ({len(WLP_clean)} points)")
        continue
    
    # Perform tidal analysis using pytides
    try:
        tide_model = Tide.decompose(
            heights=WLP_clean,
            t=TSP_clean
        )
        print(f"    Analyzed {len(WLP_clean):,} points from reference year {yr}")
    except Exception as e:
        print(f"    Error in tidal analysis for year {y}: {e}")
        continue
    
    # Generate prediction timestamps for target year (10-minute intervals)
    pred_start = datetime(y, 1, 1, 0, 0, 0)
    pred_end = datetime(y, 12, 31, 23, 50, 0)
    tsp2 = pd.date_range(start=pred_start, end=pred_end, freq='10min').to_pydatetime()
    
    # Predict tides
    try:
        tip = tide_model.at(tsp2)
        TIP.extend(tip)
        TSP2.extend(tsp2)
        print(f"    Predicted {len(tip):,} points for year {y}")
    except Exception as e:
        print(f"    Error in prediction for year {y}: {e}")
        continue

# Convert to numpy arrays
TSP2 = np.array(TSP2)
TIP = np.array(TIP)

print(f"\nTotal predictions: {len(TIP):,} points")
if len(TIP) > 0:
    print(f"  Start: {TSP2[0]}")
    print(f"  End: {TSP2[-1]}")

## Save Results (Part 3)


In [None]:
#| label: save-master3-thames
#| include: true

output_file = os.path.join(output_dir, 'mast3_thames.pkl')
with open(output_file, 'wb') as f:
    pickle.dump({
        'TSP': TSP,
        'WLP': WLP,
        'TIP': TIP,
        'TSP2': TSP2,
        'DQ': DQ
    }, f)
print(f"Data saved to {output_file}")

## Visualization (Part 3)

This figure compares observed water levels (blue) with predicted astronomical tides (red) from harmonic analysis. The predicted tides represent the astronomical component only, while observed water levels include both tides and meteorological effects.


In [None]:
#| label: plot-comparison-thames
#| fig-cap: Comparison of observed water levels (blue) and predicted astronomical tides (red) for the Thames, 1986-2023

fig, ax = plt.subplots(figsize=(14, 6))
ax.plot(TSP2, TIP, 'r', linewidth=0.5, label='Predicted tide', alpha=0.7)
ax.plot(TSP, WLP, 'b', linewidth=0.5, label='Observed water level', alpha=0.7)
ax.set_xlabel('Date', fontweight='bold', fontsize=16)
ax.set_ylabel('Level (m NAP)', fontweight='bold', fontsize=16)
ax.legend(fontsize=14)
ax.grid(True, alpha=0.3)
ax.tick_params(labelsize=14)

plt.tight_layout()
plt.show()

# Save figure
fig_file = os.path.join(output_dir, 'master3_thames_tidal_analysis.png')
plt.savefig(fig_file, dpi=150, bbox_inches='tight')
print(f"\nFigure saved to {fig_file}")

---

# Part 4: Predicted High Waters and Barrier Closures

## Introduction

This section plots time-series of predicted high waters and barrier closures for the Thames. The analysis:

1. Loads barrier closure dates and tide gauge data from previous analyses
2. Creates a visualization showing:
   - Barrier closure dates as vertical dashed lines
   - Predicted astronomical tides (TIP)
   - Observed water levels (WLP)
   - The difference between observed and predicted levels (surge component)

This visualization helps identify when barrier closures occurred relative to predicted high waters and actual water levels.

## Setup and Configuration


In [None]:
#| label: setup-master4-thames
#| include: true

import numpy as np
import pandas as pd
import pickle
import matplotlib.pyplot as plt
from datetime import datetime
import os

# Output directory
output_dir = 'output'
os.makedirs(output_dir, exist_ok=True)

print(f"Analysis configuration:")
print(f"  Output directory: {output_dir}")

## Load Data


In [None]:
#| label: load-data-master4-thames
#| include: true

# Load barrier closure data from master1
print("Loading barrier closure data...")
mast1_file = os.path.join(output_dir, 'mast1_thames.pkl')
if not os.path.exists(mast1_file):
    raise FileNotFoundError(
        f"Required file {mast1_file} not found. "
        "Please run Part 1 (Barrier Closure Analysis) first to generate the required data file."
    )
with open(mast1_file, 'rb') as f:
    data1 = pickle.load(f)
    OCD = data1['OCD']  # Observed closure dates

# Load tide gauge and tidal prediction data from master3
print("Loading tide gauge and tidal prediction data...")
mast3_file = os.path.join(output_dir, 'mast3_thames.pkl')
if not os.path.exists(mast3_file):
    raise FileNotFoundError(
        f"Required file {mast3_file} not found. "
        "Please run Part 1, Part 2, and Part 3 first to generate the required data files."
    )
with open(mast3_file, 'rb') as f:
    data3 = pickle.load(f)
    TSP = data3['TSP']  # Time series for observed water levels
    TSP2 = data3['TSP2']  # Time series for predictions
    TIP = data3['TIP']  # Predicted tides
    WLP = data3['WLP']  # Observed water levels

# Convert to numpy arrays if needed
OCD = np.array(OCD)
TSP = np.array(TSP)
TSP2 = np.array(TSP2)
TIP = np.array(TIP)
WLP = np.array(WLP)

print(f"\nData loaded:")
print(f"  Closure dates: {len(OCD)} closures")
if len(OCD) > 0:
    print(f"    First closure: {OCD[0]}")
    print(f"    Last closure: {OCD[-1]}")
print(f"  Time series points (observed): {len(TSP):,}")
print(f"  Time series points (predictions): {len(TSP2):,}")
print(f"  Predicted tides: {len(TIP):,}")
print(f"  Observed water levels: {len(WLP):,}")

## Figure - Predicted High Waters and Closures

This figure shows predicted astronomical tides (red), observed water levels (blue), the difference between them (green), and barrier closure dates (vertical dashed magenta lines). The difference (WLP-TIP) represents the non-tidal component, primarily storm surge.


In [None]:
#| label: plot-predicted-high-waters-thames
#| fig-cap: Predicted high waters, observed water levels, and barrier closures for the Thames

fig, ax = plt.subplots(figsize=(14, 8))

# Plot vertical lines for barrier closure dates
for i in range(len(OCD)):
    ax.axvline(OCD[i], color='m', linestyle='--', linewidth=2, alpha=0.7)

# Check if we have predictions
if len(TIP) > 0 and len(TSP2) > 0:
    # Plot predicted tides (TIP) - use TSP2 for time axis
    ax.plot(TSP2, TIP, 'r', linewidth=1, label='Predicted tide (TIP)', alpha=0.8)
    
    # For observed water levels, plot the full time series
    ax.plot(TSP, WLP, 'b', linewidth=1, label='Observed water level (WLP)', alpha=0.8)
    
    # For surge calculation, interpolate TIP to match TSP timestamps using pandas
    # Find overlapping time period
    mask_overlap = (TSP >= TSP2[0]) & (TSP <= TSP2[-1])
    if np.any(mask_overlap):
        TSP_overlap = TSP[mask_overlap]
        WLP_overlap = WLP[mask_overlap]
        # Use pandas for interpolation
        df_tip = pd.Series(TIP, index=pd.DatetimeIndex(TSP2))
        # Reindex to overlap timestamps and interpolate
        tip_interp = df_tip.reindex(pd.DatetimeIndex(TSP_overlap), method='nearest', 
                                    tolerance=pd.Timedelta(minutes=15))
        # Calculate surge where we have both values
        valid_mask = ~tip_interp.isna()
        if np.any(valid_mask):
            surge = WLP_overlap[valid_mask] - tip_interp.values[valid_mask]
            ax.plot(TSP_overlap[valid_mask], surge, 'g', 
                   linewidth=1, label='Surge (WLP - TIP)', alpha=0.8)
else:
    # If no predictions, just plot observed water levels
    print("Warning: No tidal predictions available. Plotting only observed water levels.")
    ax.plot(TSP, WLP, 'b', linewidth=1, label='Observed water level (WLP)', alpha=0.8)

# Formatting
ax.grid(True, alpha=0.3)
ax.set_xlabel('Date', fontweight='bold', fontsize=16)
ax.set_ylabel('Level (m)', fontweight='bold', fontsize=16)
ax.set_ylim(-3, 4)
ax.legend(fontsize=14, loc='best')
ax.tick_params(labelsize=14)

plt.tight_layout()
plt.show()

# Save figure
fig_file = os.path.join(output_dir, 'master4_thames_predicted_high_waters.png')
plt.savefig(fig_file, dpi=150, bbox_inches='tight')
print(f"\nFigure saved to {fig_file}")

---

# Summary

This complete analysis workflow has processed data from 1986 to 2024 for the Thames barrier system. Key results from each part:

## Part 1: Barrier Closure Analysis
- **Total closures**: Recorded across water years 1986/87 to 2024/25
- **Annual statistics**: Minimum, mean, and maximum closures per water year
- **Temporal patterns**: Variability in closure frequency over time

## Part 2: Tide Gauge Data Processing
- **Combined time series**: 1986-2023 at 10-minute intervals
- **Data sources**: Available Thames tide gauge measurements
- **Data quality**: Percentage of valid data points across the time series

## Part 3: Tidal Analysis
- **Data quality**: Calculated for each year to determine suitable reference years
- **Harmonic analysis**: Performed using `pytides` (Python equivalent of MATLAB's `t_tide`)
- **Predictions**: Generated at 10-minute intervals for the entire period
- **Surge calculation**: The difference between observed and predicted water levels represents the non-tidal component (surge)

## Part 4: Visualization
- **Closure timing**: When closures occurred relative to predicted high waters
- **Surge contribution**: The non-tidal component shows how much storm surge contributed to water levels
- **Tidal vs. meteorological effects**: The difference between observed and predicted levels highlights meteorological forcing

The complete workflow provides a foundation for understanding barrier closure patterns, water level variability, and storm surge dynamics in the Thames estuary. The results can be used for further analysis of individual closure events, surge characteristics, and long-term trends.