# Reduce Pollution

This notebook outlines the general workflow for the data within the [Reduce Pollution](https://oceancentral.org/track/reduce-pollution) page of the Ocean Central website.

# Plastic Pollution

## Figure 1

<p align="center">
  <img src="Figs/reduce_pollution_plastic_1.png" style="width:50%;">
</p>

Data were obtained from OECD (2022) – processed by Our World in Data and can be found [here](https://ourworldindata.org/grapher/plastic-waste-accumulated-in-oceans).

## Figure 2

<p align="center">
  <img src="Figs/reduce_pollution_plastic_2.png" style="width:50%;">
</p>

Data were obtained from Meijer et al. (2021) – processed by Our World in Data and can be found [here](https://ourworldindata.org/plastic-pollution#explore-data-on-plastic-pollution).

## Figure 3

<p align="center">
  <img src="Figs/reduce_pollution_plastic_3.png" style="width:50%;">
</p>

Data were obtained from OECD (2022) – processed by Our World in Data and can be found [here](https://ourworldindata.org/grapher/plastic-production-by-sector).

## Figures 4 and 10

<p align="center">
  <img src="Figs/reduce_pollution_plastic_4.png" style="width:50%;">
</p>

<p align="center">
  <img src="Figs/reduce_pollution_plastic_10.png" style="width:50%;">
</p>

Data were obtained from United Nations Comtrade Database (2023), United Nations Comtrade Database (2025) – with minor processing by Our World in Data and can be found [here](https://ourworldindata.org/plastic-pollution#explore-data-on-plastic-pollution).

## Figures 5 and 11

<p align="center">
  <img src="Figs/reduce_pollution_plastic_5.png" style="width:50%;">
</p>

<p align="center">
  <img src="Figs/reduce_pollution_plastic_11.png" style="width:50%;">
</p>

Data were obtained from United Nations Comtrade Database (2023), United Nations Comtrade Database (2025) – with minor processing by Our World in Data and can be found [here](https://ourworldindata.org/plastic-pollution#explore-data-on-plastic-pollution).

## Figure 6

<p align="center">
  <img src="Figs/reduce_pollution_plastic_6.png" style="width:50%;">
</p>

Data were obtained from OECD (2022) – processed by Our World in Data and can be found [here](https://ourworldindata.org/plastic-waste-trade).

## Figure 7

<p align="center">
  <img src="Figs/reduce_pollution_plastic_7.png" style="width:50%;">
</p>

Data were obtained from OECD (2022) – processed by Our World in Data and can be found [here](https://ourworldindata.org/how-much-plastic-waste-ends-up-in-the-ocean).

## Figure 8

<p align="center">
  <img src="Figs/reduce_pollution_plastic_8.png" style="width:50%;">
</p>

Data were obtained from Meijer et al. (2021). More than 1000 rivers account for 80% of global riverine plastic emissions into the ocean. Science Advances. – processed by Our World in Data and can be found [here](https://ourworldindata.org/grapher/plastics-top-rivers?overlay=sources).

## Figure 9

<p align="center">
  <img src="Figs/reduce_pollution_plastic_9.png" style="width:50%;">
</p>

Data were obtained from Plastics in Great Pacific Garbage Patch (Lebreton et al. 2022) – processed by Our World in Data and can be found [here](https://ourworldindata.org/plastic-great-pacific-garbage).

## Figure 12

<p align="center">
  <img src="Figs/reduce_pollution_plastic_12.png" style="width:50%;">
</p>

Data were obtained from [Lebreton, Laurent; Reisser, Julia (2018). Supplementary data for 'River plastic emissions to the world's oceans'. figshare. Dataset. https://doi.org/10.6084/m9.figshare.4725541.v6](https://figshare.com/articles/dataset/River_plastic_emissions_to_the_world_s_oceans/4725541).

In [None]:
#!/usr/bin/env python3
"""
Subset The Ocean Cleanup / Figshare 'River plastic emissions to the world's oceans'
dataset to the top 1000 river emitters and save as GeoJSON.

Assumes the shapefile components:
  PlasticRiverInputs.shp, .shx, .dbf, .prj, .qpj
are in the working directory.
"""

from pathlib import Path
import geopandas as gpd

# ---- CONFIG ----
INPUT_SHP = Path("../Data/PlasticRiverInputs/PlasticRiverInputs.shp")   
OUTPUT_GEOJSON = Path("Fig_12_plastic.geojson")
EMISSION_COL = "i_mid"  # mid estimate of plastic input

def main():
    # 1. Load shapefile
    if not INPUT_SHP.exists():
        raise FileNotFoundError(f"Shapefile not found: {INPUT_SHP}")

    gdf = gpd.read_file(INPUT_SHP)
    print(f"Loaded {len(gdf)} river points")
    print("Columns:", list(gdf.columns))

    # 2. Ensure emission column exists and is numeric
    if EMISSION_COL not in gdf.columns:
        raise ValueError(
            f"Emission column '{EMISSION_COL}' not found. "
            f"Available columns: {list(gdf.columns)}"
        )

    gdf[EMISSION_COL] = gdf[EMISSION_COL].astype(float)

    # 3. Drop rows with missing emission values (just in case)
    gdf = gdf[gdf[EMISSION_COL].notna()]

    # 4. Sort by emission and take top 1000
    gdf_sorted = gdf.sort_values(EMISSION_COL, ascending=False)
    top_n = min(1000, len(gdf_sorted))
    top1000 = gdf_sorted.head(top_n).reset_index(drop=True)

    print(f"Selected top {top_n} rivers by '{EMISSION_COL}'")

    # 5. Make sure CRS is WGS84 (it already is in your file, EPSG:4326)
    if gdf.crs is None:
        # If for some reason CRS is missing, set it explicitly
        top1000.set_crs(epsg=4326, inplace=True)
    else:
        top1000 = top1000.to_crs(epsg=4326)

    # 6. Save to GeoJSON
    top1000.to_file(OUTPUT_GEOJSON, driver="GeoJSON")
    print(f"Saved GeoJSON to: {OUTPUT_GEOJSON.resolve()}")

if __name__ == "__main__":
    main()


# Nutrients Pollution

## Figure 1

In [None]:
import xarray as xr
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Load dataset
ds = xr.open_dataset("../Data/cmems_mod_glo_bgc-nut_anfc_0.25deg_P1M-m_1759960580853.nc") # Data from https://data.marine.copernicus.eu/product/GLOBAL_ANALYSISFORECAST_BGC_001_028/services

def area_weighted_mean(var):
    """Compute global area-weighted mean of a variable over time."""
    weights = np.cos(np.deg2rad(ds["latitude"]))
    weights_2d = xr.DataArray(weights, dims=["latitude"], coords={"latitude": ds["latitude"]})
    weights_2d = weights_2d.broadcast_like(var.isel(time=0, depth=0))

    mean = (var * weights_2d).sum(dim=["latitude", "longitude"]) / weights_2d.sum(dim=["latitude", "longitude"])
    if "depth" in mean.dims:
        mean = mean.mean(dim="depth")
    return mean

# Compute weighted means
po4_weighted = area_weighted_mean(ds["po4"])
no3_weighted = area_weighted_mean(ds["no3"])

# Drop the first time step
po4_weighted = po4_weighted.isel(time=slice(1, None))
no3_weighted = no3_weighted.isel(time=slice(1, None))

# Combine into a DataFrame
df = pd.DataFrame({
    "time": po4_weighted["time"].values,
    "po4": po4_weighted.values,
    "no3": no3_weighted.values
})

# Save to CSV
df.to_csv("Fig_1_nutrients.csv", index=False)

# Plot
plt.figure(figsize=(10, 6))
plt.plot(df["time"], df["po4"], label="PO4")
plt.plot(df["time"], df["no3"], label="NO3")
plt.title("Global Area-Weighted PO4 and NO3 Concentrations Over Time")
plt.ylabel("Concentration")
plt.xlabel("Time")
plt.legend()
plt.grid()
plt.show()


## Figure 2

<p align="center">
  <img src="Figs/reduce_pollution_nutrients_2.png" style="width:50%;">
</p>

In [None]:
import json
import numpy as np
import xarray as xr
import rioxarray  # pip install rioxarray rasterio

# ---------------------------
# INPUT: open your dataset
# ---------------------------
ds = xr.open_dataset("../Data/cmems_mod_glo_bgc-nut_anfc_0.25deg_P1M-m_1759960580853.nc") # Data from https://data.marine.copernicus.eu/product/GLOBAL_ANALYSISFORECAST_BGC_001_028/services

# ---------------------------
# CONFIG
# ---------------------------
LOW_Q, HIGH_Q = 0.02, 0.98   # robust percentiles for 8-bit scaling
OUT_PO4 = "../Data/Fig_2_nutrients_phosphorus.tif"
OUT_NO3 = "../Data/Fig_2_nutrients_nitrogen.tif"
SIDE_CAR_JSON = "../Data/nutrient_scaling_metadata.json"

# ---------------------------
# SELECT VARIABLES & HANDLE DEPTH
# ---------------------------
po4_spatial = ds["po4"]
no3_spatial = ds["no3"]

# ---------------------------
# TIME MEAN
# ---------------------------
po4_mean = po4_spatial.mean(dim="time", skipna=True)
no3_mean = no3_spatial.mean(dim="time", skipna=True)

# ---------------------------
# ORIENTATION + GEOSPATIAL SETUP
# Ensure latitude is descending (north -> south)
# ---------------------------
def ensure_lat_desc(da):
    if "latitude" not in da.dims or "longitude" not in da.dims:
        raise ValueError("Expected dims 'latitude' and 'longitude' in the DataArray.")
    lats = da["latitude"].values
    if np.all(np.diff(lats) > 0):
        da = da.sortby("latitude", ascending=False)
    return da

po4_mean = ensure_lat_desc(po4_mean)
no3_mean = ensure_lat_desc(no3_mean)

# rioxarray: attach spatial dims and CRS (WGS84 / EPSG:4326)
def prepare(da):
    # NOTE: rioxarray expects x_dim / y_dim, not x / y
    da = da.rio.set_spatial_dims(x_dim="longitude", y_dim="latitude", inplace=False)
    da = da.rio.write_crs(4326, inplace=False)
    return da

po4_mean = prepare(po4_mean)
no3_mean = prepare(no3_mean)

# ---------------------------
# 8-BIT NORMALIZATION
# Map robust range (LOW_Q–HIGH_Q) -> 0..255, reserve 0 as nodata
# ---------------------------
def normalize_to_8bit(da, low_q=0.02, high_q=0.98):
    # Compute percentiles ignoring NaNs
    vmin = float(da.quantile(low_q).values)
    vmax = float(da.quantile(high_q).values)
    if not np.isfinite(vmin) or not np.isfinite(vmax) or vmax <= vmin:
        finite_vals = da.values[np.isfinite(da.values)]
        if finite_vals.size == 0:
            raise ValueError("All values are NaN; cannot normalize.")
        vmin, vmax = float(np.nanmin(finite_vals)), float(np.nanmax(finite_vals))
        if vmax <= vmin:
            vmax = vmin + 1.0

    clipped = da.clip(min=vmin, max=vmax)
    scaled = (clipped - vmin) / (vmax - vmin) * 255.0

    # Keep NaNs for now; we'll fill to nodata after handling 0s
    # Pixels that legitimately map to 0 (i.e., exactly vmin) will be nudged to 1
    scaled_uint8 = scaled.round().astype("float32")  # temporary float for masking
    finite_mask = np.isfinite(scaled_uint8)

    # Nudge valid zeros to 1 so 0 can be nodata
    zeros_mask = (scaled_uint8 == 0) & finite_mask
    scaled_uint8 = scaled_uint8.where(~zeros_mask, 1)

    # Now fill NaNs to 0 and cast to uint8
    scaled_uint8 = scaled_uint8.fillna(0).astype("uint8")

    return scaled_uint8, vmin, vmax

po4_8bit, po4_vmin, po4_vmax = normalize_to_8bit(po4_mean, LOW_Q, HIGH_Q)
no3_8bit, no3_vmin, no3_vmax = normalize_to_8bit(no3_mean, LOW_Q, HIGH_Q)

# Re-attach CRS/dims after astype
po4_8bit = prepare(po4_8bit)
no3_8bit = prepare(no3_8bit)

# Write explicit nodata = 0 (Mapbox tilers treat 0 as transparent/background)
po4_8bit = po4_8bit.rio.write_nodata(0, inplace=False)
no3_8bit = no3_8bit.rio.write_nodata(0, inplace=False)

# ---------------------------
# WRITE GEOTIFFS (Mapbox-friendly)
# ---------------------------
po4_8bit.rio.to_raster(
    OUT_PO4,
    compress="DEFLATE",
    tiled=True,
    BIGTIFF="IF_SAFER",
)

no3_8bit.rio.to_raster(
    OUT_NO3,
    compress="DEFLATE",
    tiled=True,
    BIGTIFF="IF_SAFER",
)

print(f"Saved:\n  {OUT_PO4}\n  {OUT_NO3}")

# ---------------------------
# SAVE SCALING METADATA (for reverse mapping)
# ---------------------------
meta = {
    "scaling": {
        "po4": {
            "file": OUT_PO4,
            "quantiles": [LOW_Q, HIGH_Q],
            "vmin": po4_vmin,
            "vmax": po4_vmax,
            "nodata_uint8": 0,
            "crs": "EPSG:4326",
            "note": "Pixel 0 is nodata; valid 1-255 map linearly from vmin..vmax (with valid zeros nudged to 1)."
        },
        "no3": {
            "file": OUT_NO3,
            "quantiles": [LOW_Q, HIGH_Q],
            "vmin": no3_vmin,
            "vmax": no3_vmax,
            "nodata_uint8": 0,
            "crs": "EPSG:4326",
            "note": "Pixel 0 is nodata; valid 1-255 map linearly from vmin..vmax (with valid zeros nudged to 1)."
        }
    }
}
with open(SIDE_CAR_JSON, "w") as f:
    json.dump(meta, f, indent=2)

print(f"Saved scaling metadata: {SIDE_CAR_JSON}")


## Figure 3

<p align="center">
  <img src="Figs/reduce_pollution_nutrients_3.png" style="width:50%;">
</p>

Data were retrieved from Table S17 from [Poore and Nemecek (2018)](https://www.science.org/doi/10.1126/science.aaq0216).

## Figures 4 and 5

<p align="center">
  <img src="Figs/reduce_pollution_nutrients_4.png" style="width:50%;">
</p>

<p align="center">
  <img src="Figs/reduce_pollution_nutrients_5.png" style="width:50%;">
</p>

This code finds ocean dead zones based on modeled dissolved $O_2$ concentrations from the Global Ocean Biogeochemistry Analysis and Forecast from [Copernicus](https://doi.org/10.48670/moi-00015). Hypoxic conditions were defined as locations where the dissolved $O_2$ concentration is below 2 mg/L for any locations between the sea surface and 200m depth. The dead zones are then aggregated at the level of water bodies throughout the world's oceans and seas. This is also overlaid with biodiversity priority areas from Zhao et al. (2020).  

In [None]:
import xarray as xr
import geopandas as gpd
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from scipy import stats
import cartopy.crs as ccrs
from shapely.geometry import box
import rioxarray
import re

from rasterio.features import geometry_mask
from scipy.stats import linregress
from tqdm import tqdm

# Open the biodiversity priority areas based on Zhao et al. 2020 (https://www.sciencedirect.com/science/article/abs/pii/S0006320719312182?via%3Dihub)
masked_data = rioxarray.open_rasterio('../Data/masked_top_30_percent_over_water.tif')

# Set the CRS for masked_data if it's not already set
if 'crs' not in masked_data.attrs:
    masked_data.rio.write_crs('EPSG:4326', inplace=True)

# Load SST dataset and EEZ shapefile
seas_shapefile_path = '../Data/World_Seas_IHO_v3/World_Seas_IHO_v3.shp'
SEAS_DF = gpd.read_file(seas_shapefile_path)

def area_dead_zone(o2_df, SEAS_DF=SEAS_DF):
    area_deadzone = []

    # Set CRS and rename dimensions and coordinates
    o2_df = o2_df.rio.write_crs("epsg:4326")
    o2_df = o2_df.rename({'latitude': 'y', 'longitude': 'x'})

    # Interpolate biodiversity priority areas to the same resolution as the climate data
    masked_data_interp = masked_data.interp(
        x=o2_df['x'],
        y=o2_df['y'],
        method='nearest'
    )

    # Calculate the area for each grid cell (assumes lat/lon grid)
    lat = o2_df['y'].values
    lon = o2_df['x'].values
    
    # Calculate grid cell area using Haversine formula or by approximation
    lat_rad = np.deg2rad(lat)
    lon_rad = np.deg2rad(lon)
    
    # Earth radius in kilometers
    R = 6371
    dlat = np.gradient(lat_rad)
    dlon = np.gradient(lon_rad)
    
    # Approximate area calculation
    cell_areas = (R**2 * np.outer(np.sin(dlat), dlon)) * np.cos(lat_rad[:, None])
    
    # Use tqdm to track progress through SEAS_DF.iterrows()
    for i, row in tqdm(SEAS_DF.iterrows(), total=len(SEAS_DF), desc="Processing Sea Areas"):
        try:
            region_name = row['NAME']
            area = row['area']
            geom = row['geometry']
    
            # Mask SST trends with the sea geometry
            masked_df = o2_df.rio.clip([geom], drop=True)
    
            # Clip cell_areas to the same extent as masked_df
            cell_areas_clipped = xr.DataArray(
                cell_areas, 
                dims=['y', 'x'], 
                coords={'y': o2_df['y'], 'x': o2_df['x']}
            )
            
            # Set CRS for cell_areas_clipped to match the CRS of trend_significance_ds
            cell_areas_clipped = cell_areas_clipped.rio.write_crs('EPSG:4326')
    
            # Clip cell_areas to the same geometry
            cell_areas_clipped = cell_areas_clipped.rio.clip([geom], drop=True)
        
            # Compute the total area that is impacted by dead zones
            deadzone_area = (masked_df * cell_areas_clipped).sum(dim=('y', 'x')).compute()  # Compute to convert from Dask array
    
            # Calculate the area for biodiversity based on the mask
            area_biodiversity = ((masked_df * cell_areas_clipped) * masked_data_interp).sum(dim=['x', 'y']).compute()

            total_area = cell_areas_clipped.sum(dim=('y', 'x')).compute()  # Ensure computation
    
            # Extract values after computing
            deadzone_area_value = deadzone_area.item() if deadzone_area.size == 1 else deadzone_area.values[0]
            total_area_value = total_area.item() if total_area.size == 1 else total_area.values[0]
            area_biodiversity = area_biodiversity.item() if area_biodiversity.size == 1 else area_biodiversity.values[0]
    
            # Store the result
            area_deadzone.append({
                'Region_Name': region_name,
                'geometry': geom,
                'Deadzone_Area': area*deadzone_area_value/total_area.item(),
                'Deadzone_Area_Percent': 100*(deadzone_area_value/total_area.item()),
                'Sea_Area': area,
                'Biodiversity_Area': area*area_biodiversity/total_area.item(),
                'Biodiversity_Area_Percent': 100*(area_biodiversity/total_area.item()),
            })
        except Exception as e:
            print(f"Error processing {region_name}: {e}")

    # Convert the results to a GeoDataFrame for easy viewing
    area_deadzone_gdf = gpd.GeoDataFrame(area_deadzone, crs=SEAS_DF.crs)
    return area_deadzone_gdf

In [None]:
# Load the dataset
o2_df = xr.open_mfdataset("../Data/dead_zone/*")

# Conversion factor from mmol O2/m³ to mg/L
o2_conversion_factor = 32 / 1000

# Convert oxygen concentration from mmol O2/m³ to mg/L
o2_mg_per_l = o2_df['o2'] * o2_conversion_factor

# Define a depth range for dead zone analysis (e.g., upper 200 meters)
shallow_depth_mask = o2_df['depth'] <= 200

# Apply the mask to limit analysis to shallow depths
o2_mg_per_l_shallow = o2_mg_per_l.where(shallow_depth_mask, drop=True)

# Create a mask where oxygen concentration is less than 2 mg/L at any shallow depth or time
low_oxygen_mask_shallow = (o2_mg_per_l_shallow < 2).max(dim=['depth', 'time']).astype(int)

area_df = area_dead_zone(low_oxygen_mask_shallow)

# Save the GeoDataFrame to a GeoJSON file
area_df.to_file("../Data/Fig_4_nutrients.geojson",driver="GeoJSON")

In [None]:
area_df

# Light Pollution

For the raw light pollution files, data were downloaded from [here](https://doi.pangaea.de/10.1594/PANGAEA.969081) as 11 regional netcdf files per each month in 2019 (corresponding to 132 netcdf files total). The files for each month were merged together with `xarray` functions.

## Figures 1 and 3

<p align="center">
  <img src="Figs/reduce_pollution_light_1.png" style="width:50%;">
</p>

In [None]:
import xarray as xr

# Open all ALAN light pollution data for 2019 as an xarray dataset
combined = xr.open_mfdataset("../Data/ALAN/global_month_*.nc")

# Create a risk_level boolean based on what's considered a 'High' threat level (critical depth > 10)
combined['risk_level'] = (combined['z_thresh'] >= 10) #.astype(int)

# Drop the 'z_thresh' variable from the dataset
combined = combined.drop_vars('z_thresh')
combined

In [None]:
import geopandas as gpd

# Open EEZ shapefiles
eez = gpd.read_file('../Data/World_EEZ_v12_20231025/eez_v12.shp')
eez

In [None]:
import numpy as np
import geopandas as gpd
import rioxarray

# Open the biodiversity priority areas based on Zhao et al. 2020
masked_data = rioxarray.open_rasterio('masked_top_30_percent_over_water.tif')

# Set CRS if missing
if 'crs' not in masked_data.attrs:
    masked_data.rio.write_crs('EPSG:4326', inplace=True)

area_light_data = []

for i, row in eez.iterrows():
    try:
        print(i)
        country_name = row['TERRITORY1']
        geom = row['geometry']
        ISO_TER1 = row['ISO_TER1']

        # Rename dims if needed
        if 'x' not in combined.dims or 'y' not in combined.dims:
            combined = combined.rename({'lon': 'x', 'lat': 'y'})
            combined.rio.write_crs('EPSG:4326', inplace=True)

        # Mask ALAN data to the EEZ
        masked_light = combined.rio.clip([geom], drop=True)

        # Interpolate biodiversity map to ALAN grid
        masked_data_interp = masked_data.interp(
            x=combined['x'],
            y=combined['y'],
            method='nearest'
        )

        lat = masked_light['y'].values
        lon = masked_light['x'].values

        lat_rad = np.deg2rad(lat)
        lon_rad = np.deg2rad(lon)

        R = 6371  # km
        dlat = np.gradient(lat_rad)
        dlon = np.gradient(lon_rad)

        # approximate area for lat/lon grid
        cell_areas = (R**2 * np.outer(np.sin(dlat), dlon)) * np.cos(lat_rad[:, None])

        valid_cell_areas = np.where(masked_light.values, cell_areas, 0)

        area_light = (
            masked_light.max(dim='time')['risk_level'] * valid_cell_areas
        ).sum().values

        area_eez = valid_cell_areas.sum()

        valid_mask_data = masked_data_interp

        area_biodiversity = (
            (masked_light.max(dim='time')['risk_level'] * valid_cell_areas) *
            valid_mask_data
        ).sum().values

        area_light_data.append({
            'Country': country_name,
            'ISO_TER1': ISO_TERTER1,
            'geometry': geom,
            'Light_Area': row['AREA_KM2'] * area_light / area_eez,
            'Light_Area_Percent': 100 * area_light / area_eez,
            'EEZ_Area': row['AREA_KM2'],
            'Biodiversity_Area': row['AREA_KM2'] * area_biodiversity / area_eez,
            'Biodiversity_Area_Percent': 100 * area_biodiversity / area_eez
        })

        print(country_name,
              'Light_Area:', row['AREA_KM2'] * area_light / area_eez,
              'EEZ_Area:', row['AREA_KM2'],
              'Biodiversity_Area:', row['AREA_KM2'] * area_biodiversity / area_eez)

    except Exception as e:
        print(e)

# ---- Convert to GeoDataFrame ----
area_light_gdf = gpd.GeoDataFrame(area_light_data, crs=eez.crs)

# Ensure valid shapes (optional but recommended)
area_light_gdf['geometry'] = area_light_gdf['geometry'].buffer(0)

# ---- SAVE AS GEOJSON ----
output_path = "Figure_1_light.geojson"
area_light_gdf.to_file(output_path, driver="GeoJSON")

print(f"Saved GeoJSON to {output_path}")


## Figure 2

<p align="center">
  <img src="Figs/reduce_pollution_light_2.png" style="width:50%;">
</p>

This figure plots the raw data from [Smyth et al.](https://doi.pangaea.de/10.1594/PANGAEA.969081)

## Figure 4

sea turtle

## Figure 5

coral reefs

# Noise Pollution

<p align="center">
  <img src="Figs/reduce_pollution_noise_1.png" style="width:50%;">
</p>

The data are derived from Figure 2 of [Duarte et al.](https://epic.awi.de/id/eprint/53691/1/DuarteEtAl_2021full.pdf)

## Figure 2

<p align="center">
  <img src="Figs/reduce_pollution_noise_2.png" style="width:50%;">
</p>

The data are derived from UN Trade and Development, [UNCTADstat](https://unctadstat.unctad.org/datacentre/dataviewer/US.MerchantFleet).

## Figure 3

<p align="center">
  <img src="Figs/reduce_pollution_noise_3.png" style="width:50%;">
</p>

The data are derived from the 125 Hz band available from [Jukka-Pekka et al.](https://zenodo.org/records/6513401)

In [None]:
import xarray as xr
import numpy as np
import rioxarray  # make sure this is installed: pip install rioxarray

# --- Load data ---
ds = xr.open_dataset("../Data/2020_Ship_Noise_Energy.nc", decode_times=False)

# If there's a time dimension, pick one slice (adjust as needed)
da = ds['NOISE_ENE_2ndBand']
if 'time' in da.dims:
    da = da.isel(time=0)  # or pick appropriate index

# --- Ensure CRS is set for GeoTIFF (change if your CRS is different) ---
if 'crs' not in da.attrs:
    da = da.rio.write_crs("EPSG:4326", inplace=False)

# --- Log-transform (log10) ---
# mask non-positive values before log
da_pos = da.where(da > 0)

log_da = np.log10(da_pos)

# Or use quantiles to avoid extreme outliers (recommended):
vmin = float(log_da.quantile(0.02))
vmax = float(log_da.quantile(0.98))

# --- Rescale to 0–255 ---
scaled = (log_da - vmin) / (vmax - vmin)
scaled = scaled.clip(0, 1) * 255

# Round and cast to uint8
scaled_uint8 = scaled.round().astype("uint8")

# Set nodata (where original was non-positive or NaN) to 0
mask_nodata = ~np.isfinite(log_da)
scaled_uint8 = scaled_uint8.where(~mask_nodata, 0)

scaled_uint8.name = "NOISE_ENE_2ndBand_logscaled"

# --- Save as GeoTIFF ---
out_path = "Figure_3_noise.tif"
scaled_uint8.rio.to_raster(out_path, dtype="uint8")

print(f"Saved: {out_path}")


## Figures 4 and 7

<p align="center">
  <img src="Figs/reduce_pollution_noise_4.png" style="width:50%;">
</p>

The data are derived from [Global Wind Power Tracker, Global Energy Monitor, February 2025 release](https://globalenergymonitor.org/projects/global-wind-power-tracker/download-data/).

In [None]:
#!/usr/bin/env python3
import pandas as pd
import geopandas as gpd
from pathlib import Path

# --- paths ---
xlsx_path = Path("../Data/Global-Wind-Power-Tracker-February-2025.xlsx")
out_csv = Path("Figure_4_noise.csv")
out_geojson = Path("Figure_7_noise.geojson")

# --- load data ---
df = pd.read_excel(xlsx_path, sheet_name="Data")

# ------------------------
# SUBSET TO OFFSHORE WIND
# ------------------------
df["Installation Type"] = df["Installation Type"].astype(str)
offshore = df[df["Installation Type"].str.lower().str.contains("offshore")].copy()

offshore["Capacity (MW)"] = pd.to_numeric(offshore["Capacity (MW)"], errors="coerce")
offshore["Start year"] = pd.to_numeric(offshore["Start year"], errors="coerce")

offshore = offshore.dropna(subset=["Start year", "Capacity (MW)"])
offshore["Start year"] = offshore["Start year"].astype(int)

# ------------------------
# ANNUAL CAPACITY ADDITIONS
# ------------------------
annual = (
    offshore.groupby("Start year", as_index=False)["Capacity (MW)"]
    .sum()
    .rename(columns={"Start year": "Year", "Capacity (MW)": "Annual_capacity_MW"})
    .sort_values("Year")
)

# ------------------------
# FILL MISSING YEARS WITH ZERO
# ------------------------
min_year = annual["Year"].min()
max_year = 2025  # user requested upper bound

all_years = pd.DataFrame({"Year": range(min_year, max_year + 1)})

# Merge with full year index
annual_full = (
    all_years.merge(annual, on="Year", how="left")
              .fillna({"Annual_capacity_MW": 0})
)

# ------------------------
# CUMULATIVE CAPACITY
# ------------------------
annual_full["Cumulative_capacity_MW"] = annual_full["Annual_capacity_MW"].cumsum()

# ------------------------
# SAVE ANNUAL TIMESERIES
# ------------------------
annual_full[["Year", "Annual_capacity_MW", "Cumulative_capacity_MW"]].to_csv(
    out_csv, index=False
)
print(f"Saved annual offshore timeseries to {out_csv.resolve()}")

# ------------------------
# SAVE GEOJSON OF OFFSHORE TURBINES
# ------------------------
offshore = offshore.dropna(subset=["Latitude", "Longitude"])

gdf_offshore = gpd.GeoDataFrame(
    offshore,
    geometry=gpd.points_from_xy(offshore["Longitude"], offshore["Latitude"]),
    crs="EPSG:4326"
)

gdf_offshore = gdf_offshore[["Project Name", "Capacity (MW)", "Latitude",
                             "Longitude", "Start year", "geometry"]]

gdf_offshore.to_file(out_geojson, driver="GeoJSON")
print(f"Saved offshore turbines GeoJSON to {out_geojson.resolve()}")


## Figures 5 and 8

<p align="center">
  <img src="Figs/reduce_pollution_noise_4.png" style="width:50%;">
</p>

The data are derived from [Global Oil and Gas Extraction Tracker, Global Energy Monitor, February 2025 release](https://globalenergymonitor.org/projects/global-oil-gas-extraction-tracker/download-data/).

In [None]:
#!/usr/bin/env python3
import pandas as pd
import geopandas as gpd
from pathlib import Path

# --- paths ---
xlsx_path = Path("../Data/Global-Oil-and-Gas-Extraction-Tracker-Feb-2025.xlsx")
out_csv = Path("Figure_5_noise.csv")
out_geojson = Path("Figure_8_noise.geojson")

# --- load data ---
df = pd.read_excel(xlsx_path, sheet_name="Main data")

# ------------------------
# USE ALL EXTRACTION SITES
# ------------------------
# We'll build a time series based on Production start year (no filtering)
df_ts = df.copy()
df_ts["Production start year"] = pd.to_numeric(
    df_ts["Production start year"], errors="coerce"
)

# Drop rows without a valid production start year
df_ts = df_ts.dropna(subset=["Production start year"])
df_ts["Production start year"] = df_ts["Production start year"].astype(int)

# ------------------------
# ANNUAL COUNTS
# ------------------------
annual = (
    df_ts.groupby("Production start year", as_index=False)
    .size()
    .rename(columns={"Production start year": "Year", "size": "Annual_count"})
    .sort_values("Year")
)

# ------------------------
# FILL MISSING YEARS WITH ZERO
# ------------------------
min_year = annual["Year"].min()
max_year = 2025  # upper bound

all_years = pd.DataFrame({"Year": range(min_year, max_year + 1)})

annual_full = (
    all_years.merge(annual, on="Year", how="left")
             .fillna({"Annual_count": 0})
)

# ------------------------
# CUMULATIVE COUNT
# ------------------------
annual_full["Cumulative_count"] = annual_full["Annual_count"].cumsum()

# Cast to int for cleaner CSV
annual_full["Annual_count"] = annual_full["Annual_count"].astype(int)
annual_full["Cumulative_count"] = annual_full["Cumulative_count"].astype(int)

# ------------------------
# SAVE ANNUAL TIMESERIES
# ------------------------
annual_full[["Year", "Annual_count", "Cumulative_count"]].to_csv(
    out_csv, index=False
)
print(f"Saved annual extraction site timeseries to {out_csv.resolve()}")

# ------------------------
# SAVE GEOJSON OF ALL EXTRACTION SITES
# ------------------------
df_geo = df.dropna(subset=["Latitude", "Longitude"]).copy()

gdf_offshore = gpd.GeoDataFrame(
    df_geo,
    geometry=gpd.points_from_xy(df_geo["Longitude"], df_geo["Latitude"]),
    crs="EPSG:4326"
)

# Keep a compact set of useful attributes (adjust as needed)
gdf_offshore = gdf_offshore[
    ["Unit Name", "Fuel type", "Latitude",
     "Longitude", "Production start year", "geometry"]
]

gdf_offshore.to_file(out_geojson, driver="GeoJSON")
print(f"Saved extraction sites GeoJSON to {out_geojson.resolve()}")


## Figure 6

<p align="center">
  <img src="Figs/reduce_pollution_noise_4.png" style="width:50%;">
</p>

Data were retrieved from the [International Seabed Authority Dashboard](https://isa.org.jm/deepdata-database/deepdata-dashboard/).

# Oil Pollution

## Figure 1

<p align="center">
  <img src="Figs/reduce_pollution_oil_1.png" style="width:50%;">
</p>

The data were retrieved from [ITOPF (2025) – with minor processing by Our World in Data](https://ourworldindata.org/grapher/quantity-oil-spills).

## Figure 2

<p align="center">
  <img src="Figs/reduce_pollution_oil_2.png" style="width:50%;">
</p>

The data were retrieved from [ITOPF](https://www.itopf.org/knowledge-resources/data-statistics/oil-tanker-spill-statistics-2024/).

## Figure 3

<p align="center">
  <img src="Figs/reduce_pollution_oil_3.png" style="width:50%;">
</p>

The number of spills data were retrieved from [ITOPF (2025) – with minor processing by Our World in Data](https://ourworldindata.org/grapher/number-oil-spills). The crude oil and tanker data were available from [UNCTAD](https://digitallibrary.in.one.un.org/TempPdfFiles/8392_1.pdf).

## Figure 4

<p align="center">
  <img src="Figs/reduce_pollution_oil_4.png" style="width:50%;">
</p>

The data were retrieved from [ITOPF](https://www.itopf.org/knowledge-resources/data-statistics/oil-tanker-spill-statistics-2024/).

## Figure 5

<p align="center">
  <img src="Figs/reduce_pollution_oil_5.png" style="width:50%;">
</p>

The data were retrieved from Table 3.1 of National Academies of Sciences, Engineering, and Medicine. Oil in the Sea IV: Inputs, Fates, and Effects. Washington, DC: The National Academies Press, 2022.

## Figure 6

<p align="center">
  <img src="Figs/reduce_pollution_oil_6.png" style="width:50%;">
</p>

<p align="center">
  <img src="Figs/reduce_pollution_oil_7.png" style="width:50%;">
</p>

The data were retrieved from the Cerulean Slick API for the year 2024 globally.

In [None]:
#!/usr/bin/env python3
import argparse, json, time, datetime as dt
from urllib.parse import urlencode
import requests

API_ROOT = "https://api.cerulean.skytruth.org"

def get_json(url, timeout=60):
    for i in range(6):
        r = requests.get(url, timeout=timeout, headers={"Accept":"application/geo+json"})
        if r.status_code == 200:
            return r.json()
        if r.status_code in (429,500,502,503,504):
            time.sleep(1.5**i)
            continue
        r.raise_for_status()
    r.raise_for_status()

def iter_months(year):
    for m in range(1,13):
        start = dt.datetime(year,m,1)
        end = dt.datetime(year+1,1,1) if m==12 else dt.datetime(year,m+1,1)
        yield start.isoformat()+"Z", end.isoformat()+"Z"

def page_items(collection_id, start_iso, end_iso, bbox=None, limit=1000):
    base = f"{API_ROOT}/collections/{collection_id}/items"
    params = {
        "datetime": f"{start_iso}/{end_iso}",
        "limit": str(limit),
        "f": "geojson",
    }
    if bbox:
        params["bbox"] = ",".join(map(str, bbox))
    url = base + "?" + urlencode(params)
    while url:
        data = get_json(url)
        for ft in data.get("features", []):
            yield ft
        next_url = None
        for ln in data.get("links", []):
            if ln.get("rel") == "next":
                next_url = ln.get("href")
                break
        url = next_url

def main():
    ap = argparse.ArgumentParser(description="Download Cerulean slick polygons for 2024 (global).")
    ap.add_argument("--collection", default="public.slick_plus", help="Try public.slick_plus (open). public.slick is restricted.")
    ap.add_argument("--year", type=int, default=2024)
    ap.add_argument("--bbox", type=float, nargs=4, default=[-180,-90,180,90])
    ap.add_argument("--limit", type=int, default=1000)
    ap.add_argument("--out", default="cerulean_slicks_2024.geojson")
    args = ap.parse_args()

    out = {"type":"FeatureCollection","features":[]}
    total = 0
    for start_iso, end_iso in iter_months(args.year):
        print(f"Fetching {start_iso} → {end_iso} …")
        for ft in page_items(args.collection, start_iso, end_iso, args.bbox, args.limit):
            out["features"].append(ft)
            total += 1
        time.sleep(0.25)
    print(f"Writing {total} features to {args.out}")
    with open(args.out, "w", encoding="utf-8") as f:
        json.dump(out, f)

    # Optional: GeoParquet
    try:
        import geopandas as gpd
        gdf = gpd.GeoDataFrame.from_features(out, crs="EPSG:4326")
        gdf.to_parquet(args.out.replace(".geojson",".parquet"), index=False)
        print("Also wrote", args.out.replace(".geojson",".parquet"))
    except Exception as e:
        print(f"(Parquet export skipped: {e})")

if __name__ == "__main__":
    main()
