# Datasets

## Overview

To use `ROMS-Tools`, the user must download the following datasets:

| **Dataset** | **Supported Versions** | **Notes** | **Required Fields** | **Field Description** | **Available at** | **Required For** |
|-------------|------------------------|----------|--------------------|---------------------|-----------------|-----------------|
| **SRTM15** | V2.6 | | `lat` | Latitude (degrees north) | [USCD SRTM15+ Product](https://topex.ucsd.edu/WWW_html/srtm15_plus.html) | Grid (Topography) |
| | | | `lon` | Longitude (degrees east) | | |
| | | | `z` | Topography | | |
| **TPXO** | TPXO9v5a (1/6°) | | `lat_z` | Latitude of z nodes | [OSU TPXO Tide Models](https://www.tpxo.net/global) | Tidal Forcing |
| | TPXO10v2 (1/6°) | | `lon_z` | Longitude of z nodes | | |
| | TPXO10v2a (1/6°)| | `lat_u` | Latitude of u nodes | | |
| | | | `lon_u` | Longitude of u nodes | | |
| | | | `lat_v` | Latitude of v nodes | | |
| | | | `lon_v` | Longitude of v nodes | | |
| | | | `mz` | Water/land mask for z nodes | | |
| | | | `mu` | Water/land mask for u nodes | | |
| | | | `mv` | Water/land mask for v nodes | | |
| | | | `hRe` | Tidal elevation, real part (m) | | |
| | | | `hIm` | Tidal elevation, imaginary part (m) | | |
| | | | `URe` | Tidal transport WE, real part (m²/s) | | |
| | | | `UIm` | Tidal transport WE, imaginary part (m²/s) | | |
| | | | `VRe` | Tidal transport SN, real part (m²/s) | | |
| | | | `VIm` | Tidal transport SN, imaginary part (m²/s) | | |
| **GLORYS** |  | **Refer to notes below for download instructions** | `time` | Time | [Mercator Ocean](https://data.marine.copernicus.eu/product/GLOBAL_MULTIYEAR_PHY_001_030/description) | Initial Conditions, Boundary Forcing |
| | | | `latitude` | Latitude (degrees north) | | |
| | | | `longitude` | Longitude (degrees east) | | |
| | | | `depth` | Depth (m) | | |
| | | | `zos` | Sea surface height (m) | | |
| | | | `thetao` | Temperature (°C) | | |
| | | | `so` | Salinity (psu) | | |
| | | | `uo` | Eastward velocity (m/s) | | |
| | | | `vo` | Northward velocity (m/s) | | |
| **ERA5** |  | ROMS-Tools can stream ERA5 data directly, so **downloading ERA5 data is optional** | `time` | Time | [Climate Data Store](https://cds.climate.copernicus.eu/datasets/reanalysis-era5-single-levels) | Surface Forcing |
| | | | `latitude` | Latitude (degrees north) | | |
| | | | `longitude` | Longitude (degrees east) | | |
| | | | `u10` | 10m U wind component (m/s) | | |
| | | | `v10` | 10m V wind component (m/s) | | |
| | | | `ssr` | Surface net short-wave (W/m²) | | |
| | | | `strd` | Surface long-wave downwards (W/m²) | | |
| | | | `t2m` | 2m temperature (K) | | |
| | | | `d2m` | 2m dewpoint temperature (K) | | |
| | | | `tp` | Total precipitation (m) | | |
| | | | `sst` | Sea surface temperature (K) — used for land masking | | |
| **Dai & Trenberth** | 2019 | Coastal station volumes (monthly); **automatically downloaded by ROMS-Tools** | `station` | Station index | [NCAR RDA](https://rda.ucar.edu/datasets/d551000/dataaccess/#) | River Forcing |
|  |  |  | `time` | Time | | |
|  |  |  | `lat_mou` | River mouth latitude | | |
|  |  |  | `lon_mou` | River mouth longitude | | |
|  |  |  | `FLOW` | Monthly mean volume at station | | |
|  |  |  | `ratio_m2s` | Ratio of volume between river mouth and station | | |
|  |  |  | `riv_name` | River name | | |
|  |  |  | `vol_stn` (optional) | Annual volume at station | | |


## Downloading GLORYS data

You can download GLORYS data from the [Copernicus Marine Data Store](https://data.marine.copernicus.eu/products).  
To access the data, [register for a Copernicus Marine Service account](https://help.marine.copernicus.eu/en/articles/4220332-how-to-sign-up-for-copernicus-marine-service) to obtain a username and password.

Once registered, install the `copernicusmarine` package to download the datasets:

```bash
pip install copernicusmarine
```

In [1]:
import copernicusmarine



When you first log in with `copernicusmarine`, your credentials are saved in a `.copernicusmarine-credentials` file. This one-time setup gives you seamless access to all Copernicus Marine services without re-entering credentials.

```python
copernicusmarine.login(username="YOUR_USERNAME", password="YOUR_PASSWORD")
```

### Downloading global data
This example demonstrates how to download the global GLORYS dataset for a specified time range, defined by `start_time` and `end_time`. In this case, we select January 2012.

In [2]:
from datetime import datetime

In [3]:
start_time = datetime(2012, 1, 1)
end_time = datetime(2012, 2, 1)

In [4]:
%%time

copernicusmarine.subset(
    dataset_id="cmems_mod_glo_phy_my_0.083deg_P1D-m",
    variables=["thetao", "so", "uo", "vo", "zos"],
    minimum_longitude=None, # global data
    maximum_longitude=None, # global data
    minimum_latitude=None, # global data
    maximum_latitude=None, # global data 
    start_datetime=start_time,
    end_datetime=end_time,
    coordinates_selection_method="outside",
    output_filename = "global_GLORYS_Jan2012.nc",
    output_directory = "copernicus-data"
)

INFO - 2025-08-29T21:33:43Z - Selected dataset version: "202311"
INFO - 2025-08-29T21:33:43Z - Selected dataset part: "default"
INFO - 2025-08-29T21:33:45Z - Starting download. Please wait...


  0%|          | 0/6761 [00:00<?, ?it/s]

INFO - 2025-09-02T18:51:23Z - Successfully downloaded to copernicus-data/global_GLORYS_Jan2012_(2).nc


CPU times: user 17min 42s, sys: 16min 48s, total: 34min 30s
Wall time: 3d 21h 17min 42s


ResponseSubset(file_path=PosixPath('copernicus-data/global_GLORYS_Jan2012_(2).nc'), output_directory=PosixPath('copernicus-data'), filename='global_GLORYS_Jan2012_(2).nc', file_size=138819.17330534352, data_transfer_size=630106.1276335877, variables=['thetao', 'so', 'uo', 'vo', 'zos'], coordinates_extent=[GeographicalExtent(minimum=-180.0, maximum=179.9166717529297, unit='degrees_east', coordinate_id='longitude'), GeographicalExtent(minimum=-80.0, maximum=90.0, unit='degrees_north', coordinate_id='latitude'), TimeExtent(minimum='2012-01-01T00:00:00+00:00', maximum='2012-02-02T00:00:00+00:00', unit='iso8601', coordinate_id='time'), GeographicalExtent(minimum=0.49402499198913574, maximum=5727.9169921875, unit='m', coordinate_id='depth')], status='000', message='The request was successful.', file_status='DOWNLOADED')

### Downloading a spatial subset

If you don’t want to download the entire *global* dataset (which can be very time-consuming) you can instead download a **spatial subset** of GLORYS data for a specific domain. This requires specifying `minimum_longitude`, `maximum_longitude`, `minimum_latitude`, and `maximum_latitude`.

Because ROMS grids (at least those created by ROMS-Tools) are **not regular lat-lon grids**, determining these bounds is not straightforward. Additionally, ROMS-Tools requires a **safety margin** to perform [lateral fill](https://roms-tools.readthedocs.io/en/latest/methods.html#multigrid-method-for-filling-land-values) and regridding, which helps prevent boundary artifacts. ROMS-Tools provides a function that can compute appropriate bounds given a grid.

In [5]:
from roms_tools import Grid

In [6]:
grid = Grid(
    nx=100,  # number of grid points in x-direction
    ny=80,  # number of grid points in y-direction
    size_x=2000,  # domain size in x-direction (in km)
    size_y=1600,  # domain size in y-direction (in km)
    center_lon=-89,  # longitude of the center of the domain
    center_lat=24,  # latitude of the center of the domain
    rot=0,  # rotation of the grid (in degrees)
    N=20,  # number of vertical layers
)

In [7]:
from roms_tools import get_glorys_bounds

In [8]:
bounds = get_glorys_bounds(grid.ds)

In [9]:
bounds

{'minimum_latitude': 14.833333015441895,
 'maximum_latitude': 32.91666793823242,
 'minimum_longitude': -101.16666412353516,
 'maximum_longitude': -76.83333587646484}

In [10]:
%%time

copernicusmarine.subset(
    dataset_id="cmems_mod_glo_phy_my_0.083deg_P1D-m",
    variables=["thetao", "so", "uo", "vo", "zos"],
    **bounds, # regional data
    start_datetime=start_time,
    end_datetime=end_time,
    coordinates_selection_method="outside",
    output_filename = "regional_GLORYS_Jan2012.nc",
    output_directory = "copernicus-data"
)

INFO - 2025-09-02T19:01:05Z - Selected dataset version: "202311"
2025-09-02 19:01:05 - INFO - Selected dataset version: "202311"
INFO - 2025-09-02T19:01:05Z - Selected dataset part: "default"
2025-09-02 19:01:05 - INFO - Selected dataset part: "default"
INFO - 2025-09-02T19:01:08Z - Starting download. Please wait...
2025-09-02 19:01:08 - INFO - Starting download. Please wait...


  0%|          | 0/584 [00:00<?, ?it/s]

INFO - 2025-09-02T19:31:42Z - Successfully downloaded to copernicus-data/regional_GLORYS_Jan2012_(1).nc
2025-09-02 19:31:42 - INFO - Successfully downloaded to copernicus-data/regional_GLORYS_Jan2012_(1).nc


CPU times: user 1min 9s, sys: 30 s, total: 1min 39s
Wall time: 30min 42s


ResponseSubset(file_path=PosixPath('copernicus-data/regional_GLORYS_Jan2012_(1).nc'), output_directory=PosixPath('copernicus-data'), filename='regional_GLORYS_Jan2012_(1).nc', file_size=1005.6628091603054, data_transfer_size=52508.84396946565, variables=['thetao', 'so', 'uo', 'vo', 'zos'], coordinates_extent=[GeographicalExtent(minimum=-101.16666412353516, maximum=-76.83333587646484, unit='degrees_east', coordinate_id='longitude'), GeographicalExtent(minimum=14.833333015441895, maximum=32.91666793823242, unit='degrees_north', coordinate_id='latitude'), TimeExtent(minimum='2012-01-01T00:00:00+00:00', maximum='2012-02-02T00:00:00+00:00', unit='iso8601', coordinate_id='time'), GeographicalExtent(minimum=0.49402499198913574, maximum=5727.9169921875, unit='m', coordinate_id='depth')], status='000', message='The request was successful.', file_status='DOWNLOADED')

## Downloading the Unified BGC Dataset

This section demonstrates how to download a **unified biogeochemical (BGC) climatology**, which integrates multiple observational and model-based sources:

- **Nutrients (NO₃⁻, PO₄³⁻, SiO₄⁴⁻)** and **dissolved oxygen** from the 2018 **World Ocean Atlas**
- **Dissolved iron (Fe)** and **nitrous oxide (N₂O)** from **in-situ measurements**
- **Dissolved inorganic carbon (DIC)** and **total alkalinity (ALK)** from the **GLODAPv2** global product
- **Other nutrients** (ammonium NH₄⁺, nitrite NO₂⁻, organic nitrogen) and **dissolved organic matter (DOM)** from **CESM model simulations**

The dataset is hosted on **Google Drive** and can be downloaded using the following procedure.

In [13]:
import gdown
import os

In [14]:
url = "https://drive.google.com/uc?id=1wUNwVeJsd6yM7o-5kCx-vM3wGwlnGSiq"

In [15]:
output_dir = "BGC-data"

In [16]:
os.makedirs(output_dir, exist_ok=True)

In [17]:
gdown.download(url, "BGC-data/BGCdataset.nc", quiet=False)

Downloading...
From (original): https://drive.google.com/uc?id=1wUNwVeJsd6yM7o-5kCx-vM3wGwlnGSiq
From (redirected): https://drive.google.com/uc?id=1wUNwVeJsd6yM7o-5kCx-vM3wGwlnGSiq&confirm=t&uuid=ae520df6-6c60-4c35-aa91-91767ec77231
To: /Users/noraloose/roms-tools/docs/BGC-data/BGCdataset.nc
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 21.4G/21.4G [06:13<00:00, 57.3MB/s]


'BGC-data/BGCdataset.nc'

### Handling File Download Limits

<div class="alert alert-info">
Note
    
If you encounter a `FileURLRetrievalError`, it usually means the file has been accessed or downloaded too many times recently. This often happens with large files or files shared by many users.  

**Workaround:** Download the file manually using the following link: [Download unified BGC dataset](https://drive.google.com/uc?id=1wUNwVeJsd6yM7o-5kCx-vM3wGwlnGSiq)

After downloading, place the file in the appropriate directory for your workflow.

</div>

