# ONC Hydrophone Data Download Cookbook

This notebook shows how to pull plotRes-style spectrograms from Ocean Networks Canada (ONC) using the helper utilities in this repo. It focuses on our parallel submit+poll downloader (which fires multiple ONC jobs, staggers the submissions, and polls for completion before downloading) and walks through the common ways we sample data:

* fixed-length windows (sequential span)
* sampling between two dates
* explicit timestamp lists

Along the way we highlight which spectrograms ONC keeps in the archive versus those generated on demand, and how to choose the available plot styles/resolutions.

> How it works: the helper submits multiple ONC data-product requests, staggers them a few seconds apart so the backend has breathing room, and then polls `/dataProductDelivery/status` for each run until ONC says the files are ready. Once ready, it downloads the MAT (and optionally FLAC) files. This parallel submit+poll loop typically delivers a 5–6× wall-clock speedup over the strictly sequential waitComplete workflow.



## Prerequisites

* A valid ONC API token stored in your `.env` file as `ONC_TOKEN`.
* The repo's Python environment (`pip install -r requirements.txt` or use `mamba/env.yml`).
* This notebook assumes the repository root is on `sys.path`; the cells below will append it automatically.



In [None]:
import os
import sys
import json
import time
from pathlib import Path
from datetime import datetime, timedelta, timezone

# ensure repo root in path
REPO_ROOT = Path(".." ).resolve()
ENV_ROOT = REPO_ROOT.parent
for path in {str(REPO_ROOT), str(ENV_ROOT)}:
    if path not in sys.path:
        sys.path.append(path)

from src.onc.common import load_config, print_status
from src.data.hydrophone_downloader import HydrophoneDownloader




In [2]:
ONC_TOKEN, DATA_DIR = load_config()
dl = HydrophoneDownloader(ONC_TOKEN, DATA_DIR)
print(f"Using data dir: {DATA_DIR}")


Using data dir: ./data


In [None]:
from src.utils.download_helpers import (
    DEFAULT_PARALLEL_CONFIG,
    build_hsd_filters,
    build_sampling_windows,
    run_parallel_for_device,
)



In [None]:
RUN_DOWNLOADS = True  # flip to True to execute ONC downloads
STAGGER_SECONDS = 3.0
MAX_WAIT_MINUTES = 45
POLL_INTERVAL_SECONDS = 30
MAX_DOWNLOAD_WORKERS = 4
MAX_ATTEMPTS = 6

PARALLEL_CONFIG = {**DEFAULT_PARALLEL_CONFIG}
PARALLEL_CONFIG.update({
    'stagger_seconds': STAGGER_SECONDS,
    'max_wait_minutes': MAX_WAIT_MINUTES,
    'poll_interval_seconds': POLL_INTERVAL_SECONDS,
    'max_download_workers': MAX_DOWNLOAD_WORKERS,
    'max_attempts': MAX_ATTEMPTS,
})



## Understanding ONC Hydrophone Spectrogram Products

ONC exposes its data products through the `dataProductDelivery` API. For hydrophones, the most common spectrogram products include:

| Data Product Code | Description | Notes |
| --- | --- | --- |
| `HSD` | High Sampled Data (plotRes spectrograms) | 5-minute windows at 1 Hz time resolution and configurable frequency resolution (e.g., 1024-point FFT). These are often generated on demand unless they reside in the archive. |
| `HAF` | Hydrophone Audio Files | Underlies the FLAC fetch; useful if you need raw audio rather than spectrograms. |
| `HDO` | Hydrophone Downsampled Output | Lower-resolution products suitable for continuous monitoring dashboards. |

Some windows may already exist in the ONC archive (fast downloads). Others require ONC to compute the spectrogram on the fly (longer wait, occasional `500`/`waiting on file system` messages). When you see `"data product running"` plus `"search complete, waiting on the file system to synchronize"`, it means ONC is finishing the write-out; polling until `status=complete` (or retrying after transient 500s) is expected.



### Choosing plotRes parameters

When you submit an HSD request you can control:

* `dpo_hydrophoneDataDiversionMode`: typically `OD` (Ocean Data) for standard processing.
* `dpo_spectralDataDownsample`: `1` (pre-generated one-minute averages), `2` (spectrogram resolution, produces `*_plotRes.mat`), `0` (full resolution, produces `*_fullRes.mat`). Non-default options are generated on demand and can take ~25 s per 5-minute slice, so keep requests small. [Spectral Data Downsampling docs](https://wiki.oceannetworks.ca/spaces/DP/pages/114032814/Spectral+Data+Downsampling)
* `dpo_spectrogramWindowLengthSec` and `dpo_spectrogramOverlap`. The helper in `HydrophoneDownloader` uses the default 1-second hop with 0.5 Hz bins; see ONC's documentation under `Spectrogram Plot Options` for other presets.

We'll expose a small helper below to tweak these when needed.



## Scenario 1: Full-Day Window (Single Request)

If you just want the last 24 hours with maximum resolution, request 288 five-minute spectrograms in one go. This produces a single run per device.



In [None]:
device = 'ICLISTENHF6324'
start_span = datetime(2024, 4, 1, 0, 0, tzinfo=timezone.utc)
end_span = start_span + timedelta(hours=6)
# 72 five-minute spectrograms cover roughly six hours
spectros_per_request_span = 72
request_windows_span = {device: [(start_span, end_span)]}

if RUN_DOWNLOADS:
    info = run_parallel_for_device(
        dl,
        device,
        request_windows_span,
        spectros_per_request_span,
        tag='six_hour_window',
        parallel_config=PARALLEL_CONFIG,
    )
    print(json.dumps(info, indent=2))
else:
    print("Set RUN_DOWNLOADS=True to execute this download")



## Scenario 2: Sampling Across Two Dates

Specify a sampling window and a target number of spectrograms. The helper distributes evenly between the start and end timestamps and builds multiple 5-minute requests.



In [None]:
device = 'ICLISTENHF6324'
sampling_start = datetime(2024, 4, 1, 0, 0, tzinfo=timezone.utc)
sampling_end = datetime(2024, 4, 3, 0, 0, tzinfo=timezone.utc)
total_spectros = 160
per_request = 12
sampling_windows = build_sampling_windows(device, sampling_start, sampling_end, total_spectros, per_request)
actual_requests = len(sampling_windows.get(device, []))

if RUN_DOWNLOADS:
    info = run_parallel_for_device(
        dl,
        device,
        sampling_windows,
        per_request,
        tag='sampled_window',
        parallel_config=PARALLEL_CONFIG,
    )
    print(json.dumps(info, indent=2))
else:
    print(f"Requests per device: {actual_requests}, range=({sampling_start}, {sampling_end})")



Request Id: 28885801
Estimated File Size: 48 MB
Estimated Processing Time: 35 s
To cancel the running data product, run 'onc.cancelDataProduct(28885801)'
Request Id: 28885802
Estimated File Size: 48 MB
Estimated Processing Time: 35 s
To cancel the running data product, run 'onc.cancelDataProduct(28885802)'
Request Id: 28885803
Estimated File Size: 48 MB
Estimated Processing Time: 35 s
To cancel the running data product, run 'onc.cancelDataProduct(28885803)'
Request Id: 28885804
Estimated File Size: 48 MB
Estimated Processing Time: 35 s
To cancel the running data product, run 'onc.cancelDataProduct(28885804)'
Request Id: 28885805
Estimated File Size: 48 MB
Estimated Processing Time: 35 s
To cancel the running data product, run 'onc.cancelDataProduct(28885805)'
Request Id: 28885806
Estimated File Size: 48 MB
Estimated Processing Time: 35 s
To cancel the running data product, run 'onc.cancelDataProduct(28885806)'
Request Id: 28885807
Estimated File Size: 48 MB
Estimated Processing Time: 3




Downloading data product files with runId 56040030...

   Running... processing hydrophone spectral data, 75.0% complete for deployment 1 of 1. (0.0% was pre-generated.)




Downloading data product files with runId 56040031...

   Running... processing hydrophone spectral data, 8.3% complete for deployment 1 of 1. (0.0% was pre-generated.).
   Running... processing hydrophone spectral data, 91.7% complete for deployment 1 of 1. (0.0% was pre-generated.)




Downloading data product files with runId 56040032...

   Running... processing hydrophone spectral data, 16.7% complete for deployment 1 of 1. (0.0% was pre-generated.).




Downloading data product files with runId 56040033...

   Running... processing hydrophone spectral data, 8.3% complete for deployment 1 of 1. (0.0% was pre-generated.)
   Running... processing hydrophone spectral data, 83.3% complete for deployment 1 of 1. (0.0% was pre-generated.).
Downloading data product files with runId 56040034...

Downloading data product files with runId 56040035...

   Running... processing hydrophone spectral data, 83.3% complete for deployment 1 of 1. (0.0% was pre-generated.)
   Running... processing hydrophone spectral data, 8.3% complete for deployment 1 of 1. (0.0% was pre-generated.)
Downloading data product files with runId 56040036...

   Running... processing hydrophone spectral data, 16.7% complete for deployment 1 of 1. (0.0% was pre-generated.)...
   Running... processing hydrophone spectral data, 75.0% complete for deployment 1 of 1. (0.0% was pre-generated.)..
   Running... processing hydrophone spectral data, 91.7% complete for deployment 1 of




Downloading data product files with runId 56040038...

   Running... processing hydrophone spectral data, 16.7% complete for deployment 1 of 1. (0.0% was pre-generated.).
   Running... processing hydrophone spectral data, 91.7% complete for deployment 1 of 1. (0.0% was pre-generated.).




Downloading data product files with runId 56040039...

   Running... processing hydrophone spectral data, 18.2% complete for deployment 1 of 1. (0.0% was pre-generated.).{
  "device": "ICLISTENHF6324",
  "runs_total": 14,
  "runs_downloaded": 14,
  "runs_errors": 0,
  "processed_mat": 166,
  "input_path": "./data/ICLISTENHF6324/sampled_window_2024-04-01_to_2024-04-03/mat/",
  "flac_path": "./data/ICLISTENHF6324/sampled_window_2024-04-01_to_2024-04-03/flac/",
  "wall_seconds": 102.64560627937317
}


## Scenario 3: Explicit Timestamp List

When you already know the exact start times, build the `request_windows` list yourself. Below we request three specific bursts.



In [None]:
explicit_windows = {
    device: [
        (datetime(2024, 4, 1, 4, 25, tzinfo=timezone.utc), datetime(2024, 4, 1, 4, 30, tzinfo=timezone.utc)),
        (datetime(2024, 4, 1, 14, 0, tzinfo=timezone.utc), datetime(2024, 4, 1, 14, 5, tzinfo=timezone.utc)),
        (datetime(2024, 4, 2, 3, 15, tzinfo=timezone.utc), datetime(2024, 4, 2, 3, 20, tzinfo=timezone.utc)),
    ]
}

if RUN_DOWNLOADS:
    info = run_parallel_for_device(
        dl,
        device,
        explicit_windows,
        12,
        tag='explicit_times',
        parallel_config=PARALLEL_CONFIG,
    )
    print(json.dumps(info, indent=2))
else:
    print(f"Prepared {len(explicit_windows[device])} explicit windows")



Request Id: 28885824
Estimated File Size: 48 MB
Estimated Processing Time: 35 s
To cancel the running data product, run 'onc.cancelDataProduct(28885824)'
Request Id: 28885825
Estimated File Size: 48 MB
Estimated Processing Time: 35 s
To cancel the running data product, run 'onc.cancelDataProduct(28885825)'
Request Id: 28885826
Estimated File Size: 48 MB
Estimated Processing Time: 35 s
To cancel the running data product, run 'onc.cancelDataProduct(28885826)'

Downloading data product files with runId 56040043...

Downloading data product files with runId 56040044...

Downloading data product files with runId 56040045...

   Running... working on time segment 1 of 1, for device deployment 1 of 1.
   Running... working on time segment 1 of 1, for device deployment 1 of 1.
   Running... working on time segment 1 of 1, for device deployment 1 of 1.....................
Downloading data product files with runId 56040045...
{
  "device": "ICLISTENHF6324",
  "runs_total": 3,
  "runs_downloaded"

## Scenario 4: JSON Timestamp Requests & Clips

When you have ad-hoc sound annotations in JSON, feed them straight into `HydrophoneDownloader.download_requests_from_json(...)`. The loader accepts either the legacy `{device: [[Y,M,D,H,M,S], ...]}` map or a richer schema with `defaults` and `requests`. Each request can include just a single `timestamp`, or both `start` and `end` timestamps to cover a full event. Set `clip` to `true` to save cropped NPZ/FLAC snippets; leave it `false` to keep only the full five-minute spectrogram windows. The downloader automatically spans adjacent files whenever the requested sound runs past a 5-minute boundary.

**Key JSON fields**
- `timestamp`: single instant that lands inside the target spectrogram.
- `start` / `end`: optional window bounds; supply one or both to describe the full duration.
- `pad_before_seconds` / `pad_after_seconds`: extra context appended to the clip.
- `clip`: `true` crops spectrogram/audio into `clips/` folders; `false` keeps full windows.
- `download_audio`, `spectrogram_format`, `output_tag`, `deviceCode`: per-request overrides for audio, format, or destination.

To run this outside the notebook: `python scripts/download_hydrophone_data.py --mode specific --config my_requests.json --clip-outputs --request-audio-clips --request-spectrogram-format mat`. The CLI will honor the same `defaults` block while still auto-extending windows that cross 5-minute file boundaries.



In [None]:
json_request_payload = {
    "defaults": {
        "deviceCode": device,
        "output_tag": "json_timestamp_demo",
        "pad_seconds": 0,
        "download_audio": True,
        "clip": True,
    },
    "requests": [
        {
            "timestamp": "2024-04-01T04:25:00Z",
            "label": "single ping"
        },
        {
            "start": "2024-04-01T14:00:00Z",
            "end": "2024-04-01T14:03:30Z",
            "pad_before_seconds": 15,
            "pad_after_seconds": 20,
            "label": "ship ramp",
            "download_audio": True
        },
        {
            "start": "2024-04-02T03:15:00Z",
            "end": "2024-04-02T03:19:59Z",
            "clip": False,
            "label": "keep full window"
        }
    ]
}

json_requests_path = Path(DATA_DIR) / "demo_timestamp_requests.json"
json_requests_path.write_text(json.dumps(json_request_payload, indent=2))
print(f"Wrote JSON requests to {json_requests_path}")



In [None]:
if RUN_DOWNLOADS:
    summaries = dl.download_requests_from_json(
        str(json_requests_path),
        default_pad_seconds=15,
        default_tag="json_timestamp_demo",
        clip_outputs=True,
        spectrogram_format='mat',
        download_audio=True,
    )
    print(json.dumps(summaries, indent=2))
else:
    print(json.dumps(json_request_payload, indent=2))



## Scenario 5: Custom Plot Resolution / FFT Settings

If you need to change downsample ratio, FFT window, or overlap, adjust the filters before calling ONC. The snippet below requests 5-minute windows at 0.5x downsample and a 2-second window length.



In [None]:
custom_start = datetime(2024, 4, 1, 6, 0, tzinfo=timezone.utc)
custom_end = custom_start + timedelta(minutes=5)
custom_filters = build_hsd_filters(
    device,
    custom_start,
    custom_end,
    downsample=4,
    window_sec=2.0,
    overlap=0.75,
)

if RUN_DOWNLOADS:
    req = dl.onc.requestDataProduct(custom_filters)
    run_data = dl.onc.runDataProduct(req['dpRequestId'], waitComplete=True)
    rec = {
        'deviceCode': device,
        'dpRequestId': req['dpRequestId'],
        'runIds': run_data.get('runIds'),
        'start': custom_filters['dateFrom'],
        'end': custom_filters['dateTo'],
        'outPath': dl.input_path,
        'status': 'submitted',
        'createdAt': dl._format_iso_utc(datetime.now(timezone.utc)),
        'attempts': 0,
    }
    status, updated = dl.try_download_run(rec, allow_rerun=False, download_flac=False)
    print(status, updated.get('input_path'))
else:
    print(json.dumps(custom_filters, indent=2))



## Downloading FLAC Audio

Once spectrograms are downloaded you can optionally pull the raw audio (`extension='flac'`) for the same windows. The helper exposes `download_flac_files`, so after `try_download_run` completes you can call it with the same start/end timestamps.



In [None]:
flac_start = datetime(2024, 4, 1, 12, 0, tzinfo=timezone.utc)
flac_end = flac_start + timedelta(minutes=5)

if RUN_DOWNLOADS:
    try:
        dl.download_flac_files(device, flac_start.strftime('%Y-%m-%dT%H:%M:%S.%f')[:-3] + 'Z',
                               flac_end.strftime('%Y-%m-%dT%H:%M:%S.%f')[:-3] + 'Z')
        print("FLAC files downloaded to", dl.flac_path)
    except Exception as exc:
        print("FLAC download failed:", exc)
else:
    print("Flip RUN_DOWNLOADS=True to fetch FLAC audio for", flac_start)



In [None]:
# build_hsd_filters lives in src.utils.download_helpers
