## Table of Contents

- [1. Introduction & Setup](#1-introduction--setup)
  - [1.1 Hydrophone Deployments & Inventory](#11-hydrophone-deployments--inventory)
    - [1.1a Hydrophone Inventory (Current + History)](#11a-hydrophone-inventory-current--history)
- [2. Download Workflows (Spectrograms / Audio / Both)](#2-download-workflows-spectrograms--audio--both)
  - [2.1 Basic Spectrogram Download (2 Spectrograms / 10 Minutes)](#21-basic-spectrogram-download-2-spectrograms--10-minutes)
  - [2.2 Range Downloads (Between Two Dates)](#22-range-downloads-between-two-dates)
  - [2.3 Sampling Mode (Uniform Samples Across Range)](#23-sampling-mode-uniform-samples-across-range)
  - [2.4 Event-Based Downloads (Simple)](#24-event-based-downloads-simple)
  - [2.5 Centered Audio Clip (Custom Duration)](#25-centered-audio-clip-custom-duration)
- [3. Custom Spectrogram Generation (Local)](#3-custom-spectrogram-generation-local)
  - [3.1 SpectrogramGenerator Basics](#31-spectrogramgenerator-basics)
  - [3.2 Custom Parameters](#32-custom-parameters)
  - [3.3 Batch Processing Audio Directory](#33-batch-processing-audio-directory)
- [4. Advanced Event-Based Workflows (JSON / CSV / Python Lists)](#4-advanced-event-based-workflows-json--csv--python-lists)
  - [4.1 Direct Timestamps (Python Lists / Datetime Objects)](#41-direct-timestamps-python-lists--datetime-objects)
  - [4.2 Request Files (JSON + CSV)](#42-request-files-json--csv)
  - [4.3 Supported Date/Time Formats](#43-supported-datetime-formats)
- [5. Advanced Workflows](#5-advanced-workflows)
  - [5.1 Request-Driven Custom Spectrograms (JSON/CSV)](#51-request-driven-custom-spectrograms-jsoncsv)
  - [5.2 Batch Pipeline: Download Audio → Local Spectrograms](#52-batch-pipeline-download-audio--local-spectrograms)
  - [5.3 Multi-Device Downloads](#53-multi-device-downloads)
- [6. Output Folder Structure](#6-output-folder-structure)
- [7. Troubleshooting & Tips](#7-troubleshooting--tips)



# 1. Introduction & Setup
This section covers setup, prerequisites, and the main data products you'll use throughout the notebook.

## What This Notebook Covers

This notebook demonstrates how to:
- Download **ONC-generated spectrograms** (MAT/PNG files)
- Download **raw audio files** (FLAC/WAV)
- Create **custom spectrograms** from audio with your own parameters
- Handle various **input formats** (JSON, CSV, Python lists)
- Work with **specific timestamps** or **date ranges**

## ONC Data Products Overview

| Product Code | Description | Use Case |
| --- | --- | --- |
| `HSD` | Hydrophone Spectrogram Data | Spectrogram plots + spectral MAT; 1-min MAT pre-generated, higher-res on request |
| `HAF` | Hydrophone Audio Files | Raw audio (FLAC/WAV) |

## Prerequisites

- `.env` with `ONC_TOKEN=...` in the repo root (optional: `DATA_DIR=/path/to/data`; default is `data/`)
- Package installed: `pip install onc-hydrophone-data`
- All timestamps are converted to UTC for requests; provide tz-aware datetimes or a `timezone` field in JSON/CSV.


In [None]:
# Standard imports
import os
import sys
import json
import numpy as np
from pathlib import Path
from datetime import datetime, timedelta, timezone

# Ensure repo is in path
REPO_ROOT = Path("..").resolve()
if str(REPO_ROOT) not in sys.path:
    sys.path.append(str(REPO_ROOT))

# Core imports
from onc_hydrophone_data.onc.common import load_config, print_status
from onc_hydrophone_data.data.hydrophone_downloader import HydrophoneDownloader

from onc_hydrophone_data.utils.plotting import (
    find_first_file,
    plot_first_spectrogram,
    plot_first_audio,
    plot_onc_mat_spectrogram,
    plot_audio_waveform,
    plot_clip_pair,
    plot_spectrogram_clip,
    plot_request_results,
)


In [None]:
# Load configuration
ONC_TOKEN, DATA_DIR = load_config()
dl = HydrophoneDownloader(ONC_TOKEN, DATA_DIR)
print(f"✅ Data directory: {DATA_DIR}")

---
## 1.1 Hydrophone Deployments & Inventory
Use deployment dates to pick time ranges that actually contain data before making requests.

In this section we:
- Pull a full hydrophone inventory (current + history)
- Select devices and set an example date for the rest of the notebook

### 1.1a Hydrophone Inventory (Current + History)
Collect all hydrophones, their current deployments, and a history view with location metadata.
---


**Hydrophone Inventory**
Pulls deployment metadata for all hydrophones and builds two views: current deployments and full history.


In [None]:
from onc_hydrophone_data.data.deployment_checker import HydrophoneDeploymentChecker

checker = HydrophoneDeploymentChecker(ONC_TOKEN)
inventory = checker.collect_hydrophone_inventory()


**Table 1: Current Deployments (Active Devices)**
One row per active device with `device_id`, location metadata, depth/coords, and mapping labels.


In [None]:
_ = checker.show_hydrophone_inventory_table(inventory, view='current')


**Table 2: Deployment History (All Deployments)**
One row per deployment (includes `device_id`). Increase `max_rows` or set it to `None` to show everything.


In [None]:
_ = checker.show_hydrophone_inventory_table(inventory, view='history', max_rows=20)


**Select target devices**
Choose device codes or device IDs after reviewing the inventory tables above.


In [None]:
# Global settings
# Default device for examples (update to your target device code)
DEVICE = 'ICLISTENHF6324'
# Optional second device for multi-device request examples
# (set to another device code you have access to)
DEVICE_2 = 'ICLISTENHF1332'

**Table 3: Deployments for Selected Devices**
Shows full deployment history for the devices you selected (code or ID).


In [None]:
_ = checker.show_device_deployments(device_codes=[DEVICE, DEVICE_2], inventory=inventory)
# Or filter by numeric device IDs if you have them:
# _ = checker.show_device_deployments(device_ids=[12345, 67890], inventory=inventory)


In [None]:
# Example date used throughout the notebook
# Choose a time within the deployment ranges shown above
EXAMPLE_DATE = datetime(2024, 4, 1, 12, 0, tzinfo=timezone.utc)

---
# 2. Download Workflows (Spectrograms / Audio / Both)
Choose the simplest pattern that matches your goal. Each helper builds the 5-minute request windows and handles batching/parallelism for you.

- Spectrogram downloads pull ONC HSD files (MAT/PNG).
- To also download matching audio, add `download_audio=True` to any spectrogram call.
- Audio-only workflows use the `download_audio_*` helpers and fetch FLAC/WAV files.

ONC provides HSD spectrograms in 5-minute windows. For MAT data, you can choose the spectral resolution with `dpo_spectralDataDownsample` in `HSD_OPTIONS`:
- `1`: one-minute averaged MAT (pre-generated, fast)
- `2`: spectrogram resolution MAT (on demand, file name includes `_plotRes`)
- `0`: full resolution MAT (on demand, file name includes `_fullRes`)

The ONC API also exposes data product options you can pass via `data_product_options={...}`:

| Option | DPO key | Values | Notes |
| --- | --- | --- | --- |
| Spectral downsample | `dpo_spectralDataDownsample` | `1`, `2`, `0` | 1=pre-generated; 2/0 are on-demand MAT |
| Diversion mode | `dpo_hydrophoneDataDiversionMode` | `OD`, `LPF`, `HPF`, `All` | Filter by diversion/filters |
| Acquisition mode | `dpo_hydrophoneAcquisitionMode` | `LF`, `HF`, `All` | Duty-cycle sample rate mode |
| Spectrogram source | `dpo_spectrogramSource` | `MIX`, `WAV`, `FFT` | PNG/PDF plots only |
| Concatenation | `dpo_spectrogramConcatenation` | `None`, `Adjacent`, `Daily`, `Weekly`, `Concatenate` | Default: None; non-default downsample disables concat |
| Colour palette | `dpo_spectrogramColourPalette` | `0`-`5` | PNG/PDF plots only |
| Upper colour limit | `dpo_upperColourLimit` | `-1000` or `0`-`140` | PNG/PDF plots only |
| Lower colour limit | `dpo_lowerColourLimit` | `-1000` or `-160`-`140` | PNG/PDF plots only |
| Upper frequency (preset) | `dpo_spectrogramFrequencyUpperLimit` | `-1`, `1000`, `10000` | PNG/PDF plots only |
| Upper frequency (explicit) | `dpo_spectrogramUpperFrequencyLimit` | `100`-`500000` | PNG/PDF plots only |

FFT window/overlap are fixed on the ONC side; for custom FFT settings, use the custom spectrogram generation section later in the notebook.


In [None]:
# Optional ONC data product options (override defaults).
# Leave empty to keep defaults; uncomment to customize.
HSD_OPTIONS = {
    # "dpo_spectralDataDownsample": 1,  # 1=min avg, 2=plotRes, 0=fullRes
    # "dpo_hydrophoneDataDiversionMode": "OD",  # OD, LPF, HPF, All
    # "dpo_hydrophoneAcquisitionMode": "All",  # LF, HF, All
    # "dpo_spectrogramSource": "MIX",  # PNG/PDF only: MIX, WAV, FFT
    # "dpo_spectrogramConcatenation": "None",  # MAT/PNG/PDF (default)
    # "dpo_spectrogramColourPalette": 0,  # PNG/PDF only: 0-5
    # "dpo_upperColourLimit": -1000,  # PNG/PDF only: -1000 or 0-140
    # "dpo_lowerColourLimit": -1000,  # PNG/PDF only: -1000 or -160-140
    # "dpo_spectrogramFrequencyUpperLimit": -1,  # PNG/PDF only: -1, 1000, 10000
    # "dpo_spectrogramUpperFrequencyLimit": 10000,  # PNG/PDF only: 100-500000
}


## 2.1 Basic Spectrogram Download (2 Spectrograms / 10 Minutes)
Download a short window to validate your setup and directory paths.


In [None]:
# Download 10 minutes of spectrograms (2 x 5-min windows)
start = EXAMPLE_DATE
end = start + timedelta(minutes=10)
spectrograms_per_batch = 2

info = dl.download_spectrograms_for_range(
    DEVICE,
    start,
    end,
    spectrograms_per_batch,
    tag='basic_download',
    # download_audio=True,  # also download matching audio
    # data_product_options=HSD_OPTIONS,
)
print(json.dumps(info, indent=2))

plot_first_spectrogram(dl, title="Basic download spectrogram")


## 2.2 Range Downloads (Between Two Dates)
Download every 5-minute file between two dates. The helper builds the request windows and batches them for you.

### Spectrograms (optional audio)
Set `spectrograms_per_batch` to control request size.


In [None]:
# Download ALL spectrograms between two dates, batched by spectrograms_per_batch
range_start = datetime(2024, 4, 1, 0, 0, tzinfo=timezone.utc)
range_end = range_start + timedelta(minutes=30)  # keep short for tutorial
spectrograms_per_batch = 3  # number of 5-min spectrograms per request

print(f"Date range: {range_start} to {range_end}")

result = dl.download_spectrograms_for_range(
    DEVICE,
    range_start,
    range_end,
    spectrograms_per_batch,
    # download_audio=True,
    # data_product_options=HSD_OPTIONS,
)
print(json.dumps(result, indent=2))

plot_first_spectrogram(dl, title="Date range spectrogram")


### Audio only
Download all 5-minute audio files that overlap the range (FLAC, with WAV fallback).

This is the audio-only equivalent of the spectrogram range download above.


In [None]:
# Download audio for a time range
audio_start = datetime(2024, 4, 1, 12, 0, tzinfo=timezone.utc)
audio_end = audio_start + timedelta(minutes=10)  # 2 files

print(f"Audio range: {audio_start} to {audio_end}")

dl.download_audio_for_range(
    DEVICE,
    audio_start,
    audio_end,
)
print(f"Audio saved to: {dl.audio_path}")
plot_first_audio(dl, max_seconds=10.0)


## 2.3 Sampling Mode (Uniform Samples Across Range)
Sampling selects evenly spaced 5-minute windows across the full date range. This gives a fast, representative overview without downloading everything.

You control:
- start/end date
- total samples (number of 5-minute windows)
- per-request batch size (how many windows per request)

### Spectrograms (optional audio)


In [None]:
# Sample N spectrograms evenly across a date range
sampling_start = datetime(2024, 4, 1, 0, 0, tzinfo=timezone.utc)
sampling_end = datetime(2024, 4, 1, 2, 0, tzinfo=timezone.utc)  # 2 hours
total_samples = 4
spectrograms_per_request = 2

print(f"Sampling {total_samples} spectrograms from {sampling_start} to {sampling_end}")

info = dl.download_sampled_spectrograms(
    DEVICE,
    sampling_start,
    sampling_end,
    total_samples,
    spectrograms_per_request,
    # download_audio=True,
    # data_product_options=HSD_OPTIONS,
)
print(json.dumps(info, indent=2))

plot_first_spectrogram(dl, title="Sampled spectrogram")


### Audio only
Sample evenly spaced 5-minute audio files across the same range.


In [None]:
# Sample N audio files evenly across the same date range
# Reuse sampling_start/sampling_end/total_samples from above
audio_files_per_request = 2

audio_info = dl.download_sampled_audio(
    DEVICE,
    sampling_start,
    sampling_end,
    total_samples,
    audio_files_per_request,
)

# json.dumps can't handle datetime objects directly, so serialize them first.
serializable_info = {
    **audio_info,
    "start_dt": audio_info["start_dt"].isoformat(),
    "end_dt": audio_info["end_dt"].isoformat(),
    "request_windows": [
        (start.isoformat(), end.isoformat())
        for start, end in audio_info["request_windows"]
    ],
}
print(json.dumps(serializable_info, indent=2))

plot_first_audio(dl, max_seconds=10.0)


## 2.4 Event-Based Downloads (Simple)
Provide event timestamps and let the helper map each one to its containing 5-minute window. This keeps the call minimal.

If you need padding, clipping, JSON/CSV files, or per-event overrides, jump to Section 4 (Advanced Event-Based Workflows).

### Spectrograms (optional audio)


In [None]:
# Download spectrograms for event timestamps (mapped to 5-minute windows)
event_times = [
    datetime(2024, 4, 1, 4, 25, 30, tzinfo=timezone.utc),
    datetime(2024, 4, 1, 14, 10, 15, tzinfo=timezone.utc),
    datetime(2024, 4, 2, 3, 45, 0, tzinfo=timezone.utc),
]

spectrograms_per_request = 1

info = dl.download_spectrograms_for_events(
    DEVICE,
    event_times,
    spectrograms_per_request,
    tag='event_times',
    # download_audio=True,
    # data_product_options=HSD_OPTIONS,
)
print(json.dumps(info, indent=2))

plot_first_spectrogram(dl, title="Event-based spectrogram")


### Audio only
Download the 5-minute audio files that contain each event timestamp.


In [None]:
# Download audio for the same event timestamps
# Reuse event_times from the spectrogram example above.

dl.download_audio_for_events(DEVICE, event_times)
plot_first_audio(dl, max_seconds=10.0)


## 2.5 Centered Audio Clip (Custom Duration)
Compute which 5-minute files are needed for a shorter clip, then download them in one call.

Use `describe_audio_window` to see which files are needed, and `download_audio_for_center_time` to fetch them.
This example intentionally crosses a 5-minute boundary, so two adjacent audio files are required before clipping to the exact duration.


In [None]:
# Download 30 seconds of audio centered on a timestamp near a 5-minute boundary
center_time = datetime(2024, 4, 1, 12, 34, 50, tzinfo=timezone.utc)
duration_seconds = 30

window = dl.describe_audio_window(center_time, duration_seconds)

dl.download_audio_for_center_time(DEVICE, center_time, duration_seconds)
print("Files downloaded - use audio utils to stitch and clip")

plot_first_audio(dl, max_seconds=10.0)


---
# 3. Custom Spectrogram Generation (Local)
Generate spectrograms locally when you need full control over parameters and FFT settings.


## 3.1 SpectrogramGenerator Basics
Minimal setup for producing custom spectrograms from audio files.

Defaults work well for most cases; the example below narrows the frequency range for readability. The next section lists every tunable parameter.


In [None]:
from onc_hydrophone_data.audio import SpectrogramGenerator

# Create generator with default settings
generator = SpectrogramGenerator(
    win_dur=1.0,          # 1 second FFT window
    overlap=0.5,          # 50% overlap
    window_type='hann',   # Window function
    freq_lims=(10, 1000) # 10 Hz to 1 kHz
)

print("SpectrogramGenerator created with:")
print(f"  Window: {generator.win_dur}s")
print(f"  Overlap: {generator.overlap}")
print(f"  Window type: {generator.window_type}")
print(f"  Freq range: {generator.freq_lims}")


## 3.2 Custom Parameters
Every SpectrogramGenerator argument is optional; below are the tunable knobs and what they do.


In [None]:
# Full customization example (all arguments are optional)
custom = SpectrogramGenerator(
    win_dur=0.5,
    overlap=0.75,
    window_type=('kaiser', 14.0),
    nfft=None,
    win_length=None,
    hop_length=None,
    freq_lims=(10, 24000),
    colormap='magma',
    clim=(-80, 0),
    log_freq=True,
    max_duration=120.0,
    clip_start=5.0,
    clip_end=65.0,
    backend='auto',
    scaling='density',
    quiet=False,
    use_logging=True,
)

# Presets for common use cases
high_res = SpectrogramGenerator(
    win_dur=0.1,           # 100ms window (higher time resolution)
    overlap=0.9,           # 90% overlap
    window_type='hann',
    freq_lims=(1, 24000), # Full frequency range
    clim=(-80, 0)      # dB scale limits
)

low_freq = SpectrogramGenerator(
    win_dur=2.0,           # 2s window (better freq resolution)
    overlap=0.5,
    window_type=('kaiser', 8.0),
    freq_lims=(10, 200),  # Focus on low frequencies
)

print("Custom config window type:", custom.window_type)
print("High-res config:", high_res.win_dur, high_res.freq_lims)
print("Low-freq config:", low_freq.win_dur, low_freq.freq_lims)


Example: exact-duration trimming with extra context for the STFT.

By default, `clip_pad_seconds` uses `auto` (half the window length) to reduce edge artifacts. You can override it with an explicit value if needed.


In [None]:
# Exact-duration trimming with STFT context
from IPython.display import display

target_seconds = 30.0
clip_start = 10.0
clip_end = clip_start + target_seconds
audio_path = find_first_file(dl.audio_path, ["*.flac", "*.wav"])

if audio_path:
    local_gen = SpectrogramGenerator(
        win_dur=0.5,
        overlap=0.5,
        window_type='hann',
        freq_lims=(10, 1000),
        clip_start=clip_start,
        clip_end=clip_end,
        quiet=True,
    )
    # clip_pad_seconds defaults to 'auto' (half-window); set it if you want more context.
    audio_data, sr, clip_meta = local_gen.load_audio(audio_path)
    freqs, times, _, db = local_gen.compute_spectrogram(
        audio_data,
        sr,
        clip_meta=clip_meta,
    )
    fig = local_gen.plot_spectrogram(
        freqs,
        times,
        db,
        title=f"Local spectrogram {target_seconds:.1f}s (trimmed after STFT)",
    )
    display(fig)
else:
    print("No audio file found; run a download cell first.")


## 3.3 Batch Processing Audio Directory
Process an entire directory of audio files in one call.


In [None]:
# Process all audio files in a directory
# Uses the most recent audio download path (run an audio download above).
audio_dir = Path(dl.audio_path)
output_dir = audio_dir.parent / "custom_spectrograms"

if audio_dir.exists():
    results = generator.process_directory(
        input_dir=audio_dir,
        save_dir=output_dir,
        save_plot=True,   # Save PNG plots
        save_mat=True,    # Save MAT files
    )
    print(f"Processed {len(results)} files")
else:
    print(f"No audio files at {audio_dir}")
    print("Run audio downloads first, then process them here (or set audio_dir manually)")


---
# 4. Advanced Event-Based Workflows (JSON / CSV / Python Lists)
Use this section when you need padding, clipping, or per-event overrides. The simple event-based download in Section 2.4 just maps timestamps to 5-minute windows.

For downloading data around specific events/annotations.


## 4.1 Direct Timestamps (Python Lists / Datetime Objects)
When you have event times, ONC spectrograms are stored in 5-minute blocks. Map each event time to the surrounding 5-minute window (e.g., 04:27 -> 04:25-04:30) and request those windows. The example below passes those explicit start/end windows directly so you control exactly what gets downloaded.


In [None]:
# Explicit 5-minute windows (e.g., derived from event timestamps)
explicit_windows = {
    DEVICE: [
        (datetime(2024, 4, 1, 4, 25, tzinfo=timezone.utc), 
         datetime(2024, 4, 1, 4, 30, tzinfo=timezone.utc)),
        (datetime(2024, 4, 1, 14, 0, tzinfo=timezone.utc), 
         datetime(2024, 4, 1, 14, 5, tzinfo=timezone.utc)),
        (datetime(2024, 4, 2, 3, 15, tzinfo=timezone.utc), 
         datetime(2024, 4, 2, 3, 20, tzinfo=timezone.utc)),
    ]
}

print(f"Downloading {len(explicit_windows[DEVICE])} specific time windows")

info = dl.download_spectrogram_windows(
    DEVICE,
    explicit_windows,
    spectrograms_per_request=1,
    tag='explicit_times',
)
print(json.dumps(info, indent=2))

plot_first_spectrogram(dl, title="Explicit window spectrogram")


Example formats (datetime objects and legacy tuples):
Use these directly in Python, or convert them into the request schema below.


In [None]:
# Python datetime objects
timestamps_datetime = [
    datetime(2024, 4, 1, 12, 30, 0, tzinfo=timezone.utc),
    datetime(2024, 4, 1, 14, 45, 30, tzinfo=timezone.utc),
    datetime(2024, 4, 2, 8, 15, 0, tzinfo=timezone.utc),
]

# Tuple format (legacy)
timestamps_tuple = [
    [2024, 4, 1, 12, 30, 0],
    [2024, 4, 1, 14, 45, 30],
]

print("Datetime list:", timestamps_datetime[:2])
print("Tuple list:", timestamps_tuple)

## 4.2 Request Files (JSON + CSV)
Use request files when you have many events or want per-request overrides in one batch.
Examples below run the request file as-is; pass keyword args to override JSON defaults if needed.


**Request schema (JSON/CSV)**

JSON uses a `{defaults, requests}` payload. CSV is a flat table with the same fields (one row per request). Required per request: `deviceCode` and either `timestamp` *or* a `start`/`end` window.
You can set `timezone` in `defaults` or per-request; it applies to naive timestamps.
You can mix multiple devices in one file by setting `deviceCode` per request (JSON) or per row (CSV).

JSON format:
```
{
  "defaults": { ... },
  "requests": [ { ... }, { ... } ]
}
```

| Field | Type | Required | Notes |
| --- | --- | --- | --- |
| `deviceCode` | string | yes | Hydrophone device code (e.g., `ICLISTENHF6324`) |
| `timestamp` | string | if no `start`/`end` | ISO 8601 timestamp (UTC or offset, e.g., `2024-04-01T12:30:00Z`) |
| `timezone` | string | no | Timezone for naive timestamps (e.g., `America/Vancouver`, `UTC`, `-07:00`) |
| `start` | string | if no `timestamp` | ISO 8601 timestamp (UTC or offset) |
| `end` | string | no | ISO 8601 timestamp (UTC or offset) |
| `duration_seconds` | number | no | Used when `start` is set but `end` is omitted |
| `pad_seconds` | number | no | Symmetric padding around `timestamp` or `start`/`end` |
| `pad_before_seconds` | number | no | Override padding before |
| `pad_after_seconds` | number | no | Override padding after |
| `download_audio` | bool | no | Download audio files (default: false) |
| `download_spectrogram` | bool | no | Download ONC spectrograms (default: true) |
| `spectrogram_format` | string | no | `mat` or `png` |
| `clip` | bool | no | Clip outputs to the padded window |
| `audio_extension` | string | no | `flac` or `wav` |
| `output_tag` | string | no | Output folder tag |
| `output_name` | string | no | Override clip basename |
| `label` / `description` | string | no | Metadata label |
| `data_product_options` | object | no | ONC `dpo_*` options (same as `HSD_OPTIONS`) |

Tip: In CSV, use a `deviceCode` column to match the JSON field name. For `data_product_options`, store a JSON string per row and parse it in Python.

**Padding + clipping behavior**
Each request expands to the 5-minute coverage windows needed for downloads. If padding crosses a window boundary, the downloader fetches adjacent files.

- Audio clips are trimmed to the exact padded interval.
- ONC spectrogram clips are trimmed to the nearest time-bin boundaries on the fixed 5-minute grid, so the spectrogram duration can differ slightly from the audio (up to ~one bin).

The bin width is `300s / num_time_bins` for each 5-minute file and is stored as `seconds_per_column` in the clip metadata.


### 4.2a JSON Example + Execution
Write a request file, then execute it to download audio and/or ONC spectrograms.


In [None]:
# New JSON format with defaults
json_new = {
    "defaults": {
        "deviceCode": "ICLISTENHF6324",
        "pad_seconds": 30,
        "data_product_options": HSD_OPTIONS
    },
    "requests": [
        {"timestamp": "2024-04-01T12:30:00Z"},
        {"start": "2024-04-01T14:00:00Z", "end": "2024-04-01T14:05:00Z"}
    ]
}

# Legacy JSON format
json_legacy = {
    "ICLISTENHF6324": [
        [2024, 4, 1, 12, 30, 0],
        [2024, 4, 1, 14, 45, 30]
    ]
}

print("New format:", json.dumps(json_new, indent=2)[:200] + "...")

In [None]:
# New JSON format with defaults
# Example includes multiple devices (set DEVICE_2 to a different device to test)
json_requests = {
  "defaults": {
    "pad_seconds": 15,
    "download_audio": True,
    "clip": True,
    "data_product_options": HSD_OPTIONS
  },
  "requests": [
    {
      "deviceCode": DEVICE,
      "timestamp": "2024-04-01T12:34:50Z",
      "label": "whale call 1"
    },
    {
      "deviceCode": DEVICE_2,
      "start": "2024-04-01T12:30:00Z",
      "end": "2024-04-01T12:33:30Z",
      "pad_before_seconds": 10,
      "pad_after_seconds": 20,
      "label": "ship noise event"
    }
  ]
}
json_path = Path(DATA_DIR) / "example_requests.json"
json_path.write_text(json.dumps(json_requests, indent=2))
print(f"Saved requests to: {json_path}")
print(json.dumps(json_requests, indent=2))


In [None]:
json_path = Path(DATA_DIR) / "example_requests.json"
# Execute JSON requests (uses settings from the JSON file)
results = dl.download_requests_from_json(
    str(json_path),
)
print(json.dumps(results, indent=2))

plot_request_results(results, downloader=dl)


### 4.2b CSV Example + Execution
CSV can be executed directly with `download_requests_from_csv` (no pandas required).


In [None]:
# Example CSV content (data_product_options is JSON per row; use double quotes)
# Example includes multiple devices (set DEVICE_2 to a different device to test)
csv_content = f"""deviceCode,timestamp,label,data_product_options
{DEVICE},2024-04-01T12:30:00Z,whale call,"{{""dpo_spectralDataDownsample"": 2}}"
{DEVICE_2},2024-04-15T14:45:30Z,ship noise,"{{""dpo_spectralDataDownsample"": 1}}"
{DEVICE},2024-04-02T08:15:00Z,unknown,""
"""

csv_path = Path(DATA_DIR) / "example_requests.csv"
csv_path.write_text(csv_content)
print(f"Saved CSV to: {csv_path}")

# Execute CSV requests (uses settings from the CSV file)
results = dl.download_requests_from_csv(
    str(csv_path),
)
print(json.dumps(results, indent=2))



## 4.3 Supported Date/Time Formats
These formats are parsed consistently by the downloader utilities and are accepted anywhere a timestamp is expected.
All inputs are converted to UTC. If you use naive datetimes or strings, set a `timezone` in request files or pass a tz-aware datetime.


In [None]:
# All supported input formats
from onc_hydrophone_data.data.hydrophone_downloader import HydrophoneDownloader

formats = [
    datetime(2024, 4, 1, 12, 30, tzinfo=timezone.utc),  # datetime object
    "2024-04-01T12:30:00Z",                             # ISO 8601
    "2024-04-01T12:30:00.000Z",                         # ISO with ms
    "2024-04-01T12:30:00-07:00",                    # ISO with offset
    [2024, 4, 1, 12, 30, 0],                             # list
    (2024, 4, 1, 12, 30, 0),                             # tuple
]

print("All these formats are parsed correctly:")
for f in formats:
    parsed = HydrophoneDownloader._parse_timestamp_value(f)
    print(f"  {type(f).__name__:10} → {parsed}")

print("Timezone override for naive strings:")
local_parsed = HydrophoneDownloader._parse_timestamp_value(
    "2024-04-01 12:30:00",
    timezone_str="America/Vancouver",
)
print(f"  America/Vancouver → {local_parsed}")


# 5. Advanced Workflows
End-to-end examples that combine download, processing, and multi-device usage.


## 5.1 Request-Driven Custom Spectrograms (JSON/CSV)
Request files can drive custom spectrograms by downloading audio clips first and then running `SpectrogramGenerator` locally.
This helper automatically grabs extra audio context (`clip_pad_seconds`, default `auto`) so the STFT has padding and edge artifacts are reduced.
Use `generator_defaults` (applies to all requests) and optional per-request `generator_options` for settings like `freq_lims`.
When `freq_lims` are provided, the saved outputs are cropped to that range (set `"crop_freq_lims": false` to keep full-band saves).
Control outputs with `save_png`, `save_mat`, and `save_npy` (PNG defaults to off; MAT defaults to on).
Saved MAT/NPY files include metadata describing the generator settings, FFT params, and clip context.
The saved audio clip matches the requested window length; extra context is used internally for spectrogram generation.
Use `save_context_audio=True` if you want to keep the longer context clip alongside the final trimmed clip.


In [None]:
custom_json_path = Path(DATA_DIR) / "custom_spectrogram_requests.json"

custom_requests = {
    "defaults": {
        "deviceCode": DEVICE,
        "pad_seconds": 15,
        "label": "custom_clip",
    },
    "generator_defaults": {
        "win_dur": 0.5,
        "overlap": 0.5,
        "window_type": "hann",
        "quiet": True,
    },
    "requests": [
        {
            "timestamp": "2024-04-01T12:30:00Z",
            "label": "early_april",
            "generator_options": {"freq_lims": [10, 500]},
        },
        {
            "timestamp": "2024-04-15T12:30:00Z",
            "label": "mid_april",
            "generator_options": {"freq_lims": [10, 2000]},
        },
        {
            "timestamp": "2024-04-30T12:30:00Z",
            "label": "late_april",
            "generator_options": {"freq_lims": [10, 10000]},
        },
    ],
}

custom_json_path.write_text(json.dumps(custom_requests, indent=2))

custom_results = dl.create_custom_spectrograms_from_json(
    str(custom_json_path),
    save_png=False,
    save_mat=True,
    save_npy=False,
)
print(json.dumps(custom_results, indent=2))

for result in custom_results:
    custom_spec = result.get("custom_spectrogram") or {}
    mat_file = custom_spec.get("mat_file")
    if not mat_file:
        continue
    timestamp = result.get("timestamp", "")
    plot_onc_mat_spectrogram(
        mat_file,
        title=f"Custom spectrogram ({timestamp})",
        freq_lims=custom_spec.get("freq_lims"),
        log_freq=custom_spec.get("log_freq", True),
    )


In [None]:
# Inspect the first saved spectrogram file
import scipy.io

first_mat = None
first_npy = None
for result in custom_results:
    custom_spec = result.get("custom_spectrogram") or {}
    first_mat = first_mat or custom_spec.get("mat_file")
    first_npy = first_npy or custom_spec.get("npy_file")
    if first_mat or first_npy:
        break

if first_mat:
    mat = scipy.io.loadmat(first_mat)
    keys = sorted(k for k in mat.keys() if not k.startswith("__"))
    print(f"MAT keys: {keys}")
    for key in keys:
        value = mat[key]
        if hasattr(value, "shape"):
            print(f"  {key}: shape={value.shape}, dtype={getattr(value, 'dtype', None)}")
        else:
            print(f"  {key}: type={type(value).__name__}")
    meta_json = mat.get("metadata_json")
    if meta_json is not None:
        try:
            meta_text = meta_json.item()
        except Exception:
            meta_text = str(meta_json)
        try:
            meta = json.loads(meta_text)
            print(f"metadata_json keys: {sorted(meta.keys())}")
        except json.JSONDecodeError:
            print("metadata_json: <unreadable>")
elif first_npy:
    data = np.load(first_npy, allow_pickle=True).item()
    print(f"NPY keys: {sorted(data.keys())}")
    metadata = data.get("metadata")
    if isinstance(metadata, dict):
        print(f"metadata keys: {sorted(metadata.keys())}")
else:
    print("No saved spectrogram files found to inspect.")


## 5.2 Batch Pipeline: Download Audio → Local Spectrograms
Download audio, then generate custom spectrograms for analysis.


In [None]:
# Complete workflow: Download audio + generate custom spectrograms
pipeline_start = datetime(2024, 4, 1, 12, 0, tzinfo=timezone.utc)
pipeline_end = pipeline_start + timedelta(minutes=10)

# Step 1: Download audio
dl.download_audio_for_range(DEVICE, pipeline_start, pipeline_end, tag="pipeline_demo")
print(f"1. Audio downloaded to: {dl.audio_path}")

plot_first_audio(dl, max_seconds=10.0)

# Step 2: Generate custom spectrograms
custom_out = Path(dl.audio_path).parent / "custom_spectrograms"
results = generator.process_directory(
    input_dir=dl.audio_path,
    save_dir=custom_out,
)
print(f"2. Custom spectrograms saved to: {custom_out}")


## 5.3 Multi-Device Downloads
Repeat the same download pattern across multiple hydrophones.


In [None]:
# Download from multiple hydrophones
# Keep the range short for a quick multi-device check.
devices = ['ICLISTENHF6324', 'ICLISTENHF6020', 'ICLISTENHF6019']
multi_start = datetime(2024, 4, 1, 12, 0, tzinfo=timezone.utc)
multi_end = multi_start + timedelta(minutes=15)

for device in devices:
    windows = {device: [(multi_start, multi_end)]}
    print(f"Device {device}: {multi_start} to {multi_end}")

    info = dl.download_spectrogram_windows(
        device,
        windows,
        spectrograms_per_request=3,
        tag='multi_device',
    )
    print(f"  Downloaded: {info.get('runs_downloaded', 0)} runs")

plot_first_spectrogram(dl, title=f"Multi-device spectrogram ({devices[-1]})")


# 6. Output Folder Structure
Overview of where downloads and generated files are stored.


In [None]:
# View current paths for a known tag/date range
example_tag = 'basic_download'
example_start = EXAMPLE_DATE
example_end = example_start + timedelta(minutes=10)

dl.setup_directories('mat', DEVICE, example_tag, example_start, example_end)
print("Output paths:")
print(f"  Spectrograms: {dl.spectrogram_path}")
print(f"  Audio:        {dl.audio_path}")


# 7. Troubleshooting & Tips
Common issues and ways to speed up or stabilize downloads.

## Common Issues

| Issue | Solution |
| --- | --- |
| "Missing ONC_TOKEN" | Add `ONC_TOKEN=...` to `.env` in the repo root |
| "Device not deployed" | Run Section 1.1 (HydrophoneDeploymentChecker) and choose dates within deployment |
| "Waiting on file system" | Normal - ONC is generating data; wait and retry |
| Timeout errors | Reduce request size or increase wait time (`max_wait_minutes`) |
| Rate limiting | Reduce request size or add delays between runs |

## Performance Tips

1. **Batch requests**: Group spectrograms into 6-12 per request
2. **Avoid full resolution**: `downsample=0` is much slower
3. **Check archive first**: Archived data downloads instantly
4. **Prefer shorter ranges**: Large ranges are easier to handle in chunks
