# 1. Introduction & Setup

## What This Notebook Covers

This notebook demonstrates how to:
- Download **ONC-generated spectrograms** (MAT/PNG files)
- Download **raw audio files** (FLAC format)
- Create **custom spectrograms** from audio with your own parameters
- Handle various **input formats** (JSON, CSV, Python lists)
- Work with **specific timestamps** or **date ranges**

## ONC Data Products Overview

| Product Code | Description | Use Case |
| --- | --- | --- |
| `HSD` | Hydrophone Spectrogram Data | Pre-computed spectrograms |
| `HAF` | Hydrophone Audio Files | Raw FLAC audio |

## Prerequisites

- ONC API token in `.env` file as `ONC_TOKEN`
- Package installed: `pip install onc-hydrophone-data`

In [1]:
# Standard imports
import os
import sys
import json
from pathlib import Path
from datetime import datetime, timedelta, timezone

# Ensure repo is in path
REPO_ROOT = Path("..").resolve()
if str(REPO_ROOT) not in sys.path:
    sys.path.append(str(REPO_ROOT))

# Core imports
from onc_hydrophone_data.onc.common import load_config, print_status
from onc_hydrophone_data.data.hydrophone_downloader import HydrophoneDownloader
from onc_hydrophone_data.utils.download_helpers import (
    build_hsd_filters,
    build_sampling_windows,
    run_parallel_for_device,
    DEFAULT_PARALLEL_CONFIG,
)

In [2]:
# Load configuration
ONC_TOKEN, DATA_DIR = load_config()
dl = HydrophoneDownloader(ONC_TOKEN, DATA_DIR)
print(f"‚úÖ Data directory: {DATA_DIR}")

‚úÖ Data directory: /home/sbialek/ONC/onc-hydrophone-data/data


In [3]:
# Global settings - set to True to execute actual downloads
RUN_DOWNLOADS = True  # ‚ö†Ô∏è TESTING MODE - downloads enabled

# Default device for examples
DEVICE = 'ICLISTENHF6324'

# Parallel download configuration
PARALLEL_CONFIG = {
    **DEFAULT_PARALLEL_CONFIG,
    'stagger_seconds': 3.0,
    'max_wait_minutes': 45,
    'poll_interval_seconds': 30,
    'max_download_workers': 4,
    'max_attempts': 6,
}

---
# 1.5 Check Hydrophone Deployment Dates

Before downloading data, let's verify the deployment dates for our target hydrophone.
This ensures we request data from valid time periods when the device was active.

In [4]:
# Check deployment dates for our target hydrophone
from onc_hydrophone_data.data.deployment_checker import HydrophoneDeploymentChecker

checker = HydrophoneDeploymentChecker(ONC_TOKEN)
date_ranges = checker.get_deployment_date_ranges([DEVICE])

print(f"üì° Deployment dates for {DEVICE}:")
for device, ranges in date_ranges.items():
    for start, end in ranges:
        end_str = end.strftime('%Y-%m-%d') if end else 'ongoing'
        print(f"   {start.strftime('%Y-%m-%d')} to {end_str}")

# Pick a valid date within deployment range for our examples
# We'll use April 1, 2024 which is within the deployment period
EXAMPLE_DATE = datetime(2024, 4, 1, 12, 0, tzinfo=timezone.utc)
print(f"\n‚úÖ Using example date: {EXAMPLE_DATE}")

Fetching deployments: 95/95
üì° Deployment dates for ICLISTENHF6324:
   2023-09-08 to ongoing

‚úÖ Using example date: 2024-04-01 12:00:00+00:00


---
# 2. Downloading ONC Spectrograms

ONC provides pre-computed spectrograms in MAT format. These are 5-minute windows.

## 2.1 Basic Download (2 Spectrograms / 10 Minutes)

In [5]:
# Download 1 hour of spectrograms
start = datetime(2024, 4, 1, 12, 0, tzinfo=timezone.utc)
end = start + timedelta(minutes=10)  # Reduced for testing

# Build request windows
windows = {DEVICE: [(start, end)]}
spectros_per_request = 2  # 2 x 5min = 10 min (testing)

if RUN_DOWNLOADS:
    info = run_parallel_for_device(
        dl, DEVICE, windows, spectros_per_request,
        tag='basic_download',
        parallel_config=PARALLEL_CONFIG,
    )
    print(json.dumps(info, indent=2))
else:
    print(f"Would download {spectros_per_request} spectrograms from {start} to {end}")

Submitting 1 requests for ICLISTENHF6324...
Submitting request 1/1...
Request Id: 30968497
Estimated File Size: 48 MB
Estimated Processing Time: 35 s
To cancel the running data product, run 'onc.cancelDataProduct(30968497)'

Downloading data product files with runId 58513086...

   Running.
   Running... working on time segment 1 of 1, for device deployment 1 of 1........
Downloading data product files with runId 58513086...

   Search complete, waiting on the file system to synchronize (ICLISTENHF6324_20240401T120000.000Z_20240401T120500.000Z-OD-spect_plotRes.mat)...{
  "device": "ICLISTENHF6324",
  "runs_total": 1,
  "runs_downloaded": 1,
  "runs_errors": 0,
  "spectrogram_files": 2,
  "input_path": "/home/sbialek/ONC/onc-hydrophone-data/data/ICLISTENHF6324/basic_download_2024-04-01_to_2024-04-01/onc_spectrograms/",
  "flac_path": "/home/sbialek/ONC/onc-hydrophone-data/data/ICLISTENHF6324/basic_download_2024-04-01_to_2024-04-01/audio/",
  "wall_seconds": 62.461962938308716
}


## 2.2 Download All Data Between Two Dates

In [6]:
# Download ALL spectrograms between two dates (continuous)
range_start = datetime(2024, 4, 1, 0, 0, tzinfo=timezone.utc)
range_end = datetime(2024, 4, 1, 0, 15, tzinfo=timezone.utc)  # 15 min TESTING

# Calculate total 5-minute windows
total_minutes = int((range_end - range_start).total_seconds() / 60)
total_spectrograms = total_minutes // 5

print(f"Date range: {range_start} to {range_end}")
print(f"Total 5-min spectrograms: {total_spectrograms}")

# SKIPPED - uncomment to run
# if RUN_DOWNLOADS:
    # dl.setup_directories('mat', DEVICE, 'date_range', range_start, range_end)
    # # Use run_parallel_windows for efficiency
    # windows_list = dl._build_request_windows(range_start, range_end)
    # result = dl.run_parallel_windows(
        # DEVICE, windows_list,
        # spectrograms_per_request=3,
        # download_flac=False
    # )
print("[SKIPPED in test mode]")
    # print(json.dumps(result, indent=2))

Date range: 2024-04-01 00:00:00+00:00 to 2024-04-01 00:15:00+00:00
Total 5-min spectrograms: 3
[SKIPPED in test mode]


## 2.3 Sampling Mode (4 Sampled Spectrograms)

In [7]:
# Get N spectrograms spread evenly across a date range
sampling_start = datetime(2024, 4, 1, 0, 0, tzinfo=timezone.utc)
sampling_end = datetime(2024, 4, 1, 2, 0, tzinfo=timezone.utc)  # 2 hours TESTING
total_samples = 4  # TESTING - reduced from 100
per_request = 2  # 2 per request (TESTING)

# Build sampling windows
sampling_windows = build_sampling_windows(
    DEVICE, sampling_start, sampling_end, total_samples, per_request
)

print(f"Sampling {total_samples} spectrograms from {sampling_start} to {sampling_end}")
print(f"Requires {len(sampling_windows.get(DEVICE, []))} API requests")

if RUN_DOWNLOADS:
    info = run_parallel_for_device(
        dl, DEVICE, sampling_windows, per_request,
        tag='sampled',
        parallel_config=PARALLEL_CONFIG,
    )
    print(json.dumps(info, indent=2))

Sampling 4 spectrograms from 2024-04-01 00:00:00+00:00 to 2024-04-01 02:00:00+00:00
Requires 2 API requests
Submitting 2 requests for ICLISTENHF6324...
Submitting request 1/2...
Request Id: 30968502
Estimated File Size: 48 MB
Estimated Processing Time: 35 s
To cancel the running data product, run 'onc.cancelDataProduct(30968502)'
Submitting request 2/2...
Request Id: 30968504
Estimated File Size: 48 MB
Estimated Processing Time: 35 s
To cancel the running data product, run 'onc.cancelDataProduct(30968504)'

Downloading data product files with runId 58513092...

Downloading data product files with runId 58513090...

   Running... working on time segment 1 of 1, for device deployment 1 of 1.
   Running..
   Running... working on time segment 1 of 1, for device deployment 1 of 1..............
Downloading data product files with runId 58513092...
{
  "device": "ICLISTENHF6324",
  "runs_total": 2,
  "runs_downloaded": 2,
  "runs_errors": 0,
  "spectrogram_files": 2,
  "input_path": "/home/s

## 2.4 Custom ONC Spectrogram Parameters

Configure resolution, FFT settings, and more:

| Parameter | Options | Description |
| --- | --- | --- |
| `downsample` | 0, 1, 2 | 0=fullRes, 1=1min avg, 2=plotRes |
| `window_sec` | float | FFT window in seconds |
| `overlap` | 0.0-1.0 | Window overlap fraction |

In [8]:
# Custom spectrogram parameters
custom_start = datetime(2024, 4, 1, 6, 0, tzinfo=timezone.utc)
custom_end = custom_start + timedelta(minutes=5)

custom_filters = build_hsd_filters(
    DEVICE,
    custom_start,
    custom_end,
    downsample=0,       # 0 = full resolution
    window_sec=2.0,     # 2-second FFT window
    overlap=0.75,       # 75% overlap
)

print("Custom HSD filters:")
print(json.dumps(custom_filters, indent=2))

if RUN_DOWNLOADS:
    req = dl.onc.requestDataProduct(custom_filters)
    run_data = dl.onc.runDataProduct(req['dpRequestId'], waitComplete=True)
    print(f"Downloaded with runId: {run_data.get('runIds')}")

Custom HSD filters:
{
  "dataProductCode": "HSD",
  "dpo_hydrophoneDataDiversionMode": "OD",
  "dpo_spectralDataDownsample": 0,
  "extension": "mat",
  "deviceCode": "ICLISTENHF6324",
  "dateFrom": "2024-04-01T06:00:00.000Z",
  "dateTo": "2024-04-01T06:05:00.000Z",
  "dpo_spectrogramWindowLengthSec": 2.0,
  "dpo_spectrogramOverlap": 0.75
}


{'dataProductCode': 'HSD',
 'dateFrom': '2024-04-01T06:00:00.000Z',
 'dateTo': '2024-04-01T06:05:00.000Z',
 'deviceCode': 'ICLISTENHF6324',
 'dpo_hydrophoneDataDiversionMode': 'OD',
 'dpo_spectralDataDownsample': 0,
 'dpo_spectrogramOverlap': 0.75,
 'dpo_spectrogramWindowLengthSec': 2.0,
 'extension': 'mat'},
* dpo_spectrogramWindowLengthSec is not a valid data product option for this search and will be ignored
* dpo_spectrogramOverlap is not a valid data product option for this search and will be ignored



Request Id: 30968513
Estimated File Size: 55 MB
Estimated Processing Time: 25 s
To cancel the running data product, run 'onc.cancelDataProduct(30968513)'

   queued
   data product running...................
   1 files generated for this data product
   complete
Downloaded with runId: [58513101]


---
# 3. Downloading Audio Files

Raw audio is available in FLAC format at full sample rate.

## 3.1 Download Audio for a Time Range

In [9]:
# Download FLAC audio for a time range
audio_start = datetime(2024, 4, 1, 12, 0, tzinfo=timezone.utc)
audio_end = audio_start + timedelta(minutes=5)  # 1 file TESTING

# Format as ISO strings
start_str = audio_start.strftime('%Y-%m-%dT%H:%M:%S.000Z')
end_str = audio_end.strftime('%Y-%m-%dT%H:%M:%S.000Z')

print(f"Downloading audio from {start_str} to {end_str}")

if RUN_DOWNLOADS:
    dl.setup_directories('mat', DEVICE, 'audio_range', audio_start, audio_end)
    dl.download_flac_files(DEVICE, start_str, end_str)
    print(f"Audio saved to: {dl.audio_path}")

Downloading audio from 2024-04-01T12:00:00.000Z to 2024-04-01T12:05:00.000Z


{'dateFrom': '2024-04-01T12:00:00.000Z',
 'dateTo': '2024-04-01T12:05:00.000Z',
 'deviceCode': 'ICLISTENHF6324'},
* deviceCode: ICLISTENHF6324 has data restricted by Ocean Networks Canada Society (data product code AF, data product name Annotation File, extension an). If you would like to request access to this data, please complete and submit the Restricted Data Request form at https://docs.google.com/forms/d/e/1FAIpQLSdDyBaEgOuQw_-pVIwgQO23z7INMtM3oomlcAMeM3bBUTPCMQ/viewform. An ONC representative will pass on the detailed request to the owner of the data, and will be in contact with you regarding your request. Please allow for time for communication.



Audio saved to: /home/sbialek/ONC/onc-hydrophone-data/data/ICLISTENHF6324/audio_range_2024-04-01_to_2024-04-01/audio/


## 3.2 Download Audio for Specific Timestamps

In [10]:
# Download audio around specific event timestamps
event_times = [
    datetime(2024, 4, 1, 4, 25, 30, tzinfo=timezone.utc),
    datetime(2024, 4, 1, 14, 10, 15, tzinfo=timezone.utc),
    datetime(2024, 4, 2, 3, 45, 0, tzinfo=timezone.utc),
]

# For each timestamp, download the 5-minute window containing it
for ts in event_times:
    # Floor to 5-minute boundary
    floored = ts.replace(minute=(ts.minute // 5) * 5, second=0, microsecond=0)
    end_ts = floored + timedelta(minutes=5)
    
    start_str = floored.strftime('%Y-%m-%dT%H:%M:%S.000Z')
    end_str = end_ts.strftime('%Y-%m-%dT%H:%M:%S.000Z')
    
    print(f"Event {ts} ‚Üí Window: {start_str} to {end_str}")
    
    if RUN_DOWNLOADS:
        dl.download_flac_files(DEVICE, start_str, end_str)

Event 2024-04-01 04:25:30+00:00 ‚Üí Window: 2024-04-01T04:25:00.000Z to 2024-04-01T04:30:00.000Z


{'dateFrom': '2024-04-01T04:25:00.000Z',
 'dateTo': '2024-04-01T04:30:00.000Z',
 'deviceCode': 'ICLISTENHF6324'},
* deviceCode: ICLISTENHF6324 has data restricted by Ocean Networks Canada Society (data product code AF, data product name Annotation File, extension an). If you would like to request access to this data, please complete and submit the Restricted Data Request form at https://docs.google.com/forms/d/e/1FAIpQLSdDyBaEgOuQw_-pVIwgQO23z7INMtM3oomlcAMeM3bBUTPCMQ/viewform. An ONC representative will pass on the detailed request to the owner of the data, and will be in contact with you regarding your request. Please allow for time for communication.



Event 2024-04-01 14:10:15+00:00 ‚Üí Window: 2024-04-01T14:10:00.000Z to 2024-04-01T14:15:00.000Z


{'dateFrom': '2024-04-01T14:10:00.000Z',
 'dateTo': '2024-04-01T14:15:00.000Z',
 'deviceCode': 'ICLISTENHF6324'},
* deviceCode: ICLISTENHF6324 has data restricted by Ocean Networks Canada Society (data product code AF, data product name Annotation File, extension an). If you would like to request access to this data, please complete and submit the Restricted Data Request form at https://docs.google.com/forms/d/e/1FAIpQLSdDyBaEgOuQw_-pVIwgQO23z7INMtM3oomlcAMeM3bBUTPCMQ/viewform. An ONC representative will pass on the detailed request to the owner of the data, and will be in contact with you regarding your request. Please allow for time for communication.



Event 2024-04-02 03:45:00+00:00 ‚Üí Window: 2024-04-02T03:45:00.000Z to 2024-04-02T03:50:00.000Z


{'dateFrom': '2024-04-02T03:45:00.000Z',
 'dateTo': '2024-04-02T03:50:00.000Z',
 'deviceCode': 'ICLISTENHF6324'},
* deviceCode: ICLISTENHF6324 has data restricted by Ocean Networks Canada Society (data product code AF, data product name Annotation File, extension an). If you would like to request access to this data, please complete and submit the Restricted Data Request form at https://docs.google.com/forms/d/e/1FAIpQLSdDyBaEgOuQw_-pVIwgQO23z7INMtM3oomlcAMeM3bBUTPCMQ/viewform. An ONC representative will pass on the detailed request to the owner of the data, and will be in contact with you regarding your request. Please allow for time for communication.



## 3.3 Audio with Custom Duration Windows

When you need a specific duration centered on a timestamp, possibly spanning multiple 5-minute files.

In [11]:
# Download 30 seconds of audio centered on a timestamp
center_time = datetime(2024, 4, 1, 12, 32, 45, tzinfo=timezone.utc)
duration_seconds = 30
half_duration = duration_seconds / 2

clip_start = center_time - timedelta(seconds=half_duration)
clip_end = center_time + timedelta(seconds=half_duration)

# Calculate which 5-minute files we need
file_start = clip_start.replace(minute=(clip_start.minute // 5) * 5, second=0, microsecond=0)
file_end = clip_end.replace(minute=(clip_end.minute // 5) * 5, second=0, microsecond=0) + timedelta(minutes=5)

print(f"Center: {center_time}")
print(f"Clip range: {clip_start} to {clip_end}")
print(f"Files needed: {file_start} to {file_end}")

if RUN_DOWNLOADS:
    start_str = file_start.strftime('%Y-%m-%dT%H:%M:%S.000Z')
    end_str = file_end.strftime('%Y-%m-%dT%H:%M:%S.000Z')
    dl.download_flac_files(DEVICE, start_str, end_str)
    print("Files downloaded - use audio utils to stitch and clip")

Center: 2024-04-01 12:32:45+00:00
Clip range: 2024-04-01 12:32:30+00:00 to 2024-04-01 12:33:00+00:00
Files needed: 2024-04-01 12:30:00+00:00 to 2024-04-01 12:35:00+00:00


{'dateFrom': '2024-04-01T12:30:00.000Z',
 'dateTo': '2024-04-01T12:35:00.000Z',
 'deviceCode': 'ICLISTENHF6324'},
* deviceCode: ICLISTENHF6324 has data restricted by Ocean Networks Canada Society (data product code AF, data product name Annotation File, extension an). If you would like to request access to this data, please complete and submit the Restricted Data Request form at https://docs.google.com/forms/d/e/1FAIpQLSdDyBaEgOuQw_-pVIwgQO23z7INMtM3oomlcAMeM3bBUTPCMQ/viewform. An ONC representative will pass on the detailed request to the owner of the data, and will be in contact with you regarding your request. Please allow for time for communication.



Files downloaded - use audio utils to stitch and clip


---
# 4. Downloading Both Spectrograms & Audio

## 4.1 Combined Download (2 Spectrograms + Audio)

In [12]:
# Download both spectrograms AND audio in one call
combined_start = datetime(2024, 4, 1, 10, 0, tzinfo=timezone.utc)
combined_end = combined_start + timedelta(minutes=10)  # 2 spectrograms

windows = {DEVICE: [(combined_start, combined_end)]}

if RUN_DOWNLOADS:
    # Use download_flac=True for combined download
    info = run_parallel_for_device(
        dl, DEVICE, windows, 2,  # 2 spectrograms for 10 min
        tag='combined',
        parallel_config=PARALLEL_CONFIG,
        download_flac=True,  # ‚Üê This enables audio download
    )
    print(f"Spectrograms: {dl.spectrogram_path}")
    print(f"Audio: {dl.audio_path}")
else:
    print("Set RUN_DOWNLOADS=True to download spectrograms + audio together")

Submitting 1 requests for ICLISTENHF6324...
Submitting request 1/1...
Request Id: 30968519
Estimated File Size: 48 MB
Estimated Processing Time: 35 s
To cancel the running data product, run 'onc.cancelDataProduct(30968519)'

Downloading data product files with runId 58513107...

   Running.
   Running... working on time segment 1 of 1, for device deployment 1 of 1........
Downloading data product files with runId 58513107...


{'dateFrom': '2024-04-01T10:00:00.000Z',
 'dateTo': '2024-04-01T10:10:00.000Z',
 'deviceCode': 'ICLISTENHF6324'},
* deviceCode: ICLISTENHF6324 has data restricted by Ocean Networks Canada Society (data product code AF, data product name Annotation File, extension an). If you would like to request access to this data, please complete and submit the Restricted Data Request form at https://docs.google.com/forms/d/e/1FAIpQLSdDyBaEgOuQw_-pVIwgQO23z7INMtM3oomlcAMeM3bBUTPCMQ/viewform. An ONC representative will pass on the detailed request to the owner of the data, and will be in contact with you regarding your request. Please allow for time for communication.



Spectrograms: /home/sbialek/ONC/onc-hydrophone-data/data/ICLISTENHF6324/combined_2024-04-01_to_2024-04-01/onc_spectrograms/
Audio: /home/sbialek/ONC/onc-hydrophone-data/data/ICLISTENHF6324/combined_2024-04-01_to_2024-04-01/audio/


---
# 5. Timestamp-Based Downloads

For downloading data around specific events/annotations.

## 5.1 Download at Specific Timestamps

In [13]:
# Explicit timestamps
explicit_windows = {
    DEVICE: [
        (datetime(2024, 4, 1, 4, 25, tzinfo=timezone.utc), 
         datetime(2024, 4, 1, 4, 30, tzinfo=timezone.utc)),
        (datetime(2024, 4, 1, 14, 0, tzinfo=timezone.utc), 
         datetime(2024, 4, 1, 14, 5, tzinfo=timezone.utc)),
        (datetime(2024, 4, 2, 3, 15, tzinfo=timezone.utc), 
         datetime(2024, 4, 2, 3, 20, tzinfo=timezone.utc)),
    ]
}

print(f"Downloading {len(explicit_windows[DEVICE])} specific time windows")

if RUN_DOWNLOADS:
    info = run_parallel_for_device(
        dl, DEVICE, explicit_windows, 1,
        tag='explicit_times',
        parallel_config=PARALLEL_CONFIG,
    )
    print(json.dumps(info, indent=2))

Downloading 3 specific time windows
Submitting 3 requests for ICLISTENHF6324...
Submitting request 1/3...
Request Id: 30968523
Estimated File Size: 48 MB
Estimated Processing Time: 35 s
To cancel the running data product, run 'onc.cancelDataProduct(30968523)'
Request Id: 30968524
Estimated File Size: 48 MB
Estimated Processing Time: 35 s
To cancel the running data product, run 'onc.cancelDataProduct(30968524)'
Submitting request 3/3...
Request Id: 30968525
Estimated File Size: 48 MB
Estimated Processing Time: 35 s
To cancel the running data product, run 'onc.cancelDataProduct(30968525)'

Downloading data product files with runId 58513110...

Downloading data product files with runId 58513112...

Downloading data product files with runId 58513111...

   Running
   Running... working on time segment 1 of 1, for device deployment 1 of 1.
   Running... working on time segment 1 of 1, for device deployment 1 of 1..
   Running... working on time segment 1 of 1, for device deployment 1 of 1..

## 5.2 Using JSON Request Format with Padding

In [14]:
json_requests = {
  "defaults": {
    "deviceCode": DEVICE,
    "pad_seconds": 15,
    "download_audio": True,
    "clip": True
  },
  "requests": [
    {
      "timestamp": "2024-04-01T12:25:30Z",
      "label": "whale call 1"
    },
    {
      "start": "2024-04-01T12:30:00Z",
      "end": "2024-04-01T12:33:30Z",
      "pad_before_seconds": 10,
      "pad_after_seconds": 20,
      "label": "ship noise event"
    }
  ]
}
json_path = Path(DATA_DIR) / "example_requests.json"
json_path.write_text(json.dumps(json_requests, indent=2))
print(f"Saved requests to: {json_path}")
print(json.dumps(json_requests, indent=2))

Saved requests to: /home/sbialek/ONC/onc-hydrophone-data/data/example_requests.json
{
  "defaults": {
    "deviceCode": "ICLISTENHF6324",
    "pad_seconds": 15,
    "download_audio": true,
    "clip": true
  },
  "requests": [
    {
      "timestamp": "2024-04-01T12:25:30Z",
      "label": "whale call 1"
    },
    {
      "start": "2024-04-01T12:30:00Z",
      "end": "2024-04-01T12:33:30Z",
      "pad_before_seconds": 10,
      "pad_after_seconds": 20,
      "label": "ship noise event"
    }
  ]
}


In [15]:
json_path = Path(DATA_DIR) / "example_requests.json"
# Execute JSON requests
if RUN_DOWNLOADS:
    results = dl.download_requests_from_json(
        str(json_path),
        default_pad_seconds=15,
        clip_outputs=True,
        spectrogram_format='mat',
        download_audio=True,
    )
    print(json.dumps(results, indent=2))
else:
    print("Set RUN_DOWNLOADS=True to execute JSON requests")

üìÖ Downloading data for Monday, 2024-04-01 at 12:25:00 (requesting 1 spectrograms)


HTTPError: 
Status 400 - Bad Request: https://data.oceannetworks.ca/api/dataProductDelivery/request?dataProductCode=HSD&deviceCode=ICLISTENHF6324&dateFrom=2024-04-01T12%3A25%3A00.000Z&dateTo=2024-04-01T12%3A25%3A00.000Z&extension=mat&dpo_hydrophoneDataDiversionMode=OD&dpo_spectralDataDownsample=2&token=2fd8e1d0-7b62-4f4a-ba5e-d504c4fd8553
API Error 33: ICLISTENHF6324 not deployed in the provided date range (parameter: deviceCode, dateFrom, dateTo)

---
# 6. Input Formats for Timestamp Lists

## 6.1 Python Lists / Datetime Objects

In [None]:
# Python datetime objects
timestamps_datetime = [
    datetime(2024, 4, 1, 12, 30, 0, tzinfo=timezone.utc),
    datetime(2024, 4, 1, 14, 45, 30, tzinfo=timezone.utc),
    datetime(2024, 4, 2, 8, 15, 0, tzinfo=timezone.utc),
]

# Tuple format (legacy)
timestamps_tuple = [
    [2024, 4, 1, 12, 30, 0],
    [2024, 4, 1, 14, 45, 30],
]

print("Datetime list:", timestamps_datetime[:2])
print("Tuple list:", timestamps_tuple)

## 6.2 JSON Input Format

In [None]:
# New JSON format with defaults
json_new = {
    "defaults": {
        "deviceCode": "ICLISTENHF6324",
        "pad_seconds": 30
    },
    "requests": [
        {"timestamp": "2024-04-01T12:30:00Z"},
        {"start": "2024-04-01T14:00:00Z", "end": "2024-04-01T14:05:00Z"}
    ]
}

# Legacy JSON format
json_legacy = {
    "ICLISTENHF6324": [
        [2024, 4, 1, 12, 30, 0],
        [2024, 4, 1, 14, 45, 30]
    ]
}

print("New format:", json.dumps(json_new, indent=2)[:200] + "...")

## 6.3 CSV Input Format

In [None]:
import pandas as pd
from io import StringIO

# Example CSV content
csv_content = """device,timestamp,label
ICLISTENHF6324,2024-04-01T12:30:00Z,whale call
ICLISTENHF6324,2024-04-01T14:45:30Z,ship noise
ICLISTENHF6324,2024-04-02T08:15:00Z,unknown
"""

# Parse CSV
df = pd.read_csv(StringIO(csv_content))
df['timestamp'] = pd.to_datetime(df['timestamp'])
print(df)

# Convert to request format
requests = [
    {
        "deviceCode": row['device'],
        "timestamp": row['timestamp'].strftime('%Y-%m-%dT%H:%M:%SZ'),
        "label": row['label']
    }
    for _, row in df.iterrows()
]
print("\nConverted to requests:")
print(json.dumps(requests, indent=2))

## 6.4 Supported Date/Time Formats

In [None]:
# All supported input formats
from onc_hydrophone_data.data.hydrophone_downloader import HydrophoneDownloader

formats = [
    datetime(2024, 4, 1, 12, 30, tzinfo=timezone.utc),  # datetime object
    "2024-04-01T12:30:00Z",                             # ISO 8601
    "2024-04-01T12:30:00.000Z",                         # ISO with ms
    [2024, 4, 1, 12, 30, 0],                             # list
    (2024, 4, 1, 12, 30, 0),                             # tuple
]

print("All these formats are parsed correctly:")
for f in formats:
    parsed = HydrophoneDownloader._parse_timestamp_value(f)
    print(f"  {type(f).__name__:10} ‚Üí {parsed}")

---
# 7. Custom Spectrogram Generation from Audio

Generate your own spectrograms with custom parameters.

In [None]:
from onc_hydrophone_data.audio import SpectrogramGenerator

## 7.1 SpectrogramGenerator Basics

In [None]:
# Create generator with default settings
generator = SpectrogramGenerator(
    win_dur=1.0,          # 1 second FFT window
    overlap=0.5,          # 50% overlap
    freq_lims=(10, 1000) # 10 Hz to 1 kHz
)

print("SpectrogramGenerator created with:")
print(f"  Window: {generator.win_dur}s")
print(f"  Overlap: {generator.overlap}")
print(f"  Freq range: {generator.freq_lims}")

In [None]:
# This cell was malformed and has been cleaned up
pass

## 7.2 Custom Parameters

In [None]:
# High-resolution spectrogram for detailed analysis
high_res = SpectrogramGenerator(
    win_dur=0.1,           # 100ms window (higher time resolution)
    overlap=0.9,           # 90% overlap
    freq_lims=(1, 24000), # Full frequency range
    clim=(-80, 0)      # dB scale limits
)

# Low-frequency analysis (whale calls)
low_freq = SpectrogramGenerator(
    win_dur=2.0,           # 2s window (better freq resolution)
    overlap=0.5,
    freq_lims=(10, 200),  # Focus on low frequencies
)

print("High-res config:", high_res.win_dur, high_res.freq_lims)
print("Low-freq config:", low_freq.win_dur, low_freq.freq_lims)

## 7.3 Batch Processing Audio Directory

In [None]:
# Process all audio files in a directory
audio_dir = Path(DATA_DIR) / DEVICE / "example" / "audio"
output_dir = Path(DATA_DIR) / DEVICE / "example" / "custom_spectrograms"

if audio_dir.exists():
    results = generator.process_directory(
        input_dir=audio_dir,
        save_dir=output_dir,
        save_plot=True,   # Save PNG plots
        save_mat=True,    # Save MAT files
    )
    print(f"Processed {len(results)} files")
else:
    print(f"No audio files at {audio_dir}")
    print("Run audio downloads first, then process them here")

---
# 8. Advanced Workflows

## 8.1 Full Pipeline: Download ‚Üí Generate Custom Spectrograms

In [None]:
# Complete workflow: Download audio + generate custom spectrograms
pipeline_start = datetime(2024, 4, 1, 12, 0, tzinfo=timezone.utc)
pipeline_end = pipeline_start + timedelta(minutes=10)

if RUN_DOWNLOADS:
    # Step 1: Download audio
    dl.setup_directories('mat', DEVICE, 'pipeline_demo', pipeline_start, pipeline_end)
    start_str = pipeline_start.strftime('%Y-%m-%dT%H:%M:%S.000Z')
    end_str = pipeline_end.strftime('%Y-%m-%dT%H:%M:%S.000Z')
    dl.download_flac_files(DEVICE, start_str, end_str)
    print(f"1. Audio downloaded to: {dl.audio_path}")
    
    # Step 2: Generate custom spectrograms
    custom_out = Path(dl.audio_path).parent / "custom_spectrograms"
    results = generator.process_directory(
        input_dir=dl.audio_path,
        save_dir=custom_out,
    )
    print(f"2. Custom spectrograms saved to: {custom_out}")
else:
    print("Pipeline workflow:")
    print("  1. Download audio for time range")
    print("  2. Generate custom spectrograms with SpectrogramGenerator")
    print("  3. Apply analysis/ML model")

## 8.2 Multi-Device Downloads

In [None]:
# Download from multiple hydrophones
devices = ['ICLISTENHF6324', 'ICLISTENHF6020', 'ICLISTENHF6019']
multi_start = datetime(2024, 4, 1, 12, 0, tzinfo=timezone.utc)
multi_end = multi_start + timedelta(hours=1)

for device in devices:
    windows = {device: [(multi_start, multi_end)]}
    print(f"Device {device}: {multi_start} to {multi_end}")
    
    if RUN_DOWNLOADS:
        info = run_parallel_for_device(
            dl, device, windows, 12,
            tag='multi_device',
            parallel_config=PARALLEL_CONFIG,
        )
        print(f"  Downloaded: {info.get('runs_downloaded', 0)} runs")

---
# 9. Output Folder Structure

Downloads are organized in a clean, flat structure:

```
data/
‚îî‚îÄ‚îÄ DEVICE_CODE/
    ‚îî‚îÄ‚îÄ method_date_range/
        ‚îú‚îÄ‚îÄ onc_spectrograms/     # ONC-downloaded MAT/PNG
        ‚îÇ   ‚îú‚îÄ‚îÄ *.mat
        ‚îÇ   ‚îî‚îÄ‚îÄ anomaly_report.txt
        ‚îú‚îÄ‚îÄ audio/                # Downloaded FLAC files
        ‚îÇ   ‚îî‚îÄ‚îÄ *.flac
        ‚îî‚îÄ‚îÄ custom_spectrograms/  # Your generated spectrograms
            ‚îú‚îÄ‚îÄ mat/
            ‚îî‚îÄ‚îÄ png/
```

In [None]:
# View current paths
dl.setup_directories('mat', DEVICE, 'example', datetime(2024, 4, 1))
print("üìÅ Output paths:")
print(f"  Spectrograms: {dl.spectrogram_path}")
print(f"  Audio:        {dl.audio_path}")

---
# 10. Troubleshooting & Tips

## Common Issues

| Issue | Solution |
| --- | --- |
| "Device not deployed" | Check deployment dates with `check_deployments_mode` |
| "Waiting on file system" | Normal - ONC is generating data, be patient |
| Timeout errors | Increase `max_wait_minutes` or reduce request size |
| Rate limiting | Increase `stagger_seconds` between requests |

## Performance Tips

1. **Use parallel downloads**: `run_parallel_for_device` is 5-6x faster
2. **Batch requests**: Group spectrograms into 6-12 per request
3. **Avoid full resolution**: `downsample=0` is much slower
4. **Check archive first**: Archived data downloads instantly

In [None]:
# Check device deployments
from onc_hydrophone_data.data.deployment_checker import DeploymentChecker

checker = DeploymentChecker(ONC_TOKEN)
print(f"Checking deployments for {DEVICE}...")

# This will show when the hydrophone was active
# deployments = checker.get_deployments(DEVICE, 
#     datetime(2024, 1, 1), datetime(2024, 12, 31))

---

## üìö Additional Resources

- [ONC API Documentation](https://wiki.oceannetworks.ca/display/O2A/Oceans+2.0+API+Home)
- [Repository README](../README.md)
- [Command-line script](../scripts/download_hydrophone_data.py)