# Audio Bounding Box Annotation Demo

This notebook demonstrates how to:
1. Load an audio file using OpenSoundscape
2. Create a spectrogram and extract time/frequency axes
3. Save the spectrogram as a PNG image
4. Use jupyter-bbox-widget to draw bounding boxes
5. Convert annotations to a DataFrame with audio coordinates

In [1]:
# Import required libraries
import sys
sys.path.append(".")

from audio_bbox_annotator import AudioBBoxAnnotator
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from IPython.display import display, HTML

  super().__init__(trait=trait, default_value=default_value, **kwargs)


## Step 1: Define Parameters

Set up the audio file path, time range, and annotation classes.

In [2]:
# Audio file parameters
audio_path = "/Users/lviotti/Library/CloudStorage/Dropbox/Work/Kitzes/datasets/kipu2025/2MM01186/Data/2MM01186_20250126_080003.wav"
st_time = 315
end_time = 320
bandpass = [1, 10000]

# Annotation classes
classes = ["a", "b", "c"]

print(f"Audio file: {audio_path}")
print(f"Time range: {st_time}s - {end_time}s")
print(f"Duration: {end_time - st_time}s")
print(f"Bandpass: {bandpass[0]} - {bandpass[1]} Hz")
print(f"Classes: {classes}")

Audio file: /Users/lviotti/Library/CloudStorage/Dropbox/Work/Kitzes/datasets/kipu2025/2MM01186/Data/2MM01186_20250126_080003.wav
Time range: 315s - 320s
Duration: 5s
Bandpass: 1 - 10000 Hz
Classes: ['a', 'b', 'c']


## Step 2: Create and Run the Annotation Pipeline

This will:
- Load the audio file
- Create a spectrogram
- Save it as a PNG image
- Create the bounding box widget

In [3]:
# Create the annotator
annotator = AudioBBoxAnnotator(
    audio_path=audio_path,
    st_time=st_time,
    end_time=end_time,
    bandpass=bandpass,
    classes=classes
)

# Run the full pipeline
bbox_widget = annotator.run_full_pipeline(
    output_path="temp_spectrogram.png",  # Save in current directory
    dpi=100,
    figsize=(12, 8)
)

# Display the widget
display(bbox_widget)

=== Starting Audio BBox Annotation Pipeline ===
Loading audio from /Users/lviotti/Library/CloudStorage/Dropbox/Work/Kitzes/datasets/kipu2025/2MM01186/Data/2MM01186_20250126_080003.wav
Time range: 315s - 320s
Duration: 5s
Bandpass: 1 - 10000 Hz
Audio loaded successfully. Sample rate: 24000 Hz
Creating spectrogram...
Spectrogram created with shape: (257, 467)
Time axis: 0.01s - 4.98s
Frequency axis: 0.0 - 12000.0 Hz
Saving spectrogram to: temp_spectrogram.png
Image saved with dimensions: 1127 x 790 pixels
Loading image from: temp_spectrogram.png
Image loaded with dimensions: 1127 x 790 pixels
Creating bounding box widget...
Available classes: ['a', 'b', 'c']
BBox widget created successfully!
Instructions:
1. Click and drag to draw bounding boxes
2. Select a class from the dropdown for each box
3. Use the widget controls to manage annotations
=== Pipeline Complete ===
Use get_annotations_dataframe() to get the final results


  self.image = Image.open(self.image_path)


BBoxWidget(classes=['a', 'b', 'c'], colors=['#1f77b4', '#ff7f0e', '#2ca02c', '#d62728', '#9467bd', '#8c564b', …

## Step 3: Instructions for Annotation

Now you can annotate the spectrogram:

1. **Draw bounding boxes**: Click and drag on the image to create bounding boxes
2. **Select classes**: Use the dropdown menu to assign a class ("a", "b", or "c") to each box
3. **Manage annotations**: Use the widget controls to edit or delete boxes
4. **Get results**: Run the cell below to extract the annotations as a DataFrame

## Step 4: Extract Annotations

After you have drawn your bounding boxes, run this cell to get the results as a DataFrame with audio coordinates.

In [None]:
# Get annotations as DataFrame
annotations_df = annotator.get_annotations_dataframe()

print("\n=== Annotations DataFrame ===")
display(annotations_df)

if not annotations_df.empty:
    print("\n=== Summary ===")
    print(f"Total annotations: {len(annotations_df)}")
    print("\nClass distribution:")
    print(annotations_df["class"].value_counts())
    print("\n=== Time ranges ===")
    for i, row in annotations_df.iterrows():
        print(f"Box {i+1}: {row['st_time']:.2f}s - {row['end_time']:.2f}s "
              f"({row['end_time'] - row['st_time']:.2f}s duration)")
    print("\n=== Frequency ranges ===")
    for i, row in annotations_df.iterrows():
        print(f"Box {i+1}: {row['min_freq']:.1f} - {row['max_freq']:.1f} Hz "
              f"({row['max_freq'] - row['min_freq']:.1f} Hz bandwidth)")

AttributeError: 'BBoxWidget' object has no attribute 'get_annotations'

## Step 5: Save Results (Optional)

Save the annotations to a CSV file for later use.

In [None]:
# Save annotations to CSV
if not annotations_df.empty:
    output_file = "audio_annotations.csv"
    annotations_df.to_csv(output_file, index=False)
    print(f"Annotations saved to: {output_file}")
else:
    print("No annotations to save.")

## Step 6: Verification

Let us verify that the coordinate conversion is working correctly by checking the spectrogram properties.

In [None]:
# Display spectrogram properties
print("=== Spectrogram Properties ===")
print(f"Time axis range: {annotator.time_axis[0]:.2f}s - {annotator.time_axis[-1]:.2f}s")
print(f"Frequency axis range: {annotator.freq_axis[0]:.1f} - {annotator.freq_axis[-1]:.1f} Hz")
print(f"Image dimensions: {annotator.image_width} x {annotator.image_height} pixels")
print(f"Spectrogram shape: {annotator.spec_data.shape}")

# Show a sample of the time and frequency axes
print("\n=== Sample Time Axis (first 5 values) ===")
print(annotator.time_axis[:5])

print("\n=== Sample Frequency Axis (first 5 values) ===")
print(annotator.freq_axis[:5])

## Summary

This notebook successfully:

1. ✅ **Loaded audio file** using OpenSoundscape with bandpass filtering
2. ✅ **Created spectrogram** and extracted time/frequency axes
3. ✅ **Saved spectrogram** as PNG with known dimensions
4. ✅ **Loaded the image** for annotation
5. ✅ **Created bounding box widget** with predefined classes
6. ✅ **Generated DataFrame** with columns: 

The coordinate conversion ensures that the bounding box coordinates in the image are properly converted to the original audio time and frequency scales.