# Aural Alert: 3D Audio Emergency Signal System

#### Project Overview
This project develops a spatial audio system for emergency alerts, makin them more informative and easier to locate through 3D audio techniques.

#### Toolkit
1. Dataset UrbanSound8K
- Contains various urban sounds categorized into 10 classes
- Focus on Class 8 (sirens) for emergency alerts

2. Audio Processing Pipeline
- Input: Raw siren audio files
- Processing: 
    - Sample rate normalization to 48kHz.
    - Spatial positioning using HRTF.
    - Urgency level modulation
    - Room acoustics simulation

3. Emergency Signal Features
- Urgency Levels: Scale from 0-5
- Spatial Information
    - Distance (proximity in meters)
    - Direction (azimuth/elevation in degrees)

4. Audio Spatialization
- HRTF Implementation: For accurate 3D positioning
- VBAP: Variable spread ($15^{\circ} \; \text{to} \; 45^{\circ}$)
- Room Acoustics:
    - Office spaces (short reverb)
    - Hallways (long reverb)


In [2]:
# modules
import os
import soundata
import numpy as np
import pandas as pd
import librosa
from scipy import signal
import matplotlib.pyplot as plt
from IPython.display import Audio

# pipeline
from utils import SpatialProcessor

#### Using UrbanSound8k Dataset and Soundata for data validation
#### For this project the siren label taxonomy will be used.
- classID - A numeric identifier of the sound class:
    - 0 = air_conditioner
    - 1 = car_horn
    - 2 = children_playing
    - 3 = dog_bark
    - 4 = drilling
    - 5 = engine_idling
    - 6 = gun_shot
    - 7 = jackhammer
    - 8 = siren
    - 9 = street_music

In [3]:
# available datasets in soundata 
for dataset in soundata.list_datasets():
    print(dataset)

dcase23_task2
dcase23_task4b
dcase23_task6a
dcase23_task6b
dcase_bioacoustic
dcase_birdVox20k
eigenscape
eigenscape_raw
esc50
freefield1010
fsd50k
fsdnoisy18k
marco
singapura
starss2022
tau2019sse
tau2019uas
tau2020sse_nigens
tau2020uas_mobile
tau2021sse_nigens
tau2022uas_mobile
tut2017se
urbansed
urbansound8k
warblrb10k


In [None]:
dataset = soundata.initialize('urbansound8k', data_home='/Users/calodii/Desktop/stuff/home/threed-audio/final/aural-alert')
# dataset download and validation only has to be run one time
# dataset.download()
# dataset.validate()

# metadata
metadata_path = os.path.join(dataset.data_home, 'metadata', 'UrbanSound8K.csv')
metadata = pd.read_csv(metadata_path)

# Dataset Development
#### Spatialization Pipeline
- Extract "siren" class clips form the UrbanSound8k datasets and normalize sample rates to 48kHz

#### Metadata Embedding
- Use metadata fields for:
    - Urgency Levels (0-5 scale)
    - Proximity (meters/dBSPL reference)
    - Spatial Position (azimuth/elevation (degrees))

#### Spatial Capture Simulation
- Proces mno files through Ambisonic encoders (Pro Tools)
- Apply synthesized room acoustics using REAKTOR BRIR Generator for:
    - Hallways (long RT60)
    - Office Rooms (short RT60)

# 3D Audio Implementation
#### HRTF Spatialization
- Near-field HRTF compensation (<1m sources>)
Dynamic ITD/ILD adjustment based on urgency (e.g. +15% ILD for critical signals)
- Map urgency to spectral brightness
- Directional encoding: $0^{\circ}$=fire, $120^{\circ}$=gas leak, $240^{\circ}$=evacuation route

# VBAP Implementation
- Using Faust's VBAP library increase spread range from $15^{\circ}$(normal) to $45^{\circ}$(urgent).

# Without Spatial Processing

In [None]:
# initializing dataset and processor
dataset = soundata.initialize('urbansound8k', data_home='/Users/calodii/Desktop/stuff/home/threed-audio/final/aural-alert')
processor = SpatialProcessor(sample_rate=48000, dataset=dataset)

# Get first siren
metadata = pd.read_csv(os.path.join(dataset.data_home, 'metadata', 'UrbanSound8K.csv'))
siren_files = metadata[metadata['classID'] == 8]
first_siren = siren_files.iloc[0]
track_id = first_siren['slice_file_name'].split('.')[0]

# Load and play audio
audio, sr = processor.load_audio(track_id)

# Display info
print(f"File name: {first_siren['slice_file_name']}")
print(f"Class: {first_siren['class']}")
print(f"Sample rate: {sr} Hz")
print(f"Duration: {len(audio)/sr:.2f} seconds")

# Play audio
Audio(audio, rate=sr)

File name: 102853-8-0-0.wav
Class: siren
Sample rate: 44100 Hz
Duration: 4.00 seconds


# With Spatial Processing

In [None]:
# initializing dataset and processor
dataset = soundata.initialize('urbansound8k', data_home='/Users/calodii/Desktop/stuff/home/threed-audio/final/aural-alert')
processor = SpatialProcessor(sample_rate=48000, dataset=dataset)

# get siren files from metadata
metadata_path = os.path.join(dataset.data_home, 'metadata', 'UrbanSound8K.csv')  # Note the capital K
metadata = pd.read_csv(metadata_path)
siren_files = metadata[metadata['classID'] == 8]

# processing first siren as an example
first_siren = siren_files.iloc[0]
track_id = first_siren['slice_file_name'].split('.')[0]

result = processor.process_emergency_signal(
    track_id,
    urgency=0.8,
    azimuth=0,
    room_type='office'
)

# play result
Audio(result.T, rate=processor.sample_rate)

FileNotFoundError: [Errno 2] No such file or directory: 'hrtf_database.npy'