# SereneSense Quickstart Demo

Welcome to SereneSense! This notebook demonstrates the basic workflow for detecting military vehicle sounds using a pre-trained model.

**Duration**: ~5 minutes
**Topics**: Model loading, inference, visualization

## Installation Check

First, let's verify all required packages are installed:

In [None]:
import subprocess
import sys

# Verify packages
packages = ['torch', 'torchaudio', 'librosa', 'soundfile', 'plotly', 'numpy']
missing = []

for package in packages:
    try:
        __import__(package)
        print(f'✓ {package} installed')
    except ImportError:
        missing.append(package)
        print(f'✗ {package} NOT installed')

if missing:
    print(f'\nInstalling missing packages: {missing}')
    subprocess.check_call([sys.executable, '-m', 'pip', 'install'] + missing)
    print('Installation complete!')
else:
    print('\n✓ All packages installed!')

## Import Libraries

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import librosa
import soundfile as sf
import torch
from pathlib import Path
import warnings
warnings.filterwarnings('ignore')

print('✓ All imports successful')

## Load Pre-trained Model

Load the AudioMAE model for military vehicle sound detection:

In [None]:
from src.core.models.audioMAE.model import AudioMAE, AudioMAEConfig
from src.core.core.model_manager import ModelManager

# Initialize model
model_manager = ModelManager(model_type='audioMAE')
print(f'✓ Model loaded: {model_manager.model_type}')
print(f'  Device: {model_manager.device}')
print(f'  Model size: {sum(p.numel() for p in model_manager.model.parameters()) / 1e6:.1f}M parameters')

## Create Sample Audio

Since we don't have a real audio file, let's create a synthetic audio sample:

In [None]:
# Create synthetic audio for demonstration
sr = 16000  # Sample rate
duration = 5  # 5 seconds
t = np.linspace(0, duration, int(sr * duration))

# Create a synthetic signal with multiple frequencies (simulating vehicle sound)
frequencies = [100, 250, 500, 1000]  # Multiple harmonics
amplitudes = [0.3, 0.2, 0.15, 0.1]
audio = np.zeros_like(t)

for freq, amp in zip(frequencies, amplitudes):
    audio += amp * np.sin(2 * np.pi * freq * t)

# Add some noise
audio += 0.05 * np.random.randn(len(audio))

# Normalize
audio = audio / np.max(np.abs(audio)) * 0.95

print(f'✓ Created synthetic audio')
print(f'  Sample rate: {sr} Hz')
print(f'  Duration: {len(audio) / sr:.2f} seconds')
print(f'  Audio shape: {audio.shape}')
print(f'  Audio range: [{audio.min():.3f}, {audio.max():.3f}]')

## Visualize Audio

Let's visualize the waveform and spectrogram:

In [None]:
# Create figure with subplots
fig, axes = plt.subplots(2, 1, figsize=(14, 8))

# Waveform
axes[0].plot(t, audio, linewidth=0.5)
axes[0].set_xlabel('Time (s)')
axes[0].set_ylabel('Amplitude')
axes[0].set_title('Waveform')
axes[0].grid(True, alpha=0.3)

# Spectrogram (Mel scale)
mel_spec = librosa.feature.melspectrogram(
    y=audio, sr=sr, n_mels=64, fmax=8000
)
mel_spec_db = librosa.power_to_db(mel_spec, ref=np.max)

im = axes[1].imshow(
    mel_spec_db, aspect='auto', origin='lower', cmap='viridis',
    extent=[0, duration, 0, 8000], interpolation='bilinear'
)
axes[1].set_xlabel('Time (s)')
axes[1].set_ylabel('Frequency (Hz)')
axes[1].set_title('Mel-Scale Spectrogram')
plt.colorbar(im, ax=axes[1], label='Power (dB)')

plt.tight_layout()
plt.show()

print('✓ Visualizations created')

## Run Inference

Process the audio through the model:

In [None]:
# Prepare audio for model
# Model expects input shape: (batch_size, num_samples)
audio_tensor = torch.FloatTensor(audio).unsqueeze(0)  # Add batch dimension

# Run inference
with torch.no_grad():
    model_manager.model.eval()
    if torch.cuda.is_available():
        audio_tensor = audio_tensor.to(model_manager.device)
    
    # Get model predictions
    outputs = model_manager.model(audio_tensor)

print('✓ Inference complete')
print(f'  Output shape: {outputs.shape}')
print(f'  Output type: {type(outputs)}')

## Display Results

Show the detection results:

In [None]:
# Define class labels (from MAD dataset)
class_labels = {
    0: 'Helicopter',
    1: 'Fighter Aircraft',
    2: 'Military Vehicle',
    3: 'Truck',
    4: 'Footsteps',
    5: 'Speech',
    6: 'Background'
}

# Get predictions
with torch.no_grad():
    logits = outputs
    probabilities = torch.softmax(logits, dim=1)
    top_prob, top_class = torch.max(probabilities, dim=1)

predicted_class = top_class.item()
confidence = top_prob.item()
predicted_label = class_labels[predicted_class]

print(f'🎯 Detection Results:')
print(f'   Predicted Class: {predicted_label} (ID: {predicted_class})')
print(f'   Confidence: {confidence * 100:.2f}%')
print()
print(f'📊 All Class Probabilities:')
for i in range(len(class_labels)):
    prob = probabilities[0, i].item()
    print(f'   {class_labels[i]:20s}: {prob * 100:6.2f}%')

## Visualize Predictions

Create an interactive visualization of the results:

In [None]:
# Extract probabilities
probs_np = probabilities[0].cpu().numpy()
labels = [class_labels[i] for i in range(len(class_labels))]

# Create interactive bar chart
fig = go.Figure(data=[
    go.Bar(
        x=labels,
        y=probs_np * 100,
        marker=dict(
            color=probs_np,
            colorscale='Viridis',
            showscale=True,
            colorbar=dict(title='Probability (%)')
        ),
        text=[f'{p*100:.1f}%' for p in probs_np],
        textposition='auto',
    )
])

fig.update_layout(
    title='SereneSense: Military Vehicle Sound Detection',
    xaxis_title='Sound Class',
    yaxis_title='Probability (%)',
    height=500,
    hovermode='x unified',
    template='plotly_white'
)

fig.show()

## Next Steps

Congratulations! You've successfully run inference with SereneSense. Here are some things you can try next:

1. **Load Real Audio**: Replace the synthetic audio with your own WAV/MP3 files
2. **Batch Processing**: Process multiple audio files at once
3. **Real-time Detection**: See `05_deployment_walkthrough.ipynb` for real-time streaming
4. **Model Training**: See `03_model_training.ipynb` to train your own model
5. **Edge Optimization**: See `04_edge_optimization.ipynb` to optimize for edge devices

For more details, check out the [Documentation](../docs/) and [GitHub Repository](https://github.com/serenesense/serenesense)