Conservation research platform that restores corrupted elephant rumble recordings using an unsupervised, non-generative three-branch ML ensemble, classifies calls against published bioacoustic literature, separates simultaneous speakers, and reports behavioral responses to mechanical noise.
Built for the ElephantVoices track at HackSMU VII.
Given a dataset of elephant recordings contaminated with airplanes, vehicles, generators, or mixed noise, EchoSave:
- Detects the rumble fundamental (10–22 Hz).
- Builds an energy-gated harmonic protection mask — only keeping harmonics where call-window energy exceeds local inter-harmonic noise by ≥ 6 dB.
- Runs three independent denoising branches in parallel:
- Classical — minimum-statistics noise tracking, over-subtraction, and Wiener gain fused in gain-space.
- HPSS — harmonic-percussive source separation with asymmetric margins (1.0, 5.0) that strongly penalize transient/broadband noise.
- NMF — unsupervised non-negative matrix factorization with a post-hoc harmonic-band component classifier. Learns a per-file signal and noise subspace with zero training data.
- Picks the branch with the largest measured SNR-in-band improvement, or softmax-fuses them when all three are within 2 dB.
- Separates simultaneous callers per annotated call window.
- Scores each call against the Poole/ElephantVoices ethogram.
- Reports behavioral signals in softened, biologist-defensible language.
- Lets the researcher ask Gemini scientific questions with the full analysis context injected server-side.
Every branch is non-generative by construction: it only masks, subtracts, or re-weights existing STFT bins. No model ever synthesizes new spectrum. That property is asserted by the acceptance tests.
On 1989-06_airplane_01.wav:
| Branch | Improvement |
|---|---|
| classical | +1.55 dB |
| HPSS | +1.10 dB |
| NMF (winner) | +15.82 dB |
Stability across mask widths (±2 / ±4 / ±6 Hz): +8.97 / +15.82 / +21.20 dB — grows monotonically with bandwidth, not a knife-edge.
| Category | n | mean Δ dB | n_pass ≥ 5 dB | range |
|---|---|---|---|---|
| airplane | 16 | +9.40 | 13/16 | +2.6 … +22.1 |
| car | 14 | +9.95 | 10/14 | +2.2 … +26.4 |
| generator | 5 | +5.89 | 3/5 | +3.7 … +8.9 |
| mixed | 3 | +16.40 | 3/3 | +7.3 … +34.1 |
| total | 38 | +9.70 | 29 / 38 (76%) | — |
The generator and mixed categories were initially the worst — fixed
by widening the mains-notch quality factor from Q=30 to Q=10,
which gives a ~5 Hz rejection band wide enough to catch real-world
mains hum (which fluctuates with engine speed and load). Airplane
and car files don't invoke the notch path so they were unaffected.
Honest framing: NMF wins on every file or fuses with the others on ties. Classical and HPSS are independent witnesses — when they agree the signal is real, that's evidence NMF didn't hallucinate. The generator category is the hardest because broadband + mains-tonal noise overlaps the rumble band.
# Backend
cd backend
pip install -r requirements.txt
# ensure backend/.env has GEMINI_API_KEY and ECHOSAVE_DATASET
python precompute_demo.py # build frontend hero asset
python -m uvicorn main:app --port 8000
# Frontend (separate shell)
cd frontend
npm install
npm run dev # http://127.0.0.1:5173| Key | Action |
|---|---|
R |
Run restoration on the selected file |
J |
Next recording |
K |
Previous recording |
A |
Swap audible source (Before / After) |
Space |
Play / pause both waveforms |
Esc |
Close measurement-protocol modal |
cd backend
python -m unittest tests.test_dsp -vAll five tests must pass before the frontend is considered runnable:
test_hpf_20hz_preserved— 20 Hz sine survives the pipeline within 1 dBtest_notch_150hz_drops— 150 Hz drops ≥ 10 dB while 20 Hz survivestest_nmf_converges— Branch C produces finite W/H with ≥1 signal componenttest_hpss_nonzero— Branch B gives nonzero harmonic AND percussive energytest_real_file— full pipeline on DEMO_FILE gets ≥ 5 dB improvement
echosave/
├── backend/
│ ├── config.py # SR_WORK, NPERSEG, NOVERLAP, NMF settings
│ ├── core.py # STFT, F0, energy-gated mask, SNR
│ ├── branches/
│ │ ├── classical.py # Branch A
│ │ ├── hpss.py # Branch B
│ │ └── nmf.py # Branch C
│ ├── restoration_engine.py # ensemble orchestration + stability
│ ├── caller_separator.py # per-call peak counting + bin assignment
│ ├── call_classifier.py # Poole ethogram scoring
│ ├── behavioral_analyzer.py # pre/post calling-rate statistics
│ ├── dataset.py # spreadsheet loader
│ ├── gemini_proxy.py # server-side Gemini streaming
│ ├── main.py # FastAPI routes + SSE
│ ├── precompute_demo.py # build-time hero asset
│ ├── validation/run_on_one_file.py # human-inspection harness
│ └── tests/test_dsp.py
└── frontend/
└── src/
├── components/
│ ├── HeroScreen.jsx
│ ├── StudioScreen.jsx
│ ├── ArchiveScreen.jsx
│ ├── SpectrogramCanvas.jsx
│ ├── PipelineAnimation.jsx
│ ├── EnsembleComparison.jsx
│ ├── CallCard.jsx
│ ├── DualPlayback.jsx
│ ├── GeminiSidebar.jsx
│ ├── MeasurementProtocolModal.jsx
│ └── BehavioralPanel.jsx
└── lib/{api,sse,motion}.js
- All numbers in the UI are measured. No fabrication. When the classifier confidence is below 45, the label reads "Unclassified — insufficient acoustic match."
- Behavioral insights use the phrasing "consistent with documented X" — never "we discovered elephant was stressed."
- The NMF component classifier is a post-hoc rule based on protected-band energy fraction, not a trained supervised model.
- Gemini answers are labeled "AI-assisted interpretation" and are never ground truth.
- The SNR-in-band mask is derived from RAW audio exactly once and reused unchanged for every branch's measurement — the metric is provably non-circular.
- Stability at three mask widths is reported alongside the headline number so a judge can confirm the result isn't knife-edge.