Skip to content

Jonathan0607/Rumblr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Rumblr

Conservation research platform that restores corrupted elephant rumble recordings using an unsupervised, non-generative three-branch ML ensemble, classifies calls against published bioacoustic literature, separates simultaneous speakers, and reports behavioral responses to mechanical noise.

Built for the ElephantVoices track at HackSMU VII.

What it does

Given a dataset of elephant recordings contaminated with airplanes, vehicles, generators, or mixed noise, EchoSave:

  1. Detects the rumble fundamental (10–22 Hz).
  2. Builds an energy-gated harmonic protection mask — only keeping harmonics where call-window energy exceeds local inter-harmonic noise by ≥ 6 dB.
  3. Runs three independent denoising branches in parallel:
    • Classical — minimum-statistics noise tracking, over-subtraction, and Wiener gain fused in gain-space.
    • HPSS — harmonic-percussive source separation with asymmetric margins (1.0, 5.0) that strongly penalize transient/broadband noise.
    • NMF — unsupervised non-negative matrix factorization with a post-hoc harmonic-band component classifier. Learns a per-file signal and noise subspace with zero training data.
  4. Picks the branch with the largest measured SNR-in-band improvement, or softmax-fuses them when all three are within 2 dB.
  5. Separates simultaneous callers per annotated call window.
  6. Scores each call against the Poole/ElephantVoices ethogram.
  7. Reports behavioral signals in softened, biologist-defensible language.
  8. Lets the researcher ask Gemini scientific questions with the full analysis context injected server-side.

Every branch is non-generative by construction: it only masks, subtracts, or re-weights existing STFT bins. No model ever synthesizes new spectrum. That property is asserted by the acceptance tests.

Results

Demo file

On 1989-06_airplane_01.wav:

Branch Improvement
classical +1.55 dB
HPSS +1.10 dB
NMF (winner) +15.82 dB

Stability across mask widths (±2 / ±4 / ±6 Hz): +8.97 / +15.82 / +21.20 dB — grows monotonically with bandwidth, not a knife-edge.

Full-dataset audit (38 annotated recordings)

Category n mean Δ dB n_pass ≥ 5 dB range
airplane 16 +9.40 13/16 +2.6 … +22.1
car 14 +9.95 10/14 +2.2 … +26.4
generator 5 +5.89 3/5 +3.7 … +8.9
mixed 3 +16.40 3/3 +7.3 … +34.1
total 38 +9.70 29 / 38 (76%)

The generator and mixed categories were initially the worst — fixed by widening the mains-notch quality factor from Q=30 to Q=10, which gives a ~5 Hz rejection band wide enough to catch real-world mains hum (which fluctuates with engine speed and load). Airplane and car files don't invoke the notch path so they were unaffected.

Honest framing: NMF wins on every file or fuses with the others on ties. Classical and HPSS are independent witnesses — when they agree the signal is real, that's evidence NMF didn't hallucinate. The generator category is the hardest because broadband + mains-tonal noise overlaps the rumble band.

Running it

# Backend
cd backend
pip install -r requirements.txt
# ensure backend/.env has GEMINI_API_KEY and ECHOSAVE_DATASET
python precompute_demo.py        # build frontend hero asset
python -m uvicorn main:app --port 8000

# Frontend (separate shell)
cd frontend
npm install
npm run dev                       # http://127.0.0.1:5173

Keyboard shortcuts (Studio)

Key Action
R Run restoration on the selected file
J Next recording
K Previous recording
A Swap audible source (Before / After)
Space Play / pause both waveforms
Esc Close measurement-protocol modal

Acceptance tests

cd backend
python -m unittest tests.test_dsp -v

All five tests must pass before the frontend is considered runnable:

  • test_hpf_20hz_preserved — 20 Hz sine survives the pipeline within 1 dB
  • test_notch_150hz_drops — 150 Hz drops ≥ 10 dB while 20 Hz survives
  • test_nmf_converges — Branch C produces finite W/H with ≥1 signal component
  • test_hpss_nonzero — Branch B gives nonzero harmonic AND percussive energy
  • test_real_file — full pipeline on DEMO_FILE gets ≥ 5 dB improvement

Project layout

echosave/
├── backend/
│   ├── config.py                    # SR_WORK, NPERSEG, NOVERLAP, NMF settings
│   ├── core.py                      # STFT, F0, energy-gated mask, SNR
│   ├── branches/
│   │   ├── classical.py             # Branch A
│   │   ├── hpss.py                  # Branch B
│   │   └── nmf.py                   # Branch C
│   ├── restoration_engine.py        # ensemble orchestration + stability
│   ├── caller_separator.py          # per-call peak counting + bin assignment
│   ├── call_classifier.py           # Poole ethogram scoring
│   ├── behavioral_analyzer.py       # pre/post calling-rate statistics
│   ├── dataset.py                   # spreadsheet loader
│   ├── gemini_proxy.py              # server-side Gemini streaming
│   ├── main.py                      # FastAPI routes + SSE
│   ├── precompute_demo.py           # build-time hero asset
│   ├── validation/run_on_one_file.py  # human-inspection harness
│   └── tests/test_dsp.py
└── frontend/
    └── src/
        ├── components/
        │   ├── HeroScreen.jsx
        │   ├── StudioScreen.jsx
        │   ├── ArchiveScreen.jsx
        │   ├── SpectrogramCanvas.jsx
        │   ├── PipelineAnimation.jsx
        │   ├── EnsembleComparison.jsx
        │   ├── CallCard.jsx
        │   ├── DualPlayback.jsx
        │   ├── GeminiSidebar.jsx
        │   ├── MeasurementProtocolModal.jsx
        │   └── BehavioralPanel.jsx
        └── lib/{api,sse,motion}.js

Scientific honesty

  • All numbers in the UI are measured. No fabrication. When the classifier confidence is below 45, the label reads "Unclassified — insufficient acoustic match."
  • Behavioral insights use the phrasing "consistent with documented X" — never "we discovered elephant was stressed."
  • The NMF component classifier is a post-hoc rule based on protected-band energy fraction, not a trained supervised model.
  • Gemini answers are labeled "AI-assisted interpretation" and are never ground truth.
  • The SNR-in-band mask is derived from RAW audio exactly once and reused unchanged for every branch's measurement — the metric is provably non-circular.
  • Stability at three mask widths is reported alongside the headline number so a judge can confirm the result isn't knife-edge.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors