Identification and Localization of Siren Signals with Beamforming and Non-negative Matrix Factorization
Team members
- Hankyu Jang (hankjang@iu.edu)
- Leonard Yulianus (lyulianu@iu.edu)
- Sunwoo Kim (kimsunw@iu.edu)
Part1: Ambulance Identification
Datasets
Preprocessing
Convert audio signal to 16kHz sampling rate and 1 audio channel:
sox --norm <input> -b 16 <output> rate 16000 channels 1 dither -s
Experiment
Prerequisite
Since the code are written in Python 3, if you run the code on the campus servers, turn Python 3 module on by running:
module load python/3.6.0
Dimensionality Reduction Model
First, we need to train a dimensionality reduction model (NMF), run the following command to train the model:
python train-dimred.py <model output> <audio training input>
For example, we are building the model using our training ambulance signals:
python train-dimred.py ambulance.dimred datasets/train/ambulance/*.wav
Classifier (SVM)
We train the classifier using the lower dimensional data that will be transformed by our dimensionality reduction model, run the following command:
python train-svm.py <classifier model output> <audio training input> --dimred <dimensionality reduction model>
python train-svm.py svm.model datasets/train/**/*.wav --dimred ambulance.dimred
Testing
Now we can use the classifier model to test our test set:
python test.py svm.model datasets/test/**/*.wav
Result
$ python train-dimred.py ambulance.dimred datasets/train/ambulance/*.wav
$ python train-svm.py svm.model datasets/train/{ambulance,others}/*.wav --dimred ambulance.dimred
Training accuracy: 0.9746621621621622
$ python test.py svm.model datasets/test/{ambulance,others}/*.wav
Testing accuracy: 0.96
Result (on mixed ambulance data)
$ python train-dimred.py ambulance.dimred datasets/train/ambulance/*.wav
$ python train-svm.py svm.model datasets/train/{ambulance,others}/*.wav --dimred ambulance.dimred
Training accuracy: 0.9746621621621622
$ python test.py svm.model datasets/test/mixed_ambulance/**/*.wav
Testing accuracy: 0.89
Visualizing audio signals
We can run the detection against longer audio signals, and see the detection per seconds by running:
python viz-audio.py <classifier model> <audio signal(s)>
For example (show graph to the screen):
python viz-audio.py svm.model datasets/{raw/ambulance1.wav,raw/traffic-10.wav,raw_ambulance_mixed/mixed_ambulance10.wav}
For example (save graph to file, useful when running from server):
python viz-audio.py svm.model datasets/{raw/ambulance1.wav,raw/traffic-10.wav,raw_ambulance_mixed/mixed_ambulance10.wav} --save
Some plot results
Part2: Localization
Here, we are going to find the location of the ambulance by using the recording data captured from three microphones.
Mathematic proof of the model
Following diagram shows the overall structure of the problem we are solving.
: microphone1
: microphone2
: microphone3
: ambulance
: distance from (A) to (1)
: distance from (A) to (2)
: distance from (A) to (3)
:
-
:
-
:
-
The three imaginary lines are drawn that equally perpendicularly divides the sides. If the ambulance is on the line where =0, then this means Ambulance is within same distance from (1) and (2). If
> 0, the ambulance is located on the right hand side. We used this knowledge to calculate the three distances.
Using the three distances each from microphone to the source, I calculated the distance from the circumcenter to the source. Let's call this distance (length of OA in the following diagram).
After getting the , I calculated the angle from the circumcenter to the source (I set the line 12 as the base line). Let's say this angle as
.
To calculate , I used two triangles,
and
to get the angles
and
. At last I got
by subtracting those two angles from 180.
Implementation of the model
input:
: distance difference of the signal 1 and 2
: distance difference of the signal 3 and 1
: distance between microphone1 and microphone2
: distance between microphone2 and microphone3
: distance between microphone3 and microphone1
output:
: distance from the circumcenter of the three microphone and the source
: angle from the circumcenter of the three microphone and the source
$ python -i localization.py -d12 5.0 -d31 4.0 -R12 10.0 -R23 9.7 -R31 10.2
d0: 9.039, theta0: 64.145degrees
Experiment
Shift the wav
python shift_wav.py -i 440Hz.wav -o 440Hz_88shift.wav -n 88
python shift_wav.py -i 440Hz.wav -o 440Hz_89shift.wav -n 89
This bash script creates shifted waves from 1 to 89
shift_wav.sh
Equilateral triangle with length of edge: 0.8
If the ambulance is coming from the back straight from the car, the maximum = 0.8 *
/ 2 = 0.69282
0.6922m (89 * sound of speed / sampling rate): this is the maximum .
Ambulance coming reaching the car from the back
python localization.py -i1 beep/440Hz.wav -i2 beep/440Hz.wav -i3 beep/440Hz_89shift.wav -R12 0.8 -R23 0.8 -R31 0.8
python localization.py -i1 beep/440Hz.wav -i2 beep/440Hz.wav -i3 beep/440Hz_88shift.wav -R12 0.8 -R23 0.8 -R31 0.8
python localization.py -i1 beep/440Hz.wav -i2 beep/440Hz.wav -i3 beep/440Hz_87shift.wav -R12 0.8 -R23 0.8 -R31 0.8
python localization.py -i1 beep/440Hz.wav -i2 beep/440Hz.wav -i3 beep/440Hz_86shift.wav -R12 0.8 -R23 0.8 -R31 0.8