GitHub - NeurodataLab/multi-speaker: Automatic detection of multi-speaker fragments with high time resolution

Automatic detection of multi-speaker fragments with high time resolution

This is python implementation of the project, described in Automatic detection of multi-speaker fragments with high time resolution. The aim of the project is to detect fragments in audio files where there are more than one speakers.

How to use

First step: compute spectrogram from input audio --> specs/SPEC.jpeg
python2 SpecCreator.py [--audio-path=AUDIO_PATH] [--dir=DIRECTORY_WITH_AUDIOS]

Second step: process spectrogram by CNN --> results/RESULTS.json
pythn2 VoiceCounter.py [--spec-path=SPECTROGRAM_PATH] [--dir=DIRECTORY_WITH_SPECTROGRAMS]

Output will be in json format with probabilities of more than one speakers talking in the each frame.

For full information about params see
python2 SpecCreator.py --help and python2 VoiceCounter.py --help

Requirements

Linux
python2
numpy, scipy, scikit-image, matplotlib, mxnet, tqdm

Future updates

The output json file may need different processing depending on current aims. The code to obtain the results from the main paper, section 2.3, will be provided later (write the authors, if needed).

Authors and citation

When using this code, please cite this paper

Authors: Belyaev Andrey, Kazimirova Evdokia

Support: Neurodatalab LLC, USA

Contact: e.kazimirova@neurodatalab.com

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
model		model
samples		samples
LICENSE		LICENSE
README.md		README.md
SpecCreator.py		SpecCreator.py
VoiceCounter.py		VoiceCounter.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model

model

samples

samples

LICENSE

LICENSE

README.md

README.md

SpecCreator.py

SpecCreator.py

VoiceCounter.py

VoiceCounter.py

Repository files navigation

Automatic detection of multi-speaker fragments with high time resolution

How to use

Requirements

Future updates

Authors and citation

When using this code, please cite this paper

About

Releases

Packages

Languages

License

NeurodataLab/multi-speaker

Folders and files

Latest commit

History

Repository files navigation

Automatic detection of multi-speaker fragments with high time resolution

How to use

Requirements

Future updates

Authors and citation

When using this code, please cite this paper

About

Resources

License

Stars

Watchers

Forks

Languages