GitHub - IS2AI/TatarSCR: An Open-Source Speech Commands Dataset for the Tatar Language

An Open-Source Tatar Speech Commands Dataset

Paper

An Open-Source Tatar Speech Commands Dataset for IoT and Robotics Applications

Dataset

The dataset covers 35 commands used in robotics, IoT, and smart systems. In total, the dataset contains 3,547 one-second utterances from 153 people. The dataset can be downloaded from Google Drive.

Data Augmentation

To preprocess and augment the dataset, you can use the data_preprocessing_augmentation.ipynb notebook. However, first you need to download the ESC-50: Dataset for Environmental Sound Classification dataset.

Model Training, Testing, and Optimization

In this project, we used Keyword-MLP model. We sincerely thank the authors for open sourcing their code.

git clone https://github.com/IS2AI/TatarSCR.git
cd TatarSCR
cd Keyword-MLP
pip3 install -r requirements.txt

Training and Testing

python3 train.py --conf configs/kwmlp_tscd.yaml

The model automatically runs on the test set at the end of the training.

Convert to ONNX

python3 convert_to_onnx.py --conf configs/kwmlp_tscd.yaml \
                           --ckpt checkpoints/best.pth \
                           --out checkpoints/best.onnx

Inference

PyTorch

inference.py: For short ~1s clips, like the audios in the Speech Commands dataset
window_inference.py: For running inference on longer audio clips, where multiple keywords may be present. Runs inference on the audio in a sliding window manner.

python3 inference.py --conf configs/kwmlp_tscd.yaml \
                    --ckpt checkpoints/best.pth \
                    --inp <path to audio.wav / path to audio folder> \
                    --out <output directory> \
                    --lmap label_map.json \
                    --device cpu \ # change to cuda if you have a gpu
                    --batch_size 8   # should be possible to use much larger batches if necessary, like 128, 256, 512 etc.

python3 window_inference.py --conf configs/kwmlp_tscd.yaml \
                    --ckpt checkpoints/best.pth \
                    --inp <path to audio.wav / path to audio folder> \
                    --out <output directory> \
                    --lmap label_map.json \
                    --device cpu \ # change to cuda if you have a gpu
                    --wlen 1 \
                    --stride 0.5 \
                    --thresh 0.85 \
                    --mode multi

ONNX

python3 onnx_inference.py --onnx_model checkpoints/best.onnx \
                          --conf configs/kwmlp_tscd.yaml \
                          --lmap label_map.json \
                          --inp <path to audio.wav / path to audio folder>

In case of using our dataset/model please cite our work

Askat Kuzdeuov, Rinat Gilmullin, Bulat Khakimov, and Huseyin Atakan Varol. An Open-Source Tatar Speech Commands Dataset for IoT and Robotics Applications. TechRxiv. October 18, 2024,
DOI: 10.36227/techrxiv.172926779.98914732/v1.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Keyword-MLP		Keyword-MLP
LICENSE		LICENSE
README.md		README.md
data_preprocessing_augmentation.ipynb		data_preprocessing_augmentation.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

An Open-Source Tatar Speech Commands Dataset

Paper

Dataset

Data Augmentation

Model Training, Testing, and Optimization

Training and Testing

Convert to ONNX

Inference

PyTorch

ONNX

In case of using our dataset/model please cite our work

About

Releases

Packages

Languages

License

IS2AI/TatarSCR

Folders and files

Latest commit

History

Repository files navigation

An Open-Source Tatar Speech Commands Dataset

Paper

Dataset

Data Augmentation

Model Training, Testing, and Optimization

Training and Testing

Convert to ONNX

Inference

PyTorch

ONNX

In case of using our dataset/model please cite our work

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages