Skip to content

Lhx94As/PHO-LID

Repository files navigation

PHO-LID: A Unified Model to Incorporate Acoustic-Phonetic and Phonotactic Information for Language Identification

Some people have successfully reproduced the results (and achieved even better ones). There are no specific hyper-parameters or random seed, just remember to have a low number of nega_frame (see the config.json).

DDP is not applicable during training, will try to update soon. Single-GPU training works well.
An example to run it: python train_PHOLID.py --json /home/tony/PHOLID_config_file.json

The training data txt file should be look like:

data0_path.npy(space)language_index0(space)T'_0
data1_path.npy language_index1 T'_1
data2_path.npy language_index2 T'_2
data3_path.npy language_index3 T'_3

Where the T'_0 is sequence length of the data reshaped from T x D to T' x 20 x D, 20 denotes 20 frames in each embedding. You may also prepare your own script and only use this model config.

If you find this paper useful, cite:

@inproceedings{liu22e_interspeech,
author={Hexin Liu and Leibny Paola {Garcia Perera} and Andy Khong and Suzy Styles and Sanjeev Khudanpur},
title={{PHO-LID: A Unified Model Incorporating Acoustic-Phonetic and Phonotactic Information for Language Identification}},
year=2022,
booktitle={Proc. Interspeech 2022},
pages={2233--2237},
doi={10.21437/Interspeech.2022-354}
}

About

PHO-LID: A Unified Model to Incorporate Acoustic-Phonetic and Phonotactic Information for Language Identification

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages