Assuming that the preprocessed manifest files are in manifest/slue-voxpopuli
for SLUE-VoxPopuli. This command fine-tune a wav2vec 2.0 base model using one GPU.
bash baselines/ner/e2e_scripts/ft-w2v2-base.sh manifest/slue-voxpopuli/e2e_ner save/e2e_ner/w2v2-base
To fine-tune wav2vec 2.0 large ll60k model using 8 GPUs, please run:
bash baselines/ner/e2e_scripts/ft-w2v2-base.sh manifest/slue-voxpopuli/e2e_ner save/e2e_ner/w2v2-large
To decode with LM, please first build the 4-gram LM using this command
bash scripts/build_vp_ner_lm.sh $kenlm_build_bin
where $kenlm_build_bin
is the path of your kenlm build folder (e.g., /home/user/kenlm/build/bin
).
To evaluate the fine-tuned wav2vec 2.0 E2E NER model on the dev set, please run the following command.
- Decoded without language model
bash baselines/ner/e2e_scripts/eval-ner.sh w2v2-base dev combined nolm
- Decoded with language model
bash baselines/ner/e2e_scripts/eval-ner.sh w2v2-base dev combined vp_ner/4
This command trains the deberta-base model on ground-truth text transcripts with raw labels
bash baselines/ner/nlp_scripts/ft-deberta.sh deberta-base raw
The above command can also be used to train deberta-large
model and also accepts combined
tag set as argument
The following command evaluates the trained deberta-base model on dev set with combined labels
bash baselines/ner/nlp_scripts/eval-deberta.sh deberta-base dev raw combined
The ASR module is trained using the scripts mentioned here. The text NER module is trained using the scripts mentioned here.
The following command evalutes the pipeline model that uses w2v2-base as ASR backbone and deberta-base as text NER backbone, with the former decoded using the T3 language model as mentioned here.
bash baselines/ner/pipeline_scripts/eval.sh w2v2-base deberta-base dev combined t3/3