Unable to generate predictions for wav2vec model fine-tuned with custom data #11432

nayak24 · 2022-01-11T23:36:57Z

nayak24
Jan 11, 2022

Hi, I'm trying to fine-tune the baseline wav2vec model with my own audio training/test data using Lightning Flash, essentially exactly following the tutorial in this doc:
https://lightning-flash.readthedocs.io/en/latest/reference/speech_recognition.html

However, I am running into an issue when generating the prediction for an audio file, and I'm getting a null output:

94.4 M Trainable params
0 Non-trainable params
94.4 M Total params
377.585 Total estimated model params size (MB)
Epoch 0: 100%|█████████████████████████████████████████████████████████| 88/88 [02:43<00:00, 1.86s/it, loss=633, v_num=57, train_loss_step=750.0]
Predicting: 88it [00:00, ?it/s]
[['']]

I'm not sure what the issue is, as I've only replaced the Timit dataset with my own input data for fine-tuning, and the rest of the script follows exactly from the doc above. All of the input data are wav files with the following format:

format | 1 (uncompressed PCM)
number of channel | 1 (mono)
sampleRate | 16000
byteRate | 32000
blockAlign | 2
bitsPerSample (bit depth) | 16

I'm new to PyTorch Lightning and training with wav2vec as a whole, so I'm guessing that I'm missing something obvious. Any help would be greatly appreciated!

Here is the full script I'm running:

import torch
import flash
from flash.audio import SpeechRecognition, SpeechRecognitionData
from flash.core.data.utils import download_data

#download_data("https://pl-flash-data.s3.amazonaws.com/timit_data.zip", "./data")

datamodule = SpeechRecognitionData.from_csv(
    input_fields="file",
    target_fields="text",
    #train_file="data/timit/train.json",
    #test_file="data/timit/test.json",
    train_file="FLT034/FLT034-TRAIN.csv",
    test_file="FLT034/FLT034-TEST.csv",
    batch_size=4,
)

#can use any wav2vec model in HuggingFace as backbone for finetuning
model = SpeechRecognition(backbone="facebook/wav2vec2-base-960h")

#create trainer and finetune model
trainer = flash.Trainer(max_epochs=1)
trainer.finetune(model, datamodule=datamodule, strategy='no_freeze')

# predict on audio files
#datamodule = SpeechRecognitionData.from_files(predict_files=["data/timit/example.wav"], batch_size=4)
datamodule = SpeechRecognitionData.from_files(predict_files=["FLT034/FLT034-14.wav"], batch_size=4)
predictions = trainer.predict(model, datamodule=datamodule)
print(predictions)

# Save Checkpoint 
trainer.save_checkpoint("FL034_trained_model.pt")

And here is a sample of the train.csv file with the annotations:

file,text
"./FLT034-12.wav","Weather at one seven five eight zulu."
"./FLT034-13.wav","Wind one niner zero at eight."
"./FLT034-14.wav","Visibility eight ceiling eight hundred overcast."
"./FLT034-15.wav","Temperature one five"
"./FLT034-16.wav","Dewpoint one four"
"./FLT034-17.wav","Altimeter three zero"
"./FLT034-18.wav","Get both sides on a mic"

tchaton · 2022-01-12T07:43:40Z

tchaton
Jan 12, 2022
Maintainer

I would advise you to check out Lightning Flash. This is what we are building there. The ability to make prediction on raw data.

2 replies

nayak24 Jan 12, 2022
Author

Thanks for the reply! I'm not entirely sure what you mean, as I am using the Lightning Flash speech recognition module script for the training. Is there a new class/method I should be creating? Sorry, I'm not very familiar with lightning flash.

nayak24 Jan 12, 2022
Author

sorry, just realized I posted this in the wrong repo!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to generate predictions for wav2vec model fine-tuned with custom data #11432

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Unable to generate predictions for wav2vec model fine-tuned with custom data #11432

nayak24 Jan 11, 2022

Replies: 1 comment · 2 replies

tchaton Jan 12, 2022 Maintainer

nayak24 Jan 12, 2022 Author

nayak24 Jan 12, 2022 Author

nayak24
Jan 11, 2022

Replies: 1 comment 2 replies

tchaton
Jan 12, 2022
Maintainer

nayak24 Jan 12, 2022
Author

nayak24 Jan 12, 2022
Author