Replies: 1 comment 2 replies
-
I would advise you to check out Lightning Flash. This is what we are building there. The ability to make prediction on raw data. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi, I'm trying to fine-tune the baseline wav2vec model with my own audio training/test data using Lightning Flash, essentially exactly following the tutorial in this doc:
https://lightning-flash.readthedocs.io/en/latest/reference/speech_recognition.html
However, I am running into an issue when generating the prediction for an audio file, and I'm getting a null output:
I'm not sure what the issue is, as I've only replaced the Timit dataset with my own input data for fine-tuning, and the rest of the script follows exactly from the doc above. All of the input data are wav files with the following format:
I'm new to PyTorch Lightning and training with wav2vec as a whole, so I'm guessing that I'm missing something obvious. Any help would be greatly appreciated!
Here is the full script I'm running:
And here is a sample of the train.csv file with the annotations:
Beta Was this translation helpful? Give feedback.
All reactions