OZEN is a small tool to help you process audio files to a LJ format.
Given a folder of files or a single audio file, it will extract the speech, transcribe using Whisper and save in the LJ format (wavs in wavs folder, train and valid txts).
Accept the license terms on https://huggingface.co/pyannote/segmentation
Install Anaconda or setup your own environment and install requirements
git clone https://github.com/devilismyfriend/ozen-toolkit
run Set Up Ozen.bat
Drag a folder or a file on the Drag_Here.bat to process it.
The first time you'll be prompted to provide an HuggingFace token, once you do a config file will be created where you can specifiy models to use, the validation/training data desired split and more.
Alternatively you can use ozen.py in cli.