## Converting MP3 to WAV

To facilitate the use of the Automatic Speech Recognition (ASR) models, such as Whisper and FineTune, it is essential to convert the audio files in the MP3 format to the WAV format. This conversion is necessary due to the restrictions of audio file formats supported on Kaggle and it is used  streamline the following notebook: 


In [1]:
import os
import skimage.io
from tqdm.notebook import tqdm
import zipfile
import shutil
from pydub import AudioSegment
from joblib import Parallel, delayed

In [2]:
!mkdir -p /tmp/CV15_ASR_dataset

In [3]:
!ls /tmp

CV15_ASR_dataset    npm-10103-119ee48d	npm-10255-9c25895f
clean-layer.sh	    npm-10114-3357dbb4	npm-10298-bcff1aeb
conda		    npm-10157-7ab28c3e	npm-10341-d9b08525
core-js-banners     npm-10168-cda3f34b	v8-compile-cache-0
hsperfdata_root     npm-10179-10669343	yarn--1587582922338-0.21814450721061052
kaggle.log	    npm-10190-c9f9676b	yarn--1587582922338-0.509163553407652
npm-10049-ecd2e497  npm-10233-2e7f8d94	yarn--1587582923660-0.03690515600338906
npm-10092-825f72aa  npm-10244-9748b634	yarn--1587582934443-0.8113961991379441


In [4]:
data = '''{
  "title": "ASR_CV15_Hindi_wav_16000",
  "id": "SakshiRathi77/ASR_CV15_Hindi_wav_16000",
  "licenses": [
    {
      "name": "CC0-1.0"
    }
  ]
}
'''
text_file = open("/tmp/CV15_ASR_dataset/dataset-metadata.json", 'w+')
n = text_file.write(data)
text_file.close()

In [5]:
# Providing root path and output path
ROOT_PATH = "/kaggle/input/cv15-hindi/hi/hi/clips"
OUTPUT_DIR = "/tmp/CV15_ASR_dataset/audio_wav_16000"

In [6]:
os.mkdir(OUTPUT_DIR)

## Converting and Downsampling
The save_fn function is responsible for converting MP3 audio files to WAV format, setting the frame rate to 16000 Hz, and saving the converted WAV files to the specified output directory.

In [7]:
def save_fn(filename):
    
    path = f"{ROOT_PATH}/{filename}"
    save_path = f"{OUTPUT_DIR}"
    if not os.path.exists(save_path):
        os.makedirs(save_path, exist_ok=True)
    
    if os.path.exists(path):
        try:
            sound = AudioSegment.from_mp3(path)
            sound = sound.set_frame_rate(16000)
            sound.export(f"{save_path}/{filename[:-4]}.wav", format="wav")
        except:
            print(path)

In [8]:
path = "/kaggle/input/cv15-hindi/hi/hi/clips/"
audio_files = os.listdir(path)

In [None]:
%%capture
Parallel(n_jobs=8, backend="multiprocessing")(
    delayed(save_fn)(filename) for filename in tqdm(audio_files)
)

In [None]:
%%capture
!zip -r "./audio_wav_16000.zip" "/tmp/CV15_ASR_dataset/audio_wav_16000/"