# Automatic Speech Recognition Project - Part 2
In this project, I utilized a Dholuo dataset from [Mozilla Open Voice](https://commonvoice.mozilla.org/en/datasets), which included audio recordings in `.mp3` format and corresponding transcription data in `.tsv` files. Since the Whisper model, which is central to my Automatic Speech Recognition (ASR) system, requires audio inputs in the `.wav` format, the initial focus of my work was on preprocessing. This involved converting the `.mp3` audio files into `.wav` format and storing the processed files for subsequent model training and evaluation. This step ensured compatibility with the Whisper model and streamlined the workflow for further development.

## Step 1: Load Necessary Libraries

In [None]:
!apt-get install ffmpeg
!pip install pydub


Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
ffmpeg is already the newest version (7:4.4.2-0ubuntu0.22.04.1).
0 upgraded, 0 newly installed, 0 to remove and 49 not upgraded.
Collecting pydub
  Downloading pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)
Downloading pydub-0.25.1-py2.py3-none-any.whl (32 kB)
Installing collected packages: pydub
Successfully installed pydub-0.25.1


## Step 2: Convert .mp3 files to .wav

In [None]:
# Mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
import os
from pydub import AudioSegment

def convert_to_wav(input_dir, output_dir):
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)

    for file_name in os.listdir(input_dir):
        input_path = os.path.join(input_dir, file_name)

        # Skip non-audio files
        if not file_name.lower().endswith(('.mp3', '.aac', '.flac', '.ogg', '.m4a')):
            print(f"Skipping non-audio file: {file_name}")
            continue

        output_file_name = os.path.splitext(file_name)[0] + ".wav"
        output_path = os.path.join(output_dir, output_file_name)

        try:
            # Load the audio file
            audio = AudioSegment.from_file(input_path)

            # Export as WAV
            audio.export(output_path, format="wav")
            print(f"Converted {file_name} to {output_file_name}")
        except Exception as e:
            print(f"Error converting {file_name}: {e}")

# Paths to your input and output directories
input_directory = "/content/drive/My Drive/clips"
output_directory = "/content/drive/My Drive/clips_wav"

convert_to_wav(input_directory, output_directory)


[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Converted common_voice_luo_40517175.mp3 to common_voice_luo_40517175.wav
Converted common_voice_luo_40502962.mp3 to common_voice_luo_40502962.wav
Converted common_voice_luo_40519270.mp3 to common_voice_luo_40519270.wav
Converted common_voice_luo_40502851.mp3 to common_voice_luo_40502851.wav
Converted common_voice_luo_40518929.mp3 to common_voice_luo_40518929.wav
Converted common_voice_luo_40507717.mp3 to common_voice_luo_40507717.wav
Converted common_voice_luo_40519302.mp3 to common_voice_luo_40519302.wav
Converted common_voice_luo_40503053.mp3 to common_voice_luo_40503053.wav
Converted common_voice_luo_40515750.mp3 to common_voice_luo_40515750.wav
Converted common_voice_luo_40503153.mp3 to common_voice_luo_40503153.wav
Converted common_voice_luo_40505076.mp3 to common_voice_luo_40505076.wav
Converted common_voice_luo_40503149.mp3 to common_voice_luo_40503149.wav
Converted common_voice_luo_40519348.mp3 to common_voice_luo