# Learn OpenAI Whisper - Chapter 8
## Notebook 2: Process audio files to a LJ format with Whisper and OZEN

This notebook complements the book [Learn OpenAI Whisper](https://a.co/d/1p5k4Tg).

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1wnomL0dxmU9CgPKIgazR8AocolEYjAe5)

This notebook is based on the [OZEN Toolkit](https://github.com/devilismyfriend/ozen-toolkit) project. Given a folder of files or a single audio file, it will extract the speech, transcribe using Whisper and save in the LJ format (segmented audio files in WAV format in `wavs` folder, transcriptions in folders `train` and `valid`).

**NOTE**: The notebook stores the files using the following format.

`dataset/`
* ---├── `valid.txt`
* ---├── `train.txt`
* ---├── `wavs/`

`wavs/` directory must contain `.wav` files.

Example for `train.txt` and `valid.txt`:

* `wavs/A.wav|Write the transcribed audio here.`


In [None]:
!git clone https://github.com/devilismyfriend/ozen-toolkit

In [None]:
!pip install transformers
!pip install huggingface
!pip install pydub
!pip install yt-dlp
!pip install pyannote.audio

# RESTART

In [None]:
!pip install colorama
!pip install termcolor
!pip install pyfiglet
%cd ozen-toolkit

In [None]:
# Download sample file
!wget -nv https://github.com/PacktPublishing/Learn-OpenAI-Whisper/raw/main/Chapter01/Learn_OAI_Whisper_Sample_Audio01.mp3

In [None]:
# CUSTOM_VOICE_NAME = "custom"

import os
from google.colab import files

custom_voice_folder = "./"

os.makedirs(custom_voice_folder, exist_ok=True)  # Create the directory if it doesn't exist

for filename, file_data in files.upload().items():
    with open(os.path.join(custom_voice_folder, filename), 'wb') as f:
        f.write(file_data)

In [None]:
import configparser

# Create a new ConfigParser object
config = configparser.ConfigParser()

# Add the 'DEFAULT' section and set the options
config['DEFAULT'] = {
    'hf_token': '<Your HF API key>',
    'whisper_model': 'openai/whisper-medium',
    'device': 'cuda',
    'diaization_model': 'pyannote/speaker-diarization',
    'segmentation_model': 'pyannote/segmentation',
    'valid_ratio': '0.2',
    'seg_onset': '0.7',
    'seg_offset': '0.55',
    'seg_min_duration': '2.0',
    'seg_min_duration_off': '0.0'
}

# Write the configuration to a file
with open('config.ini', 'w') as configfile:
    config.write(configfile)

# Print the contents of the file
with open('config.ini', 'r') as configfile:
    print(configfile.read())

In [None]:
!python ozen.py Learn_OAI_Whisper_Sample_Audio01.mp3

2024-03-16 16:53:42.646606: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-03-16 16:53:42.646655: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-03-16 16:53:42.648084: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
[1m[40m[33m  ______    ________   _______ .__   __. 
 /  __  \  |       /  |   ____||  \ |  | 
|  |  |  | `---/  /   |  |__   |   \|  | 
|  |  |  |    /  /    |   __|  |  . `  | 
|  `--'  |   /  /----.|  |____ |  |\   | 
 \______/   /________||_______||__| \__| 
                                         
[0m
[32mConverting to WAV...[39m
[32mLoading Segment 

#### Mount Google Drive (To save trained checkpoints and to load the dataset from)

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


In [None]:
%cp -r /content/ozen-toolkit/output/ /content/gdrive/MyDrive/