<a href="https://colab.research.google.com/github/Nukaraju2003/WhisperGoogleDrive/blob/main/WhisperGoogleDrive.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

If you're looking at this on GitHub and new to Python Notebooks or Colab, click the Google Colab badge above 👆

#OpenAI Whisper and Google Drive integration for transcribing audio files

📺 Getting started video: https://youtu.be/yVLhG4-7Sj4

###This notebook will transcribe all the audio files in a Google Drive folder

*Note: This requires giving the application permission to connect to your drive. Only you will have access to the contents of your drive, but please read the warnings carefully.*

This notebook application:
1. Connects to your Google Drive when you give it permission.
2. Creates a WhisperAudio folder and two subfolders (ProcessedAudio and TextFiles.)
3. When you run the application it will search for all the audio files (.mp3 and .m4a) in your WhisperAudio folder, transcribe them and then move the file to /ProcessedAudio and save the transcript to /TextFiles.

###**For faster performance set your runtime to "GPU"**
*Click on "Runtime" in the menu and click "Change runtime type". Select "GPU".*


**Note: If you add a new file after running this application you'll need to remount the drive in step 1 to make them searchable**

##1. Install the required code libraries

In [1]:
!pip install git+https://github.com/openai/whisper.git 
!sudo apt update && sudo apt install ffmpeg
!pip install librosa

import whisper
import time
import librosa
import soundfile as sf
import re
import os

# model = whisper.load_model("tiny.en")
# model = whisper.load_model("base.en")   
model = whisper.load_model("small.en") # load the small model
# model = whisper.load_model("medium.en")
# model = whisper.load_model("large")

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting git+https://github.com/openai/whisper.git
  Cloning https://github.com/openai/whisper.git to /tmp/pip-req-build-pcz24g05
  Running command git clone --filter=blob:none --quiet https://github.com/openai/whisper.git /tmp/pip-req-build-pcz24g05
  Resolved https://github.com/openai/whisper.git to commit 6dea21fd7f7253bfe450f1e2512a0fe47ee2d258
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting ffmpeg-python==0.2.0
  Downloading ffmpeg_python-0.2.0-py3-none-any.whl (25 kB)
Collecting triton==2.0.0
  Downloading triton-2.0.0-1-cp39-cp39-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (63.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m63.3/63.3 MB[0m [31m13.1 MB/s[0m eta [36m0:00:00[0m
Collecting tiktoken==0.3.1
  Downloading tik

100%|███████████████████████████████████████| 461M/461M [00:17<00:00, 26.9MiB/s]


##2. Allow access to your Google Drive and add new folders

In [2]:
# Connect Google Drive 
from google.colab import drive
drive.mount("/content/drive", force_remount=True) # This will prompt for authorization.

# This will create the WhisperAudio files if they don't exist.
folders =  ["WhisperAudio/", "WhisperAudio/ProcessedAudio/", "WhisperAudio/TextFiles/"]
for folder in folders:
  path = "/content/drive/MyDrive/" + folder
  if not os.path.exists(path): # Create the folder if it does not exist
    os.mkdir(path)

Mounted at /content/drive


##3. Upload any audio files you want transcribed in the "WhisperAudio" folder in your Google Drive.

##4. Let the application search for new files and transcribe the audio files and save them to your Google Drive

In [3]:
# Load all the audio file paths in a Google Drive folder
from google.colab import drive
drive.mount("/content/drive", force_remount=True) # This will prompt for authorization.

# Assuming the audio files are in a folder called "WhisperAudio" in the root of the drive
audio_folder = "/content/drive/MyDrive/WhisperAudio/"

# Get a list of all the file paths and names in the folder
import os
audio_files = []
audio_names = []
for file in os.listdir(audio_folder):
  if file.endswith(".m4a") or file.endswith(".mp3"):
    audio_files.append(audio_folder + file)
    audio_names.append(file)

for f in audio_files:    
  print(f)

if len(audio_files) == 0:
  print("You have no files.")

# Loop through the audio files, split each audio file based on pauses in speech then transcribe them with Whisper.
for i, file in enumerate(audio_files): # For each audio file
  print(f"Processing {audio_names[i]}...")
  # Load the audio file and convert it to 16 kHz mono
  audio, sr = librosa.load(file, sr=16000, mono=True)
  # Detect pauses and split the audio. We use a threshold of -30 dB and a minimum pause length of 0.5 seconds.
  pauses = librosa.effects.split(audio, top_db=30, frame_length=2048, hop_length=128)
  # Transcribe each segment and concatenate the results
  transcription = ""
  for start, end in pauses: # For each segment
    segment = audio[start:end]
    # Save the segment as a temporary wav file
    temp_file = "temp.wav"
    sf.write(temp_file, segment, sr, subtype='PCM_16')
    if os.path.getsize(temp_file) > 10000:
      #continue
      # Transcribe the segment with Whisper
      result = model.transcribe(temp_file)
      text = result["text"]
      # Append the text to the transcription
      print(len(transcription.split(" ")), "words processed")
      transcription += text.strip() + " "
      # Delete the temporary file
      os.remove(temp_file)
  # Print the transcription
  print(f"Transcription of {audio_names[i]}:\n")
  print(transcription)
  print("\n")
 
  # Convert the spaces between sections into paragraph breaks and save the transcription as a txt document in the same folder as MyAudio.
  transcription = re.sub(r"\s\s+", "\n\n", transcription) # Replace multiple spaces with newlines
  text_file = audio_folder + "/TextFiles/" + audio_names[i][:-4] + ".txt" # Create the text file name
  with open(text_file, "w") as f: # Write the transcription to the text file
    f.write(transcription)
  print(f"Saved transcription as {text_file}")

# Move the audio files to "/content/drive/MyDrive/WhisperAudio/Processed"
import shutil
processed_folder = "/content/drive/MyDrive/WhisperAudio/ProcessedAudio/"
if not os.path.exists(processed_folder): # Create the folder if it does not exist
  os.mkdir(processed_folder)
for file in audio_files: # Move each audio file to the processed folder
  shutil.move(file, processed_folder + os.path.basename(file))
  print(f"Moved {file} to {processed_folder}")

Mounted at /content/drive
/content/drive/MyDrive/WhisperAudio/saved.mp3
Processing saved.mp3...
1 words processed
Transcription of saved.mp3:

Hi, my name is Julie Saha. I was born and raised in MPJ I am an outbound customer care and semi-technical processes. In my leisure time, I like to spend most of my time reading books, listening to music, traveling and of course spending time with my kids. Talking about my strength, I would say I am self-motivated. I do not need someone else to push me to achieve my goals and dreams. And about weakness, I would say I look for perfection. I completely understand it is humanly not always possible to be perfect. Hence, I am working on my weakness to make sure that I figure out a way to get things done in a better way even if it is not perfect. And yes, so that's mostly about me. Thank you so much for giving me this opportunity to talk about myself and introduce myself to you all. Hope to hear from you soon. Thank you. You have a good day. Bye-bye. 
