| Run this code via free cloud platforms: | [![Binder](https://mybinder.org/badge.svg)](https://mybinder.org/v2/gh/Hrabalikova/toolbox/master?filepath=Transcipt_Audio.ipynb) | [![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://github.com/Hrabalikova/toolbox/blob/master/Transcipt_Audio.ipynb) | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Hrabalikova/toolbox/blob/master/Transcipt_Audio.ipynb) |
|---|---|---|---|

# Install and import necessary packages and libraries

Install the required packages for speech recognition, audio processing, and video editing:

- `SpeechRecognition`: Used for transcribing speech from audio.
- `pydub`: Provides tools for audio manipulation and format conversion.
- `ffmpeg`: A command-line tool for audio and video processing. We use it for audio conversion from `.mp3` into `.vaw` chunks
- `moviepy`: A library for video editing and manipulation. We used it in the case our recording is as `.mp4`


In [None]:
# for Colab

#!pip install SpeechRecognition pydub
#!apt-get install ffmpeg -y
#!pip install moviepy

Import libraries

In [1]:
from moviepy.editor import VideoFileClip
import speech_recognition as sr
from pydub import AudioSegment

import os
import time
#import random

# 1. Define where your files are locate

* mount google drive
* locate the path to your folder with files
* convert `.mp4` to `.mp3` if you need, otherwise just set up the path to your `.mp3` file

Mount Google Drive to access files stored there, if you run this NB on Colab. Skip if you are not working on collab

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


Locate the path of your audio file. You can navigate in Google Drive (using drive/My Drive/...) to set the file path accordingly.
The best approach is, to right click on the folder where recording is saved and paste it here.

In [3]:
# Define the path to your folder in Google Drive
folder_path = "/content/drive/MyDrive"  # Replace with your actual folder path

Before moving to the Section 3:
* In the case you have video recording as `.mp4`, we need to convert into `.mp3`. Go to step a) and run all cells under.
* If your file is already in `.mp3` go to step b) and run the cell

## a) Convert recording from `.mp4` to `.mp3`


In the case that you have camptured screen or just have video recording you have two options:
1. use any free web convertor, for example [cloudconvert](https://cloudconvert.com/mp4-to-mp3). The quality is quite good and is for free, or
2. run the two cells bellow, but keep in mind that you will not get the best quality because I do not apply any noise reduction and so on....

In [None]:
# First let's set up the path for the files, here you just replace the name of input and output files
video_file_path = os.path.join(folder_path, "test.mp4")
audio_file_path = os.path.join(folder_path, "output.mp3")


**Understanding MP3 Encoding**

MP3 is a lossy compression format, meaning some audio data is discarded during the encoding process to reduce file size. The degree of compression and the resulting quality loss are influenced by various parameters, primarily the **bitrate**.

**Bitrate:** Measured in kilobits per second (kbps), the bitrate determines the amount of data used to represent the audio per unit of time. Higher bitrates generally result in better audio quality but larger file sizes.

**Common Bitrates:**

- 128 kbps: Standard quality, suitable for casual listening.
- 192 kbps: Good quality, offering a noticeable improvement over 128 kbps.
- 320 kbps: Near-CD quality, often considered the highest practical bitrate for MP3.

**Therefore, we specify in the `.write_audiofile()`:** <br>
* `codec='libmp3lame'` : This specifies the MP3 encoder to use. 'libmp3lame' is a widely used and high-quality encoder.
* `bitrate='320k'` : This sets the target bitrate to 320 kbps, aiming for near-CD quality.

In [None]:
# Convert .mp4 video to .mp3
video = VideoFileClip(video_file_path)
audio = video.audio
audio.write_audiofile(audio_file_path, codec='libmp3lame', bitrate='192k')  #modify bitrate according to required quality

MoviePy - Writing audio in /content/drive/MyDrive/output.mp3


                                                                   

MoviePy - Done.




## b) Define the path `.mp3` file

This is the step in the case your file is already as `.mp3` and you do not have to convert it from video. You just simply define the path here. It is the same principle as above ⬆️🆙

In [4]:
# Set the path to the audio file in Google Drive
audio_file_path = os.path.join(folder_path, "Voice_v2.mp3")

Now we are done with modifications and just let the code run.

# 2. Define functions for time stamp and transcibe chunks

In [5]:
# Function to convert milliseconds to HH:MM:SS format
def format_timestamp(ms):
    seconds = ms // 1000
    minutes = seconds // 60
    hours = minutes // 60
    return f"{hours:02}:{minutes % 60:02}:{seconds % 60:02}"

This function `transcribe_chunk` transcribes an audio chunk using Google Speech Recognition. The language is currently set to Icelandic `('is-IS')`. To transcribe in a different language, modify the `language` parameter within the `r.recognize_google()` call to the desired language code.

Czech: 'cs-CZ'<br>
German: 'de-DE' <br>
Icelandic: 'is-IS' <br>
English:<br>
 * US English: 'en-US'
 * UK English: 'en-GB'
 * Australian English: 'en-AU'
 * Canadian English: 'en-CA'<br>



In [6]:
def transcribe_chunk(audio_chunk_path, max_retries=10): # Transcribes an audio chunk, retrying up to max_retries times.
    r = sr.Recognizer()
    for i in range(max_retries):
        try:
            with sr.AudioFile(audio_chunk_path) as source:
                audio = r.record(source)
            text = r.recognize_google(audio, language='is-IS') # here you define language of the recording
            return text  # Successful transcription
        except sr.RequestError as e:
            print(f"Could not request results (attempt {i + 1}): {e}")
            time.sleep(10)  # Increase wait time on connection errors
        except sr.UnknownValueError:
            print("Could not understand audio in this chunk.")
            return "[Unintelligible]"  # Placeholder for unintelligible sections
    return "[Error]"  # If all retries fail

# 3. Transcibe the audio

Let's start the fun part. This step might take some time depending on the length of the audio. It could take anywhere from a few minutes to 30 minutes or longer for very long audio files.

In [7]:
# Load the audio file and split into X-minute chunks
audio = AudioSegment.from_mp3(audio_file_path)
chunk_length_ms = 1 * 60 * 1000  # 1 minute in milliseconds, I choose 1 minute, but it can be up to 5 or go to 30s by writing 0.5 instead 1. More than 10 min chunk will definetly overhelm the API
chunks = [audio[i:i + chunk_length_ms] for i in range(0, len(audio), int(chunk_length_ms))]

In [8]:
# Transcribe each chunk and store results with timestamps
transcriptions = []
for i, chunk in enumerate(chunks):
    chunk_path = f"chunk_{i}_{int(time.time())}.wav"  # Unique name for each chunk file, vole
    chunk.export(chunk_path, format="wav")

    # Calculate the start time of the chunk and format it
    chunk_start_time = i * chunk_length_ms
    timestamp = format_timestamp(chunk_start_time)

    # Transcribe the chunk and handle errors
    transcription = transcribe_chunk(audio_chunk_path=chunk_path)

    # Append the timestamp to the transcription
    transcriptions.append(f"[{timestamp}] {transcription}")

    # Clean up the chunk file and add delay between requests
    os.remove(chunk_path)
    time.sleep(10)  # Pause to avoid overwhelming the speech recognition API

# Combine and output the full transcription
full_transcription = "\n".join(transcriptions)
print(full_transcription)  # Optionally save to a txt file in the next cell

Could not understand audio in this chunk.
Could not understand audio in this chunk.
Could not understand audio in this chunk.
Could not understand audio in this chunk.
Could not understand audio in this chunk.
Could not understand audio in this chunk.
Could not understand audio in this chunk.
Could not understand audio in this chunk.
Could not understand audio in this chunk.
Could not understand audio in this chunk.
Could not understand audio in this chunk.
[00:00:00] betri en lítur út núna aðeins frá yfirsýn yfir þar sannarlega skoða tíma svo ókei hérna færð þegar að gera hérna
[00:01:00] fyrsti einhverja ákveðnu ertu að reyna að opna fá þetta stærra svo að hafa lesa bara þannig já upp hvenær hvaða verkefni hefur grunnan og ég er ennþá að reyna yfirsýn yfir ofsa mikið og og við erum nú nokkur með hátíð á suðurnesja svona akkeri fyrir að við ætlum að fara hlaupastöðum einasta svona sjáum soldið efna þú veist líka mikilvægast og
[00:02:00] hvað þarf að gerast á hvaða tímapunkti og mig l

In [9]:
# Save to a file
transcription_file_path = os.path.join(folder_path, "transcription_firstStaffMetting.txt") #give a name to the txt file
with open(transcription_file_path, 'w') as f:
  f.write(full_transcription)
print("Transcription saved to transcription.txt")

Transcription saved to transcription.txt
