# 📎 Documentation

* `input_format`: The source of the audio/video file to be transcribed
  * `youtube`: A YouTube video
    * The transcribed file(s) are saved to this Colab, and will be deleted when the Colab runtime is disconnected.
  * `gdrive`: A file in your Google Drive account
    * If you select this option, you will need to allow this notebook to connect to your Google Drive account.
    * The transcribed file(s) are saved to the same folder as the original file.
  * `local`: A local file that you have uploaded to this Colab
    * If you select this option, you will need to first upload the file to the Files tab (see Step 1 [here](https://wandb.ai/wandb_fc/gentle-intros/reports/How-to-transcribe-your-audio-to-text-for-free-with-SRTs-VTTs---VmlldzozMzc1MzU3)).
    * The transcribed file(s) are saved to this Colab, and will be deleted when the Colab runtime is disconnected.
* `file`: The URL of the YouTube video or the path of the audio file to be transcribed.
  * Example: `file = "https://www.youtube.com/watch?v=AUDIO"` (transcribing a YouTube video)
  * Example: `file = "/content/drive/My Drive/AUDIO.mp3"` (transcribing a Google Drive file)
  * Example: `file = "/content/AUDIO.mp3"` (transcribing a local file)
* `plain`: Whether to save the transcription as a text file or not.
* `srt`: Whether to save the transcription as an SRT file or not.
* `vtt`: Whether to save the transcription as a VTT file or not.
* `tsv`: Whether to save the transcription as a TSV (tab-separated values) file or not.
* `download`: Whether to download the transcribed file(s) or not.


In [43]:
# @title 🌴 Change the values in this section

# @markdown Select the source of the audio/video file to be transcribed
input_format = "local" #@param ["youtube", "gdrive", "local"]

# @markdown Enter the URL of the YouTube video or the path of the audio file to be transcribed
file = "/content/1001_DFA_ANG_XX.wav" #@param {type:"string"}

#@markdown Click here if you'd like to save the transcription as text file
plain = True #@param {type:"boolean"}

# @markdown Click here if you'd like to save the transcription as an SRT file
srt = False #@param {type:"boolean"}

#@markdown Click here if you'd like to save the transcription as a VTT file
vtt = False #@param {type:"boolean"}

#@markdown Click here if you'd like to save the transcription as a TSV file
tsv = False #@param {type:"boolean"}

#@markdown Click here if you'd like to download the transcribed file(s) locally
download = True #@param {type:"boolean"}

# 🛠 Set Up

The blocks below install all of the necessary Python libraries (including Whisper), configures Whisper, and contains code for various helper functions.



## 🤝 Dependencies

In [44]:
# Dependencies

!pip install -q pytube
!pip install -q git+https://github.com/openai/whisper.git

import os, re
import torch
from pathlib import Path
from pytube import YouTube

import whisper
from whisper.utils import get_writer

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


## 👋 Whisper configuration

This Colab use `medium.en`, [the medium-sized, English-only](https://github.com/openai/whisper#available-models-and-languages) Whisper model.


In [45]:
# Use CUDA, if available
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"

# Load the desired model
model = whisper.load_model("medium.en").to(DEVICE)

## 💪 YouTube helper functions

Code for helper functions when running Whisper on a YouTube video.

In [46]:
def to_snake_case(name):
    return name.lower().replace(" ", "_").replace(":", "_").replace("__", "_")

def download_youtube_audio(url,  file_name = None, out_dir = "."):
    "Download the audio from a YouTube video"
    yt = YouTube(url)
    if file_name is None:
        file_name = Path(out_dir, to_snake_case(yt.title)).with_suffix(".mp4")
    yt = (yt.streams
            .filter(only_audio = True, file_extension = "mp4")
            .order_by("abr")
            .desc())
    return yt.first().download(filename = file_name)

# ✍ Transcribing with Whisper

Ultimately, calling Whisper is as easy as one line!
* `result = model.transcribe(file)`

The majority of this new `transcribe_file` function is actually just for exporting the results of the transcription as a text, VTT, or SRT file.

In [47]:
def transcribe_file(model, file, plain, srt, vtt, tsv, download):
    """
    Runs Whisper on an audio file

    Parameters
    ----------
    model: Whisper
        The Whisper model instance.

    file: str
        The file path of the file to be transcribed.

    plain: bool
        Whether to save the transcription as a text file or not.

    srt: bool
        Whether to save the transcription as an SRT file or not.

    vtt: bool
        Whether to save the transcription as a VTT file or not.

    tsv: bool
        Whether to save the transcription as a TSV file or not.

    download: bool
        Whether to download the transcribed file(s) or not.

    Returns
    -------
    A dictionary containing the resulting text ("text") and segment-level details ("segments"), and
    the spoken language ("language"), which is detected when `decode_options["language"]` is None.
    """
    file_path = Path(file)
    print(f"Transcribing file: {file_path}\n")

    output_directory = file_path.parent

    # Run Whisper
    result = model.transcribe(file, verbose = False, language = "en")

    if plain:
        txt_path = file_path.with_suffix(".txt")
        print(f"\nCreating text file")

        with open(txt_path, "w", encoding="utf-8") as txt:
            txt.write(result["text"])
    if srt:
        print(f"\nCreating SRT file")
        srt_writer = get_writer("srt", output_directory)
        srt_writer(result, str(file_path.stem))

    if vtt:
        print(f"\nCreating VTT file")
        vtt_writer = get_writer("vtt", output_directory)
        vtt_writer(result, str(file_path.stem))

    if tsv:
        print(f"\nCreating TSV file")

        tsv_writer = get_writer("tsv", output_directory)
        tsv_writer(result, str(file_path.stem))

    if download:
        from google.colab import files

        colab_files = Path("/content")
        stem = file_path.stem

        for colab_file in colab_files.glob(f"{stem}*"):
            if colab_file.suffix in [".txt", ".srt", ".vtt", ".tsv"]:
                print(f"Downloading {colab_file}")
                files.download(str(colab_file))

    return result

# 💬 Whisper it!

This block actually calls `transcribe_file` 😉


In [48]:
if input_format == "youtube":
    # Download the audio stream of the YouTube video
    print(f"Downloading audio stream: {audio}")
    audio = download_youtube_audio(file)

    # Run Whisper on the audio stream
    result = transcribe_file(model, audio, plain, srt, vtt, tsv, download)
elif input_format == "gdrive":
    # Authorize a connection between Google Drive and Google Colab
    from google.colab import drive
    drive.mount('/content/drive')

    # Run Whisper on the specified file
    result = transcribe_file(model, file, plain, srt, vtt, tsv, download)
elif input_format == "local":
    # Run Whisper on the specified file
    result = transcribe_file(model, file, plain, srt, vtt, tsv, download)

Transcribing file: /content/1001_DFA_ANG_XX.wav



100%|██████████| 227/227 [00:00<00:00, 390.95frames/s]


Creating text file
Downloading /content/1001_DFA_ANG_XX.txt





<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
input_format = "local"
file = r"/content/1001_DFA_ANG_XX.wav"
plain = "True"

In [None]:
# """
# To write this piece of code I took inspiration/code from a lot of places.
# It was late night, so I'm not sure how much I created or just copied o.O
# Here are some of the possible references:
# https://blog.addpipe.com/recording-audio-in-the-browser-using-pure-html5-and-minimal-javascript/
# https://stackoverflow.com/a/18650249
# https://hacks.mozilla.org/2014/06/easy-audio-capture-with-the-mediarecorder-api/
# https://air.ghost.io/recording-to-an-audio-file-using-html5-and-js/
# https://stackoverflow.com/a/49019356
# """
# from IPython.display import HTML, Audio
# from google.colab.output import eval_js
# from base64 import b64decode
# import numpy as np
# from scipy.io.wavfile import read as wav_read
# import io
# import ffmpeg

# AUDIO_HTML = """
# <script>
# var my_div = document.createElement("DIV");
# var my_p = document.createElement("P");
# var my_btn = document.createElement("BUTTON");
# var t = document.createTextNode("Press to start recording");
# my_btn.appendChild(t);
# //my_p.appendChild(my_btn);
# my_div.appendChild(my_btn);
# document.body.appendChild(my_div);
# var base64data = 0;
# var reader;
# var recorder, gumStream;
# var recordButton = my_btn;
# var handleSuccess = function(stream) {
#   gumStream = stream;
#   var options = {
#     //bitsPerSecond: 8000, //chrome seems to ignore, always 48k
#     mimeType : 'audio/webm;codecs=opus'
#     //mimeType : 'audio/webm;codecs=pcm'
#   };
#   //recorder = new MediaRecorder(stream, options);
#   recorder = new MediaRecorder(stream);
#   recorder.ondataavailable = function(e) {
#     var url = URL.createObjectURL(e.data);
#     var preview = document.createElement('audio');
#     preview.controls = true;
#     preview.src = url;
#     document.body.appendChild(preview);
#     reader = new FileReader();
#     reader.readAsDataURL(e.data);
#     reader.onloadend = function() {
#       base64data = reader.result;
#       //console.log("Inside FileReader:" + base64data);
#     }
#   };
#   recorder.start();
#   };
# recordButton.innerText = "Recording... press to stop";
# navigator.mediaDevices.getUserMedia({audio: true}).then(handleSuccess);
# function toggleRecording() {
#   if (recorder && recorder.state == "recording") {
#       recorder.stop();
#       gumStream.getAudioTracks()[0].stop();
#       recordButton.innerText = "Saving the recording... pls wait!"
#   }
# }
# // https://stackoverflow.com/a/951057
# function sleep(ms) {
#   return new Promise(resolve => setTimeout(resolve, ms));
# }
# var data = new Promise(resolve=>{
# //recordButton.addEventListener("click", toggleRecording);
# recordButton.onclick = ()=>{
# toggleRecording()
# sleep(2000).then(() => {
#   // wait 2000ms for the data to be available...
#   // ideally this should use something like await...
#   //console.log("Inside data:" + base64data)
#   resolve(base64data.toString())
# });
# }
# });

# </script>
# """

# def get_audio():
#   display(HTML(AUDIO_HTML))
#   data = eval_js("data")
#   binary = b64decode(data.split(',')[1])

#   process = (ffmpeg
#     .input('pipe:0')
#     .output('pipe:1', format='wav')
#     .run_async(pipe_stdin=True, pipe_stdout=True, pipe_stderr=True, quiet=True, overwrite_output=True)
#   )
#   output, err = process.communicate(input=binary)

#   riff_chunk_size = len(output) - 8
#   # Break up the chunk size into four bytes, held in b.
#   q = riff_chunk_size
#   b = []
#   for i in range(4):
#       q, r = divmod(q, 256)
#       b.append(r)

#   # Replace bytes 4:8 in proc.stdout with the actual size of the RIFF chunk.
#   riff = output[:4] + bytes(b) + output[8:]

#   sr, audio = wav_read(io.BytesIO(riff))

#   return audio, sr

In [None]:
# audio, sample_rate = get_audio()
# # Save the recorded audio as a WAV file
# with open("recorded_audio.wav", "wb") as f:
#     f.write(audio)

# # You can also play the recorded audio
# Audio(data=audio, rate=sample_rate)

In [49]:
!pip install transformers

Collecting transformers
  Downloading transformers-4.34.1-py3-none-any.whl (7.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.7/7.7 MB[0m [31m68.6 MB/s[0m eta [36m0:00:00[0m
Collecting huggingface-hub<1.0,>=0.16.4 (from transformers)
  Downloading huggingface_hub-0.18.0-py3-none-any.whl (301 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.0/302.0 kB[0m [31m34.7 MB/s[0m eta [36m0:00:00[0m
Collecting tokenizers<0.15,>=0.14 (from transformers)
  Downloading tokenizers-0.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.8/3.8 MB[0m [31m101.1 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting safetensors>=0.3.1 (from transformers)
  Downloading safetensors-0.4.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m69.0 MB/s[0m eta [36m0:00:00[0m
Co

In [56]:
from transformers import pipeline

# Set up the inference pipeline using a model from the 🤗 Hub
sentiment_analysis = pipeline(model="finiteautomata/bertweet-base-sentiment-analysis")

# Split the text into sentences using periods as delimiters
sentences = text.split(".")

# Initialize lists to store sentiment and confidence for each sentence
sentence_sentiments = []
sentence_confidences = []

# Predict the sentiment for each sentence
for sentence in sentences:
    # Predict the sentiment of the sentence
    result = sentiment_analysis(sentence)

    # Access the sentiment prediction
    sentiment = result[0]["label"]
    confidence = result[0]["score"]
    result

    # Append the sentiment and confidence to the lists
    sentence_sentiments.append(sentiment)
    sentence_confidences.append(confidence)

# Print the sentiment and confidence for each sentence
for i, sentence in enumerate(sentences):
    print(f"Sentence {i + 1}:")
    print(f"Text: {sentence}")
    print(f"Sentiment: {sentence_sentiments[i]}")
    print(f"Confidence: {sentence_confidences[i]}")
    print("\n")

""

emoji is not installed, thus not converting emoticons or emojis into text. Install emoji: pip3 install emoji==0.6.0


Sentence 1:
Text: I dont want to go the station but i am forced to do
Sentiment: NEG
Confidence: 0.7604385018348694


Sentence 2:
Text:  I loved icecream near the station
Sentiment: POS
Confidence: 0.9919430017471313




''

In [50]:
with open("/content/1001_DFA_ANG_XX.txt", "r") as file:
    text = file.read()

In [52]:
text = "I dont want to go the station but i am forced to do. I loved icecream near the station"