This notebook is intended to show how to install and use the tool Whisper developed by OpenAI. For this notebook i use Python 3.9.9 and Windows OS.

##Open AI - Whisper
As defined by the [official documentation of Whisper](https://github.com/openai/whisper) "Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification."

### Required Libraries before Whisper Instalation
You will need two complementary libraries [ffmpeg](https://ffmpeg.org/download.html) and [pythorch](https://pytorch.org/)
To install Pythorch please be sure to checkout the official webpage to identify the install that best suit you, my case im using:<br>
`pip install torch torchvision torchaudio` <br>

To install ffmpeg, since i had some issues installing it with Github cloning method, i downloaded manually, here´s a [webpage](https://phoenixnap.com/kb/ffmpeg-windows) that explain this process step by step 
*Plase be sure to add ffmpeg to the environment variables in Windows.

The OpenAI official documentation also recommends to install [HuggingFace] (https://huggingface.co/docs/transformers/installation) 

### 1.2 Whisper Installation
You will need to install:<br>
`pip install -U openai-whisper`<br>
`pip install git+https://github.com/openai/whisper.git` <br>
`pip install setuptools-rust` <br>

### 1.3 Whisper model

Here´s the info for parameter you´ll need to choose for the model: <br>
<img src="Model.PNG" alt= “Modelparameters” width="500" height="150">

In [12]:
import whisper
model = whisper.load_model("medium")
result = model.transcribe("prueba2.mp3", task="transcribe", language="es")
print(result["text"])



 She lost almost six decades when the woman's cosy was open and shameless. That's why, along with her acting title, people always used to add the term sexual symbol. One of the films that made her famous was 1 Million Years Before Christ, in 1966. Her daughters were few, and as she herself recalled, her work was limited to running in a dress that barely covered her body and the middle of dinosaurs. That image would define her for decades. In 1966, she also starred in another adventure film, Fantastic Journey, which would later become a popular television series. Her beauty was undeniable, but she also wanted to be valued for her artistic talent.


Next we will do the same process by in a lower level way

In [16]:
import whisper

model = whisper.load_model("medium")

# load audio and pad/trim it to fit 30 seconds
audio = whisper.load_audio("prueba2.mp3")
audio = whisper.pad_or_trim(audio)

# make log-Mel spectrogram and move to the same device as the model
mel = whisper.log_mel_spectrogram(audio).to(model.device)

# detect the spoken language
_, probs = model.detect_language(mel)
print(f"Detected language: {max(probs, key=probs.get)}",probs)

# decode the audio
options = whisper.DecodingOptions(fp16 = False,language="en")# since FP16 is not supported on my CPU i had to set it to false
result = whisper.decode(model, mel, options)

# print the recognized text
print(result.text)

Detected language: es {'en': 0.06431033462285995, 'zh': 0.0006753414054401219, 'de': 0.0013966881670057774, 'es': 0.9017177224159241, 'ru': 0.0007992238970473409, 'ko': 0.0027548845391720533, 'fr': 0.0007725052419118583, 'ja': 0.0011740651680156589, 'pt': 0.011564787477254868, 'tr': 0.00025775248650461435, 'pl': 0.0005445817369036376, 'ca': 6.362856220221147e-05, 'nl': 0.0004048972041346133, 'ar': 0.0004656464443542063, 'sv': 8.011633326532319e-05, 'it': 0.003932875115424395, 'id': 0.00032589369220659137, 'hi': 5.311307904776186e-05, 'fi': 0.00017199188005179167, 'vi': 4.194464418105781e-05, 'he': 2.66999049927108e-05, 'uk': 3.3463707950431854e-05, 'el': 0.00035039594513364136, 'ms': 0.00011051067849621177, 'cs': 1.6082787624327466e-05, 'ro': 9.427577606402338e-05, 'da': 6.9617663029930554e-06, 'hu': 3.656509215943515e-05, 'ta': 5.239813617663458e-06, 'no': 6.100636710471008e-06, 'th': 6.347486487356946e-05, 'ur': 4.1984905692515895e-05, 'hr': 3.4916938602691516e-06, 'bg': 2.3265417894

Now there´s a pretty uselful library if you want to transcribe audio from Youtube: Pytube, you can install it with: <br>
`pip install pytube`<br>
let´s see how it can be done

In [20]:
import pytube
test = pytube.YouTube('https://www.youtube.com/watch?v=7MI3buZedOw')
audio = test.streams.get_audio_only()
audio.download(filename='testyoutube.mp4')
result = model.transcribe('testyoutube.mp4')
with open("transcriptionyotube.txt", "w", encoding="utf-8") as txt:
    txt.write(result["text"])



 I Ain't that cool They'll be hanging me quick when I'm back from the dead Get the rope, get the rope, get the rope, get the rope I'm a punk rock hit, I came from hell with the curse She tried to play it away, so I fucked her in church Don't you know, don't you know, don't you know Yeah, don't you know They say, you're not safe here if I stay With a knife that's sharp, no way No, I need the broken hearted Oh no, look at the mess we started Oh no, I need the broken hearts this way What you say, you wanna play Yeah, don't be fooled, I'm only letting you down They pursued me on foot, but I hid in a crowd Like a ghost, like a ghost, like a ghost, like a ghost The seats of my car filled with cigarette burns I gotta burn my eye, a little blood on my shirt Let's hit the road, hit the road, hit the road, hit the road I know, there's a special place in hell That my friends and I know well There's a perfect place to go When it's time to lose control No, I need the broken hearted Oh no, look at t

In [130]:
import subprocess
import os
import sys


def convert_web_mp4_subprocess(input_file, output_file):# python codes taken from youtube video: https://www.youtube.com/watch?v=W7SIRsIAYao
    try:
        command = 'ffmpeg -i ' + input_file +" "+ output_file
        subprocess.run(command)
        pass
    except:
        print('error')


In [131]:
convert_web_mp4_subprocess(r"C:\Users\ccuartas\Downloads\Notebooks\OpenAI_Whisper\output.webm",
                           r"C:\Users\ccuartas\Downloads\Notebooks\OpenAI_Whisper\test_output.mp4")


In [5]:
import os
import subprocess
import os
import sys
def convert_video_to_audio_ffmpeg(video_file, output_ext="mp3"): # python codes taken from: https://www.thepythoncode.com/article/extract-audio-from-video-in-python
    """Converts video to audio directly using `ffmpeg` command
    with the help of subprocess module"""
    filename, ext = os.path.splitext(video_file)
    subprocess.call(["ffmpeg", "-y", "-i", video_file, f"{filename}.{output_ext}"],
                    stdout=subprocess.DEVNULL,
                    stderr=subprocess.STDOUT)
if __name__ == "__main__":
    vf = sys.argv[1]
    convert_video_to_audio_ffmpeg(vf)
    


In [7]:
convert_video_to_audio_ffmpeg("testmp4.mp4")