# OpenAI's Whisper
Whisper is a general-purpose speech recognition model. 
A tool to transcribe multi-lingual video and audio, and also to translate it into English.

Read [the documentation](https://github.com/openai/whisper).

### Step 1 - install Whisper and required libraries
Whisper is a python library and can be installed with `pip`, it requires [ffmpeg](https://ffmpeg.org/) another tool that is used to manipulate media files, in this case its used to extract audio from videos before feeding them into Whisper, it's installed in the operating system `apt` is for Linux/Mac based systems.

In [None]:
!pip install openai-whisper

In [None]:
!apt install ffmpeg

In [None]:
# call the help method to validate installation and see the available commands and options
!whisper --help

### Step 2 - make your video/audio available to the notebook
As an example, we're downloading a copy of a ~3min video available on youtube at https://www.youtube.com/watch?v=c2ncxbXpHjs . 

In [None]:
!wget https://bellingcat-embeds.ams3.cdn.digitaloceanspaces.com/open-source-research-notebooks/hannah-arendt.webm

### Step 3 - transcribe the video
The video is of an interview in German, so when we call Whisper it will perform its default `--task` which is transcribe. 

We need to specify what model should Whisper use, please see [available models](https://github.com/openai/whisper#available-models-and-languages). These differ in size and resources need, at the time of writing they are `tiny`, `base`, `small`, `medium`, and `large`. `tiny` is the smallest, requires 1GB of ram and is 32 times faster than `large` which requires 10GB of ram. The results of `tiny` may not be good enough for your use cases, though. You can still find the sweet spot for your type of audio by testing smaller models first. For this video, the `small` model works well enough. `large` may literally be too large for some free Notebook environments.

The following command will transcribe using the small `--model`.

The first time you call Whisper with a new model, it will be downloaded to your environment so it will always take longer on the first attempt.

The duration of the transcription/translation depends on the model, on your system resources, and on the duration of the audio. 

In [None]:
!whisper hannah-arendt.webm --model small

### Step 4 - translate the video

If the original audio is not English and you want it to be translated, you can specify the `--task` as `translate`.

If you don't instruct the tool to write to a specific format, it will create a result file for all available formats which include `.txt`, `.srt` (subtitle), `.json` and a few others. You can ask for a single one, such as text, with `--output_format txt` (see the help and documentation for more info). 

In [None]:
!whisper hannah-arendt.webm --model small --task translate --output_format txt

You can download the resulting files and use/analyse them.

Let's see the contents of `hannah-arendt.txt` with `cat`:

In [None]:
!cat hannah-arendt.txt