# Whiper Package Installation

## Installation and force-update to the latest version via pip

In [2]:
# Installing all whisperAI requirements in colab env

!pip install git+https://github.com/openai/whisper.git
!pip install setuptools-rust
!pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git


Collecting git+https://github.com/openai/whisper.git
  Cloning https://github.com/openai/whisper.git to /tmp/pip-req-build-qx9226nx
  Running command git clone --filter=blob:none --quiet https://github.com/openai/whisper.git /tmp/pip-req-build-qx9226nx
  Resolved https://github.com/openai/whisper.git to commit 517a43ecd132a2089d85f4ebc044728a71d49f6e
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting tiktoken (from openai-whisper==20240930)
  Downloading tiktoken-0.9.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch->openai-whisper==20240930)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch->openai-whisper==20240930)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-

# Running the Whisper

- Just run the following cells and follow the intended inputs

## Running via code

```
# This is formatted as code
```



### Importing Whisper

In [6]:
import whisper

### Model Selection

- Input your intended model number based on following table (1-6)

|  Size  | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
|:------:|:----------:|:------------------:|:------------------:|:-------------:|:--------------:|
|  tiny  |    39 M    |     `tiny.en`      |       `tiny`       |     ~1 GB     |      ~10x      |
|  base  |    74 M    |     `base.en`      |       `base`       |     ~1 GB     |      ~7x       |
| small  |   244 M    |     `small.en`     |      `small`       |     ~2 GB     |      ~4x       |
| medium |   769 M    |    `medium.en`     |      `medium`      |     ~5 GB     |      ~2x       |
| large  |   1550 M   |        N/A         |      `large`       |    ~10 GB     |       1x       |
| turbo  |   809 M    |        N/A         |      `turbo`       |     ~6 GB     |      ~8x       |


In [None]:
models_dict = {1: 'tiny', 2: 'base', 3: 'small', 4: 'medium', 5: 'large', 6: 'turbo'}

selected_model = input('Gimme your intended model number based-on the table above: ')

if len(selected_model) == 1 and selected_model[0].isdigit:
    selected_model = 3
    selected_model = models_dict[int(selected_model)]
else:
    print('Invalid input, defaulting to medium')
    selected_model = 'tiny'

In [None]:
### Video/Audio File Passing

In [None]:
media_file_path = input('Your media file rel/abs path: ')

### Getting the result

- While you can find all types of transcriptions format in current dir, after running this cell, you can get the raw format of generated transcripts

In [None]:
result = model.transcribe(media_file_path)
print(result["text"])

## Running via command-line

- Just run your intended version of command-line command in following format with relevant flags

The following command will transcribe speech in audio files, using the `turbo` model:

    whisper audio.flac audio.mp3 audio.wav --model turbo

```Bash
whisper [<file-1 path> <file-2 path> <file-3 path>] --model <model-name based on given table>
```

The default setting (which selects the `turbo` model) works well for transcribing **English**. To transcribe an audio file containing non-English speech, you can specify the language using the `--language` option:

    whisper japanese.wav --language Japanese

```Bash
whisper <file path> --model <model-name based on given table> --language <language name based on following chart>
```

Adding `--task translate` will translate the speech into English:

    whisper japanese.wav --language Japanese --task translate

Run the following to view all available options:

    whisper --help

See [tokenizer.py](https://github.com/openai/whisper/blob/main/whisper/tokenizer.py) for the list of all available languages.

### Languages

Whisper's performance varies widely depending on the language. The figure below shows a performance breakdown of `large-v3` and `large-v2` models by language, using WERs (word error rates) or CER (character error rates, shown in *Italic*) evaluated on the Common Voice 15 and Fleurs datasets. Additional WER/CER metrics corresponding to the other models and datasets can be found in Appendix D.1, D.2, and D.4 of [the paper](https://arxiv.org/abs/2212.04356), as well as the BLEU (Bilingual Evaluation Understudy) scores for translation in Appendix D.3.

![WER breakdown by language](https://github.com/openai/whisper/assets/266841/f4619d66-1058-4005-8f67-a9d811b77c62)

In [None]:
!whisper english-sample.wav --language en