# Transcription Notebook

This notebook can be used to transcribe audio files. The transcriptions can be exported as a text file or
 as a csv (tsv) file.


Libraries that need to be installed:
- [Whisper](https://github.com/openai/whisper)

Libraries Dependencies:
- [ffmpeg](https://ffmpeg.org/)

Detailed step-by-step instructions for installing the required libraries can be found in the
 [README.md](https://github.com/LeonMunz/OAI_Whisper#readme)

In [42]:
import os
import whisper
import pandas as pd

#### 1️⃣ Path details
In this section, the path and the file name can be specified. The default path is "audio/...".

In [43]:
# Filename
file_name = 'test_audio.wav'

# Path to audio file
audio = os.path.join('audio', file_name)

#### 2️⃣ Select model:
The model to be used by whisper can be selected here. The results of the individual models may differ from each other.
A simple overview of the results of the different models can be found in the
[README.md](https://github.com/LeonMunz/OAI_Whisper#readme).


There are five model sizes, four with English-only versions, offering speed and accuracy tradeoffs. Below are the names of the available models and their approximate memory requirements and relative speed.


|  Size  | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
|:------:|:----------:|:------------------:|:------------------:|:-------------:|:--------------:|
|  tiny  |    39 M    |     `tiny.en`      |       `tiny`       |     ~1 GB     |      ~32x      |
|  base  |    74 M    |     `base.en`      |       `base`       |     ~1 GB     |      ~16x      |
| small  |   244 M    |     `small.en`     |      `small`       |     ~2 GB     |      ~6x       |
| medium |   769 M    |    `medium.en`     |      `medium`      |     ~5 GB     |      ~2x       |
| large  |   1550 M   |        N/A         |      `large`       |    ~10 GB     |       1x       |

In [44]:
# Choose model size
model_size = 'medium'

# Load model
model = whisper.load_model(model_size)

#### 3️⃣ Read audio file:

If this warning appears during execution, it can be ignored.

```UserWarning: FP16 is not supported on CPU; using FP32 instead w
arnings.warn("FP16 is not supported on CPU; using FP32 instead")```

In [45]:
# Read the entire file and processes the audio
res = model.transcribe(audio)



#### 4️⃣ Generate Output:
First, a data frame is created to check the transcription within the notebook. In this DataFrame, the start and end
time of each extracted segment is given in seconds, as well as the extracted text.

Secondly, the output format can be selected, with the choice between a .txt file or a .csv (tsv) file.

In [46]:
# Creating a DataFrame from transcription data
df = pd.DataFrame(res['segments'])
df = df.drop(['id', 'seek', 'tokens', 'temperature', 'avg_logprob', 'compression_ratio', 'no_speech_prob'], axis=1)
pd.set_option('max_colwidth', None)
df

Unnamed: 0,start,end,text
0,0.0,5.0,Beim Nachrichtendienst Twitter dachte sich wohl so mancher bei dem Pieps wohl.
1,5.0,8.0,"Seitdem Milliardär Elon Musk den Laden übernommen hat,"
2,8.0,11.0,und gleich einmal Massenentlassungen vornahm.
3,11.0,13.0,"Der Letzte macht das Licht aus,"
4,13.0,17.0,mochte einem auch angesichts dieser Radikal-Cure einfallen.
5,17.0,21.0,"Bei Musk eine konstante Strategie zu erkennen, ist schwer."
6,21.0,24.0,"Klar ist wohl nur, er will den Dienst kommerzieller machen."
7,24.0,27.0,Vergaulte aber durch seine Aktionen schon Werbekunden.
8,27.0,30.0,Und er will weniger inhaltlich eingreifen.


In [52]:
def output(filename, path, out_type):
    if out_type == 'csv':
        df.to_csv(path + filename + '.csv', sep=';')
    elif out_type == 'txt':
        with open(path + '.txt', 'a') as f:
            df_as_string = df.to_string(header=False, index=False)
            f.write(df_as_string)
    return print('Output file "{}" was created in the format "{}".'.format(filename, out_type))

# Output filename
out_file_name = 'test_output'
# Output filepath
output_path = os.path.join('text_output', out_file_name)
# Output type (csv, txt)
output_type = 'txt'

output(out_file_name, output_path, output_type)


Output file "test_output" was created in the format "txt"
