# Machine Transcription and Translation

AdvancedCI.

For technical assistance contact beining@chineseaci.com . 

## Step 1: Install

Execute all steps.

In [None]:
#@title Step 1.1: GPU Model
!nvidia-smi

Sat Oct  8 04:14:14 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   46C    P0    26W /  70W |   5926MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

You need a GPU with minimal 11GiB VRAM: if not, turn it on under "Runtime - Change Runtime Type".

In [None]:
#@title Step 1.2 Install packages - takes ~2 mins
!pip install srt requests tqdm git+https://github.com/openai/whisper.git --quiet

## Step 2: Setup Model

In [None]:
#@title Step 2.1: Import model
# setup model
import whisper

In [None]:
#@title Step 2.2: Select and load Model.
model_name = 'medium.en' #@param ["tiny", "small", "medium", "large", "tiny.en", "small.en", "medium.en"]

model = whisper.load_model("medium.en")

100%|█████████████████████████████████████| 1.42G/1.42G [00:55<00:00, 27.5MiB/s]



`medium.en` should strike a balance for English based audio. Colab should have enough VRAM for any model selected on *any* GPU provided, including `large`.

Disconnect and reconnect if you change the model in the middle of execution(and only do so if you know what you are doing) to avoid VRAM OOM. All progress and uploaded/generated files shall be lost.

Select desired model and run the cell above. It takes ~3 mins to download the `medium` model: move to Step 3 while waiting.

## Step 3: Prepare audio for transcription

### Step 3.1: Convert video to audio

While you are waiting(download takes ~2 mins) convert original video to MP3 with `FFmpeg` by running:

`ffmpeg -i Air_Crash_Investigation_S22E071.mp4 -vn -c:a libmp3lame aci.mp3`

or use GUI tools like 

- `Maruko Toolbox` (Windows only): https://maruko.appinn.me/ 
- `Handbrake` for all platforms: https://handbrake.fr/downloads.php .

_ACICFG has sponsorship relationship with `Maruko Toolbox`._

#### Alternatively (highly discouraged): 

FFmpeg is instlled on Colab: If uploading video use the following command to convert on the colab instance: 
`ffmpeg -i aci.mp4 -vn -c:a libmp3lame aci.mp3`


### Step 3.2: Upload audio file to Colab

Rename the file to something benign - without space or any special character.

Click the "file" icon on the left, click the "upload to session storage" button to upload the audio file.

Input the exact name of your uploaded file to the field below. Select the main language of the audio.

**NOTE: All uploaded and generated files are strictly for this session and shall be deleted when you disconnect from the instance - no recovery possible!**

Type in the name of the audio file you uploaded below, should be ending with `.mp3` and select language of your audio file.

In [None]:
audio_file_name = 'bb.mp3' #@param {type:"string"}
audio_file_language = 'english' #@param ['english', 'chinese', 'german', 'spanish', 'russian', 'korean', 'french', 'japanese', 'portuguese', 'turkish', 'polish', 'catalan', 'dutch', 'arabic', 'swedish', 'italian', 'indonesian', 'hindi', 'finnish', 'vietnamese', 'hebrew', 'ukrainian', 'greek', 'malay', 'czech', 'romanian', 'danish', 'hungarian', 'tamil', 'norwegian', 'thai', 'urdu', 'croatian', 'bulgarian', 'lithuanian', 'latin', 'maori', 'malayalam', 'welsh', 'slovak', 'telugu', 'persian', 'latvian', 'bengali', 'serbian', 'azerbaijani', 'slovenian', 'kannada', 'estonian', 'macedonian', 'breton', 'basque', 'icelandic', 'armenian', 'nepali', 'mongolian', 'bosnian', 'kazakh', 'albanian', 'swahili', 'galician', 'marathi', 'punjabi', 'sinhala', 'khmer', 'shona', 'yoruba', 'somali', 'afrikaans', 'occitan', 'georgian', 'belarusian', 'tajik', 'sindhi', 'gujarati', 'amharic', 'yiddish', 'lao', 'uzbek', 'faroese', 'haitian creole', 'pashto', 'turkmen', 'nynorsk', 'maltese', 'sanskrit', 'luxembourgish', 'myanmar', 'tibetan', 'tagalog', 'malagasy', 'assamese', 'tatar', 'hawaiian', 'lingala', 'hausa', 'bashkir', 'javanese', 'sundanese']


## Step 4: Transcribe!

Execute the step below.

Expected speed for `medium.en` model is ~5X on T4 - aka 45 min episode should take ~8 mins. Larger model shall take longer to process

Wait till the process is finished.


In [None]:
#@title Transcribe the audio file
# Do the work
# Speed for medium.en is 5X on T4 - aka 45 min episode should take ~8 mins
result = model.transcribe(audio_file_name, language=audio_file_language, verbose=True)

[00:00.000 --> 00:24.400]  What the hell is this?
[00:24.400 --> 00:25.400]  Tribal police?
[00:25.400 --> 00:31.400]  Paige! Paige! Paige!
[00:31.400 --> 00:49.400]  No!
[00:49.400 --> 01:01.400]  Police! Drop your weapons!
[01:01.400 --> 01:04.400]  Jack! Don't do it!
[01:04.400 --> 01:08.400]  Jack!
[01:08.400 --> 01:26.400]  Drop your weapons!
[01:38.400 --> 02:01.400]  Jack!
[02:01.400 --> 02:11.400]  No!


## Step 5: Collect results

### Step 5.1: Convert transcription to SRT

Execute the 3 cells below to peek the result.

In [None]:
#@title Import packages
import srt
from datetime import timedelta

In [None]:
#@title Create SRT with transcription
result_srt_list = []
for i in result['segments']:
    result_srt_list.append(srt.Subtitle(index=i['id'], start=timedelta(seconds=i['start']), end=timedelta(seconds=i['end']), content=i['text'].strip()))

composed_transcription = srt.compose(result_srt_list)

In [None]:
#@title Optional: Peek the transcription SRT file
print(composed_transcription)

### Step 5.2: Generate and download transcribed srt

Input desired name of the file for transcribed srt below, and execute the 2 cells below.

In [None]:
#@title Name of the transcribed srt to generate, should be ending with `.srt`

transcribed_srt_name = 'transcribed.srt' #@param {type:"string"}


In [None]:
#@title Write the SRT
with open(transcribed_srt_name, 'w') as f:
    f.write(composed_transcription)

You should see a `srt` file generated with desired name: right click and download the file.

## Step 6: Translate

### Step 6.1: Execute translation

We will use DeepL's undocumented API for translation.

Execute the 3 cells below.

In [None]:
#@title 6.1.1 Import packages
import requests
from tqdm.notebook import tqdm
from tqdm.contrib.concurrent import process_map  # or thread_map

In [None]:
#@title 6.1.2 Setup Variables: Thread Number, Source Language, Target Language

result_list_translated = []
result_list_assembled = []
s = requests.Session()

thread_num = 8 #@param [1, 2, 4, 6, 8, 12, 16]
source_lang = "auto" #@param ["auto", "BG", "CS", "DA", "DE", "EL", "EN", "EN-GB", "EN-US", "ES", "ET", "FI", "FR", "HU", "ID", "IT", "JA", "LT", "LV", "NL", "PL", "PT", "PT-BR", "PT-PT", "RO", "RU", "SK", "SL", "SV", "TR", "UK", "ZH"]
target_lang = "ZH" #@param ["BG", "CS", "DA", "DE", "EL", "EN", "EN-GB", "EN-US", "ES", "ET", "FI", "FR", "HU", "ID", "IT", "JA", "LT", "LV", "NL", "PL", "PT", "PT-BR", "PT-PT", "RO", "RU", "SK", "SL", "SV", "TR", "UK", "ZH"]

In [None]:
#@title 6.1.3 Setup DeepL

def translate_via_deepl(content):
    try:
        resp = s.post('https://deepl.cnbeining.com/translate', json={"text": content, "source_lang": "auto", "target_lang": "ZH"}).json()
    except Exception as e:
        print(line)
        print(e)
        if resp['code'] != 200:
            print('Error calling API: ' + resp['msg'])
        return [content, content]

    return [content, f"{resp['data']}\n{content}"]

In [None]:
#@title 6.1.4 Call DeepL API for translation: ~2 lines/sec
result = process_map(translate_via_deepl, [line['text'].strip() for line in result['segments']], max_workers=8)

  0%|          | 0/10 [00:00<?, ?it/s]

### 6.1.6 Assemble results

In [None]:
#@title Create versions of SRT

for i in result:
    result_list_translated.append(i[0])
    result_list_assembled.append(i[1])

result_srt_list_translated = []

for i, v in enumerate(result['segments']):
    result_srt_list_translated.append(srt.Subtitle(index=v['id'], start=timedelta(seconds=v['start']), end=timedelta(seconds=v['end']), content=result_list_translated[i]))

result_srt_list_assembled = []

for i, v in enumerate(result['segments']):
    result_srt_list_assembled.append(srt.Subtitle(index=v['id'], start=timedelta(seconds=v['start']), end=timedelta(seconds=v['end']), content=result_list_assembled[i]))

composed_transcription_translated = srt.compose(result_srt_list_translated)
composed_transcription_assembled = srt.compose(result_srt_list_assembled)

In [None]:
#@title Optional: Remove special characters according to ACICFG's standard
composed_transcription_translated = composed_transcription_translated.replace("。", "").replace("，", " ")
composed_transcription_assembled = composed_transcription_assembled.replace("。", "").replace("，", " ")

In [None]:
#@title Optional: Execute the cell below to peak the assembled results.
print(composed_transcription_assembled)

### Step 6.2: Collect translated results

Toggle the selection to generate Assembled SRT(Translation - Transcription) rather than Translated SRT(Translation only); 

Also change the desired filename below. 

Execute the 2 cells below and collect generated SRT file on the left.

In [None]:
#@title Generation Settings

translated_result_filename = 'translated.srt' #@param {type:"string"}
is_generate_assembled_srt = True #@param {type:"boolean"}



In [None]:
#@title Generate SRT
with open(translated_result_filename, 'w') as f:
    if is_generate_assembled_srt:
        f.write(composed_transcription_assembled)
    else:
        f.write(composed_transcription_translated)

## Recycle

Recycle bin for code snippets: None of them should be necessary for ordinary users.

In [None]:
#@title Unused: Single Threaded version


with tqdm(total=len(result['segments'])) as pbar:
    for line in result['segments']:
        content = line['text'].strip()
        try:
            resp = s.post('https://deepl.cnbeining.com/translate', json={"text": content, "source_lang": "auto", "target_lang": "ZH"}).json()
            result_list_translated.append(resp['data'])

        except Exception as e:
            print(line)
            print(e)
            if resp['code'] != 200:
                print('Error calling API: ' + resp['msg'])
            result_list_translated.append(content)
            result_list_assembled.append(content)
            continue


        result_list_translated.append(resp['data'])
        result_list_assembled.append(f"{resp['data']}\n{content}")

        pbar.update(1)
