# N46Whisper

N46Whisper is a Google Colab notebook application for streamedlined video Japanese subtitle file generation to facilitate subsequent translation and timing process. This application serves the purpose of improving the productivity of Nogizaka46 (and Sakamichi groups) subbers.

The notebook is based on [Whisper](https://https://github.com/openai/whisper), a general-prupose speech recognition model.

The output file will be in .ass format with built-in style of selected sub group so it can be directly imported into [Aegisub](https://github.com/Aegisub/Aegisub) for subsequent editing.

Contact:

In [8]:
#@markdown **挂载你的谷歌网盘/Mount Google Drive** 

from google.colab import drive
drive.mount('/drive')
print('Google Drive mounted')
print('挂载完毕')

Drive already mounted at /drive; to attempt to forcibly remount, call drive.mount("/drive", force_remount=True).
Google Drive mounted
挂载完毕


In [None]:
#@markdown **配置Whisper/Setup Whisper**

! pip install git+https://github.com/openai/whisper.git
print('Whisper installed')
print('配置完毕')

In [13]:
#@markdown **上传文件/Upload files**

from google.colab import files

uploaded = files.upload()
file_name = list(uploaded.keys())[0]

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



In [10]:
# @markdown **参数设置/Required settings:**


# @markdown <font size="2">select uploaded file type and input file name (e.g. audio.mp3).
# @markdown <br/>选择文件类型(视频/音频）并输入名称。</font>

# encoding:utf-8
file_type = "video"  # @param ["audio","video"]

# @markdown <font size="2">Model size will affect the processing time and transcribe quality.
# @markdown <br/>模型大小将影响转录时间和质量
model_size = "medium"  # @param ["base", "medium", "large"]
language = "japanese"  # @param {type:"string"}


# @markdown **高级设置/Andvanced settings:**

# @markdown <font size="2">Don't change anything here unless you konw what you are doing.
# @markdown <br/>调节以下参数可能会提高转录质量并避免一些问题，但是不懂请不要调

compression_ratio_threshold = 2.4 # @param {type:"number"}
no_speech_threshold = 0.6 # @param {type:"number"}
logprob_threshold = -1.0 # @param {type:"number"}
condition_on_previous_text = "True" # @param ["True", "False"]

In [29]:
#@markdown **运行Whisper/Run Whisper**

import os
import ffmpeg
import subprocess
import torch
import whisper
import time
import pandas as pd
from pathlib import Path
import sys
sys.path.append('/drive/content')

assert file_name != ""
assert language != ""


if not os.path.exists(file_name):
  raise ValueError(f"No {file_name} found in current path.")
else:
    try:
        file_basename = Path(file_name).stem
        output_dir = Path(file_name).parent.resolve()
        print(file_basename)
        print(output_dir)      
    except Exception as e:
        print(f'error: {e}')

if file_type == "video":
  print('提取音频中 Extracting audio from video file...')
  os.system(f'ffmpeg -i {file_name} -f mp3 -ab 192000 -vn {file_basename}.mp3')
  print('提取完毕 Done.') 

torch.cuda.empty_cache()
print('加载模型 Loading model...')
model = whisper.load_model(model_size)

#Transcribe
tic = time.time()
print('识别中 Converting...')
result = model.transcribe(audio = f'{file_basename}.mp3', language= language)
toc = time.time()
print('识别完毕 Done')
print(f'Time consumpution {toc-tic}s')

#Write SRT file
from whisper.utils import write_srt
with open(Path(output_dir) / (file_basename + ".srt"), "w", encoding="utf-8") as srt:
    write_srt(result["segments"], file=srt)
#Convert SRT to ASS
from srt2ass import srt2ass
assSub = srt2ass(file_basename + ".srt")
print('ASS subtitle saved as: ' + assSub)
!rm <file_basename + ".srt">
torch.cuda.empty_cache()


test
/content
提取音频中 Extracting audio from video file...
提取完毕 Done.
加载模型 Loading model...


100%|█████████████████████████████████████| 1.42G/1.42G [00:25<00:00, 60.5MiB/s]


识别中 Converting...
Time consumpution 150.13225388526917s


In [None]:
from srt2ass import srt2ass
assSub = srt2ass(audio_basename + ".srt")
print('ASS subtitle saved as: ' + assSub)
!rm <file_basename + ".srt">
torch.cuda.empty_cache()

ASS subtitle saved as: 221024a.ass
