# N46Whisper

N46Whisper is a Google Colab notebook application for streamlined video subtitle file generation.The original purpose of the project was to improve the productivity of Nogizaka46 (and Sakamichi groups) subbers. However, it can also be used to create subtitles in general.The application could significantly reduce the labour and time costs of sub-groups or individual subbers. However, despite its impressive performance, the Whisper model, AI translation and the application itself are not without limitations.


N46Whisper 是基于 Google Colab 的应用。开发初衷旨在提高乃木坂46（以及坂道系）字幕组日语视频的制作效率,但亦适于所有外语视频的字幕制作。本应用的目标并非生产完美的字幕文件， 而旨在于搭建并提供一个简单且自动化的使用平台以节省生产成品字幕的时间和精力。Whisper模型有其本身的应用场景限制，AI 翻译的质量亦还不能尽如人意。

<font size='4'>**对于中文用户，推荐在使用前阅读[常见问题说明](https://github.com/Ayanaminn/N46Whisper/blob/main/FAQ.md)。如果你觉得本应用对你有所帮助，欢迎帮助扩散给更多的人。**


<font size='4'>**联系作者/Contact me：[E-mail](admin@ikedateresa.cc)**


## 更新/What's Latest：
历史更新日志

<font size = '3'>**本项目将不再进行维护和更新，感谢大家的帮助与支持。**
</br></br>

2024.4.17:
* 添加使用Google Gemini API翻译的选项。

2024.1.31:
* 鉴于集成的参数选项（还会）越来越多有使流程变得繁琐的趋势，这有违开发初衷。因此测试分离了一个[轻量版](https://colab.research.google.com/github/Ayanaminn/N46Whisper/blob/dev/N46WhisperLite.ipynb)，只保留最少的必要操作。

2023.12.4:
* 支持基于faster-whisper的WhisperV3模型/Support faster-whisper based WhisperV3 model

2023.11.7:
* 现在可以加载最新的WhisperV3模型/Enable users to load lastest Whisper V3 model.
* 允许用户自行设置beam size/ Enable customerize beam size parameter.

2023.4.30:
* 优化提示词/Refine the translation prompt.
* 允许用户使用个人提示词并调节Temperature参数/Allow user to custom prompt and temperature for translation.
* 显示翻译任务消费统计/Display the token used and total cost for the translation task.

2023.4.15:
* 使用faster-whisper模型重新部署以提高效率，节省资源。Reimplement Whsiper based on faster-whsiper to improve efficiency.
* 提供faster-whisper集成的vad filter选项以提高转录精度。Enable vad filter that integrated in faster-whisper to improve transcribe accuracy

**<font size='5'>以下选择文件方式按需执行其中一种即可，不需要全部运行</font>**

In [None]:
#@title **从谷歌网盘选择文件/Select File From Google Drive**

# @markdown <font size="2">Navigate to the file you want to transcribe, left-click to highlight the file, then click 'Select' button to confirm.
# @markdown <br/>从网盘目录中选择要转换的文件(视频/音频），单击选中文件，点击'Select'按钮以确认。</font><br/>
# @markdown <br/><font size="2">If use local file, ignore this cell and move to the next.
# @markdown <br/>若希望从本地上传文件，则跳过此步执行下一单元格。</font><br/>
# @markdown <br/><font size="2">If file uploaded to drive after execution, execute this cell again to refresh.
# @markdown <br/>若到这一步才上传文件到谷歌盘，则重复执行本单元格以刷新文件列表。</font>
! pip install geemap
from google.colab import drive
from google.colab import files
import os
import logging
from IPython.display import clear_output
import geemap

clear_output()
drive.mount('/drive')

print('Google Drive is mounted，please select file')
print('谷歌云盘挂载完毕，请选择要转换的文件')

from ipytree import Tree, Node
import ipywidgets as widgets
from ipywidgets import interactive
# import os
from google.colab import output
output.enable_custom_widget_manager()
use_drive = True
global drive_dir
drive_dir = []

def file_tree():
    # create widgets as a simple file browser
    full_widget = widgets.HBox()
    left_widget = widgets.VBox()
    right_widget = widgets.VBox()

    path_widget = widgets.Text()
    path_widget.layout.min_width = '300px'
    select_widget = widgets.Button(
      description='Select', button_style='primary', tooltip='Select current media file.'
      )
    drive_url = widgets.Output()

    right_widget.children = [select_widget]
    full_widget.children = [left_widget]

    tree_widget = widgets.Output()
    tree_widget.layout.max_width = '300px'
    tree_widget.overflow = 'auto'

    left_widget.children = [path_widget,tree_widget]

    # init file tree
    my_tree = Tree(multiple_selection=False)
    my_tree_dict = {}
    media_names = []

    def select_file(b):
        drive_dir.append(path_widget.value)
        # full_widget.disabled = True
        # clear_output()
        print('File selected，please continue to select more or execute next cell')
        print('已选择文件，可以继续选择或执行下个单元格')
    #     if (out_file not in my_tree_dict.keys()) and (out_dir in my_tree_dict.keys()):
    #         node = Node(os.path.basename(out_file))
    #         my_tree_dict[out_file] = node
    #         parent_node = my_tree_dict[out_dir]
    #         parent_node.add_node(node)

    select_widget.on_click(select_file)

    def handle_file_click(event):
        if event['new']:
            cur_node = event['owner']
            for key in my_tree_dict.keys():
                if (cur_node is my_tree_dict[key]) and (os.path.isfile(key)):
                    try:
                        with open(key) as f:
                            path_widget.value = key
                            path_widget.disabled = False
                            select_widget.disabled = False
                            full_widget.children = [left_widget, right_widget]
                    except Exception as e:
                        path_widget.value = key
                        path_widget.disabled = True
                        select_widget.disabled = True

                        return

    def handle_folder_click(event):
        if event['new']:
            full_widget.children = [left_widget]

    # redirect cwd to default drive root path and add nodes
    my_dir = '/drive/MyDrive'
    my_root_name = my_dir.split('/')[-1]
    my_root_node = Node(my_root_name)
    my_tree_dict[my_dir] = my_root_node
    my_tree.add_node(my_root_node)
    my_root_node.observe(handle_folder_click, 'selected')

    for root, d_names, f_names in os.walk(my_dir):
        folders = root.split('/')
        for folder in folders:
            if folder.startswith('.'):
                continue
        for d_name in d_names:
            if d_name.startswith('.'):
                d_names.remove(d_name)
        for f_name in f_names:
            # if f_name.startswith('.'):
            #     f_names.remove(f_name)
            # only add media files
            if f_name.lower().endswith(('mp3','m4a','flac','aac','wav','mp4','mkv','ts','flv')):
                media_names.append(f_name)

        d_names.sort()
        f_names.sort()
        media_names.sort()
        keys = my_tree_dict.keys()

        if root not in my_tree_dict.keys():
          # print(f'root name is {root}') # folder path
          name = root.split('/')[-1] # folder name
          # print(f'folder name is {name}')
          dir_name = os.path.dirname(root) # parent path of folder
          # print(f'dir name is {dir_name}')
          parent_node = my_tree_dict[dir_name]
          node = Node(name)
          my_tree_dict[root] = node
          parent_node.add_node(node)
          node.observe(handle_folder_click, 'selected')

        if len(media_names) > 0:
              parent_node = my_tree_dict[root] # parent folders
              # print(parent_node)
              parent_node.opened = False
              for f_name in media_names:
                  node = Node(f_name)
                  node.icon = 'file'
                  full_path = os.path.join(root, f_name)
                  # print(full_path)
                  my_tree_dict[full_path] = node
                  parent_node.add_node(node)
                  node.observe(handle_file_click, 'selected')
        media_names.clear()

    with tree_widget:
      tree_widget.clear_output()
      display(my_tree)

    return full_widget


tree= file_tree()
tree


In [None]:
#@title **从本地上传文件(可多选）/Upload Local File（Can select multiple)**
# @markdown <font size="2">If use file in google drive, ignore this cell and move to the next.
# @markdown <br/>若已选择谷歌盘中的文件，则跳过此步执行下一单元格。</font>

from google.colab import files
use_drive = False
uploaded = files.upload()
file_names = []
file_names.append(list(uploaded.keys())[0])
print('File uploaded，please continue to upload more or execute next cell')
print('已上传文件，可以执行下个单元格')

**<font size='5'>以下顺次点击下方每个单元格左侧的“运行”图标，不可跳过步骤</font>**
**</br>【重要】:** 务必在"修改"->"笔记本设置"->"硬件加速器"中选择GPU！否则处理速度会非常慢。
 **</br>【IMPORTANT】:** Make sure you select GPU as hardware accelerator in notebook settings, otherwise the processing speed will be very slow.

---

## 【实验功能】文本转语音 / Text-to-Speech (TTS)

**<font size="2">此功能将文本内容转换为语音音频文件，支持中文和日语。</font>**

**<font size="2">This feature converts text content to speech audio files, supporting Chinese and Japanese.</font>**

**<font size="2">使用 VITS 模型实现高质量语音合成。</font>**

In [None]:
#@title **TTS 参数设置 / TTS Settings**

# @markdown **选择语音语言 / Select Voice Language**
tts_language = "Chinese"  # @param ["Chinese", "Japanese"]

# @markdown **选择语音类型 / Select Voice Type**
tts_voice_type = "Female"  # @param ["Female", "Male"]

# @markdown **输出音频格式 / Output Audio Format**
tts_output_format = "wav"  # @param ["wav", "mp3"]

# @markdown **语速调节 / Speech Rate (0.5-2.0)**
tts_speed = 1.0  # @param {type:"slider", min:0.5, max:2.0, step:0.1}

# @markdown **文本输入来源 / Text Input Source**
# @markdown <font size="2">upload_new: 上传新文件 / use_subtitle: 使用上一步生成的字幕</font>
tts_text_source = "upload_new"  # @param ["upload_new", "use_subtitle"]

print(f"TTS 配置完成 / TTS Settings configured:")
print(f"  语言/Language: {tts_language}")
print(f"  语音类型/Voice: {tts_voice_type}")
print(f"  输出格式/Format: {tts_output_format}")
print(f"  语速/Speed: {tts_speed}")

In [None]:
#@title **安装 TTS 依赖 / Install TTS Dependencies**

# @markdown <font size="2">安装 VITS TTS 模型所需的依赖库</font>
# @markdown <br/><font size="2">Install dependencies required for VITS TTS model</font>

!pip install -q TTS
!pip install -q pydub
!apt-get install -q -y ffmpeg

from IPython.display import clear_output
clear_output()

print("TTS 依赖安装完成 / TTS dependencies installed successfully!")
print("请继续执行下一个单元格 / Please continue to the next cell")

In [None]:
#@title **运行 TTS / Run TTS**

# @markdown <font size="2">执行文本转语音转换</font>
# @markdown <br/><font size="2">Execute text-to-speech conversion</font>

import os
import re
import torch
from pathlib import Path
from tqdm import tqdm
from google.colab import files
from IPython.display import clear_output, Audio, display

# 加载文本文件
if tts_text_source == 'upload_new':
    print("请上传文本文件 (.txt, .srt, .ass) / Please upload text file (.txt, .srt, .ass)")
    uploaded = files.upload()
    tts_input_file = list(uploaded.keys())[0]
    tts_basename = Path(tts_input_file).stem
elif tts_text_source == 'use_subtitle':
    try:
        tts_input_file = file_basenames[0] + '.srt'
        tts_basename = file_basenames[0]
        print(f"使用字幕文件 / Using subtitle file: {tts_input_file}")
    except:
        raise Exception("未找到字幕文件，请先运行 Whisper 或选择 upload_new / No subtitle file found, please run Whisper first or select upload_new")

clear_output()
print(f"已加载文件 / File loaded: {tts_input_file}")

# 读取并预处理文本
def read_text_file(filepath):
    """读取文本文件并提取纯文本内容"""
    with open(filepath, 'r', encoding='utf-8') as f:
        content = f.read()
    
    ext = Path(filepath).suffix.lower()
    
    if ext == '.srt':
        # 移除 SRT 时间戳和序号
        lines = content.split('\n')
        text_lines = []
        for line in lines:
            line = line.strip()
            if not line:
                continue
            if line.isdigit():
                continue
            if '-->' in line:
                continue
            text_lines.append(line)
        return ' '.join(text_lines)
    
    elif ext == '.ass':
        # 提取 ASS 对话文本
        lines = content.split('\n')
        text_lines = []
        for line in lines:
            if line.startswith('Dialogue:'):
                # 提取对话文本部分
                parts = line.split(',', 9)
                if len(parts) >= 10:
                    text = parts[9]
                    # 移除 ASS 样式标签
                    text = re.sub(r'\{[^}]*\}', '', text)
                    text = text.replace('\\N', ' ').replace('\\n', ' ')
                    text_lines.append(text.strip())
        return ' '.join(text_lines)
    
    else:
        # 普通文本文件
        return content

text_content = read_text_file(tts_input_file)
print(f"文本长度 / Text length: {len(text_content)} 字符/characters")

# 文本分段处理
def split_text(text, max_length=200):
    """将长文本分割成较短的段落"""
    sentences = re.split(r'([。！？.!?])', text)
    segments = []
    current_segment = ""
    
    for i in range(0, len(sentences)-1, 2):
        sentence = sentences[i] + (sentences[i+1] if i+1 < len(sentences) else '')
        if len(current_segment) + len(sentence) <= max_length:
            current_segment += sentence
        else:
            if current_segment:
                segments.append(current_segment)
            current_segment = sentence
    
    if current_segment:
        segments.append(current_segment)
    
    # 如果没有标点分割，按字符数分割
    if not segments:
        for i in range(0, len(text), max_length):
            segments.append(text[i:i+max_length])
    
    return segments

text_segments = split_text(text_content)
print(f"分段数量 / Number of segments: {len(text_segments)}")

# 加载 TTS 模型
print("加载 TTS 模型 / Loading TTS model...")
from TTS.api import TTS

# 根据语言选择模型
if tts_language == "Chinese":
    tts_model_name = "tts_models/zh-CN/baker/tacotron2-DDC-GST"
elif tts_language == "Japanese":
    tts_model_name = "tts_models/ja/kokoro/tacotron2-DDC"
else:
    tts_model_name = "tts_models/en/ljspeech/tacotron2-DDC"

# 检查 GPU
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"使用设备 / Using device: {device}")

tts = TTS(model_name=tts_model_name, progress_bar=True).to(device)

clear_output()
print(f"TTS 模型加载完成 / TTS model loaded: {tts_model_name}")

# 生成音频
print("生成音频中 / Generating audio...")
audio_segments = []
output_dir = "tts_output"
os.makedirs(output_dir, exist_ok=True)

for i, segment in enumerate(tqdm(text_segments, desc="TTS Progress")):
    if not segment.strip():
        continue
    
    segment_file = f"{output_dir}/segment_{i:04d}.wav"
    try:
        tts.tts_to_file(text=segment, file_path=segment_file, speed=tts_speed)
        audio_segments.append(segment_file)
    except Exception as e:
        print(f"段落 {i} 生成失败 / Segment {i} failed: {e}")
        continue

print(f"生成了 {len(audio_segments)} 个音频段落 / Generated {len(audio_segments)} audio segments")

# 合并音频
from pydub import AudioSegment

print("合并音频文件 / Merging audio files...")
combined = AudioSegment.empty()

for seg_file in audio_segments:
    audio = AudioSegment.from_wav(seg_file)
    combined += audio

# 导出最终音频
output_filename = f"{tts_basename}_tts.{tts_output_format}"
if tts_output_format == "mp3":
    combined.export(output_filename, format="mp3", bitrate="192k")
else:
    combined.export(output_filename, format="wav")

print(f"音频生成完成 / Audio generation complete: {output_filename}")

# 显示音频预览
print("音频预览 / Audio preview:")
display(Audio(output_filename))

# 清理临时文件
import shutil
shutil.rmtree(output_dir, ignore_errors=True)

# 触发下载
files.download(output_filename)

print("TTS 转换完成！/ TTS conversion complete!")