# YouTube Speaker Diarization (Kaggle, faster-whisper版)

本版本使用 **faster-whisper + pyannote**，句级时间戳，重点保证 Kaggle 稳定性。


## 1) 一键初始化
运行后务必 **Restart Session**。


In [1]:
import os, sys
from pathlib import Path

REPO_URL = 'https://github.com/Hana19951208/youtube-speaker-diarization.git'
REPO_DIR = '/kaggle/working/youtube-speaker-diarization'
CACHE_DIR = Path('/kaggle/working/cache')
WHEELHOUSE = CACHE_DIR / 'wheelhouse'

!apt-get update -y
!apt-get install -y ffmpeg

%cd /kaggle/working
if os.path.exists(REPO_DIR):
    !rm -rf {REPO_DIR}
!git clone {REPO_URL}
%cd {REPO_DIR}

CACHE_DIR.mkdir(parents=True, exist_ok=True)
WHEELHOUSE.mkdir(parents=True, exist_ok=True)
os.environ['PIP_CACHE_DIR'] = str(CACHE_DIR / 'pip')
os.environ['HF_HOME'] = str(CACHE_DIR / 'huggingface')
os.environ['HF_HUB_CACHE'] = str(CACHE_DIR / 'huggingface' / 'hub')
os.environ['TORCH_HOME'] = str(CACHE_DIR / 'torch')
os.environ['XDG_CACHE_HOME'] = str(CACHE_DIR / 'xdg')

stamp = CACHE_DIR / 'deps_installed_v2_faster_whisper.flag'
if stamp.exists():
    print('✅ 检测到依赖缓存标记，跳过安装。')
    print('如果你改了 requirements.txt，请删除：', stamp)
else:
    print('⏬ 首次安装：先下载 wheels 到缓存，再离线安装...')
    !pip download -q -r requirements.txt -d {WHEELHOUSE}
    !pip download -q torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 -d {WHEELHOUSE}

    # 清理关键冲突包
    !pip uninstall -y whisperx faster-whisper pyannote.audio transformers accelerate numpy pandas torch torchvision torchaudio -q

    # 优先离线安装（从缓存 wheelhouse）
    !pip install -q --no-index --find-links {WHEELHOUSE} torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1
    !pip install -q --no-index --find-links {WHEELHOUSE} -r requirements.txt

    stamp.write_text('ok')
    print('✅ 依赖安装并缓存完成。')

print('✅ Step1完成。若已重启Session，可从Step2开始。')


Hit:1 http://archive.ubuntu.com/ubuntu jammy InRelease
Get:2 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]        
Get:3 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,632 B]
Get:4 https://cli.github.com/packages stable InRelease [3,917 B]               
Get:5 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [127 kB]      
Get:6 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease [1,581 B]
Hit:7 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease   
Hit:8 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Hit:9 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease   
Get:10 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]    
Get:11 https://cli.github.com/packages stable/main amd64 Packages [357 B]      
Get:12 https://r2u.stat.illinois.edu/ubuntu jammy InRelease [6,555 B]          
Get:13 http://archive.ubuntu.com/ubun

## 1.1) 重启后健康检查


In [2]:
import torch, transformers, accelerate, numpy, pandas
from faster_whisper import WhisperModel
print('torch:', torch.__version__)
print('transformers:', transformers.__version__)
print('accelerate:', accelerate.__version__)
print('numpy:', numpy.__version__)
print('pandas:', pandas.__version__)
_ = WhisperModel('tiny', device='cpu', compute_type='int8')
print('✅ faster-whisper import ok')


torch: 2.5.1+cu124
transformers: 4.46.3
accelerate: 0.34.2
numpy: 2.0.2
pandas: 2.2.2


config.json: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

vocabulary.txt: 0.00B [00:00, ?B/s]

model.bin:   0%|          | 0.00/75.5M [00:00<?, ?B/s]

✅ faster-whisper import ok


## 2) 设置 HF_TOKEN
你可以手填，或从 Kaggle Secrets 读取。


In [8]:
HF_TOKEN = ''  # 可直接粘贴；或留空后用 secrets
try:
    if not HF_TOKEN:
        from kaggle_secrets import UserSecretsClient
        HF_TOKEN = UserSecretsClient().get_secret('HF_TOKEN')
except Exception:
    pass
import os
os.environ['HF_TOKEN'] = HF_TOKEN
print('HF_TOKEN set:', bool(HF_TOKEN))


HF_TOKEN set: True


## 3) 上传参考音频


In [11]:
ref_audio_path = '/kaggle/working/youtube-speaker-diarization/biao.mp3'

print('ref_audio_path =', ref_audio_path)


ref_audio_path = /kaggle/working/youtube-speaker-diarization/biao.mp3


## 4) 配置参数


In [6]:
CONFIG = {
    'youtube_url': 'https://www.youtube.com/watch?v=Zs8jUFaqtCI',
    'playlist_mode': 'single',  # single | all
    'language': 'zh',
    'max_speakers': 3,
    'whisper_model': 'large-v3',
    'do_separation': False,
    'do_vad': False,
    'do_enhance': False,
    'similarity_threshold': 0.25,
    'output_dir': './output',
}
CONFIG


{'youtube_url': 'https://www.youtube.com/watch?v=Zs8jUFaqtCI',
 'playlist_mode': 'single',
 'language': 'zh',
 'max_speakers': 3,
 'whisper_model': 'large-v3',
 'do_separation': False,
 'do_vad': False,
 'do_enhance': False,
 'similarity_threshold': 0.25,
 'output_dir': './output'}

## 4.5) Step5前同步最新仓库代码


In [None]:
%cd /kaggle/working/youtube-speaker-diarization
!git fetch origin
!git pull --rebase origin master || git pull origin master
# 拉取后补齐依赖（避免新代码依赖未安装）
!pip install -q -r requirements.txt
print('✅ 已同步到最新 master 并补齐依赖')


## 5) 运行 Pipeline


In [12]:
from pipeline import YouTubeSpeakerPipeline

pipeline = YouTubeSpeakerPipeline(
    hf_token=HF_TOKEN,
    output_dir=CONFIG['output_dir'],
    whisper_model=CONFIG['whisper_model'],
    max_speakers=CONFIG['max_speakers'],
    do_separation=CONFIG['do_separation'],
    do_vad=CONFIG['do_vad'],
    do_enhance=CONFIG['do_enhance'],
    similarity_threshold=CONFIG['similarity_threshold'],
    playlist_mode=CONFIG.get('playlist_mode', 'single'),
)

results = pipeline.process(
    youtube_url=CONFIG['youtube_url'],
    ref_audio_path=ref_audio_path,
    language=CONFIG['language'],
)
print('✅ Done')


INFO:pipeline:Pipeline initialized
INFO:audio_utils:FFmpeg found
INFO:pipeline:Step 1: Downloading YouTube audio


[youtube] Extracting URL: https://www.youtube.com/watch?v=Zs8jUFaqtCI
[youtube] Zs8jUFaqtCI: Downloading webpage




[youtube] Zs8jUFaqtCI: Downloading android vr player API JSON
[info] Zs8jUFaqtCI: Downloading 1 format(s): 251
[download] Destination: ./output/《马大帅》EP 01 ｜ 《漫长的季节》爆笑喜剧版 （赵本山，范伟）_Zs8jUFaqtCI.webm
[download] 100% of   26.64MiB in 00:00:02 at 13.14MiB/s    
[ExtractAudio] Destination: ./output/《马大帅》EP 01 ｜ 《漫长的季节》爆笑喜剧版 （赵本山，范伟）_Zs8jUFaqtCI.wav
Deleting original file ./output/《马大帅》EP 01 ｜ 《漫长的季节》爆笑喜剧版 （赵本山，范伟）_Zs8jUFaqtCI.webm (pass -k to keep)


ERROR:audio_utils:Failed to download YouTube audio: yt-dlp finished but no WAV file was found in output_dir
ERROR:pipeline:Failed to download YouTube audio: yt-dlp finished but no WAV file was found in output_dir


RuntimeError: yt-dlp finished but no WAV file was found in output_dir

## 6) 查看输出


In [None]:
import glob, os
files = glob.glob(os.path.join(CONFIG['output_dir'], '*.srt')) + glob.glob(os.path.join(CONFIG['output_dir'], '*.json'))
for f in files:
    print('-', f)
