Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

小改进意见 #21

Open
LyWangPX opened this issue Mar 20, 2023 · 3 comments
Open

小改进意见 #21

LyWangPX opened this issue Mar 20, 2023 · 3 comments

Comments

@LyWangPX
Copy link

LyWangPX commented Mar 20, 2023

我做过相关AI字幕的工作,不方便上传代码,但有几个方向值得借鉴:

  • 不用whisper,用whisperX,然后用whisper原始输出的字幕作为原本,扫描出准确时间轴
  • 用更智能的方式切割映像文件,只检测出里面有语音的部分,并采用不同的设置喂给whisperX
  • 将不同的whisper和wav2vec2模型的结果进行一个综合导出

最后时间轴的结果里无需人为修正的准确结果可以达到80%+

@Ayanaminn
Copy link
Owner

感谢,我抽空会研究一下

@echoIIImk2
Copy link

echoIIImk2 commented Mar 27, 2023

也可以参考一下whisper webui
https://gitlab.com/aadnk/whisper-webui
它调用silero-vad先对音频进行分块然后喂给whisper,基本上可以完美解决莫名其妙反复重复某句话的bug。对于小语种特别有用,openai/whisper#397 这里讨论的例子也是日语。

@Nekofoxmiu
Copy link

whisperX 的 colab 使用 似乎必定會牽扯到要重啟環境

! pip install torch==2.0.0+cu118 torchvision==0.15.1+cu118 torchaudio==2.0.1 torchtext==0.15.1 --index-url https://download.pytorch.org/whl/cu118

! pip install git+https://github.com/m-bain/whisperx.git

因為環境涉及到重新安裝pytorch的樣子

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants