<a href="https://colab.research.google.com/github/Crossme0809/whisper-tutorials/blob/main/PyDub_Spilit_Audio.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **通过OpenAI Whipser开源模型直接在本地转录**

In [None]:
%pip install pydub

Collecting pydub
  Downloading pydub-0.25.1-py2.py3-none-any.whl (32 kB)
Installing collected packages: pydub
Successfully installed pydub-0.25.1


* 先安装 openai-whisper 的相关的依赖包。

In [None]:
%pip install openai-whisper
%pip install setuptools-rust

Collecting openai-whisper
  Downloading openai-whisper-20230314.tar.gz (792 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/792.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m792.9/792.9 kB[0m [31m24.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting tiktoken==0.3.1 (from openai-whisper)
  Downloading tiktoken-0.3.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.7 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m46.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting ffmpeg-python==0.2.0 (from openai-whisper)
  Downloading ffmpeg_python-0.2.0-py3-none-any.whl (25 kB)
Building wheels for collected packages: openai-whisper
  Building wheel for openai-whisper (pyproject.toml) ... [?25l[?25

* 如需要翻译则需要安装 `openai` 的相关的依赖包。

In [None]:
!pip install openai --upgrade -q

# **针对大的音频文件两种切割方案**
* **将音频文件根据静音部分分割成块，并使用Whisper API进行语音识别转录**
* **将音频文件按固定时间间隔分割成块，并使用Whisper API进行语音识别转录**


In [None]:
import os
import whisper
from pydub import AudioSegment
from pydub.silence import split_on_silence

model = whisper.load_model("large")

# 使用Whisper本地进行音频转录
def transcribe_audio_whisper(path):
    result = model.transcribe(path)
    text = result['text']
    return text

# 将音频文件根据静音部分分割成块，并使用Whisper API进行语音识别的函数
def get_large_audio_transcription_on_silence_whisper(path):
    sound = AudioSegment.from_file(path)
    chunks = split_on_silence(sound, min_silence_len=500, silence_thresh=sound.dBFS-14, keep_silence=500)

    folder_name = "audio-chunks"
    if not os.path.isdir(folder_name):
        os.mkdir(folder_name)

    whole_text = ""
    for i, audio_chunk in enumerate(chunks, start=1):
        chunk_filename = os.path.join(folder_name, f"chunk{i}.mp3")
        audio_chunk.export(chunk_filename, format="mp3")

        try:
            text = transcribe_audio_whisper(chunk_filename)
        except Exception as e:
            print("Error:", str(e))
        else:
            text = f"{text.capitalize()}. "
            print(chunk_filename, ":", text)
            whole_text += text

    return whole_text

# 将音频文件按固定时间间隔分割成块，并使用Whisper API进行语音识别的函数
def get_large_audio_transcription_fixed_interval_whisper(path, minutes=5):
    sound = AudioSegment.from_file(path)
    chunk_length_ms = int(1000 * 60 * minutes)
    chunks = [sound[i:i + chunk_length_ms] for i in range(0, len(sound), chunk_length_ms)]

    folder_name = "audio-fixed-chunks"
    if not os.path.isdir(folder_name):
        os.mkdir(folder_name)

    whole_text = ""
    for i, audio_chunk in enumerate(chunks, start=1):
        chunk_filename = os.path.join(folder_name, f"chunk{i}.mp3")
        audio_chunk.export(chunk_filename, format="mp3")

        try:
            text = transcribe_audio_whisper(chunk_filename)
        except Exception as e:
            print("Error:", str(e))
        else:
            text = f"{text.capitalize()}. "
            print(chunk_filename, ":", text)
            whole_text += text

    return whole_text

path="generative_ai_topics_clip.mp3"
print("\nFull text:", get_large_audio_transcription_on_silence_whisper(path))
print("="*50)
print("\nFull text:", get_large_audio_transcription_fixed_interval_whisper(path, minutes=1/6))

100%|█████████████████████████████████████| 2.87G/2.87G [01:01<00:00, 50.2MiB/s]


audio-chunks/chunk1.mp3 : 欢迎来到 onboard真实的一线经验走新的投资思考我是 monica我是高宁我们一起聊聊软件如何改变世界. 
audio-chunks/chunk2.mp3 : 大家好,欢迎来到 onboard,我是 monica. 
audio-chunks/chunk3.mp3 : 自从openai发布的chatgbt掀起了席卷世界的ai热潮. 
audio-chunks/chunk4.mp3 : 不到三个月就积累了超过一亿的越货用户超过1300万的日货用户. 
audio-chunks/chunk5.mp3 : 真的是展現了ai讓人驚嘆的能力也讓很多人直呼這就是下一個互聯網的未來. 
audio-chunks/chunk6.mp3 : 有不少观众都说,希望我们再做一期ai的讨论。. 
audio-chunks/chunk7.mp3 : 于是这次硬核讨论就来了. 
audio-chunks/chunk8.mp3 : 这次我们请来了google brain的研究员雪芝. 
audio-chunks/chunk9.mp3 : 他是google大语言模型pumpathway language model的作者之一. 
audio-chunks/chunk10.mp3 : 要知道这个模型的参数量是gpt-3的三倍还多. 
audio-chunks/chunk11.mp3 : 另外還有兩位ai產品大牛. 
audio-chunks/chunk12.mp3 : 一位来自著名的stable diffusion背后的商业公司surbrity ai. 
audio-chunks/chunk13.mp3 : 另一位来自某硅谷合计大厂. 
audio-chunks/chunk14.mp3 : 也曾在吴恩达教授的landing ai中担任产品负责人。. 
audio-chunks/chunk15.mp3 : 此外,莫妮凱還邀請到一位一直關注ai的投資人朋友bill當作我的特邀共同主持嘉賓。. 
audio-chunks/chunk16.mp3 : 我们主要讨论几个话题. 
audio-chunks/chunk17.mp3 : 一方面从研究的视角. 
audio-chunks/chunk18.mp3 : 最前沿的研究者在關注什麼?. 
audio-c

In [None]:

# 从MP4文件中提取音频
video_path = "01-beginner_python_developer.mp4"
audio_path = "01-beginner_python_developer.mp3"

video = AudioSegment.from_file(video_path, format="mp4")
audio = video.set_channels(1)  # 转为单声道
audio.export(audio_path, format="mp3")


<_io.BufferedRandom name='01-beginner_python_developer.mp3'>

In [None]:
import os
os.environ['OPENAI_API_KEY'] = "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"

In [None]:
import openai
import whisper
from pydub import AudioSegment
from pydub.silence import split_on_silence

model = whisper.load_model("large")
openai.api_key = os.getenv("OPENAI_API_KEY")

# 使用Whisper本地进行音频转录
def transcribe_audio_whisper(path):
    result = model.transcribe(path)
    text = result['text']
    return text

# 将音频文件根据静音部分分割成块，并使用Whisper API进行语音识别的函数
def get_large_audio_transcription_on_silence_whisper(path, export_chunk_len):
    sound = AudioSegment.from_file(path)
    chunks = split_on_silence(sound, min_silence_len=500, silence_thresh=sound.dBFS-14, keep_silence=500)

    folder_name = "audio-chunks"
    if not os.path.isdir(folder_name):
        os.mkdir(folder_name)

    # 现在重新组合这些块，使得每个部分至少有export_chunk_len长。
    output_chunks = [chunks[0]]
    for chunk in chunks[1:]:
        if len(output_chunks[-1]) < export_chunk_len:
            output_chunks[-1] += chunk
        else:
            # 如果最后一个输出块的长度超过目标长度， 我们可以开始一个新的块
            output_chunks.append(chunk)

    whole_text = ""
    for i, audio_chunk in enumerate(output_chunks, start=1):
        chunk_filename = os.path.join(folder_name, f"chunk{i}.mp3")
        audio_chunk.export(chunk_filename, format="mp3")

        try:
            text = transcribe_audio_whisper(chunk_filename)
        except Exception as e:
            print("Error:", str(e))
        else:
            text = f"{text.capitalize()}. "
            print(chunk_filename, ":", text)
            whole_text += text

    return whole_text

path="01-beginner_python_developer.mp3"
export_chunk_len = 90 * 1000

audio_text = get_large_audio_transcription_on_silence_whisper(path, export_chunk_len)
print("\nAudio Full text:", audio_text)

chinese_audio_translation = translate_text_to_chinese(audio_text)
print("\nAudio Translate text:", chinese_audio_translation)

100%|█████████████████████████████████████| 2.87G/2.87G [00:59<00:00, 52.2MiB/s]


audio-chunks/chunk1.mp3 :  hello, and welcome to the world's best python bootcamp. my name is angela, i'm a senior developer and the lead instructor at the appbury, london's highest rated programming bootcamp. to date, i've taught over half a million students in person and online, and i'm so excited to be your instructor on this course. as a student on this course, you're going to get access to over 56 hours of hd video content which contains step-by-step tutorials, interactive coding exercises, quizzes, and more. the course is structured around the 100 days of code challenge, so you can look forward to 100 days of lovingly crafted content that is going to cover every aspect of python programming, from web development to data science. it's the only course you need to become a professional python developer. every day on the course, you're going to use what you've learnt to build a new project. you'll build a bot that texts you in the morning if it will rain that day, so you never forget

In [None]:
audio_text = "hello, and welcome to the world's best python bootcamp. my name is angela, i'm a senior developer and the lead instructor at the appbury, london's highest rated programming bootcamp. to date, i've taught over half a million students in person and online, and i'm so excited to be your instructor on this course. as a student on this course, you're going to get access to over 56 hours of hd video content which contains step-by-step tutorials, interactive coding exercises, quizzes, and more. the course is structured around the 100 days of code challenge, so you can look forward to 100 days of lovingly crafted content that is going to cover every aspect of python programming, from web development to data science. it's the only course you need to become a professional python developer. every day on the course, you're going to use what you've learnt to build a new project. you'll build a bot that texts you in the morning if it will rain that day, so you never forget your umbrella again. you'll build classic arcade games like snake and pong to impress your friends by challenging them to a game that you built. you'll learn to make sense of complex data and create beautiful visualizations to impress your colleagues at work. you'll create a program that automatically sends happy birthday emails to your friends and family..  never forget mom's birthday again. you'll work on projects that clone real-world startups. cheap flight club? check. build your own blog? check. twitter bot? check. and there are so many more projects waiting to be discovered by you. 100 projects in total. so if you're somebody who wants to get a job as a python developer, then this is perfect for building up your portfolio to show off at your next interview. now this course assumes absolutely no prior programming experience. so if you're somebody who's never coded before, i'll be with you every step of the way as i take you from programming fundamentals through to more intermediate and advanced programming concepts. you're going to learn python from scratch. now if you're an advanced developer on the other hand, then take a look at the curriculum and start at the level that suits you best. from beginner to professional, every level is covered in the course. got school? working a full-time job? have to look after kids? i know you're busy. i've timed each day of the course to take less than two hours to complete so you can fit the course around your life. this course has exactly the same curriculum as our in-person programming bootcamp. so instead of spending thousands of dollars and taking time off work, you'll get access to exactly the same curriculum with years of design and testing behind it to ensure that you don't just know what to do, but know how to use it..  but also why you're doing it. now don't just take my word for it. check out what my past students had to say about my courses. so what are you still waiting for? find out why over half a million students have rated my course five stars and see what you can do by mastering python.. "

In [None]:
import openai
openai.api_key = os.getenv("OPENAI_API_KEY")

In [None]:
from concurrent.futures import ThreadPoolExecutor

# 定义要翻译的大文本
text = audio_text

# 将文本分割成较小的段落或句子
segments = []
segment_size = 800  # 每个段落的最大长度（以token为单位）

for i in range(0, len(text), segment_size):
    segment = text[i:i+segment_size]
    segments.append(segment)

# 使用线程池进行并发请求
executor = ThreadPoolExecutor(max_workers=5)  # 根据需要调整并发请求数量

# 上一个段落的翻译结果
previous_translation = ""
def translate_text_to_chinese(text):
    # 声明为全局变量
    global previous_translation

    # 添加上一个段落的翻译结果作为上下文信息
    translation = openai.Completion.create(
        engine="text-davinci-003",
        prompt=f"根据下面的上下文，将以下英文文本翻译成中文: Context: '{previous_translation}' '{text}'",
        max_tokens=1000,
    )

    translated_text = translation.choices[0].text.strip()
    previous_translation = translated_text

    return translated_text

# 提交并发请求（逐个翻译每个段落）
translated_segments = list(executor.map(translate_text_to_chinese, segments))

# 合并翻译结果
translated_text = ' '.join(translated_segments)

print(translated_text)

i'm going to be online to help you with any questions you may have.

欢迎来到世界上最棒的Python训练营。我叫安吉拉，是Appbury（伦敦最受欢迎的编程训练营）的高级开发人员和首席讲师。迄今为止，我已经在现场和网上向超过50万名学生授课，我很高兴成为你们的讲师。 作为这门课程的学生，你将获得超过56小时的HD视频内容，其中包含逐步教程、交互式编码练习、测验等。该课程围绕“100天编码挑战”进行组织，因此你可以期待100天精心打造的内容，涵盖了从Web开发到数据科学的Python编程的各个方面。 这是你成为专业Python开发人员所需的唯一课程。 每天我都会在线来回答你可能有的任何问题。 在本课程上，你要利用所学的知识构建一个新项目，例如你将会构建一个机器人，它会在早晨给你发短信提醒你今天会不会下雨，这样你就再也不会忘记带伞了。你还可以构建经典的街机游戏，比如贪吃蛇和乒乓，挑战朋友来玩你构建的游戏秀出自己的实力。你会学习如何去处理复杂的数据，并且创建出美轮美奂的可视化图表去打动职场上的同事。你将会创建一个自动发送给朋友和家人“生日快乐”邮件的程序，你也永远不用再担心忘记妈妈的生日了。你还将会开发像真实世界上的创业公司那样的项目，比如廉价飞行俱乐部？可以；搭建你自己的博客？可以；推特机器人？可以；还有许多等待被你去发现的项目, 一共100个。如果你想要体验构建未知的东西的乐趣，快来加入我们吧！ so you can easily fit it around your other commitments.

如果你是想要找一份python开发者的工作的人，那么这个课程就是为了帮助你构建你的作品集来给下次面试展示。这个课程完全不需要你有编程的经验，所以如果你从来没有编程过，我会陪伴你，从编程基础一直带你进阶到中级和高级编程概念。你会从头学习python。如果你是一个高级开发者，那么请查看课程大纲，然后根据自己的水平开始学习。从初学者到专业人士，这个课程涵盖了所有水平的知识。在读书？全职工作？要照顾孩子？我知道你很忙。我已经安排每一天的课程都不要超过两小时，所以你可以很容易地融入到其他的承诺中。 为了完成，你可以根据你的生活安排课程。这门课程与我们的面对面编程bootcamp有着完全相同的课程。因此，你