<a href="https://colab.research.google.com/github/ComputerWizard2/whisper_base/blob/main/whisper_base.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## 环境配置

In [1]:
pip install kagglehub==0.3.13 datasets==3.2.0 librosa==0.10.2.post1 peft==0.14.0 torchaudio==2.6.0 ffmpeg-python==0.2.0 transformers==4.51.3

Collecting datasets==3.2.0
  Downloading datasets-3.2.0-py3-none-any.whl.metadata (20 kB)
Collecting librosa==0.10.2.post1
  Downloading librosa-0.10.2.post1-py3-none-any.whl.metadata (8.6 kB)
Collecting peft==0.14.0
  Downloading peft-0.14.0-py3-none-any.whl.metadata (13 kB)
Collecting torchaudio==2.6.0
  Downloading torchaudio-2.6.0-cp312-cp312-manylinux1_x86_64.whl.metadata (6.6 kB)
Collecting ffmpeg-python==0.2.0
  Downloading ffmpeg_python-0.2.0-py3-none-any.whl.metadata (1.7 kB)
Collecting transformers==4.51.3
  Downloading transformers-4.51.3-py3-none-any.whl.metadata (38 kB)
Collecting fsspec<=2024.9.0,>=2023.1.0 (from fsspec[http]<=2024.9.0,>=2023.1.0->datasets==3.2.0)
  Downloading fsspec-2024.9.0-py3-none-any.whl.metadata (11 kB)
Collecting torch>=1.13.0 (from peft==0.14.0)
  Downloading torch-2.6.0-cp312-cp312-manylinux1_x86_64.whl.metadata (28 kB)
Collecting tokenizers<0.22,>=0.21 (from transformers==4.51.3)
  Downloading tokenizers-0.21.4-cp39-abi3-manylinux_2_17_x86_64.m

## 下载模型

In [2]:
import kagglehub
path = kagglehub.dataset_download("tenffe/common-voice-zh-cn")
print("Path to dataset files:", path)

Downloading from https://www.kaggle.com/api/v1/datasets/download/tenffe/common-voice-zh-cn?dataset_version_number=1...


100%|██████████| 1.84G/1.84G [00:29<00:00, 67.2MB/s]

Extracting files...





Path to dataset files: /root/.cache/kagglehub/datasets/tenffe/common-voice-zh-cn/versions/1


## 下载测试数据

In [3]:
import kagglehub
path = kagglehub.dataset_download("shawnchile/voice-female-yuri")
print("Path to dataset files:", path)

Downloading from https://www.kaggle.com/api/v1/datasets/download/shawnchile/voice-female-yuri?dataset_version_number=1...


100%|██████████| 21.3M/21.3M [00:00<00:00, 60.6MB/s]

Extracting files...





Path to dataset files: /root/.cache/kagglehub/datasets/shawnchile/voice-female-yuri/versions/1


## 下载预训练模型


```
cd pretrained_model
wget https://dtse-mirrors.obs.cn-north-4.myhuaweicloud.com/case/0002/whisper_base.tar.gz
tar -zxvf whisper_base.tar.gz


```



 ## 处理数据

In [1]:
import os
from datasets import load_from_disk, Audio
from transformers import WhisperFeatureExtractor, WhisperTokenizer, WhisperProcessor
# 配置参数
model_name_or_path = "/content/pretrained_model/whisper_base"
language = "chinese"
language_abbr = "zh"
task = "transcribe"
dataset_name = "/content/data/1/common_voice_zh_CN"
prepared_dataset_path = "/content/pretrained_model/prepared_common_voice"
# 加载数据集
common_voice = load_from_disk(dataset_name)
# 选择前1000个样本
common_voice['train'] = common_voice['train'].select(range(100))
if 'test' in common_voice:
    common_voice['test'] = common_voice['test'].select(range(100))
if 'validation' in common_voice:
    common_voice['validation'] = common_voice['validation'].select(range(100))
# 移除不需要的列
common_voice = common_voice.remove_columns(
    ["accent", "age", "client_id", "down_votes", "gender", "locale", "path", "segment", "up_votes"]
)
# 将采样率改为16kHz
common_voice = common_voice.cast_column("audio", Audio(sampling_rate=16000))
# 预处理音频数据和对应的文本
def prepare_dataset(batch):
    audio = batch["audio"]
    batch["input_features"] = feature_extractor(audio["array"], sampling_rate=audio["sampling_rate"]).input_features[0]
    batch["labels"] = tokenizer(batch["sentence"]).input_ids
    return batch
# 加载特征提取器和分词器
feature_extractor = WhisperFeatureExtractor.from_pretrained(model_name_or_path)
tokenizer = WhisperTokenizer.from_pretrained(model_name_or_path, language=language, task=task)
processor = WhisperProcessor.from_pretrained(model_name_or_path, language=language, task=task)
# 预处理数据集
common_voice = common_voice.map(prepare_dataset, remove_columns=common_voice.column_names["train"], num_proc=1)
# 保存预处理后的数据集
common_voice.save_to_disk(prepared_dataset_path)

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

Map:   0%|          | 0/100 [00:00<?, ? examples/s]

Saving the dataset (0/1 shards):   0%|          | 0/100 [00:00<?, ? examples/s]

Saving the dataset (0/1 shards):   0%|          | 0/100 [00:00<?, ? examples/s]

### 更新torch包的版本

In [5]:
!pip uninstall -y torchvision torch torchaudio
!pip install kagglehub==0.3.13 datasets==3.2.0 librosa==0.10.2.post1 peft==0.14.0 torch==2.6.0 torchaudio==2.6.0 torchvision==0.21.0 ffmpeg-python==0.2.0 transformers==4.51.3

Found existing installation: torchvision 0.24.0+cu126
Uninstalling torchvision-0.24.0+cu126:
  Successfully uninstalled torchvision-0.24.0+cu126
Found existing installation: torch 2.6.0
Uninstalling torch-2.6.0:
  Successfully uninstalled torch-2.6.0
Found existing installation: torchaudio 2.6.0
Uninstalling torchaudio-2.6.0:
  Successfully uninstalled torchaudio-2.6.0
Collecting torch==2.6.0
  Using cached torch-2.6.0-cp312-cp312-manylinux1_x86_64.whl.metadata (28 kB)
Collecting torchaudio==2.6.0
  Using cached torchaudio-2.6.0-cp312-cp312-manylinux1_x86_64.whl.metadata (6.6 kB)
Collecting torchvision==0.21.0
  Downloading torchvision-0.21.0-cp312-cp312-manylinux1_x86_64.whl.metadata (6.1 kB)
Using cached torch-2.6.0-cp312-cp312-manylinux1_x86_64.whl (766.6 MB)
Using cached torchaudio-2.6.0-cp312-cp312-manylinux1_x86_64.whl (3.4 MB)
Downloading torchvision-0.21.0-cp312-cp312-manylinux1_x86_64.whl (7.2 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.2/7.2 MB[0m [3

## 设置LoRA参数

In [2]:
from transformers import WhisperForConditionalGeneration, WhisperProcessor, WhisperTokenizer, AutoConfig
from peft import LoraConfig, get_peft_model
import os
import torch
# 禁用 safetensors
os.environ["TRANSFORMERS_SAFE_TENSORS"] = "false"
model_name_or_path = "/content/pretrained_model/whisper_base" # 加载模型配置
config = AutoConfig.from_pretrained(model_name_or_path)
# 手动加载模型文件
state_dict = torch.load(f"{model_name_or_path}/pytorch_model.bin", weights_only=True)
# 初始化模型
model = WhisperForConditionalGeneration(config)
# 尝试加载 state_dict，允许忽略缺失的键
model.load_state_dict(state_dict, strict=False)
# 打印模型信息
print(model)
# 定义 LoRA 配置
config = LoraConfig(r=32, lora_alpha=64, target_modules=["q_proj", "v_proj"], lora_dropout=0.05)
# 获取 LoRA 模型
model = get_peft_model(model, config)
# 打印可训练参数
model.print_trainable_parameters()

WhisperForConditionalGeneration(
  (model): WhisperModel(
    (encoder): WhisperEncoder(
      (conv1): Conv1d(80, 512, kernel_size=(3,), stride=(1,), padding=(1,))
      (conv2): Conv1d(512, 512, kernel_size=(3,), stride=(2,), padding=(1,))
      (embed_positions): Embedding(1500, 512)
      (layers): ModuleList(
        (0-5): 6 x WhisperEncoderLayer(
          (self_attn): WhisperSdpaAttention(
            (k_proj): Linear(in_features=512, out_features=512, bias=False)
            (v_proj): Linear(in_features=512, out_features=512, bias=True)
            (q_proj): Linear(in_features=512, out_features=512, bias=True)
            (out_proj): Linear(in_features=512, out_features=512, bias=True)
          )
          (self_attn_layer_norm): LayerNorm((512,), eps=1e-05, elementwise_affine=True)
          (activation_fn): GELUActivation()
          (fc1): Linear(in_features=512, out_features=2048, bias=True)
          (fc2): Linear(in_features=2048, out_features=512, bias=True)
          

# 基于 LoRA 微调 Whisper 模型实现中文语音识别

## 项目简介

本项目旨在利用 LoRA (Low-Rank Adaptation) 技术对 OpenAI 的 Whisper `base` 模型进行微调，以实现高效的中文语音识别 (ASR)。通过在 Common Voice 中文数据集上进行训练，并使用 LoRA 显著减少了可训练参数数量，从而加速了训练过程并降低了计算资源消耗。最终模型将用于对测试音频文件进行转录。

## 环境配置

项目依赖以下库：

-   `kagglehub==0.3.13`
-   `datasets==3.2.0`
-   `librosa==0.10.2.post1`
-   `peft==0.14.0`
-   `torch==2.6.0`
-   `torchaudio==2.6.0`
-   `torchvision==0.21.0`
-   `ffmpeg-python==0.2.0`
-   `transformers==4.51.3`

可以使用以下命令进行安装：

```bash
!pip install kagglehub==0.3.13 datasets==3.2.0 librosa==0.10.2.post1 peft==0.14.0 torch==2.6.0 torchaudio==2.6.0 torchvision==0.21.0 ffmpeg-python==0.2.0 transformers==4.51.3
```

**注意**: 如果遇到 `torch_npu` 相关错误，请确保使用 `torch` 和 `torchvision` 的兼容版本，并优先使用 CUDA (GPU) 或 CPU 设备。

## 数据下载

### 训练数据

本项目使用 Kaggle 上的 [Common Voice 中文数据集](https://www.kaggle.com/datasets/tenffe/common-voice-zh-cn) 进行模型微调。

```python
import kagglehub
path = kagglehub.dataset_download("tenffe/common-voice-zh-cn")
print("Path to dataset files:", path)
```

### 测试数据

使用 Kaggle 上的 [voice-female-yuri 数据集](https://www.kaggle.com/datasets/shawnchile/voice-female-yuri) 进行推理测试。

```python
import kagglehub
path = kagglehub.dataset_download("shawnchile/voice-female-yuri")
print("Path to dataset files:", path)
```

## 预训练模型下载

下载 Whisper `base` 模型的预训练权重。

```bash
!mkdir -p pretrained_model
!cd pretrained_model && wget https://dtse-mirrors.obs.cn-north-4.myhuaweicloud.com/case/0002/whisper_base.tar.gz
!cd pretrained_model && tar -zxvf whisper_base.tar.gz
```

## 数据预处理

加载 Common Voice 数据集，并进行音频采样率调整、特征提取和文本分词。

```python
import os
from datasets import load_from_disk, Audio
from transformers import WhisperFeatureExtractor, WhisperTokenizer, WhisperProcessor

# 配置参数
model_name_or_path = "/content/pretrained_model/whisper_base"
language = "chinese"
task = "transcribe"
dataset_name = "/content/data/1/common_voice_zh_CN"
prepared_dataset_path = "/content/pretrained_model/prepared_common_voice"

# ... (省略具体实现代码，详见 notebook)
```

## 设置 LoRA 参数

配置 LoRA 适配器，将其应用于 Whisper 模型，以实现参数高效微调。

```python
from transformers import WhisperForConditionalGeneration, AutoConfig
from peft import LoraConfig, get_peft_model
import os
import torch

# 禁用 safetensors
os.environ["TRANSFORMERS_SAFE_TENSORS"] = "false"
model_name_or_path = "/content/pretrained_model/whisper_base"

# ... (省略具体实现代码，详见 notebook)

# 定义 LoRA 配置
config = LoraConfig(r=32, lora_alpha=64, target_modules=["q_proj", "v_proj"], lora_dropout=0.05)
model = get_peft_model(model, config)
print(model.print_trainable_parameters())
```

## 模型训练

使用 `Seq2SeqTrainer` 在预处理后的数据集上训练带有 LoRA 适配器的 Whisper 模型。

```python
from transformers import Seq2SeqTrainingArguments, Seq2SeqTrainer
from peft import LoraConfig, get_peft_model
import torch

# ... (省略 DataCollatorSpeechSeq2SeqWithPadding 定义及其他参数配置，详见 notebook)

# 设置 Trainer 参数 (已禁用 wandb 报告)
training_args = Seq2SeqTrainingArguments(
    # ...
    report_to=["none"], # Disable wandb reporting
)

# 创建 Trainer 并开始训练
# ... (省略具体实现代码，详见 notebook)

# 保存 LoRA 模型
model.save_pretrained("/content/pretrained_model/checkpoint_1000_samples")
```

## 模型推理与语音识别

加载微调后的 LoRA 模型，并使用 `AutomaticSpeechRecognitionPipeline` 对测试音频文件进行语音转录。

```python
from transformers import WhisperForConditionalGeneration, WhisperTokenizer, WhisperProcessor
from peft import PeftModel, PeftConfig
from transformers import AutomaticSpeechRecognitionPipeline
import torch
import os
import json

# 配置参数
device = "cuda" if torch.cuda.is_available() else "cpu"
peft_model_id = "/content/pretrained_model/checkpoint_1000_samples"
audio_directory = "/content/data/wavs/1/wavs" # 测试音频路径
output_directory = "/content/output" # 输出结果路径

# ... (省略具体实现代码，详见 notebook)

# 将结果保存到 JSON 文件中
output_file = os.path.join(output_directory, "transcription_results.json")
with open(output_file, "w", encoding="utf-8") as f:
    json.dump(results, f, ensure_ascii=False, indent=4)
print(f"Results saved to {output_file}")
```

## 结果

转录结果将保存到 `/content/output/transcription_results.json` 文件中。


## 训练模型

In [12]:
from transformers import Seq2SeqTrainingArguments, Seq2SeqTrainer
from dataclasses import dataclass
from typing import Any, Dict, List, Union
from transformers import WhisperFeatureExtractor, WhisperTokenizer, WhisperProcessor
from datasets import load_from_disk  # 导入 load_from_disk 函数
import torch
from peft import LoraConfig, get_peft_model, PeftModel
@dataclass
class DataCollatorSpeechSeq2SeqWithPadding:
    def __init__(self, processor: Any):
        self.processor = processor
    def __call__(self, features: List[Dict[str, Union[List[int], torch.Tensor]]]) -> Dict[str, torch.Tensor]:
        input_features = [{"input_features": feature["input_features"]} for feature in features]
        batch = self.processor.feature_extractor.pad(input_features, return_tensors="pt")
        label_features = [{"input_ids": feature["labels"]} for feature in features]
        labels_batch = self.processor.tokenizer.pad(label_features, return_tensors="pt")
        labels = labels_batch["input_ids"].masked_fill(labels_batch.attention_mask.ne(1), -100)
        if (labels[:, 0] == self.processor.tokenizer.bos_token_id).all().cpu().item():
            labels = labels[:, 1:]
        batch["labels"] = labels
        return batch
# 配置参数
model_name_or_path = "/content/pretrained_model/whisper_base"
language = "chinese"
task = "transcribe"
prepared_dataset_path = "/content/pretrained_model/prepared_common_voice"   # 预处理后的数据集路径
# 设置设备为NPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
# 加载特征提取器和分词器
feature_extractor = WhisperFeatureExtractor.from_pretrained(model_name_or_path)
tokenizer = WhisperTokenizer.from_pretrained(model_name_or_path, language=language, task=task)
processor = WhisperProcessor.from_pretrained(model_name_or_path, language=language, task=task)
# 加载预处理后的数据集
common_voice = load_from_disk(prepared_dataset_path)
# 定义 LoRA 配置
lora_config = LoraConfig(
    r=8,  # LoRA attention dimension
    lora_alpha=32,  # Alpha scaling
    target_modules=["q_proj", "v_proj"],  # Target modules to apply LoRA
    lora_dropout=0.1,  # Dropout probability for LoRA layers
)
# 加载模型
from transformers import WhisperForConditionalGeneration
model = WhisperForConditionalGeneration.from_pretrained(model_name_or_path)
# 将模型移动到NPU
model = model.to(device)
# 将 LoRA 配置应用到模型
model = get_peft_model(model, lora_config)
# 设置 Trainer 参数
training_args = Seq2SeqTrainingArguments(
    output_dir="your-name/int8-whisper-base-v2-asr",
    per_device_train_batch_size=16,
    gradient_accumulation_steps=1,
    learning_rate=2e-3,
    warmup_steps=50,
    num_train_epochs=3,
    eval_strategy="epoch",
    fp16=True,
    per_device_eval_batch_size=32,
    generation_max_length=128,
    logging_steps=25,
    remove_unused_columns=False,
    label_names=["labels"],
    no_cuda=False,  # 允许使用 NPU/GPU
    use_cpu=False,
    report_to=["none"], # Disable wandb reporting
)
# 创建数据收集器
data_collator = DataCollatorSpeechSeq2SeqWithPadding(processor=processor)
# 创建 Trainer
trainer = Seq2SeqTrainer(
    args=training_args,
    model=model,
    train_dataset=common_voice["train"].select(range(100)),  # 仅使用前100个样本进行训练
    eval_dataset=common_voice["test"].select(range(100)),  # 仅使用前100个样本进行验证
    data_collator=data_collator,
    tokenizer=processor.feature_extractor,
)
# 开始训练
model.config.use_cache = False
trainer.train()
# 保存 LoRA 模型
model.save_pretrained("/content/pretrained_model/checkpoint_1000_samples")

Using device: cuda


  trainer = Seq2SeqTrainer(


Epoch,Training Loss,Validation Loss
1,No log,1.967222
2,No log,1.419844
3,No log,1.263087


## 推理模型实现语音识别

In [13]:
from transformers import WhisperForConditionalGeneration, WhisperTokenizer, WhisperProcessor
from peft import PeftModel, PeftConfig
from transformers import AutomaticSpeechRecognitionPipeline
import torch
import os
import json
# 配置参数
device = "cuda" if torch.cuda.is_available() else "cpu"
peft_model_id = "/content/pretrained_model/checkpoint_1000_samples"
language = "chinese"
task = "transcribe"
output_directory = "/content/output"  # 输出文件夹路径
# 确保输出目录存在
os.makedirs(output_directory, exist_ok=True)
# 加载 LoRA 模型配置
peft_config = PeftConfig.from_pretrained(peft_model_id)
# 加载基础模型
model = WhisperForConditionalGeneration.from_pretrained(
    peft_config.base_model_name_or_path
)
# 基于基础模型和 LoRA 模型路径构建 LoRA 模型
model = PeftModel.from_pretrained(model, peft_model_id)
# 将 PeftModel 转换为普通的 WhisperForConditionalGeneration 模型
model = model.merge_and_unload()
# 加载分词器（处理文本）和处理器（封装特征抽取器和分词器）、以及特征抽取器（处理音频）
tokenizer = WhisperTokenizer.from_pretrained(peft_config.base_model_name_or_path, language=language, task=task)
processor = WhisperProcessor.from_pretrained(peft_config.base_model_name_or_path, language=language, task=task)
feature_extractor = processor.feature_extractor
forced_decoder_ids = processor.get_decoder_prompt_ids(language=language, task=task)
# 构建一个 pipeline 进行测试
pipeline = AutomaticSpeechRecognitionPipeline(model=model, tokenizer=tokenizer, feature_extractor=feature_extractor, device=device)
# 测试音频文件路径
audio_directory = "/content/data/wavs/1/wavs"
# 获取目录下的所有 .wav 文件
audio_files = [os.path.join(audio_directory, f) for f in os.listdir(audio_directory) if f.endswith('.wav')]
# 只处理前100个文件
audio_files = audio_files[:100]
# 存储结果的列表
results = []
# 遍历所有音频文件并进行语音识别
for audio_file in audio_files:
    with torch.no_grad():
        text = pipeline(audio_file, generate_kwargs={"forced_decoder_ids": forced_decoder_ids}, max_new_tokens=255)["text"]
        results.append({"file": audio_file, "text": text})
        print(f"Transcribed text for {audio_file}: {text}")
# 将结果保存到 JSON 文件中
output_file = os.path.join(output_directory, "transcription_results.json")
with open(output_file, "w", encoding="utf-8") as f:
    json.dump(results, f, ensure_ascii=False, indent=4)
print(f"Results saved to {output_file}")

Device set to use cuda


Transcribed text for /content/data/wavs/1/wavs/0013095.wav: 完 算完好了吗




Transcribed text for /content/data/wavs/1/wavs/0013090.wav: 一号为死亥逝严逝这个那就这么一块儿有在定了




Transcribed text for /content/data/wavs/1/wavs/0013040.wav: 因此,尽管小王子对他的爱满事善疑,他还是很快走怀疑其他来。他把那些无足轻重的话看得很重,这样他很不开心。
Transcribed text for /content/data/wavs/1/wavs/0013115.wav: 那不重要。




Transcribed text for /content/data/wavs/1/wavs/0013006.wav: 单了晚上只会尽尽地表写。




Transcribed text for /content/data/wavs/1/wavs/0013065.wav: 哎好了好了赶紧切先去吸吸手以前还没有过去能要注意防护了




Transcribed text for /content/data/wavs/1/wavs/0013129.wav: 不要再这样摩擦下去了你已经下定决心离开了现在就走吧




Transcribed text for /content/data/wavs/1/wavs/0013063.wav: 嗯...反尼简单下横是跑三圈




Transcribed text for /content/data/wavs/1/wavs/0013032.wav: 请原谅我,我一点也不怕老虎,他继续说,但我害怕风。我想你是不会给我弄来一扇平风吧。


You seem to be using the pipelines sequentially on GPU. In order to maximize efficiency please use a dataset


Transcribed text for /content/data/wavs/1/wavs/0013100.wav: 等它最后一次给它的花儿浇水,准备把它放到它的玻璃皱瑕食。




Transcribed text for /content/data/wavs/1/wavs/0013043.wav: 老胡抓紫那件事让我非常地不安,可本该让我的内心充满同情和联名才是。




Transcribed text for /content/data/wavs/1/wavs/0013008.wav: 小王子分上自己的观察。
Transcribed text for /content/data/wavs/1/wavs/0013102.wav: 再见吧他对他的话说




Transcribed text for /content/data/wavs/1/wavs/0013082.wav: 再做一个我最爱吃的皮蛋豆腐




Transcribed text for /content/data/wavs/1/wavs/0013077.wav: 那那怎么办呀那那要不你多喝点牛奶
Transcribed text for /content/data/wavs/1/wavs/0013104.wav: 再见他又说了一次




Transcribed text for /content/data/wavs/1/wavs/0013105.wav: 花可送了一首




Transcribed text for /content/data/wavs/1/wavs/0013000.wav: 不久我对这花安有了进步了了解。




Transcribed text for /content/data/wavs/1/wavs/0013099.wav: 需要超水的食材也都搞定了辛苦了辛苦了其实还就是开火师客




Transcribed text for /content/data/wavs/1/wavs/0013035.wav: 可是说到这儿他弹出了,他是由以立种子变来的,他本来真不可能知道什么别的世界。




Transcribed text for /content/data/wavs/1/wavs/0013047.wav: 我认为他是借助一群千席的野生养类套利的。




Transcribed text for /content/data/wavs/1/wavs/0013055.wav: 可对他来说这最后的早上所有熟悉的工作似乎都变得珍贵起来。




Transcribed text for /content/data/wavs/1/wavs/0013127.wav: 他很天真地展示了他的四根刺。




Transcribed text for /content/data/wavs/1/wavs/0013083.wav: 汤的话就上个海淀汤好了。
Transcribed text for /content/data/wavs/1/wavs/0013058.wav: 啊怎么样怎么样好喝吗




Transcribed text for /content/data/wavs/1/wavs/0013078.wav: 可恶身后者的毛泽东人们和神主肉片




Transcribed text for /content/data/wavs/1/wavs/0013110.wav: 他很惊讶,他没有自卑躺。




Transcribed text for /content/data/wavs/1/wavs/0013061.wav: 嗯?是你来的太早了不是我来好不好不好哦




Transcribed text for /content/data/wavs/1/wavs/0013122.wav: 可是那些动物。




Transcribed text for /content/data/wavs/1/wavs/0013079.wav: 啊不不不不不用勉强没用勉强吃了这种事情啊不能吃我就换点别的做嘛




Transcribed text for /content/data/wavs/1/wavs/0013072.wav: 我也没有打算做后期的呀中国人就要吃中国菜




Transcribed text for /content/data/wavs/1/wavs/0013004.wav: 所以不会给任何人带来麻烦。




Transcribed text for /content/data/wavs/1/wavs/0013005.wav: 早上他们出现在花葱中。




Transcribed text for /content/data/wavs/1/wavs/0013034.wav: 晚上我想让你把我放在玻璃照下面你这地方真是冷我来了哪个地方




Transcribed text for /content/data/wavs/1/wavs/0013111.wav: 他站在那里手里举身那个玻璃照完全不至所错。




Transcribed text for /content/data/wavs/1/wavs/0013069.wav: 拉开就能看到了那种麻烦医乐




Transcribed text for /content/data/wavs/1/wavs/0013094.wav: 看着手法,你沾动专业。
Transcribed text for /content/data/wavs/1/wavs/0013003.wav: 而且一点儿也不会站这么地方




Transcribed text for /content/data/wavs/1/wavs/0013118.wav: 不要管这个玻璃罩子了我不再需要它了




Transcribed text for /content/data/wavs/1/wavs/0013117.wav: 你像我一样傻是这快乐此来吧




Transcribed text for /content/data/wavs/1/wavs/0013113.wav: 我当然爱你华尔对她说当




Transcribed text for /content/data/wavs/1/wavs/0013130.wav: 她是不想让小王子看到她在哭她是这样一朵骄傲的话




Transcribed text for /content/data/wavs/1/wavs/0013017.wav: 他不愿像因素化一样满脸皱着的来到这个世界。




Transcribed text for /content/data/wavs/1/wavs/0013085.wav: 啊都可以啊哎不用客气啊真是不要想敲了我的碰震能力呀




Transcribed text for /content/data/wavs/1/wavs/0013031.wav: 我又不是藏,花儿柔身回大灯。




Transcribed text for /content/data/wavs/1/wavs/0013051.wav: 但是正如他所说没有人会知道,所以他也打扫了那座死祸山。




Transcribed text for /content/data/wavs/1/wavs/0013038.wav: 我刚想去找可您要跟我说




Transcribed text for /content/data/wavs/1/wavs/0013026.wav: 小王子感到特别的修饭,自己找了一个噴糊装了些淡水来。




Transcribed text for /content/data/wavs/1/wavs/0013046.wav: 我本该猜说那可怜的小把戏背后隐藏的全部情感。花儿是那样的标准不一样,可我当时太年轻,还不知道怎么最爱它。




Transcribed text for /content/data/wavs/1/wavs/0013018.wav: 只有让她的美光彩照人她才愿意来到这个世界上




Transcribed text for /content/data/wavs/1/wavs/0013039.wav: 于是他又急出了急升克松童养是为了小王子自责。




Transcribed text for /content/data/wavs/1/wavs/0013101.wav: 他发现自己几乎要流泪了。




Transcribed text for /content/data/wavs/1/wavs/0013109.wav: 我请求你的原谅,你一定要幸福。




Transcribed text for /content/data/wavs/1/wavs/0013044.wav: 他继续向我图路他的新生。试试试我从前不知道怎么去看的事物。我该从行动而非言语来判断是非。




Transcribed text for /content/data/wavs/1/wavs/0013074.wav: 什么都是下半的印判哦




Transcribed text for /content/data/wavs/1/wavs/0013033.wav: 害怕风,对一猪植物来说那确实很不幸。小王子说的,其实又自言自语的。这朵花还真是很复杂。




Transcribed text for /content/data/wavs/1/wavs/0013007.wav: 可是有一天不知道是从哪里圈来的种子长出了一种新的花




Transcribed text for /content/data/wavs/1/wavs/0013009.wav: 发现这小嫟嫟苗与新穷上其他的信价都不一样。




Transcribed text for /content/data/wavs/1/wavs/0013025.wav: 我想该吃早饭了骗客他又说到你能否好心考虑一下我的需求




Transcribed text for /content/data/wavs/1/wavs/0013052.wav: 如果打扫的好火山就会慢慢的稳定的人烧就不会喷出来。




Transcribed text for /content/data/wavs/1/wavs/0013125.wav: 如果没有蝴蝶没有茫茫愁还会有谁来看我呢。




Transcribed text for /content/data/wavs/1/wavs/0013126.wav: 你会在很远很远的地方,至于那些大动物,我一点也不怕他们,我也有我自己的抓紫。




Transcribed text for /content/data/wavs/1/wavs/0013054.wav: 小王子还很优域地把最后几颗红面貓树想送描拔了。他想他再也不会回去了。




Transcribed text for /content/data/wavs/1/wavs/0013060.wav: 不过现在还不能吃饭,饭还没有做好呢。
Transcribed text for /content/data/wavs/1/wavs/0013119.wav: 可是风




Transcribed text for /content/data/wavs/1/wavs/0013045.wav: 他向我释放他的香气展示他的眉眼,我绝不该从他身边逃你。




Transcribed text for /content/data/wavs/1/wavs/0013114.wav: 都是我不好,你一直都被蒙在古里。




Transcribed text for /content/data/wavs/1/wavs/0013128.wav: 然后又接着说。




Transcribed text for /content/data/wavs/1/wavs/0013092.wav: 锦鸟肉丝的话刚刚准备做虽主肉片的那个里集肉可以拿来做肉丝啊




Transcribed text for /content/data/wavs/1/wavs/0013123.wav: 哦如果我想要及时互叠的话我就必须得人受两三条猫猫虫爬在我的身上。




Transcribed text for /content/data/wavs/1/wavs/0013050.wav: 早上用他们来热早餐很方便他还有一座死火山




Transcribed text for /content/data/wavs/1/wavs/0013010.wav: 你瞧,它可能是后面包庶的一个新品种。




Transcribed text for /content/data/wavs/1/wavs/0013022.wav: 可小王子安奈不住她的爱姆之情说到哦,你多美呀。




Transcribed text for /content/data/wavs/1/wavs/0013064.wav: 看完笑的都外面这么冷都不要命了




Transcribed text for /content/data/wavs/1/wavs/0013021.wav: 经过这三金新准备,他打是哈欠手。啊,我还没有完全信来。我肯求你原谅我,我的话吧还是乱糟糕的。




Transcribed text for /content/data/wavs/1/wavs/0013012.wav: 最大的花雷第一次出现的时候小王子便来到他的面前。




Transcribed text for /content/data/wavs/1/wavs/0013120.wav: 我的感冒没那么严重不像以前晚上的两空器对我有好处。




Transcribed text for /content/data/wavs/1/wavs/0013042.wav: 他只要看一看他们,唯一文他们的香气就可以了。我那朵花的香气一满整个行丑,可我却不知如何欣赏他全部的美。




Transcribed text for /content/data/wavs/1/wavs/0013048.wav: 在他起成的那天早上他把星球的事物安排的仅仅有条




Transcribed text for /content/data/wavs/1/wavs/0013084.wav: 应该上多了吗?你还什么别的想吃的吗?




Transcribed text for /content/data/wavs/1/wavs/0013049.wav: 他精心打扫了他的火火山他运有两重火火山




Transcribed text for /content/data/wavs/1/wavs/0013019.wav: 哦,是啊,他是那么恶恶奴多姿的生灵,他神秘的装扳日资一日永不停泉。




Transcribed text for /content/data/wavs/1/wavs/0013097.wav: 等下做皮丹弄肤队时候就可以用了。




Transcribed text for /content/data/wavs/1/wavs/0013027.wav: 就这样,它照料只是断划。也就这样,断划很快开始了以碳的续用心来折磨小王子。
Transcribed text for /content/data/wavs/1/wavs/0013108.wav: 只好他对他说。




Transcribed text for /content/data/wavs/1/wavs/0013028.wav: 但如果只像大白这还真的有些棍手。
Transcribed text for /content/data/wavs/1/wavs/0013121.wav: 我是一猪花马




Transcribed text for /content/data/wavs/1/wavs/0013023.wav: 怎么会不买呢?花儿天天的回答道我是和太阳同时出生的哦




Transcribed text for /content/data/wavs/1/wavs/0013016.wav: 他离半一半的调整自己的花板。




Transcribed text for /content/data/wavs/1/wavs/0013001.wav: 在小王子的星球上花一只以来都是非常简单的。




Transcribed text for /content/data/wavs/1/wavs/0013107.wav: 一直以来我都太傻了。
Transcribed text for /content/data/wavs/1/wavs/0013002.wav: 他们只有一层花板




Transcribed text for /content/data/wavs/1/wavs/0013124.wav: 胡迪好像很漂亮
Transcribed text for /content/data/wavs/1/wavs/0013088.wav: 有没有什么其他一层吃的呢?




Transcribed text for /content/data/wavs/1/wavs/0013091.wav: 首先做个金酵肉丝




Transcribed text for /content/data/wavs/1/wavs/0013013.wav: 塔利克感到某种不可思议的情义景象一定会从中出现。




Transcribed text for /content/data/wavs/1/wavs/0013067.wav: 蒜种放在你站的这个位置的幽厚房,对对对对对对对,就这个棍子。




Transcribed text for /content/data/wavs/1/wavs/0013106.wav: 可这并不是因为他赶猫了。




Transcribed text for /content/data/wavs/1/wavs/0013086.wav: 我可是从长单树上的时候就开始做菜了




Transcribed text for /content/data/wavs/1/wavs/0013073.wav: 所以今天的视频是猫血猫和水主用片
Results saved to /content/output/transcription_results.json
