<a href="https://colab.research.google.com/github/X-T-E-R/AudioLabeling/blob/main/colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Audio Labeling Colab Version

### 使用说明
本项目是为快速从音频文件/文件夹得到质量还可以的数据集，现在你看到的是一个 notebook 的版本，适用于colab，因此一切从简

### 推荐使用方式
推荐在项目的数据预处理阶段使用本工具，以确保输入数据的质量和一致性。

### 建议流程（阉割版）

1. 前置音频处理
   1. 去人声&降噪&去混响之类（没做好）
   2. 生成srt文件
2. 切分并生成list文件
   1. 通过SRT文件切分音频并生成list文件
3. 后置音频处理
   1. 响度标准化
   2. （可选）根据特定情感/多说话人分类

### 需要提供什么

有多种开局方式，但是现在人声分离还没做好

因此您可以试着

1. 只提供一个或几个已经分离好人声的较长音频
2. 提供音频、并提前准备好srt文件

这两个都是可选的，本项目有内置funasr的srt识别。

如果colab的gpu资源不可用，则建议您先用剪映提取出srt（也是免费的），这样会快很多。

### 项目、环境有关 Env

In [None]:
# @title Clone or Update the repository 克隆或更新存储库
%cd /content/
!git clone https://github.com/X-T-E-R/AudioLabeling.git 

# Make sure to pull the latest changes from the repository
%cd /content/AudioLabeling
!git stash
!git pull https://github.com/X-T-E-R/AudioLabeling.git 


In [None]:
# @title 安装依赖项 Install dependencies 
%pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
%pip install -r requirements.txt

### 数据准备 

#### 说明：
数据准备需要一到几个人声长音频，最好放置在同一个文件夹里面

你可以从视频中扒拉下来、从网站上扒拉下来或者自己录，但请注意版权问题。

（另外特别声明：本工具仅是一个机械的处理音频的工具，不能对任何数据相关的版权做保证、如您不同意这一点，请立即放弃使用）

本来不需要做处理，

但是因为本项目的人声分离、音频降噪还没做好，因此请手动分离好人声再上传


你可以直接用在线服务比如 https://mvsep.com/zh （免费）来分离。

#### 如何提供数据
您可以选择 

1. 运行下方代码块建立文件夹后直接在网页上上传 wav或zip
2. 指定huggingface之类的下载链接 下载zip

随后还可以选择解压这个zip

In [None]:
# @title （可选）设置路径或手动上传 Set path or manually upload
%cd /content/AudioLabeling
import os

folder_name = 'test_audio' # @param {type: "string"}
source_path = os.path.join("Input/audios", folder_name)
os.makedirs(source_path, exist_ok=True)

# 然后手动上传音频文件到 比如 "Input/audios/test_audio/" 中
# 或者使用下一个代码块自动下载zip包

# Then manually upload the audio files to e.g. "Input/audios/test_audio/"
# Or use the next code block to automatically download a zip file

In [None]:
#@title (可选)下载音频压缩包 Download audio zip file
%cd /content/AudioLabeling


import requests
import os
import urllib.parse

# copy the link from the download button on the model page
hf_link = 'https://huggingface.co/datasets/XTer123/AudioLabelingTestDataset/resolve/main/test_audio.zip?download=true' #@param {type: "string"}


# get the name of the character folder, or you can set it manually
folder_name = urllib.parse.unquote(os.path.basename(hf_link).rsplit('.', 1)[0])

print(f'Downloading {folder_name}...')

source_path = os.path.join('Input/audios', folder_name)
os.makedirs(source_path, exist_ok=True)

zip_file_path = os.path.join(source_path, 'file.zip')

# download the zip file
response = requests.get(hf_link)
with open(zip_file_path, 'wb') as file:
    file.write(response.content)

print(f'{folder_name} downloaded successfully!')


In [None]:
# @title （可选）解压zip文件 extract zip file
%cd /content/AudioLabeling
# extract the zip file
import zipfile

assert source_path, 'source_path is not defined. Please define the source_path variable.'
assert folder_name, 'folder_name is not defined. Please define the folder_name variable.'

try:
    print(f'Extracting {zip_file_path}...')
except:
    zip_file_path = "" #@param {type: "string"}
    print(f'Extracting {zip_file_path}...')
    
def get_decoder(file_name: str):
    try:
        return file_name.encode('cp437').decode('gbk')
    except:
        return file_name

with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
    for file_info in zip_ref.infolist():
        # 解码并重新编码文件名
        encoded_file_name = get_decoder(file_info.filename)
        new_path = os.path.join(source_path, encoded_file_name)
        source = zip_ref.open(file_info.filename)
        # 判断f是否是目录，目录的结尾是'/'或'\'
        if encoded_file_name[-1] not in ['\\','/']:
            with open(new_path,'wb') as file:
                file.write(zip_ref.read(file_info.filename))
                file.close()
        else:
            os.makedirs(new_path, exist_ok=True)
print(f'{folder_name} extracted successfully!')

### 前置处理 PreProcessing

In [None]:
# @title 分离人声 Separate Vocals

# 这个部分还没有写好，请使用 uvr 5 或 MVSEP-MDX23 手动分离
# This part is not ready yet, please use uvr 5 or MVSEP-MDX23 to separate manually

### 生成字幕、切分音频 SRT and Slice

#### 说明

你可以直接跳过生成srt的步骤，比如事前用剪映、Whisper生成好srt

然后将srt与音频文件同名上传即可

如果没有gpu资源，生成会相当慢速，因此还是建议 在有GPU资源的时候运行代码（应该会每天刷新） / 事前用剪映生成好 / 在本地运行代码

In [None]:
# @title 生成srt字幕 Generate srt subtitles
# 如果已经有srt字幕文件，可以直接跳过这一步
%cd /content/AudioLabeling
import os

try:
    print(f"source_path: {source_path}")
except:
    source_path = "Input/audios/test/" # @param {type:"string"}
    
os.makedirs(source_path, exist_ok=True)

from tools.my_utils import scan_audios_walk

audio_list = []
if os.path.isdir(source_path):
    audio_list = scan_audios_walk(source_path)
    audio_list = [os.path.join(source_path, audio) for audio in audio_list]
else:
    audio_list.append(source_path)
    
from src.srt_generator.audio2srt import Audio2Srt

models_path = 'models/iic' # 设定模型路径，留空或路径不存在则 modelscope 会自动下载模型 @param {type:"string"}


with Audio2Srt(models_path=models_path) as a2s:
    for audio_path in audio_list:
        srt_path = audio_path.rsplit('.', 1)[0] + '.srt'
        srt_content = a2s.generate_srt(audio_path)
        try:
            with open(srt_path, 'w') as f:
                f.write(srt_content)
            print(f"生成字幕文件：{srt_path}")
        except:
            print(f"生成字幕文件失败：{srt_path}")

In [None]:
# @title 切分音频 Split Audio
%cd /content/AudioLabeling
import os
import shutil


try:
    print(f"folder_name: {folder_name}")
except:
    folder_name = "test" # @param {type:"string"}
source_path = os.path.join("Input/audios", folder_name) # @param {type:"string"}
os.makedirs(source_path, exist_ok=True)

folder_name = os.path.basename(source_path)
output_path = f"Output/sliced_audio/{folder_name}/" # @param {type:"string"}

os.makedirs(output_path, exist_ok=True)

print(f"Remove old files in {output_path}")
shutil.rmtree(output_path, ignore_errors=True)

from tools.my_utils import scan_audios_walk, scan_ext_walk

# scan srt files and audio files
print(f"scan srt files and audio files in {source_path}")
items = []

if os.path.isdir(source_path):
    srt_list = scan_ext_walk(source_path, '.srt')
    audio_list = scan_audios_walk(source_path)
else:   
    audio_list = [source_path]
    source_path = os.path.dirname(source_path)
    srt_list = scan_ext_walk(source_path, '.srt')

print(f"audio_list: {audio_list}")
print(f"srt_list: {srt_list}")
for audio_file in audio_list:
    audio_file_name = os.path.basename(audio_file).rsplit('.', 1)[0]
    for srt_file in srt_list:
        if audio_file_name in srt_file:
            items.append((audio_file, srt_file))
            print(f"找到配对的音频文件：{audio_file} 和字幕文件：{srt_file}")
            break

from src.srt_slicer.srt_utils import merge_subtitles_with_lib, slice_audio_with_lib, parse_srt_with_lib, generate_srt_with_lib, filter_subtitles

# srt合并设置
merge_zero_interval = True # @param {type:"boolean"} 
short_interval = 0.05 # @param {type:"number"}
max_interval = 0.8 # @param {type:"number"}
max_text_length = 100 # @param {type:"number"}
add_period = True # @param {type:"boolean"}

min_text_len = 5 # @param {type:"number"}
language = 'ZH' # @param {type:"string"}

merge_folder = True # @param {type:"boolean"}

min_audio_duration = 2 # @param {type:"number"}
max_audio_duration = 300 # @param {type:"number"}

save_paths = []
for index, item in enumerate(items):
    audio_file, srt_file = item
    print(f"开始切分音频文件：{audio_file} 从字幕文件：{srt_file}")
    save_path = os.path.join(output_path, f"{index}_{os.path.basename(srt_file).rsplit('.', 1)[0]}")
    save_paths.append(save_path)
    audio_file_full_path = os.path.join(source_path, audio_file)
    srt_file_full_path = os.path.join(source_path, srt_file)
    
    try:
        with open(srt_file_full_path, 'r') as f:
            srt_content = f.read()
        subtitles = parse_srt_with_lib(srt_content)
        merged_subtitles = merge_subtitles_with_lib(subtitles, short_interval, max_interval, max_text_length, add_period, merge_zero_interval)
        merged_subtitles = filter_subtitles(merged_subtitles, min_text_len)
        print(generate_srt_with_lib(merged_subtitles))
    except Exception as e:
        print(f"打开字幕文件失败：{srt_file_full_path}")
        raise e
    print(f"合并字幕完成，开始切分音频")
    
    try:
        slice_audio_with_lib(audio_file_full_path, save_folder=save_path, format="wav", subtitles=merged_subtitles, language=language, min_audio_duration=min_audio_duration, max_audio_duration=max_audio_duration)
    except Exception as e:
        print(f"切分音频文件失败：{audio_file_full_path}")
        raise e
    


if merge_folder and len(save_paths) > 1:
    print(f"开始合并文件夹")
    from src.list_merger.list_utils import merge_list_folders

    first_folder = save_paths[0]
    first_list_file = os.path.join(first_folder, 'datamapping.list')
    for i in range(1, len(save_paths)):
        second_folder = save_paths[i]
        second_list_file = os.path.join(second_folder, 'datamapping.list')
        merge_list_folders(first_list_file, second_list_file, None, first_folder, second_folder)
        
    print(f"合并文件夹完成，开始清理")
    output_path = output_path[:-1] if output_path.endswith('/') else output_path
    tmp_path = output_path + "_tmp"
    shutil.rmtree(tmp_path, ignore_errors=True)
    shutil.move(first_folder, tmp_path)
    shutil.rmtree(output_path, ignore_errors=True)
    shutil.move(tmp_path, output_path)

### 后处理 Postprocessing

#### 说明

十分建议您顺手标准化一下响度

target_loudness 建议设置为 -23.0，这里的 -16.0 是原神标准

In [None]:
# @title 响度标准化 Loudness Normalization
%cd /content/AudioLabeling
%pip install tqdm
from tqdm import tqdm

try:
    print(f"source_path: {output_path}")
    source_path = output_path
except:
    source_path = "Output/sliced_audio/test/"
    
target_loudness = -16.0 # 目标响度 @param {type:"number"}
from src.audio_normalizer.my_utils import normalize_loudness

audio_list = scan_audios_walk(source_path)
for audio_file in tqdm(audio_list):
    audio_file_full_path = os.path.join(source_path, audio_file)
    try:
        normalize_loudness(audio_file_full_path, target_loudness=target_loudness, target_path=audio_file_full_path)
    except Exception as e:
        print(f"标准化音频文件失败：{audio_file_full_path}")
        

In [None]:
# @title (可选) 进行音频中文情绪分类 (Optional) Chinese Emotion Classification in Audio
# Warning: 当前版本只支持中文音频，并且不会自动重命名list文件中的音频文件名，谨慎使用
# Warning: 基于emotion2vec实现，效果不一定准（对于唱歌素材一定不准）
%cd /content/AudioLabeling
from src.emotion_recognition.audio2emotion import Audio2Emotion

try :
    print(f"models_path: {models_path}")
except:
    models_path = ""
    
try:
    print(f"source_path: {output_path}")
    source_path = output_path
except:
    source_path = "Output/sliced_audio/test/"

audio_list = scan_audios_walk(source_path)
with Audio2Emotion(models_path=models_path) as a2e:
    for audio_file in audio_list:
        audio_file_full_path = os.path.join(source_path, audio_file)
        emotion = a2e.get_emotion(audio_file_full_path)
        emotion = emotion.split('/')[0]
        filename = os.path.basename(audio_file).rsplit('.', 1)[0]
        new_filename = f"{emotion}#{filename}"
        new_file_full_path = os.path.join(os.path.dirname(audio_file_full_path), f"{new_filename}.{audio_file.rsplit('.', 1)[1]}")
        os.rename(audio_file_full_path, new_file_full_path)

### 打包并下载 Pack and Download

In [None]:
# @title 打包成zip文件 Zip to a zip file
%cd /content/AudioLabeling
try:
    print(f"source_path: {output_path}")
    source_path = output_path
except:
    source_path = "Output/sliced_audio/test/"
    
%pip install zipfile
import zipfile
import os

if source_path.endswith('/'):
    source_path = source_path[:-1]

import time
file_name = os.path.basename(source_path) + f"_{time.strftime('%Y%m%d%H%M%S')}"
dir_path = os.path.dirname(source_path)
zip_file_path = os.path.join(dir_path, f"{file_name}.zip")

with zipfile.ZipFile(zip_file_path, 'w') as zipf:
    for root, _, files in os.walk(source_path):
        for file in files:
            file_path = os.path.join(root, file)
            zipf.write(file_path, file_path.replace(source_path, ''))
    print(f"打包完成：{zip_file_path}")