Adding WavLM implementation #3242

jiamingkong · 2023-05-15T03:41:16Z

PR types

New features

PR changes

Models

Describe

This PR implements the WavLM model as in https://arxiv.org/abs/2110.13900 for speech recognition. On Librispeech clean set, the model finetuned from wavlm-base-plus has a WER of 6.0%.

复现效果：

在Librispeech Clean 100小时数据集上，不使用语言模型：

模型	论文精度	paddle复现精度	torch复现精度
wavlm-base	5.7%	5.8%	6.8%
wavlm-base-plus	4.7%	5.6%	-

对wavlm-base模型的paddle复现可以接近论文效果，优于torch实现。

torch 复现链接：https://huggingface.co/patrickvonplaten/wavlm-libri-clean-100h-base-plus

对齐效果

如果使用上面torch复现的超参数进行训练，paddle版本可以得到6.9%左右精度，与torch一致。
为了实现论文中的效果，在paddle复现中，我们改动了模型：
- 结构上，wavlm 编码后，后面衔接了三层MLP + BatchNorm + 激活函数（和paddlespeech中的wav2vec2ASR实现一致）
- 训练上，wavlm权重有一个带有预热的更小学习率的优化器，三层MLP有学习率正常的优化器

以上两个优化可以训练出精度接近论文描述的模型。

paddle-bot · 2023-05-15T03:41:20Z

Thanks for your contribution!

examples/librispeech/asr5/README.md

paddlespeech/s2t/models/wavlm/processing/signal_processing.py

paddlespeech/s2t/models/wavlm/processing/speech_augmentation.py

zxcd · 2023-05-16T03:25:25Z

conv_layers.py空文件删一删

zxcd · 2023-05-16T03:25:48Z

examples/librispeech/asr5/README.md

+Stage 0 also downloads the pre-trained [hubert](https://paddlespeech.bj.bcebos.com/hubert/hubert-large-lv60.pdparams) model.
+```bash
+mkdir -p exp/hubert
+wget -P exp/hubert https://paddlespeech.bj.bcebos.com/hubert/hubert-large-lv60.pdparams


模型链接改掉

examples/librispeech/asr5/RESULTS.md

zh794390558 · 2023-05-16T03:39:22Z

examples/librispeech/asr5/avg_model.py

@@ -0,0 +1,18 @@
+#!/usr/bin/env python3


这个文件可以删除，应该path.sh配置路径吧

zh794390558 · 2023-05-16T03:39:48Z

examples/librispeech/asr5/compute_wer.py

@@ -0,0 +1,558 @@
+# Copyright 2021 Mobvoi Inc. All Rights Reserved.


删除，新增path.sh文件

zh794390558 · 2023-05-16T06:13:15Z

examples/librispeech/asr5/format_rsl.py

@@ -0,0 +1,143 @@
+# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.


examples/librispeech/asr5/test.profile

zh794390558 · 2023-05-16T06:15:02Z

paddlespeech/s2t/exps/wavlm/bin/test.py

+# from paddlespeech.utils.argparse import print_arguments
+import distutils.util
+
+def add_arguments(argname, type, default, help, argparser, **kwargs):


可以复用已有的函数。

zh794390558 · 2023-05-16T06:15:09Z

paddlespeech/s2t/exps/wavlm/bin/test.py

+        help=help + ' Default: %(default)s.',
+        **kwargs)
+
+def print_arguments(args, info=None):


zh794390558 · 2023-05-16T06:15:33Z

paddlespeech/s2t/exps/wavlm/bin/train.py

+
+import distutils.util
+
+def add_arguments(argname, type, default, help, argparser, **kwargs):


zh794390558 · 2023-05-16T06:15:38Z

paddlespeech/s2t/exps/wavlm/bin/train.py

+        help=help + ' Default: %(default)s.',
+        **kwargs)
+
+def print_arguments(args, info=None):


zh794390558 · 2023-05-16T06:16:19Z

paddlespeech/s2t/models/wavlm/modules/activations.py

+}
+
+
+def get_activation(activation_string):


是否可以服用modules目录里的

zh794390558 · 2023-05-16T06:16:42Z

paddlespeech/s2t/models/wavlm/modules/conv_layers.py

jiamingkong · 2023-05-21T09:54:26Z

收到，上述的内容我会修改完成。

我先附带上权重：

链接：https://pan.baidu.com/s/1Yjv1rITAWeYv-MjRJD-PPg?pwd=wavl
提取码：wavl

zh794390558 · 2023-05-25T06:39:47Z

examples/librispeech/asr5/format_rsl.py 可以删除，在utils目录下有

jiamingkong · 2023-05-25T06:59:52Z

复现效果：

在Librispeech Clean 100小时数据集上，不使用语言模型：

模型	论文精度	paddle复现精度	torch复现精度
wavlm-base	5.7%	5.8%	6.8%
wavlm-base-plus	4.7%	5.6%	-

对wavlm-base模型的paddle复现可以接近论文效果，优于torch实现。

torch 复现链接：https://huggingface.co/patrickvonplaten/wavlm-libri-clean-100h-base-plus

对齐效果

如果使用上面torch复现的超参数进行训练，paddle版本可以得到6.9%左右精度，与torch一致。
为了实现论文中的效果，在paddle复现中，我们改动了模型：
- 结构上，wavlm 编码后，后面衔接了三层MLP + BatchNorm + 激活函数（和paddlespeech中的wav2vec2ASR实现一致）
- 训练上，wavlm权重有一个带有预热的更小学习率的优化器，三层MLP有学习率正常的优化器

以上两个优化可以训练出精度接近论文描述的模型。

paddlespeech/s2t/models/wavlm/modules/functional.py

zh794390558 · 2023-05-30T02:31:09Z

paddlespeech/s2t/models/wavlm/modules/functional.py

+    return out
+
+
+def addr(input, vec1, vec2, beta=1, alpha=1, out=None):


这是算什么的？加下doc-string.

已处理，这个是计算alpha * (vec1 * vec2.T) + beta * input的帮手函数，用于attention的QK计算

paddlespeech/s2t/models/wavlm/modules/functional.py

paddlespeech/s2t/models/wavlm/modules/modules.py

zh794390558 · 2023-05-30T02:38:19Z

paddlespeech/s2t/models/wavlm/modules/modules.py

+        # normal_(module.v_proj.weight.data)
+
+
+def quant_noise(module, p, block_size):


这个有用吗？

有的，如果要从头预训练wavlm则需要这个函数

paddlespeech/s2t/models/wavlm/wavlm_paddle.py

paddlespeech/s2t/models/wavlm/modules/modules.py

zh794390558

LGTM

zxcd · 2023-05-31T03:59:04Z

examples/librispeech/asr5/conf/preprocessor_config.json

@@ -0,0 +1,9 @@
+{
+  "do_normalize": true,
+  "feature_extractor_type": "Wav2Vec2FeatureExtractor",


这个文件是否有用到？

应该是有的，feature extractor是取出音频，然后减去了平均值，得到一个mean = 0 的[ time * 16000, 1]的tensor

zxcd · 2023-05-31T06:04:04Z

paddlespeech/s2t/exps/wavlm/bin/test_wav.py

+logger = Log(__name__).getlog()
+
+
+class Wav2vec2Infer():


zxcd · 2023-05-31T06:11:52Z

examples/librispeech/asr5/run.sh

+. ./path.sh || exit 1;
+. ./cmd.sh || exit 1;
+
+gpus=1,2,3


最好能从0开始

…in model options

zh794390558

LGTM

zh794390558

LGTM

zxcd

LGTM

zh794390558

LGTM

* [TTS]add Diffsinger with opencpop dataset (#3005) * Update requirements.txt * fix vits reduce_sum's input/output dtype, test=tts (#3028) * [TTS] add opencpop PWGAN example (#3031) * add opencpop voc, test=tts * soft link * Update textnorm_test_cases.txt * [TTS] add opencpop HIFIGAN example (#3038) * add opencpop voc, test=tts * soft link * add opencpop hifigan, test=tts * update * fix dtype diff of last expand_v2 op of VITS (#3041) * [ASR]add squeezeformer model (#2755) * add squeezeformer model * change CodeStyle, test=asr * change CodeStyle, test=asr * fix subsample rate error, test=asr * merge classes as required, test=asr * change CodeStyle, test=asr * fix missing code, test=asr * split code to new file, test=asr * remove rel_shift, test=asr * Update README.md * Update README_cn.md * Update README.md * Update README_cn.md * Update README.md * fix input dtype of elementwise_mul op from bool to int64 (#3054) * [TTS] add svs frontend (#3062) * [TTS]clean starganv2 vc model code and add docstring (#2987) * clean code * add docstring * [Doc] change define asr server config to chunk asr config, test=doc (#3067) * Update README.md * Update README_cn.md * get music score, test=doc (#3070) * [TTS]fix elementwise_floordiv's fill_constant (#3075) * fix elementwise_floordiv's fill_constant * add float converter for min_value in attention * fix paddle2onnx's install version, install the newest paddle2onnx in run.sh (#3084) * [TTS] update svs_music_score.md (#3085) * rm unused dep, test=tts (#3097) * Update bug-report-tts.md (#3120) * [TTS]Fix VITS lite infer (#3098) * [TTS]add starganv2 vc trainer (#3143) * add starganv2 vc trainer * fix StarGANv2VCUpdater and losses * fix StarGANv2VCEvaluator * add some typehint * [TTS]【Hackathon + No.190】 + 模型复现：iSTFTNet (#3006) * iSTFTNet implementation based on hifigan, not affect the function and execution of HIFIGAN * modify the comment in iSTFT.yaml * add the comments in hifigan * iSTFTNet implementation based on hifigan, not affect the function and execution of HIFIGAN * modify the comment in iSTFT.yaml * add the comments in hifigan * add iSTFTNet.md * modify the format of iSTFTNet.md * modify iSTFT.yaml and hifigan.py * Format code using pre-commit * modify hifigan.py,delete the unused self.istft_layer_id , move the self.output_conv behind else, change conv_post to output_conv * update iSTFTNet_csmsc_ckpt.zip download link * modify iSTFTNet.md * modify hifigan.py and iSTFT.yaml * modify iSTFTNet.md * add function for generating srt file (#3123) * add function for generating srt file 在原来websocket_client.py的基础上，增加了由wav或mp3格式的音频文件生成对应srt格式字幕文件的功能 * add function for generating srt file 在原来websocket_client.py的基础上，增加了由wav或mp3格式的音频文件生成对应srt格式字幕文件的功能 * keep origin websocket_client.py 恢复原本的websocket_client.py文件 * add generating subtitle function into README * add generate subtitle funciton into README * add subtitle generation function * add subtitle generation function * fix example/aishell local/train.sh if condition bug, test=asr (#3146) * fix some preprocess bugs (#3155) * add amp for U2 conformer. * fix scaler save * fix scaler save and load. * mv scaler.unscale_ blow grad_clip. * [TTS]add StarGANv2VC preprocess (#3163) * [TTS] [黑客松]Add JETS (#3109) * Update quick_start.md (#3175) * [BUG] Fix progress bar unit. (#3177) * Update quick_start_cn.md (#3176) * [TTS]StarGANv2 VC fix some trainer bugs, add add reset_parameters (#3182) * VITS learning rate revised, test=tts * VITS learning rate revised, test=tts * [s2t] mv dataset into paddlespeech.dataset (#3183) * mv dataset into paddlespeech.dataset * add aidatatang * fix import * Fix some typos. (#3178) * [s2t] move s2t data preprocess into paddlespeech.dataset (#3189) * move s2t data preprocess into paddlespeech.dataset * avg model, compute wer, format rsl into paddlespeech.dataset * fix format rsl * fix avg ckpts * Update pretrained model in README (#3193) * [TTS]Fix losses of StarGAN v2 VC (#3184) * VITS learning rate revised, test=tts * VITS learning rate revised, test=tts * add new aishell model for better CER. * add readme * [s2t] fix cli args to config (#3194) * fix cli args to config * fix train cli * Update README.md * [ASR] Support Hubert, fintuned on the librispeech dataset (#3088) * librispeech hubert, test=asr * librispeech hubert, test=asr * hubert decode * review * copyright, notes, example related * hubert cli * pre-commit format * fix conflicts * fix conflicts * doc related * doc and train config * librispeech.py * support hubert cli * [ASR] fix asr 0-d tensor. (#3214) * Update README.md * Update README.md * fix: 🐛 修复服务端 python ASREngine 无法使用conformer_talcs模型 (#3230) * fix: 🐛 fix python ASREngine not pass codeswitch * docs: 📝 Update Docs * 修改模型判断方式 * Adding WavLM implementation * fix model m5s * Code clean up according to comments in #3242 * fix error in tts/st * Changed the path for the uploaded weight * Update phonecode.py # 固话的正则错误修改参考https://github.com/speechio/chinese_text_normalization/blob/master/python/cn_tn.py 固化的正则为： pattern = re.compile(r"\D((0(10|2[1-3]|[3-9]\d{2})-?)?[1-9]\d{6,7})\D") * Adapted wavlmASR model to pretrained weights and CLI * Changed the MD5 of the pretrained tar file due to bug fixes * Deleted examples/librispeech/asr5/format_rsl.py * Update released_model.md * Code clean up for CIs * Fixed the transpose usages ignored before * Update setup.py * refactor mfa scripts * Final cleaning; Modified SSL/infer.py and README for wavlm inclusion in model options * updating readme and readme_cn * remove tsinghua pypi * Update setup.py (#3294) * Update setup.py * refactor rhy * fix ckpt * add dtype param for arange API. (#3302) * add scripts for tts code switch * add t2s assets * more comment on tts frontend * fix librosa==0.8.1 numpy==1.23.5 for paddleaudio align with this version * move ssl into t2s.frontend; fix spk_id for 0-D tensor; * add ssml unit test * add en_frontend file * add mix frontend test * fix long text oom using ssml; filter comma; update polyphonic * remove print * hotfix english G2P * en frontend unit text * fix profiler (#3323) * old grad clip has 0d tensor problem, fix it (#3334) * update to py3.8 * remove fluid. * add roformer * fix bugs * add roformer result * support position interpolation for langer attention context windown length. * RoPE with position interpolation * rope for streaming decoding * update result * fix rotary embeding * Update README.md * fix weight decay * fix develop view confict with model's * Add XPU support for SpeedySpeech (#3502) * Add XPU support for SpeedySpeech * fix typos * update description of nxpu * Add XPU support for FastSpeech2 (#3514) * Add XPU support for FastSpeech2 * optimize * Update ge2e_clone.py (#3517) 修复在windows上的多空格错误 * Fix Readme. (#3527) * Update README.md * Update README_cn.md * Update README_cn.md * Update README.md * FIX: Added missing imports * FIX: Fixed the implementation of a special method * 【benchmark】add max_mem_reserved for benchmark (#3604) * fix profiler * add max_mem_reserved for benchmark * fix develop bug function:view to reshape (#3633) * 【benchmark】fix gpu_mem unit (#3634) * fix profiler * add max_mem_reserved for benchmark * fix benchmark * 增加文件编码读取 (#3606) Fixed #3605 * bugfix: audio_len should be 1D, no 0D, which will raise list index out (#3490) of range error in the following decode process Co-authored-by: Luzhenhui <luzhenhui@mqsz.com> * Update README.md (#3532) Fixed a typo * fixed version for paddlepaddle. (#3701) * fixed version for paddlepaddle. * fix code style * 【Fix Speech Issue No.5】issue 3444 transformation import error (#3779) * fix paddlespeech.s2t.transform.transformation import error * fix paddlespeech.s2t.transform import error * 【Fix Speech Issue No.8】issue 3652 merge_yi function has a bug (#3786) * 【Fix Speech Issue No.8】issue 3652 merge_yi function has a bug * 【Fix Speech Issue No.8】issue 3652 merge_yi function has a bug * 【test】add cli test readme (#3784) * add cli test readme * fix code style * 【test】fix test cli bug (#3793) * add cli test readme * fix code style * fix bug * Update setup.py (#3795) * adapt view behavior change, fix KeyError. (#3794) * adapt view behavior change, fix KeyError. * fix readme demo run error. * fixed opencc version --------- Co-authored-by: liangym <34430015+lym0302@users.noreply.github.com> Co-authored-by: TianYuan <white-sky@qq.com> Co-authored-by: 夜雨飘零 <yeyupiaoling@foxmail.com> Co-authored-by: zxcd <228587199@qq.com> Co-authored-by: longRookie <68834517+longRookie@users.noreply.github.com> Co-authored-by: twoDogy <128727742+twoDogy@users.noreply.github.com> Co-authored-by: lemondy <lemondy9@gmail.com> Co-authored-by: ljhzxc <33015549+ljhzxc@users.noreply.github.com> Co-authored-by: PiaoYang <495384481@qq.com> Co-authored-by: WongLaw <mailoflawrence@gmail.com> Co-authored-by: Hui Zhang <zhtclz@foxmail.com> Co-authored-by: Shuangchi He <34329208+Yulv-git@users.noreply.github.com> Co-authored-by: TianHao Zhang <32243340+Zth9730@users.noreply.github.com> Co-authored-by: guanyc <guanyc@gmail.com> Co-authored-by: jiamingkong <kinetical@live.com> Co-authored-by: zoooo0820 <zoooo0820@qq.com> Co-authored-by: shuishu <990941859@qq.com> Co-authored-by: LixinGuo <18510030324@126.com> Co-authored-by: gmm <38800877+mmglove@users.noreply.github.com> Co-authored-by: Wang Huan <wanghuan29@baidu.com> Co-authored-by: Kai Song <50285351+USTCKAY@users.noreply.github.com> Co-authored-by: skyboooox <zcj924@gmail.com> Co-authored-by: fazledyn-or <ataf@openrefactory.com> Co-authored-by: luyao-cv <1367355728@qq.com> Co-authored-by: Color_yr <402067010@qq.com> Co-authored-by: JeffLu <luzhenhui@gmail.com> Co-authored-by: Luzhenhui <luzhenhui@mqsz.com> Co-authored-by: satani99 <42287151+satani99@users.noreply.github.com> Co-authored-by: mjxs <52824616+kk-2000@users.noreply.github.com> Co-authored-by: Mattheliu <leonliuzx@outlook.com>

Adding WavLM implementation

3b6651b

paddle-bot bot added contributor status: proposed labels May 15, 2023

mergify bot added S2T asr/st Example README labels May 15, 2023

zxcd reviewed May 16, 2023

View reviewed changes

examples/librispeech/asr5/README.md Outdated Show resolved Hide resolved

zxcd reviewed May 16, 2023

View reviewed changes

paddlespeech/s2t/models/wavlm/processing/signal_processing.py Outdated Show resolved Hide resolved

zxcd reviewed May 16, 2023

View reviewed changes

paddlespeech/s2t/models/wavlm/processing/speech_augmentation.py Outdated Show resolved Hide resolved

zxcd reviewed May 16, 2023

View reviewed changes

examples/librispeech/asr5/RESULTS.md Outdated Show resolved Hide resolved

zh794390558 reviewed May 16, 2023

View reviewed changes

Code clean up according to comments in PaddlePaddle#3242

60bd7f2

jiamingkong mentioned this pull request May 22, 2023

【PaddlePaddle Hackathon 第四期】任务总览 PaddlePaddle/Paddle#51281

Closed

jiamingkong added 3 commits May 23, 2023 01:48

Changed the path for the uploaded weight

9ee1205

Adapted wavlmASR model to pretrained weights and CLI

232dcf8

Changed the MD5 of the pretrained tar file due to bug fixes

2ea0075

Deleted examples/librispeech/asr5/format_rsl.py

927c60a

zh794390558 reviewed May 30, 2023

View reviewed changes

jiamingkong and others added 2 commits May 30, 2023 11:23

Merge branch 'PaddlePaddle:develop' into develop

3ef28de

Code clean up for CIs

0e2068e

zh794390558 reviewed May 30, 2023

View reviewed changes

paddlespeech/s2t/models/wavlm/modules/modules.py Outdated Show resolved Hide resolved

Fixed the transpose usages ignored before

ba874db

zh794390558 previously approved these changes May 31, 2023

View reviewed changes

zxcd reviewed May 31, 2023

View reviewed changes

Final cleaning; Modified SSL/infer.py and README for wavlm inclusion …

8432e86

…in model options

jiamingkong dismissed zh794390558’s stale review via 8432e86 May 31, 2023 07:07

mergify bot added CLI Demo labels May 31, 2023

zh794390558 approved these changes May 31, 2023

View reviewed changes

zh794390558 previously approved these changes May 31, 2023

View reviewed changes

updating readme and readme_cn

f8b7d76

jiamingkong dismissed zh794390558’s stale review via f8b7d76 May 31, 2023 12:31

zxcd approved these changes May 31, 2023

View reviewed changes

zh794390558 approved these changes May 31, 2023

View reviewed changes

zh794390558 merged commit 2214c0d into PaddlePaddle:develop Jun 1, 2023
1 check passed

luotao1 pushed a commit to luotao1/PaddleSpeech that referenced this pull request Jun 11, 2024

Code clean up according to comments in PaddlePaddle#3242

7a8528f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding WavLM implementation #3242

Adding WavLM implementation #3242

jiamingkong commented May 15, 2023 •

edited by zh794390558

Loading

paddle-bot bot commented May 15, 2023

zxcd commented May 16, 2023

zxcd May 16, 2023

zh794390558 May 16, 2023

zh794390558 May 16, 2023

zh794390558 May 16, 2023

zh794390558 May 16, 2023

zh794390558 May 16, 2023

zh794390558 May 16, 2023

zh794390558 May 16, 2023

zh794390558 May 16, 2023

zh794390558 May 16, 2023

jiamingkong commented May 21, 2023

zh794390558 commented May 25, 2023

jiamingkong commented May 25, 2023 •

edited

Loading

zh794390558 May 30, 2023

jiamingkong May 30, 2023

zh794390558 May 30, 2023

jiamingkong May 30, 2023

zh794390558 left a comment

zxcd May 31, 2023

jiamingkong May 31, 2023

zxcd May 31, 2023

jiamingkong May 31, 2023

zxcd May 31, 2023

jiamingkong May 31, 2023

zh794390558 left a comment

zh794390558 left a comment

zxcd left a comment

zh794390558 left a comment

		@@ -0,0 +1,558 @@
		# Copyright 2021 Mobvoi Inc. All Rights Reserved.

		@@ -0,0 +1,143 @@
		# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.


		import distutils.util

		def add_arguments(argname, type, default, help, argparser, **kwargs):

		return out


		def addr(input, vec1, vec2, beta=1, alpha=1, out=None):

		# normal_(module.v_proj.weight.data)


		def quant_noise(module, p, block_size):

		logger = Log(__name__).getlog()


		class Wav2vec2Infer():

Adding WavLM implementation #3242

Adding WavLM implementation #3242

Conversation

jiamingkong commented May 15, 2023 • edited by zh794390558 Loading

PR types

PR changes

Describe

复现效果：

对齐效果

paddle-bot bot commented May 15, 2023

zxcd commented May 16, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jiamingkong commented May 21, 2023

zh794390558 commented May 25, 2023

jiamingkong commented May 25, 2023 • edited Loading

复现效果：

对齐效果

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zh794390558 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zh794390558 left a comment

Choose a reason for hiding this comment

zh794390558 left a comment

Choose a reason for hiding this comment

zxcd left a comment

Choose a reason for hiding this comment

zh794390558 left a comment

Choose a reason for hiding this comment

jiamingkong commented May 15, 2023 •

edited by zh794390558

Loading

jiamingkong commented May 25, 2023 •

edited

Loading