-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding WavLM implementation #3242
Conversation
Thanks for your contribution! |
paddlespeech/s2t/models/wavlm/processing/speech_augmentation.py
Outdated
Show resolved
Hide resolved
conv_layers.py空文件删一删 |
examples/librispeech/asr5/README.md
Outdated
Stage 0 also downloads the pre-trained [hubert](https://paddlespeech.bj.bcebos.com/hubert/hubert-large-lv60.pdparams) model. | ||
```bash | ||
mkdir -p exp/hubert | ||
wget -P exp/hubert https://paddlespeech.bj.bcebos.com/hubert/hubert-large-lv60.pdparams |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
模型链接改掉
@@ -0,0 +1,18 @@ | |||
#!/usr/bin/env python3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个文件可以删除,应该path.sh配置路径吧
@@ -0,0 +1,558 @@ | |||
# Copyright 2021 Mobvoi Inc. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
删除,新增path.sh文件
@@ -0,0 +1,143 @@ | |||
# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
# from paddlespeech.utils.argparse import print_arguments | ||
import distutils.util | ||
|
||
def add_arguments(argname, type, default, help, argparser, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以复用已有的函数。
help=help + ' Default: %(default)s.', | ||
**kwargs) | ||
|
||
def print_arguments(args, info=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上。
|
||
import distutils.util | ||
|
||
def add_arguments(argname, type, default, help, argparser, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
help=help + ' Default: %(default)s.', | ||
**kwargs) | ||
|
||
def print_arguments(args, info=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上
} | ||
|
||
|
||
def get_activation(activation_string): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是否可以服用modules目录里的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
空文件
收到,上述的内容我会修改完成。 我先附带上权重: 链接:https://pan.baidu.com/s/1Yjv1rITAWeYv-MjRJD-PPg?pwd=wavl |
examples/librispeech/asr5/format_rsl.py 可以删除,在utils目录下有 |
复现效果:在Librispeech Clean 100小时数据集上,不使用语言模型:
对wavlm-base模型的paddle复现可以接近论文效果,优于torch实现。 torch 复现链接:https://huggingface.co/patrickvonplaten/wavlm-libri-clean-100h-base-plus 对齐效果
以上两个优化可以训练出精度接近论文描述的模型。 |
return out | ||
|
||
|
||
def addr(input, vec1, vec2, beta=1, alpha=1, out=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这是算什么的?加下doc-string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已处理,这个是计算alpha * (vec1 * vec2.T) + beta * input的帮手函数,用于attention的QK计算
# normal_(module.v_proj.weight.data) | ||
|
||
|
||
def quant_noise(module, p, block_size): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个有用吗?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
有的,如果要从头预训练wavlm则需要这个函数
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -0,0 +1,9 @@ | |||
{ | |||
"do_normalize": true, | |||
"feature_extractor_type": "Wav2Vec2FeatureExtractor", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个文件是否有用到?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
应该是有的,feature extractor是取出音频,然后减去了平均值,得到一个mean = 0 的[ time * 16000, 1]的tensor
logger = Log(__name__).getlog() | ||
|
||
|
||
class Wav2vec2Infer(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wav2vec2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
改好了
examples/librispeech/asr5/run.sh
Outdated
. ./path.sh || exit 1; | ||
. ./cmd.sh || exit 1; | ||
|
||
gpus=1,2,3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
最好能从0开始
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
改好了
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR types
New features
PR changes
Models
Describe
This PR implements the WavLM model as in https://arxiv.org/abs/2110.13900 for speech recognition. On Librispeech clean set, the model finetuned from wavlm-base-plus has a WER of 6.0%.
复现效果:
在Librispeech Clean 100小时数据集上,不使用语言模型:
对wavlm-base模型的paddle复现可以接近论文效果,优于torch实现。
torch 复现链接:https://huggingface.co/patrickvonplaten/wavlm-libri-clean-100h-base-plus
对齐效果
以上两个优化可以训练出精度接近论文描述的模型。