Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

batch norm 预测时指定 use_global_stats=True,出现 CUDNN_STATUS_NOT_SUPPORTED 错误 #929

Closed
lcy-seso opened this issue Dec 16, 2016 · 5 comments
Assignees
Labels

Comments

@lcy-seso
Copy link
Contributor

lcy-seso commented Dec 16, 2016

文本分类任务,想使用 cnn + batch norm,配置的片段如下:

Layer(name=name + 'context1',
          type="mixed",
          bias=False,
          inputs=ContextProjection(input_name,
                                   context_start=0,
                                   context_length=window_size,
                                   trainable_padding=False))

    Layer(name=name + 'conv0',
          type="mixed",
          size=size,
          active_type="linear",
          bias=Bias(initial_std=1e-1,
                    initial_mean=0,
                    is_static=static,
                    learning_rate=lr),
          inputs=[FullMatrixProjection(name + "context1",
                                       initial_std=2e-2,
                                       is_static=static,
                                       learning_rate=lr)])

    Layer(name=name + 'batch_norm0',
          type='batch_norm',
          active_type="relu",
          use_global_stats=True,
          bias=Bias(initial_mean=0.1, initial_std=0,
              is_static=static, learning_rate=lr),
          inputs=Input(name + 'conv0',
                       initial_mean=1.0,
                       initial_std=0.0,
                       is_static=static,
                       learning_rate=lr,
                       image=Image(channels=size, img_size=1)), )

使用 gpu 训练可以正常训练。
测试时,希望保持 use_global_stats=True,使用训练时存储下来的 mean 和 std 的working average ,
但是报如下错误:

image

@qingqing01
Copy link
Contributor

@lcy-seso 我拷贝你的环境,使用develop最新版本、cudnn-v5.1测试,没有复现该问题,我这里可以正常测试~

@qingqing01
Copy link
Contributor

qingqing01 commented Dec 19, 2016

经过调试,觉得这个问题是 cuDNN接口cudnnBatchNormalizationForwardInference的bug。 测试发现这个输入Tensor(4维)中的 shape[0] > 1024 就会出错。 这个shape对应的是batch size。

在sequence相关的模型中,这个对应的是mini-batch中word总和,比较容易超出1024, 所以很容出错。 会尽快fix下~

另外,CUDNN_STATUS_NOT_SUPPORTED 这个错误码 不在该接口的文档中~

@lcy-seso
Copy link
Contributor Author

谢谢 @qingqing01 ~

@lcy-seso
Copy link
Contributor Author

We meet this problem again in training DS2 model.

@kuke
Copy link
Contributor

kuke commented Aug 2, 2017

This problem will be fixed by using cuDNN >= 6.0

@pkuyym pkuyym reopened this Aug 7, 2017
wangxicoding pushed a commit to wangxicoding/Paddle that referenced this issue Dec 9, 2021
* add csc task.
1. add detection labels in convert_example

* add train progress

* rename spelling_correction to text_correction

* add pinyin vocab

* fix read_test

* fix decode

* change data.py to utils.py

* add evaluate for csc

* add parse decode for predict

* add Chinses Spelling Correction datasets

* use load_dataset('csc') instead of load_dataset(local_file)

* fix word length

* add predict script

* decouple csc model and ernie-gram

* add ernie backbone

* change default model_type to ernie

* add roberta

* fix decode

* add sighan test dataset;add sighan predict script;

* remove sighan testset

* add download script

* add README.md

* add export model

* add ernie-csc directory

* fix some review comments

* add sighan dataaset

* remove unnecessary model

* add deploy model

* add dataset download

* remove deploy/predict.py;fix some readme bug

* modify accroding to review comments

* Add more comments on model implementation

* upgrade sighan_evaluate.py

* add some comments in sighan metrics.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants