Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run vgg_16 error #45

Closed
fbiswt opened this issue Sep 7, 2016 · 13 comments
Closed

run vgg_16 error #45

fbiswt opened this issue Sep 7, 2016 · 13 comments
Assignees
Labels

Comments

@fbiswt
Copy link

fbiswt commented Sep 7, 2016

I got a error when I run "train.sh" in demo/image_classification
I0907 14:35:07.593504 49407 Util.cpp:144] commandline: /home/hadoop/paddle/paddle/build/bin/../opt/paddle/bin/paddle_trainer --config=vgg_16_cifar.py --dot_period=10 --log_period=100 --test_all_data_in_one_period=1 --use_gpu=1 --trainer_count=1 --num_passes=200 --save_dir=./cifar_vgg_model
I0907 14:35:08.002375 49407 Util.cpp:113] Calling runInitFunctions
I0907 14:35:08.002609 49407 Util.cpp:126] Call runInitFunctions done.
[INFO 2016-09-07 14:35:08,043 layers.py:1438] channels=3 size=3072
[INFO 2016-09-07 14:35:08,043 layers.py:1438] output size for conv_0 is 32
[INFO 2016-09-07 14:35:08,044 layers.py:1438] channels=64 size=65536
[INFO 2016-09-07 14:35:08,045 layers.py:1438] output size for conv_1 is 32
[INFO 2016-09-07 14:35:08,046 layers.py:1499] output size for pool_0 is 16_16
[INFO 2016-09-07 14:35:08,046 layers.py:1438] channels=64 size=16384
[INFO 2016-09-07 14:35:08,046 layers.py:1438] output size for conv_2 is 16
[INFO 2016-09-07 14:35:08,047 layers.py:1438] channels=128 size=32768
[INFO 2016-09-07 14:35:08,048 layers.py:1438] output size for conv_3 is 16
[INFO 2016-09-07 14:35:08,049 layers.py:1499] output size for pool_1 is 8_8
[INFO 2016-09-07 14:35:08,049 layers.py:1438] channels=128 size=8192
[INFO 2016-09-07 14:35:08,049 layers.py:1438] output size for conv_4 is 8
[INFO 2016-09-07 14:35:08,051 layers.py:1438] channels=256 size=16384
[INFO 2016-09-07 14:35:08,051 layers.py:1438] output size for conv_5 is 8
[INFO 2016-09-07 14:35:08,052 layers.py:1438] channels=256 size=16384
[INFO 2016-09-07 14:35:08,052 layers.py:1438] output size for conv_6 is 8
[INFO 2016-09-07 14:35:08,053 layers.py:1499] output size for pool_2 is 4_4
[INFO 2016-09-07 14:35:08,054 layers.py:1438] channels=256 size=4096
[INFO 2016-09-07 14:35:08,054 layers.py:1438] output size for conv_7 is 4
[INFO 2016-09-07 14:35:08,055 layers.py:1438] channels=512 size=8192
[INFO 2016-09-07 14:35:08,055 layers.py:1438] output size for conv_8 is 4
[INFO 2016-09-07 14:35:08,056 layers.py:1438] channels=512 size=8192
[INFO 2016-09-07 14:35:08,056 layers.py:1438] output size for conv_9 is 4
[INFO 2016-09-07 14:35:08,058 layers.py:1499] output size for pool_3 is 2_2
[INFO 2016-09-07 14:35:08,058 layers.py:1499] output size for pool_4 is 1_1
[INFO 2016-09-07 14:35:08,060 networks.py:1122] The input order is [image, label]
[INFO 2016-09-07 14:35:08,060 networks.py:1129] The output order is [cost_0]
I0907 14:35:08.067443 49407 Trainer.cpp:169] trainer mode: Normal
I0907 14:35:08.075389 49407 PyDataProvider2.cpp:219] loading dataprovider image_provider::processData
[INFO 2016-09-07 14:35:08,109 image_provider.py:52] Image size: 32
[INFO 2016-09-07 14:35:08,109 image_provider.py:53] Meta path: data/cifar-out/batches/batches.meta
[INFO 2016-09-07 14:35:08,109 image_provider.py:58] DataProvider Initialization finished
I0907 14:35:08.109668 49407 PyDataProvider2.cpp:219] loading dataprovider image_provider::processData
[INFO 2016-09-07 14:35:08,109 image_provider.py:52] Image size: 32
[INFO 2016-09-07 14:35:08,109 image_provider.py:53] Meta path: data/cifar-out/batches/batches.meta
[INFO 2016-09-07 14:35:08,109 image_provider.py:58] DataProvider Initialization finished
I0907 14:35:08.109978 49407 GradientMachine.cpp:134] Initing parameters..
I0907 14:35:08.554066 49407 GradientMachine.cpp:141] Init parameters done.
Current Layer forward/backward stack is
LayerName: batch_norm_10
LayerName: fc_layer_0
LayerName: dropout_0
LayerName: pool_4
LayerName: pool_3
LayerName: batch_norm_9
LayerName: conv_9
LayerName: batch_norm_8
LayerName: conv_8
LayerName: batch_norm_7
LayerName: conv_7
LayerName: pool_2
LayerName: batch_norm_6
LayerName: conv_6
LayerName: batch_norm_5
LayerName: conv_5
LayerName: batch_norm_4
LayerName: conv_4
LayerName: pool_1
LayerName: batch_norm_3
LayerName: conv_3
LayerName: batch_norm_2
LayerName: conv_2
LayerName: pool_0
LayerName: batch_norm_1
LayerName: conv_1
LayerName: batch_norm_0
LayerName: conv_0
LayerName: image
*_* Aborted at 1473230129 (unix time) try "date -d @1473230129" if you are using GNU date ***
Current Layer forward/backward stack is
PC: @ 0x7fb72227a855 (unknown)
Current Layer forward/backward stack is
*** SIGSEGV (@0x130701aa00) received by PID 49407 (TID 0x7fb7386fe800) from PID 117549568; stack trace: ***
Current Layer forward/backward stack is
@ 0x7fb737fdf100 (unknown)
Current Layer forward/backward stack is
@ 0x7fb72227a855 (unknown)
Current Layer forward/backward stack is
@ 0x8b3350 hl_batch_norm_backward()
Current Layer forward/backward stack is
@ 0x5d4684 paddle::CudnnBatchNormLayer::backward()
Current Layer forward/backward stack is
@ 0x620bae paddle::NeuralNetwork::backward()
Current Layer forward/backward stack is
@ 0x69c95d paddle::TrainerInternal::forwardBackwardBatch()
Current Layer forward/backward stack is
@ 0x69cf14 paddle::TrainerInternal::trainOneBatch()
Current Layer forward/backward stack is
@ 0x698350 paddle::Trainer::trainOnePass()
Current Layer forward/backward stack is
@ 0x69ba47 paddle::Trainer::train()
Current Layer forward/backward stack is
@ 0x53aea3 main
Current Layer forward/backward stack is
@ 0x7fb73587bb15 __libc_start_main
Current Layer forward/backward stack is
@ 0x545b15 (unknown)
/home/hadoop/paddle/paddle/build/bin/paddle: line 46: 49407 Segmentation fault (core dumped) ${DEBUGGER} $MYDIR/../opt/paddle/bin/paddle_trainer ${@:2}
No data to plot. Exiting!

Someone know why?

@qingqing01 qingqing01 self-assigned this Sep 7, 2016
@qingqing01
Copy link
Contributor

@fbiswt please give me the version of your cuda and cudnn. Thanks.

@fbiswt
Copy link
Author

fbiswt commented Sep 7, 2016

@qingqing01 cuda 7.5 ,cudnn v4 . Many Thanks

@fbiswt
Copy link
Author

fbiswt commented Sep 7, 2016

@qingqing01 Should I change the version of cudnn to V5?

@qingqing01
Copy link
Contributor

@fbiswt It is ok when I tested with cuda 7.5, cudnn v4 or cudnn v5. But my GPU is Tesla K40, what is your GPU type?

@fbiswt
Copy link
Author

fbiswt commented Sep 7, 2016

@qingqing01 My GPU is also K40, but I install paddle on centos, is this problem?

@fbiswt
Copy link
Author

fbiswt commented Sep 7, 2016

@qingqing01 When I make paddle ,I got a problem is
CMake Error at cmake/cudnn.cmake:33 (get_filename_component):
get_filename_component unknown component DIRECTORY
Call Stack (most recent call first):
CMakeLists.txt:51 (include)
so I delete the line "get_filename_component(CUDNN_LIB_PATH ${CUDNN_LIBRARY} DIRECTORY)
" in cudnn.make

@fbiswt
Copy link
Author

fbiswt commented Sep 7, 2016

@qingqing01 I upgrade cmake to cmake3 and solved "get_filename_component unknown component DIRECTORY" error ,and reinstall paddle, and also got that problem

@fbiswt
Copy link
Author

fbiswt commented Sep 7, 2016

Anyone answer this question ?

@qingqing01
Copy link
Contributor

qingqing01 commented Sep 7, 2016

@fbiswt I'm sorry for this question and can't give you a solution now. In fact, There are two implementations for BatchNorm. Maybe you can try another implementation. A simple way, you can set batch_norm_type="batch_norm" in batch_norm_layer in file of python/paddle/trainer_config_helpers/layers.py and reinstall paddle.

@fbiswt
Copy link
Author

fbiswt commented Sep 7, 2016

@qingqing01 Thank you so much for helping me solve this problem ! Now the paddle runing without error!

@aaronzs
Copy link

aaronzs commented Sep 12, 2016

@qingqing01 Thank you!
I can confirm the same problem happens on Fedora with CUDA 7.5, cuDNN 4, Tesla K80 GPU, and batch_norm_type="batch_norm" works.

@hedaoyuan
Copy link
Contributor

@aaronzs & @fbiswt
We check the code, and the bug is paddle use the CUDNN V4 RC not CUDNN V4.
The reason for this bug is that the cudnnBatchNormalizationBackward API is different between the two versions.
We will fix this bug soon.
Before this use cudnn 5.0 is better.

@hedaoyuan hedaoyuan added the Bug label Sep 13, 2016
@qingqing01
Copy link
Contributor

We have fixed this problem, the code is #71

jiweibo pushed a commit to jiweibo/Paddle that referenced this issue Dec 30, 2019
fluid-lite subgraph support content-dnn
qingqing01 pushed a commit to qingqing01/Paddle that referenced this issue Apr 30, 2020
ForFishes pushed a commit to ForFishes/Paddle that referenced this issue Oct 27, 2020
KPatr1ck pushed a commit to KPatr1ck/Paddle that referenced this issue Sep 16, 2021
XiaoguangHu01 pushed a commit that referenced this issue Sep 18, 2021
* 1. add interface for fft;
2. add data type predicate;
3. fix paddle.roll.

* add fft c2c cufft kernel

* implement argument checking & op calling parts for fft_c2c and fftn_c2c

* add operator and opmaker definitions

* only register float and double for cpu.

* add common code for implementing FFT, add pocketfft as a dependency

* add fft c2c cufft kernel function

* fix bugs in python interface

* add support for c2r, r2c operators, op makers, kernels and kernel functors.

* test and fix bugs

* 1. fft_c2c function: add support for onesided=False;
2. add complex<float>, complex<double> support for concat and flip.

* 1. fft: fix python api bugs;
2. shape_op: add support for complex data types.

* fft c2c cufft kernel done with complie and link

* fix shape_op, add mkl placeholder

* remove mkl

* complete fft c2c in gpu

* 1. implement mkl-based fft, FFTC2CFunctor and common function exec_fft;
2. change the design, add input and output typename as template parameter for all FFTFunctors, update pocketfft-based implementation.

* complete fft c2c on gpu in ND

* complete fft c2c on gpu in ND

* complete fft c2c backward in ND

* fix MKL-based implementation

* Add frame op and CPU/GPU kernels.

* Add frame op forward unittest.

* Add frame op forward unittest.

* Remove axis parameter in FrameFunctor.

* Add frame op grad CPU/GPU kernels and unittest.

* Add frame op grad CPU/GPU kernels and unittest.

* Update doc string.

* Update after review and remove librosa requirement in unittest.

* Update grad kernel.

* add fft_c2r op

* Remove data allocation in TransCompute function.

* add fft r2c onesided with cpu(pocketfft/mkl) and gpu

* last fft c2r functor

* fix C2R and R2C for cufft, becase the direction is not an option in these cases.

* add fft r2c onesided with cpu(pocketfft/mkl) and gpu

* fix bugs in python APIs

* fix fft_c2r grad kernal

* fix bugs in python APIs

* add cuda fft c2r grad kernal functor

* clean code

* fix fft_c2r python API

* fill fft r2c result with conjugate symmetry (#19)

fill fft r2c result with conjugate symmetry

* add placeholder for unittests (#24)

* simple parameterize test function by auto generate test case from parm list (#25)

* miscellaneous fixes for python APIs (#26)

* add placeholder for unittests

* resize fft inputs before computation is n or s is provided.

* add complex kernels for pad and pad_grad

* simplify argument checking.

* add type promotion

* add int to float or complex promotion

* fix output data type for static mode

* fix fft's input dtype dispatch, import fft to paddle

* fix typos in axes checking (#27)

* fix typos in axes checking

* fix argument checking (#28)

* fix argument checking

* Add C2R Python layer normal and abnormal use cases (#29)

* documents and single case

* test c2r case

* New C2R Python layer normal and exception use cases

* complete rfft,rfft2,rfftn,ihfft,ihfft2,ihfftn unittest and doc string (#30)

* Documentation of the common interfaces of c2r and c2c (#31)

* Documentation of the common interfaces of c2r and c2c

* clean c++ code  (#32)

* clean code

* Add numpy-based implementation of spectral ops (#33)

* add numpy reference implementation of spectral ops

* Add fft_c2r numpy based implementation for unittest. (#34)

* add fft_c2r numpy implementation

* Add deframe op and stft/istft api. (#23)

* Add frame api

* Add deframe op and kernels.

* Add stft and istft apis.

* Add deframe api. Update stft and istft apis.

* Fix bug in frame_from_librosa function when input dims >= 3

* Rename deframe to overlap_add.

* Update istft.

* Update after code review.

* Add overlap_add op and stft/istft api unittest (#35)

* Add overlap_add op unittest.

* Register complex kernels of squeeze/unsquuze op.

* Add stft/istft api unittest.

* Add unittest for fft helper functions (#36)

* add unittests for fft helper functions. add complex kernel for roll op.

* complete static graph unittest for all public api (#37)

* Unittest of op with FFT C2C, C2R and r2c added (#38)

* documents and single case

* test c2r case

* New C2R Python layer normal and exception use cases

* Documentation of the common interfaces of c2r and c2c

* Unittest of op with FFT C2C, C2R and r2c added

Co-authored-by: lijiaqi <lijiaqi0612@163.com>

* add fft related options to CMakeLists.txt

* fix typos and clean code (#39)

* fix invisible character in mkl branch and fix error in error message

* clean code: remove docstring from unittest for signal.py.

* always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. (#40)

* always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype.

* fix CI Errors: numpy dtype comparison, thrust when cuda is not available (#41)

1. always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype.
2. promote floating point tensor to complex tensor ior fft_c2c and fft_c2r;
3. fix unittest to catch UnImplementedError and RuntimeError;
4. fix compile error by avoid using thrust when cuda is not available.
5.  fix sample code, use paddle.fft instead of paddle.tensor.fft

* remove inclusion of thrust, add __all__ list for fft (#42)

* Add api doc and update unittest. (#43)

* Add doc strings.
* Update overlap_add op unittest

* fix MKL-based FFT implementation (#44)

* fix MKL-based FFT implementation, MKL CDFT's FORWARD DOMAIN is always REAL for R2C and C2R

* remove code for debug (#45)

* use dynload for cufft (#46)

* use std::ptrdiff_t as datatype of stride (instead of int64_t) to avoid argument mismatch on some platforms.

* add complex support for fill_zeros_like

* use dynload for cufft

* Update doc and unittest. (#47)

* Add doc of frame op and overlap_add op.

* Update unittest.

* use dynload for cufft (#48)

1. use dynload for cufft
2. fix unittest;
3. temporarily disable Rocm.

* fix conflicts and merge upstream (#49)

fix conflicts and merge upstream

* fix compile error: only link dyload_cuda when cuda is available (#50)

* fix compile error: only link dyload_cuda when cuda is available

* fix dynload for cufft on windows (#51)

1. fix dynload for cufft on windows;
2. fix unittests.

* add NOMINMAX to compile on windows (#52)

 add NOMINMAX to compile on windows

* explicitly specify capture mode for lambdas (#55)

 explicitly specify capture mode for lambdas

* fix fft sample (#53)

* fix fft sample

* update scipy and numpy version for unittests of fft (#56)

update scipy and numpy version for unittests of fft

* Add static graph unittests of frame and overlap_add api. (#57)

* Remove cache of cuFFT & Disable ONEMKL (#59)

1. replace numpy.fft with scipy.fft as numpy<1.20 not support ortho norm
2. remove cache of cufft plans;
3. enhance error checking.
4. default WITH_ONEMKL to OFF

Co-authored-by: jeff41404 <jeff41404@gmail.com>
Co-authored-by: root <root@bjyz-sys-gpu-kongming9.bjyz.baidu.com>
Co-authored-by: KP <109694228@qq.com>
Co-authored-by: lijiaqi <lijiaqi0612@163.com>
Co-authored-by: Xiaoxu Chen <chenxx_id@163.com>
Co-authored-by: lijiaqi0612 <33169170+lijiaqi0612@users.noreply.github.com>
niuliling123 pushed a commit to niuliling123/Paddle that referenced this issue Sep 29, 2021
* 1. add interface for fft;
2. add data type predicate;
3. fix paddle.roll.

* add fft c2c cufft kernel

* implement argument checking & op calling parts for fft_c2c and fftn_c2c

* add operator and opmaker definitions

* only register float and double for cpu.

* add common code for implementing FFT, add pocketfft as a dependency

* add fft c2c cufft kernel function

* fix bugs in python interface

* add support for c2r, r2c operators, op makers, kernels and kernel functors.

* test and fix bugs

* 1. fft_c2c function: add support for onesided=False;
2. add complex<float>, complex<double> support for concat and flip.

* 1. fft: fix python api bugs;
2. shape_op: add support for complex data types.

* fft c2c cufft kernel done with complie and link

* fix shape_op, add mkl placeholder

* remove mkl

* complete fft c2c in gpu

* 1. implement mkl-based fft, FFTC2CFunctor and common function exec_fft;
2. change the design, add input and output typename as template parameter for all FFTFunctors, update pocketfft-based implementation.

* complete fft c2c on gpu in ND

* complete fft c2c on gpu in ND

* complete fft c2c backward in ND

* fix MKL-based implementation

* Add frame op and CPU/GPU kernels.

* Add frame op forward unittest.

* Add frame op forward unittest.

* Remove axis parameter in FrameFunctor.

* Add frame op grad CPU/GPU kernels and unittest.

* Add frame op grad CPU/GPU kernels and unittest.

* Update doc string.

* Update after review and remove librosa requirement in unittest.

* Update grad kernel.

* add fft_c2r op

* Remove data allocation in TransCompute function.

* add fft r2c onesided with cpu(pocketfft/mkl) and gpu

* last fft c2r functor

* fix C2R and R2C for cufft, becase the direction is not an option in these cases.

* add fft r2c onesided with cpu(pocketfft/mkl) and gpu

* fix bugs in python APIs

* fix fft_c2r grad kernal

* fix bugs in python APIs

* add cuda fft c2r grad kernal functor

* clean code

* fix fft_c2r python API

* fill fft r2c result with conjugate symmetry (#19)

fill fft r2c result with conjugate symmetry

* add placeholder for unittests (#24)

* simple parameterize test function by auto generate test case from parm list (#25)

* miscellaneous fixes for python APIs (#26)

* add placeholder for unittests

* resize fft inputs before computation is n or s is provided.

* add complex kernels for pad and pad_grad

* simplify argument checking.

* add type promotion

* add int to float or complex promotion

* fix output data type for static mode

* fix fft's input dtype dispatch, import fft to paddle

* fix typos in axes checking (#27)

* fix typos in axes checking

* fix argument checking (#28)

* fix argument checking

* Add C2R Python layer normal and abnormal use cases (#29)

* documents and single case

* test c2r case

* New C2R Python layer normal and exception use cases

* complete rfft,rfft2,rfftn,ihfft,ihfft2,ihfftn unittest and doc string (PaddlePaddle#30)

* Documentation of the common interfaces of c2r and c2c (PaddlePaddle#31)

* Documentation of the common interfaces of c2r and c2c

* clean c++ code  (PaddlePaddle#32)

* clean code

* Add numpy-based implementation of spectral ops (PaddlePaddle#33)

* add numpy reference implementation of spectral ops

* Add fft_c2r numpy based implementation for unittest. (PaddlePaddle#34)

* add fft_c2r numpy implementation

* Add deframe op and stft/istft api. (#23)

* Add frame api

* Add deframe op and kernels.

* Add stft and istft apis.

* Add deframe api. Update stft and istft apis.

* Fix bug in frame_from_librosa function when input dims >= 3

* Rename deframe to overlap_add.

* Update istft.

* Update after code review.

* Add overlap_add op and stft/istft api unittest (PaddlePaddle#35)

* Add overlap_add op unittest.

* Register complex kernels of squeeze/unsquuze op.

* Add stft/istft api unittest.

* Add unittest for fft helper functions (PaddlePaddle#36)

* add unittests for fft helper functions. add complex kernel for roll op.

* complete static graph unittest for all public api (PaddlePaddle#37)

* Unittest of op with FFT C2C, C2R and r2c added (PaddlePaddle#38)

* documents and single case

* test c2r case

* New C2R Python layer normal and exception use cases

* Documentation of the common interfaces of c2r and c2c

* Unittest of op with FFT C2C, C2R and r2c added

Co-authored-by: lijiaqi <lijiaqi0612@163.com>

* add fft related options to CMakeLists.txt

* fix typos and clean code (PaddlePaddle#39)

* fix invisible character in mkl branch and fix error in error message

* clean code: remove docstring from unittest for signal.py.

* always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. (PaddlePaddle#40)

* always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype.

* fix CI Errors: numpy dtype comparison, thrust when cuda is not available (PaddlePaddle#41)

1. always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype.
2. promote floating point tensor to complex tensor ior fft_c2c and fft_c2r;
3. fix unittest to catch UnImplementedError and RuntimeError;
4. fix compile error by avoid using thrust when cuda is not available.
5.  fix sample code, use paddle.fft instead of paddle.tensor.fft

* remove inclusion of thrust, add __all__ list for fft (PaddlePaddle#42)

* Add api doc and update unittest. (PaddlePaddle#43)

* Add doc strings.
* Update overlap_add op unittest

* fix MKL-based FFT implementation (PaddlePaddle#44)

* fix MKL-based FFT implementation, MKL CDFT's FORWARD DOMAIN is always REAL for R2C and C2R

* remove code for debug (PaddlePaddle#45)

* use dynload for cufft (PaddlePaddle#46)

* use std::ptrdiff_t as datatype of stride (instead of int64_t) to avoid argument mismatch on some platforms.

* add complex support for fill_zeros_like

* use dynload for cufft

* Update doc and unittest. (PaddlePaddle#47)

* Add doc of frame op and overlap_add op.

* Update unittest.

* use dynload for cufft (PaddlePaddle#48)

1. use dynload for cufft
2. fix unittest;
3. temporarily disable Rocm.

* fix conflicts and merge upstream (PaddlePaddle#49)

fix conflicts and merge upstream

* fix compile error: only link dyload_cuda when cuda is available (PaddlePaddle#50)

* fix compile error: only link dyload_cuda when cuda is available

* fix dynload for cufft on windows (PaddlePaddle#51)

1. fix dynload for cufft on windows;
2. fix unittests.

* add NOMINMAX to compile on windows (PaddlePaddle#52)

 add NOMINMAX to compile on windows

* explicitly specify capture mode for lambdas (PaddlePaddle#55)

 explicitly specify capture mode for lambdas

* fix fft sample (PaddlePaddle#53)

* fix fft sample

* update scipy and numpy version for unittests of fft (PaddlePaddle#56)

update scipy and numpy version for unittests of fft

* Add static graph unittests of frame and overlap_add api. (PaddlePaddle#57)

* Remove cache of cuFFT & Disable ONEMKL (PaddlePaddle#59)

1. replace numpy.fft with scipy.fft as numpy<1.20 not support ortho norm
2. remove cache of cufft plans;
3. enhance error checking.
4. default WITH_ONEMKL to OFF

Co-authored-by: jeff41404 <jeff41404@gmail.com>
Co-authored-by: root <root@bjyz-sys-gpu-kongming9.bjyz.baidu.com>
Co-authored-by: KP <109694228@qq.com>
Co-authored-by: lijiaqi <lijiaqi0612@163.com>
Co-authored-by: Xiaoxu Chen <chenxx_id@163.com>
Co-authored-by: lijiaqi0612 <33169170+lijiaqi0612@users.noreply.github.com>
gglin001 added a commit to graphcore/Paddle-fork that referenced this issue Dec 8, 2021
zhoutianzi666 pushed a commit to zhoutianzi666/Paddle that referenced this issue May 23, 2022
danleifeng pushed a commit to danleifeng/Paddle that referenced this issue Jun 27, 2022
* Optimizing the zero key problem in the push phase

* Optimize CUDA thread parallelism in MergeGrad phase

* Optimize CUDA thread parallelism in MergeGrad phase

Co-authored-by: root <root@yq01-inf-hic-k8s-a100-ab2-0008.yq01.baidu.com>
yaoxuefeng6 pushed a commit to yaoxuefeng6/Paddle that referenced this issue Aug 12, 2022
* add general accessor

* fix general accessor

* code clean

* code clean

* code clean

* code clean

* code clean

* fix type

* code clean

* adapt kernel optimize for general accessor

* fix typo

* fix kernel optimizer

* remove HeterComm::set_gpu_accessor

* code clean

* code clean

* code clean

* code clean

* code clean

* code clean

* code clean

* support adam optimizer&&pass id cache stratege

* support adam optimizer&&pass id cache stratege

* fix

* fix

* fix

* fix

* fix
niuliling123 pushed a commit to niuliling123/Paddle that referenced this issue Sep 19, 2022
zmxdream added a commit to zmxdream/Paddle that referenced this issue Nov 2, 2022
* add general accessor

* fix general accessor

* code clean

* code clean

* code clean

* code clean

* code clean

* fix type

* code clean

* adapt kernel optimize for general accessor

* fix typo

* fix kernel optimizer

* remove HeterComm::set_gpu_accessor

* code clean

* code clean

* code clean

* code clean

* code clean

* code clean

* code clean

* support adam optimizer&&pass id cache stratege

* support adam optimizer&&pass id cache stratege

* fix

* fix

* fix

* fix

* fix
jack603047588 referenced this issue in jack603047588/Paddle Nov 9, 2022
boxps_wrapper used pull push offset info, remove boxps_public.h template depends
qizhaoaoe pushed a commit to qizhaoaoe/Paddle that referenced this issue Mar 3, 2023
lizexu123 pushed a commit to lizexu123/Paddle that referenced this issue Feb 23, 2024
* update fix

* fix init token

* fix bug

* update

* update doc
zmxdream pushed a commit to zmxdream/Paddle that referenced this issue Feb 27, 2024
hanhaowen-mt pushed a commit to hanhaowen-mt/Paddle that referenced this issue Feb 29, 2024
[MTAI-484] fix(code-style): modify code format for cpplint check
feifei-111 pushed a commit to feifei-111/Paddle that referenced this issue Mar 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants