LSTM running error ! #41

lqniunjunlper · 2016-09-05T12:56:37Z

HI, all
In quick start demo, I have tested LR, WE+LR, WE+CNN successfully.
But when i run WE+LSTM, the following error occurred:

I0905 12:50:46.881464 24807 Util.cpp:144] commandline: /data11/dis_ml/deeplearning/paddle/bin/../opt/paddle/bin/paddle_trainer --config=trainer_config.lstm.py --save_dir=./output_lstm --trainer_count=16 --log_period=1000 --num_passes=15 --use_gpu=false --show_parameter_stats_period=2000 --test_all_data_in_one_period=1
I0905 12:50:46.881670 24807 Util.cpp:113] Calling runInitFunctions
I0905 12:50:46.881896 24807 Util.cpp:126] Call runInitFunctions done.
[INFO 2016-09-05 12:50:47,468 networks.py:1122] The input order is [word, label]
[INFO 2016-09-05 12:50:47,468 networks.py:1125] The output order is [cost_0]
I0905 12:50:47.492952 24807 Trainer.cpp:169] trainer mode: Normal
I0905 12:50:47.660745 24807 PyDataProvider2.cpp:219] loading dataprovider dataprovider_emb::process
I0905 12:50:47.684976 24807 PyDataProvider2.cpp:219] loading dataprovider dataprovider_emb::process
I0905 12:50:47.685173 24807 GradientMachine.cpp:134] Initing parameters..
I0905 12:50:47.901549 24807 GradientMachine.cpp:141] Init parameters done.
I0905 12:50:48.229571 24813 ThreadLocal.cpp:39] thread use undeterministic rand seed:24814
I0905 12:50:48.229737 24821 ThreadLocal.cpp:39] thread use undeterministic rand seed:24822
I0905 12:50:48.230121 24818 ThreadLocal.cpp:39] thread use undeterministic rand seed:24819
I0905 12:50:48.230481 24814 ThreadLocal.cpp:39] thread use undeterministic rand seed:24815
I0905 12:50:48.230881 24810 ThreadLocal.cpp:39] thread use undeterministic rand seed:24811
I0905 12:50:48.232058 24820 ThreadLocal.cpp:39] thread use undeterministic rand seed:24821
Current Layer forward/backward stack is
LayerName: lstmemory_0
LayerName: fc_layer_0
LayerName: embedding_0
LayerName: word
*** Aborted at 1473079848 (unix time) try "date -d @1473079848" if you are using GNU date ***
I0905 12:50:48.248039 24822 ThreadLocal.cpp:39] thread use undeterministic rand seed:24823
Current Layer forward/backward stack is
PC: @ 0x8024f0 (unknown)
I0905 12:50:48.253355 24811 ThreadLocal.cpp:39] thread use undeterministic rand seed:24812
I0905 12:50:48.254111 24812 ThreadLocal.cpp:39] thread use undeterministic rand seed:24813
I0905 12:50:48.256650 24816 ThreadLocal.cpp:39] thread use undeterministic rand seed:24817
I0905 12:50:48.259268 24823 ThreadLocal.cpp:39] thread use undeterministic rand seed:24824
I0905 12:50:48.260787 24819 ThreadLocal.cpp:39] thread use undeterministic rand seed:24820
I0905 12:50:48.263543 24815 ThreadLocal.cpp:39] thread use undeterministic rand seed:24816
I0905 12:50:48.264271 24808 ThreadLocal.cpp:39] thread use undeterministic rand seed:24809
I0905 12:50:48.265414 24817 ThreadLocal.cpp:39] thread use undeterministic rand seed:24818
I0905 12:50:48.271780 24809 ThreadLocal.cpp:39] thread use undeterministic rand seed:24810

lqniunjunlper · 2016-09-07T06:34:48Z

Before building paddle, "version GLIBC_2.14 no found" occured, so i update glibc from 2.12 to 2.14. Is this OK?

reyoung · 2016-09-07T08:30:54Z

It's very strange that PaddlePaddle didn't print call stack. If you are convenient, can you rebuild PaddlePaddle with flag '-DCMAKE_BUILD_TYPE=Debug', and rerun this training? Or can you give us the core dump files?

And you can refer this link http://stackoverflow.com/questions/17965/how-to-generate-a-core-dump-in-linux-when-a-process-gets-a-segmentation-fault

lqniunjunlper · 2016-09-07T09:28:47Z

@reyoung
I0907 17:26:32.151026 1053 Util.cpp:144] commandline: /data11/dis_ml/deeplearning/paddle/bin/../opt/paddle/bin/paddle_trainer --config=trainer_config.lstm.py --save_dir=./output_lstm --trainer_count=4 --log_period=1000 --num_passes=15 --use_gpu=false --show_parameter_stats_period=2000 --test_all_data_in_one_period=1
I0907 17:26:32.151208 1053 Util.cpp:113] Calling runInitFunctions
I0907 17:26:32.151401 1053 Util.cpp:126] Call runInitFunctions done.
[INFO 2016-09-07 17:26:32,723 networks.py:1122] The input order is [word, label]
[INFO 2016-09-07 17:26:32,723 networks.py:1125] The output order is [cost_0]
I0907 17:26:32.740944 1053 Trainer.cpp:169] trainer mode: Normal
I0907 17:26:32.826501 1053 PyDataProvider2.cpp:219] loading dataprovider dataprovider_emb::process
I0907 17:26:32.856484 1053 PyDataProvider2.cpp:219] loading dataprovider dataprovider_emb::process
I0907 17:26:32.856694 1053 GradientMachine.cpp:134] Initing parameters..
I0907 17:26:33.070418 1053 GradientMachine.cpp:141] Init parameters done.
I0907 17:26:33.346114 1062 ThreadLocal.cpp:39] thread use undeterministic rand seed:1063
I0907 17:26:33.367995 1065 ThreadLocal.cpp:39] thread use undeterministic rand seed:1066
I0907 17:26:33.373780 1064 ThreadLocal.cpp:39] thread use undeterministic rand seed:1065
Current Layer forward/backward stack is
LayerName: lstmemory_0
LayerName: fc_layer_0
LayerName: embedding_0
LayerName: word
*** Aborted at 1473240393 (unix time) try "date -d @1473240393" if you are using GNU date ***
Current Layer forward/backward stack is
PC: @ 0x8024f0 (unknown)
Current Layer forward/backward stack is
*** SIGILL (@0x8024f0) received by PID 1053 (TID 0x7f50fe12e700) from PID 8398064; stack trace: ***
Current Layer forward/backward stack is
@ 0x7f510f76c710 (unknown)
Current Layer forward/backward stack is
@ 0x8024f0 (unknown)
Current Layer forward/backward stack is
@ 0x587470 paddle::LstmCompute::forwardOneSequence<>()
Current Layer forward/backward stack is
@ 0x5879fa paddle::LstmCompute::forwardBatch<>()
Current Layer forward/backward stack is
@ 0x581d4c paddle::LstmLayer::forwardBatch()
Current Layer forward/backward stack is
@ 0x58538a paddle::LstmLayer::forward()
Current Layer forward/backward stack is
@ 0x616d74 paddle::NeuralNetwork::forward()
Current Layer forward/backward stack is
@ 0x6211c6 paddle::TrainerThread::forward()
Current Layer forward/backward stack is
@ 0x623374 paddle::TrainerThread::computeThread()
Current Layer forward/backward stack is
@ 0x7f510e8743d2 execute_native_thread_routine
Current Layer forward/backward stack is
@ 0x7f510f7649d1 start_thread
Current Layer forward/backward stack is
@ 0x7f510e0598fd clone
/data11/dis_ml/deeplearning/paddle/bin/paddle: line 46: 1053 Illegal instruction ${DEBUGGER} $MYDIR/../opt/paddle/bin/paddle_trainer ${@:2}

reyoung · 2016-09-07T09:57:46Z

@NIULQfromNJU Hello, it seems that PaddlePaddle use some CPU instructions that your CPU not support (AVX). Please try to rebuild your PaddlePaddle, disable the AVX support using
-DWITH_AVX=OFF, and rebuild it. That will solve your problem.

There is a TODO in CMake file to automatically select AVX flag depends on machine CPU, but it is still not developed.

Please set -DCMAKE_BUILD_TYPE=Debug -DWITH_AVX=OFF to rebuild PaddlePaddle, make sure there is no error. Then you can set -DCMAKE_BUILD_TYPE=RelWithDebInfo -DWITH_AVX=OFF, and install it to train your model.

lqniunjunlper · 2016-09-07T12:40:03Z

hi @reyoung , i rebuild the paddle with -DWITH_AVX=OFF, and then i run the quick start demo. But I have the same problem as before: LR, WE+LR, WE+CNN run successfully while WE+LSTM aborted. So strange! Is there any other instruction that is not supported by CPU in LSTM example?
The following is the error print:
I0907 20:30:21.711181 10069 Util.cpp:144] commandline: /data11/paddle/pd/bin/../opt/paddle/bin/paddle_trainer --config=trainer_config.lstm.py --save_dir=./output --trainer_count=4 --log_period=20 --num_passes=15 --use_gpu=false --show_parameter_stats_period=100 --test_all_data_in_one_period=1
I0907 20:30:21.711364 10069 Util.cpp:113] Calling runInitFunctions
I0907 20:30:21.711556 10069 Util.cpp:126] Call runInitFunctions done.
[INFO 2016-09-07 20:30:22,156 networks.py:1122] The input order is [word, label]
[INFO 2016-09-07 20:30:22,157 networks.py:1129] The output order is [cost_0]
I0907 20:30:22.174654 10069 Trainer.cpp:169] trainer mode: Normal
I0907 20:30:22.262153 10069 PyDataProvider2.cpp:219] loading dataprovider dataprovider_emb::process
I0907 20:30:22.288261 10069 PyDataProvider2.cpp:219] loading dataprovider dataprovider_emb::process
I0907 20:30:22.288434 10069 GradientMachine.cpp:134] Initing parameters..
I0907 20:30:22.491011 10069 GradientMachine.cpp:141] Init parameters done.
I0907 20:30:22.681430 10100 ThreadLocal.cpp:39] thread use undeterministic rand seed:10101
I0907 20:30:22.683939 10101 ThreadLocal.cpp:39] thread use undeterministic rand seed:10102
I0907 20:30:22.699645 10098 ThreadLocal.cpp:39] thread use undeterministic rand seed:10099
I0907 20:30:22.701810 10099 ThreadLocal.cpp:39] thread use undeterministic rand seed:10100
Current Layer forward/backward stack is
LayerName: lstmemory_0
LayerName: fc_layer_0
LayerName: embedding_0
LayerName: word
*** Aborted at 1473251422 (unix time) try "date -d @1473251422" if you are using GNU date ***
Current Layer forward/backward stack is
PC: @ 0x8024f0 (unknown)
Current Layer forward/backward stack is
*** SIGILL (@0x8024f0) received by PID 10069 (TID 0x7f92afa00700) from PID 8398064; stack trace: ***
Current Layer forward/backward stack is
@ 0x7f92c202d710 (unknown)
Current Layer forward/backward stack is
@ 0x8024f0 (unknown)
Current Layer forward/backward stack is
@ 0x587470 paddle::LstmCompute::forwardOneSequence<>()
Current Layer forward/backward stack is
@ 0x5879fa paddle::LstmCompute::forwardBatch<>()
Current Layer forward/backward stack is
@ 0x581d4c paddle::LstmLayer::forwardBatch()
Current Layer forward/backward stack is
@ 0x58538a paddle::LstmLayer::forward()
Current Layer forward/backward stack is
@ 0x616d74 paddle::NeuralNetwork::forward()
Current Layer forward/backward stack is
@ 0x6211c6 paddle::TrainerThread::forward()
Current Layer forward/backward stack is
@ 0x623374 paddle::TrainerThread::computeThread()
Current Layer forward/backward stack is
@ 0x7f92c11353d2 execute_native_thread_routine
Current Layer forward/backward stack is
@ 0x7f92c20259d1 start_thread
Current Layer forward/backward stack is
@ 0x7f92c091a8fd clone
/data11/paddle/pd/bin/paddle: line 46: 10069 Illegal instruction ${DEBUGGER} $MYDIR/../opt/paddle/bin/paddle_trainer ${@:2}

reyoung · 2016-09-07T14:10:20Z

OK, we locate the problem here. It seems that the lstm layer is use some AVX instructions. We will fix it in few days.

lqniunjunlper · 2016-09-07T17:09:24Z

@reyoung great！

reyoung · 2016-09-08T03:36:54Z

@NIULQfromNJU Please give us your cpu info. just cat /proc/cpuinfo

lqniunjunlper · 2016-09-08T05:05:00Z

@reyoung

processor : 15
vendor_id : GenuineIntel
cpu family : 6
model : 44
model name : Intel(R) Xeon(R) CPU E5620 @ 2.40GHz
stepping : 2
cpu MHz : 2401.000
cache size : 12288 KB
physical id : 0
siblings : 8
core id : 10
cpu cores : 4
apicid : 21
initial apicid : 21
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm dca sse4_1 sse4_2 popcnt aes lahf_lm ida arat tpr_shadow vnmi flexpriority ept vpid
bogomips : 4800.24
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management:

reyoung · 2016-09-09T05:33:35Z

@NIULQfromNJU the code, that will fix this error, is under review. #51

reyoung · 2016-09-09T06:45:46Z

@NIULQfromNJU The fix code is merge into master branch. Please checkout and lstm should be ok now.

lqniunjunlper · 2016-09-09T06:55:46Z

@reyoung Well done! Now updated paddle can run lstm successfully! thx~

reyoung · 2016-09-09T07:30:33Z

@NIULQfromNJU You're welcome.

If there is anything I can help, don't hesitate to ask.

Thank you for your attention.

add lite_engine_test and lite_engine_op_test

release some image classification models

…ble (PaddlePaddle#41) 1. always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. 2. promote floating point tensor to complex tensor ior fft_c2c and fft_c2r; 3. fix unittest to catch UnImplementedError and RuntimeError; 4. fix compile error by avoid using thrust when cuda is not available. 5. fix sample code, use paddle.fft instead of paddle.tensor.fft

* 1. add interface for fft; 2. add data type predicate; 3. fix paddle.roll. * add fft c2c cufft kernel * implement argument checking & op calling parts for fft_c2c and fftn_c2c * add operator and opmaker definitions * only register float and double for cpu. * add common code for implementing FFT, add pocketfft as a dependency * add fft c2c cufft kernel function * fix bugs in python interface * add support for c2r, r2c operators, op makers, kernels and kernel functors. * test and fix bugs * 1. fft_c2c function: add support for onesided=False; 2. add complex<float>, complex<double> support for concat and flip. * 1. fft: fix python api bugs; 2. shape_op: add support for complex data types. * fft c2c cufft kernel done with complie and link * fix shape_op, add mkl placeholder * remove mkl * complete fft c2c in gpu * 1. implement mkl-based fft, FFTC2CFunctor and common function exec_fft; 2. change the design, add input and output typename as template parameter for all FFTFunctors, update pocketfft-based implementation. * complete fft c2c on gpu in ND * complete fft c2c on gpu in ND * complete fft c2c backward in ND * fix MKL-based implementation * Add frame op and CPU/GPU kernels. * Add frame op forward unittest. * Add frame op forward unittest. * Remove axis parameter in FrameFunctor. * Add frame op grad CPU/GPU kernels and unittest. * Add frame op grad CPU/GPU kernels and unittest. * Update doc string. * Update after review and remove librosa requirement in unittest. * Update grad kernel. * add fft_c2r op * Remove data allocation in TransCompute function. * add fft r2c onesided with cpu(pocketfft/mkl) and gpu * last fft c2r functor * fix C2R and R2C for cufft, becase the direction is not an option in these cases. * add fft r2c onesided with cpu(pocketfft/mkl) and gpu * fix bugs in python APIs * fix fft_c2r grad kernal * fix bugs in python APIs * add cuda fft c2r grad kernal functor * clean code * fix fft_c2r python API * fill fft r2c result with conjugate symmetry (#19) fill fft r2c result with conjugate symmetry * add placeholder for unittests (#24) * simple parameterize test function by auto generate test case from parm list (#25) * miscellaneous fixes for python APIs (#26) * add placeholder for unittests * resize fft inputs before computation is n or s is provided. * add complex kernels for pad and pad_grad * simplify argument checking. * add type promotion * add int to float or complex promotion * fix output data type for static mode * fix fft's input dtype dispatch, import fft to paddle * fix typos in axes checking (#27) * fix typos in axes checking * fix argument checking (#28) * fix argument checking * Add C2R Python layer normal and abnormal use cases (#29) * documents and single case * test c2r case * New C2R Python layer normal and exception use cases * complete rfft,rfft2,rfftn,ihfft,ihfft2,ihfftn unittest and doc string (#30) * Documentation of the common interfaces of c2r and c2c (#31) * Documentation of the common interfaces of c2r and c2c * clean c++ code (#32) * clean code * Add numpy-based implementation of spectral ops (#33) * add numpy reference implementation of spectral ops * Add fft_c2r numpy based implementation for unittest. (#34) * add fft_c2r numpy implementation * Add deframe op and stft/istft api. (#23) * Add frame api * Add deframe op and kernels. * Add stft and istft apis. * Add deframe api. Update stft and istft apis. * Fix bug in frame_from_librosa function when input dims >= 3 * Rename deframe to overlap_add. * Update istft. * Update after code review. * Add overlap_add op and stft/istft api unittest (#35) * Add overlap_add op unittest. * Register complex kernels of squeeze/unsquuze op. * Add stft/istft api unittest. * Add unittest for fft helper functions (#36) * add unittests for fft helper functions. add complex kernel for roll op. * complete static graph unittest for all public api (#37) * Unittest of op with FFT C2C, C2R and r2c added (#38) * documents and single case * test c2r case * New C2R Python layer normal and exception use cases * Documentation of the common interfaces of c2r and c2c * Unittest of op with FFT C2C, C2R and r2c added Co-authored-by: lijiaqi <lijiaqi0612@163.com> * add fft related options to CMakeLists.txt * fix typos and clean code (#39) * fix invisible character in mkl branch and fix error in error message * clean code: remove docstring from unittest for signal.py. * always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. (#40) * always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. * fix CI Errors: numpy dtype comparison, thrust when cuda is not available (#41) 1. always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. 2. promote floating point tensor to complex tensor ior fft_c2c and fft_c2r; 3. fix unittest to catch UnImplementedError and RuntimeError; 4. fix compile error by avoid using thrust when cuda is not available. 5. fix sample code, use paddle.fft instead of paddle.tensor.fft * remove inclusion of thrust, add __all__ list for fft (#42) * Add api doc and update unittest. (#43) * Add doc strings. * Update overlap_add op unittest * fix MKL-based FFT implementation (#44) * fix MKL-based FFT implementation, MKL CDFT's FORWARD DOMAIN is always REAL for R2C and C2R * remove code for debug (#45) * use dynload for cufft (#46) * use std::ptrdiff_t as datatype of stride (instead of int64_t) to avoid argument mismatch on some platforms. * add complex support for fill_zeros_like * use dynload for cufft * Update doc and unittest. (#47) * Add doc of frame op and overlap_add op. * Update unittest. * use dynload for cufft (#48) 1. use dynload for cufft 2. fix unittest; 3. temporarily disable Rocm. * fix conflicts and merge upstream (#49) fix conflicts and merge upstream * fix compile error: only link dyload_cuda when cuda is available (#50) * fix compile error: only link dyload_cuda when cuda is available * fix dynload for cufft on windows (#51) 1. fix dynload for cufft on windows; 2. fix unittests. * add NOMINMAX to compile on windows (#52) add NOMINMAX to compile on windows * explicitly specify capture mode for lambdas (#55) explicitly specify capture mode for lambdas * fix fft sample (#53) * fix fft sample * update scipy and numpy version for unittests of fft (#56) update scipy and numpy version for unittests of fft * Add static graph unittests of frame and overlap_add api. (#57) * Remove cache of cuFFT & Disable ONEMKL (#59) 1. replace numpy.fft with scipy.fft as numpy<1.20 not support ortho norm 2. remove cache of cufft plans; 3. enhance error checking. 4. default WITH_ONEMKL to OFF Co-authored-by: jeff41404 <jeff41404@gmail.com> Co-authored-by: root <root@bjyz-sys-gpu-kongming9.bjyz.baidu.com> Co-authored-by: KP <109694228@qq.com> Co-authored-by: lijiaqi <lijiaqi0612@163.com> Co-authored-by: Xiaoxu Chen <chenxx_id@163.com> Co-authored-by: lijiaqi0612 <33169170+lijiaqi0612@users.noreply.github.com>

* 1. add interface for fft; 2. add data type predicate; 3. fix paddle.roll. * add fft c2c cufft kernel * implement argument checking & op calling parts for fft_c2c and fftn_c2c * add operator and opmaker definitions * only register float and double for cpu. * add common code for implementing FFT, add pocketfft as a dependency * add fft c2c cufft kernel function * fix bugs in python interface * add support for c2r, r2c operators, op makers, kernels and kernel functors. * test and fix bugs * 1. fft_c2c function: add support for onesided=False; 2. add complex<float>, complex<double> support for concat and flip. * 1. fft: fix python api bugs; 2. shape_op: add support for complex data types. * fft c2c cufft kernel done with complie and link * fix shape_op, add mkl placeholder * remove mkl * complete fft c2c in gpu * 1. implement mkl-based fft, FFTC2CFunctor and common function exec_fft; 2. change the design, add input and output typename as template parameter for all FFTFunctors, update pocketfft-based implementation. * complete fft c2c on gpu in ND * complete fft c2c on gpu in ND * complete fft c2c backward in ND * fix MKL-based implementation * Add frame op and CPU/GPU kernels. * Add frame op forward unittest. * Add frame op forward unittest. * Remove axis parameter in FrameFunctor. * Add frame op grad CPU/GPU kernels and unittest. * Add frame op grad CPU/GPU kernels and unittest. * Update doc string. * Update after review and remove librosa requirement in unittest. * Update grad kernel. * add fft_c2r op * Remove data allocation in TransCompute function. * add fft r2c onesided with cpu(pocketfft/mkl) and gpu * last fft c2r functor * fix C2R and R2C for cufft, becase the direction is not an option in these cases. * add fft r2c onesided with cpu(pocketfft/mkl) and gpu * fix bugs in python APIs * fix fft_c2r grad kernal * fix bugs in python APIs * add cuda fft c2r grad kernal functor * clean code * fix fft_c2r python API * fill fft r2c result with conjugate symmetry (#19) fill fft r2c result with conjugate symmetry * add placeholder for unittests (#24) * simple parameterize test function by auto generate test case from parm list (#25) * miscellaneous fixes for python APIs (#26) * add placeholder for unittests * resize fft inputs before computation is n or s is provided. * add complex kernels for pad and pad_grad * simplify argument checking. * add type promotion * add int to float or complex promotion * fix output data type for static mode * fix fft's input dtype dispatch, import fft to paddle * fix typos in axes checking (#27) * fix typos in axes checking * fix argument checking (#28) * fix argument checking * Add C2R Python layer normal and abnormal use cases (#29) * documents and single case * test c2r case * New C2R Python layer normal and exception use cases * complete rfft,rfft2,rfftn,ihfft,ihfft2,ihfftn unittest and doc string (PaddlePaddle#30) * Documentation of the common interfaces of c2r and c2c (PaddlePaddle#31) * Documentation of the common interfaces of c2r and c2c * clean c++ code (PaddlePaddle#32) * clean code * Add numpy-based implementation of spectral ops (PaddlePaddle#33) * add numpy reference implementation of spectral ops * Add fft_c2r numpy based implementation for unittest. (PaddlePaddle#34) * add fft_c2r numpy implementation * Add deframe op and stft/istft api. (#23) * Add frame api * Add deframe op and kernels. * Add stft and istft apis. * Add deframe api. Update stft and istft apis. * Fix bug in frame_from_librosa function when input dims >= 3 * Rename deframe to overlap_add. * Update istft. * Update after code review. * Add overlap_add op and stft/istft api unittest (PaddlePaddle#35) * Add overlap_add op unittest. * Register complex kernels of squeeze/unsquuze op. * Add stft/istft api unittest. * Add unittest for fft helper functions (PaddlePaddle#36) * add unittests for fft helper functions. add complex kernel for roll op. * complete static graph unittest for all public api (PaddlePaddle#37) * Unittest of op with FFT C2C, C2R and r2c added (PaddlePaddle#38) * documents and single case * test c2r case * New C2R Python layer normal and exception use cases * Documentation of the common interfaces of c2r and c2c * Unittest of op with FFT C2C, C2R and r2c added Co-authored-by: lijiaqi <lijiaqi0612@163.com> * add fft related options to CMakeLists.txt * fix typos and clean code (PaddlePaddle#39) * fix invisible character in mkl branch and fix error in error message * clean code: remove docstring from unittest for signal.py. * always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. (PaddlePaddle#40) * always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. * fix CI Errors: numpy dtype comparison, thrust when cuda is not available (PaddlePaddle#41) 1. always convert numpy array to paddle.Tensor to avoid comparing numpy dtype with paddle dtype. 2. promote floating point tensor to complex tensor ior fft_c2c and fft_c2r; 3. fix unittest to catch UnImplementedError and RuntimeError; 4. fix compile error by avoid using thrust when cuda is not available. 5. fix sample code, use paddle.fft instead of paddle.tensor.fft * remove inclusion of thrust, add __all__ list for fft (PaddlePaddle#42) * Add api doc and update unittest. (PaddlePaddle#43) * Add doc strings. * Update overlap_add op unittest * fix MKL-based FFT implementation (PaddlePaddle#44) * fix MKL-based FFT implementation, MKL CDFT's FORWARD DOMAIN is always REAL for R2C and C2R * remove code for debug (PaddlePaddle#45) * use dynload for cufft (PaddlePaddle#46) * use std::ptrdiff_t as datatype of stride (instead of int64_t) to avoid argument mismatch on some platforms. * add complex support for fill_zeros_like * use dynload for cufft * Update doc and unittest. (PaddlePaddle#47) * Add doc of frame op and overlap_add op. * Update unittest. * use dynload for cufft (PaddlePaddle#48) 1. use dynload for cufft 2. fix unittest; 3. temporarily disable Rocm. * fix conflicts and merge upstream (PaddlePaddle#49) fix conflicts and merge upstream * fix compile error: only link dyload_cuda when cuda is available (PaddlePaddle#50) * fix compile error: only link dyload_cuda when cuda is available * fix dynload for cufft on windows (PaddlePaddle#51) 1. fix dynload for cufft on windows; 2. fix unittests. * add NOMINMAX to compile on windows (PaddlePaddle#52) add NOMINMAX to compile on windows * explicitly specify capture mode for lambdas (PaddlePaddle#55) explicitly specify capture mode for lambdas * fix fft sample (PaddlePaddle#53) * fix fft sample * update scipy and numpy version for unittests of fft (PaddlePaddle#56) update scipy and numpy version for unittests of fft * Add static graph unittests of frame and overlap_add api. (PaddlePaddle#57) * Remove cache of cuFFT & Disable ONEMKL (PaddlePaddle#59) 1. replace numpy.fft with scipy.fft as numpy<1.20 not support ortho norm 2. remove cache of cufft plans; 3. enhance error checking. 4. default WITH_ONEMKL to OFF Co-authored-by: jeff41404 <jeff41404@gmail.com> Co-authored-by: root <root@bjyz-sys-gpu-kongming9.bjyz.baidu.com> Co-authored-by: KP <109694228@qq.com> Co-authored-by: lijiaqi <lijiaqi0612@163.com> Co-authored-by: Xiaoxu Chen <chenxx_id@163.com> Co-authored-by: lijiaqi0612 <33169170+lijiaqi0612@users.noreply.github.com>

* add unit test for conv inference * add conv training unittest * add SGD and unittest, imporve Adam

…ddlePaddle#41) * add select_device and init_from_ckpt arg * fix distill lstm readme bug * update paddnlp install version in readme

add gpu memory status info

Refine code format

add xpu assign_value

Fix bug in enforce.h for MUSA

avoid repeated exp() calculation

implement FuseFilteredStmtPatterns

qingqing01 assigned reyoung Sep 7, 2016

reyoung closed this as completed Sep 9, 2016

jiweibo pushed a commit to jiweibo/Paddle that referenced this issue Oct 24, 2019

Merge pull request PaddlePaddle#41 from jiweibo/dev/lite_engine

6150ae0

add lite_engine_test and lite_engine_op_test

qingqing01 pushed a commit to qingqing01/Paddle that referenced this issue Apr 30, 2020

Merge pull request PaddlePaddle#41 from LielinJiang/fix-bugs-from-qa

8a312a9

release some image classification models

DemoMoon mentioned this issue Mar 24, 2021

oneDNN 如何能提升DeepSpeech的语音处理性能 #31838

Closed

gglin001 pushed a commit to graphcore/Paddle-fork that referenced this issue Dec 8, 2021

Add SGD and unittest, Improve Adam (PaddlePaddle#41)

4d1b514

* add unit test for conv inference * add conv training unittest * add SGD and unittest, imporve Adam

zmxdream pushed a commit to zmxdream/Paddle that referenced this issue Jul 18, 2022

add runtime inferShape in sequence_pool op (PaddlePaddle#41)

cdf647c

zmxdream pushed a commit to zmxdream/Paddle that referenced this issue Sep 5, 2022

add runtime inferShape in sequence_pool op (PaddlePaddle#41)

5845fd0

niuliling123 pushed a commit to niuliling123/Paddle that referenced this issue Sep 19, 2022

add first order demo (PaddlePaddle#41)

2b570d0

jack603047588 referenced this issue in jack603047588/Paddle Nov 9, 2022

Merge pull request #41 from qingshui/paddlebox

7aee852

add gpu memory status info

marsbzp mentioned this issue Jan 11, 2023

多线程调用C++推理库进行RNN算子崩溃问题！！！！ #49737

Open

qizhaoaoe pushed a commit to qizhaoaoe/Paddle that referenced this issue Mar 3, 2023

align to mxnet (PaddlePaddle#41)

a66a488

zyfncg pushed a commit to zyfncg/Paddle that referenced this issue Oct 17, 2023

Merge pull request PaddlePaddle#41 from zyfncg/drr_pass

c80faa8

Refine code format

lizexu123 pushed a commit to lizexu123/Paddle that referenced this issue Feb 23, 2024

code block use fixed width 590px (PaddlePaddle#41)

bc6188c

zmxdream added a commit to zmxdream/Paddle that referenced this issue Feb 27, 2024

Merge pull request PaddlePaddle#41 from zmxdream/add_assign_value

345905a

add xpu assign_value

hanhaowen-mt pushed a commit to hanhaowen-mt/Paddle that referenced this issue Feb 29, 2024

Merge pull request PaddlePaddle#41 from mthreads/fix_bug

48cc622

Fix bug in enforce.h for MUSA

NKNaN pushed a commit to NKNaN/Paddle that referenced this issue Mar 3, 2024

Merge pull request PaddlePaddle#41 from lifeiteng/master

19547b5

avoid repeated exp() calculation

feifei-111 pushed a commit to feifei-111/Paddle that referenced this issue Mar 9, 2024

Merge pull request PaddlePaddle#41 from tc20042008/xk-cinn-trivalop-fuse

badeae6

implement FuseFilteredStmtPatterns

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LSTM running error ! #41

LSTM running error ! #41

lqniunjunlper commented Sep 5, 2016

lqniunjunlper commented Sep 7, 2016

reyoung commented Sep 7, 2016

lqniunjunlper commented Sep 7, 2016

reyoung commented Sep 7, 2016

lqniunjunlper commented Sep 7, 2016

reyoung commented Sep 7, 2016

lqniunjunlper commented Sep 7, 2016

reyoung commented Sep 8, 2016

lqniunjunlper commented Sep 8, 2016

reyoung commented Sep 9, 2016 •

edited

reyoung commented Sep 9, 2016

lqniunjunlper commented Sep 9, 2016

reyoung commented Sep 9, 2016

LSTM running error ! #41

LSTM running error ! #41

Comments

lqniunjunlper commented Sep 5, 2016

lqniunjunlper commented Sep 7, 2016

reyoung commented Sep 7, 2016

lqniunjunlper commented Sep 7, 2016

reyoung commented Sep 7, 2016

lqniunjunlper commented Sep 7, 2016

reyoung commented Sep 7, 2016

lqniunjunlper commented Sep 7, 2016

reyoung commented Sep 8, 2016

lqniunjunlper commented Sep 8, 2016

reyoung commented Sep 9, 2016 • edited

reyoung commented Sep 9, 2016

lqniunjunlper commented Sep 9, 2016

reyoung commented Sep 9, 2016

reyoung commented Sep 9, 2016 •

edited