Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LstmLayer unit test in test_LayerGrad failed #534

Closed
Haichao-Zhang opened this issue Nov 18, 2016 · 1 comment
Closed

LstmLayer unit test in test_LayerGrad failed #534

Haichao-Zhang opened this issue Nov 18, 2016 · 1 comment
Assignees

Comments

@Haichao-Zhang
Copy link
Contributor

Platform: TitanX (Pascal) + CUDA8.0 + CUDNN 5
Error:
*Single Precision: testBatchState failed Value of: Argument::sumCosts(args) Actual: 1.49012e-07 Expected: 0
*Double Precision: CUDA error: too many resources requested for launch

-------------complete error message for Single Precision -----------------
18: Test command:
/paddle/build/paddle/gserver/tests/test_LayerGrad
18: Test timeout computed to be: 9.99988e+06
18: I1118 14:55:57.564926 14214 Util.cpp:151] commandline: /paddle/build/paddle/gserver/tests/test_LayerGrad
18: I1118 14:55:58.722831 14214 Util.cpp:126] Calling runInitFunctions
18: I1118 14:55:58.722985 14214 Util.cpp:139] Call runInitFunctions done.
18: [==========] Running 1 test from 1 test case.
18: [----------] Global test environment set-up.
18: [----------] 1 test from Layer
18: [ RUN ] Layer.LstmLayer
18: I1118 14:55:58.723070 14214 LayerGradUtil.cpp:562] layer_type=lstmemory useGpu=0
18: I1118 14:55:58.723346 14214 LayerGradUtil.cpp:596] cost 143.025
18: I1118 14:55:58.723707 14214 LayerGradUtil.cpp:39] lstmemory para_0 step=1.82986e-06 cost1=143.026 cost2=143.024 true_delta=0.00143433 analytic_delta=0.00143025 diff=0.00284948
18: I1118 14:55:58.723858 14214 LayerGradUtil.cpp:39] lstmemory........... bias................step=1e-06..........cost1=143.026...cost2=143.024...true_delta=0.00253296.....analytic_delta=0.00254425.....diff=-0.00443644
18: I1118 14:55:58.724032 14214 LayerGradUtil.cpp:39] lstmemory layer_0 step=8.36381e-05 cost1=143.026 cost2=143.024 true_delta=0.00143433 analytic_delta=0.00143025 diff=0.00284948
18: I1118 14:55:58.742120 14214 LayerGradUtil.cpp:562] layer_type=lstmemory useGpu=0
18: I1118 14:55:58.742269 14214 LayerGradUtil.cpp:596] cost 142.693
18: I1118 14:55:58.742625 14214 LayerGradUtil.cpp:39] lstmemory layer_0 step=3.5546e-05 cost1=142.694 cost2=142.692 true_delta=0.00141907 analytic_delta=0.00142693 diff=-0.00550957
18: I1118 14:55:58.760489 14214 LayerGradUtil.cpp:562] layer_type=lstmemory useGpu=0
18: I1118 14:55:58.760635 14214 LayerGradUtil.cpp:596] cost 163.724
18: I1118 14:55:58.760982 14214 LayerGradUtil.cpp:39] lstmemory para_0 step=1e-06 cost1=163.725 cost2=163.723 true_delta=0.00201416 analytic_delta=0.00201247 diff=0.00083878
18: I1118 14:55:58.761132 14214 LayerGradUtil.cpp:39] lstmemory........... bias................step=1e-06..........cost1=163.728...cost2=163.72....true_delta=0.00782776.....analytic_delta=0.00782817.....diff=-5.2704e-05
18: I1118 14:55:58.761309 14214 LayerGradUtil.cpp:39] lstmemory layer_0 step=3.72065e-05 cost1=163.725 cost2=163.723 true_delta=0.00164795 analytic_delta=0.00163724 diff=0.00654048
18: I1118 14:55:58.761329 14214 LayerGradUtil.cpp:562] layer_type=lstmemory useGpu=0
18: I1118 14:55:58.761485 14214 LayerGradUtil.cpp:596] cost 111.569
18: I1118 14:55:58.761876 14214 LayerGradUtil.cpp:39] lstmemory layer_0 step=6.79645e-06 cost1=111.57 cost2=111.568 true_delta=0.00112152 analytic_delta=0.00111569 diff=0.00522585
18: I1118 14:55:58.761914 14214 LayerGradUtil.cpp:562] layer_type=lstmemory useGpu=1
18: I1118 14:55:58.766914 14214 LayerGradUtil.cpp:596] cost 165.057
18: I1118 14:55:58.768296 14214 LayerGradUtil.cpp:39] lstmemory para_0 step=1e-06 cost1=165.058 cost2=165.055 true_delta=0.00247192 analytic_delta=0.00246707 diff=0.00196716
18: I1118 14:55:58.768846 14214 LayerGradUtil.cpp:39] lstmemory........... bias................step=1e-06..........cost1=165.061...cost2=165.052...true_delta=0.00917053.....analytic_delta=0.00917125.....diff=-7.77858e-05
18: I1118 14:55:58.769419 14214 LayerGradUtil.cpp:39] lstmemory layer_0 step=3.47424e-05 cost1=165.058 cost2=165.056 true_delta=0.00164795 analytic_delta=0.00165057 diff=-0.00158588
18: I1118 14:55:59.262369 14214 LayerGradUtil.cpp:562] layer_type=lstmemory useGpu=1
18: I1118 14:55:59.262892 14214 LayerGradUtil.cpp:596] cost 175.734
18: I1118 14:55:59.264713 14214 LayerGradUtil.cpp:39] lstmemory layer_0 step=7.29514e-05 cost1=175.735 cost2=175.733 true_delta=0.00175476 analytic_delta=0.00175734 diff=-0.00146832
18: I1118 14:55:59.752734 14214 LayerGradUtil.cpp:562] layer_type=lstmemory useGpu=1
18: I1118 14:55:59.753172 14214 LayerGradUtil.cpp:596] cost 120.313
18: I1118 14:55:59.754591 14214 LayerGradUtil.cpp:39] lstmemory para_0 step=2.79576e-06 cost1=120.314 cost2=120.312 true_delta=0.00120544 analytic_delta=0.00120313 diff=0.00192272
18: I1118 14:55:59.755139 14214 LayerGradUtil.cpp:39] lstmemory........... bias................step=1e-06..........cost1=120.314...cost2=120.312...true_delta=0.00147247.....analytic_delta=0.00147154.....diff=0.000633126
18: I1118 14:55:59.755717 14214 LayerGradUtil.cpp:39] lstmemory layer_0 step=9.24628e-05 cost1=120.314 cost2=120.312 true_delta=0.00119781 analytic_delta=0.00120313 diff=-0.00441846
18: I1118 14:55:59.755755 14214 LayerGradUtil.cpp:562] layer_type=lstmemory useGpu=1
18: I1118 14:55:59.756124 14214 LayerGradUtil.cpp:596] cost 135.085
18: I1118 14:55:59.757428 14214 LayerGradUtil.cpp:39] lstmemory layer_0 step=3.4552e-05 cost1=135.086 cost2=135.084 true_delta=0.00135803 analytic_delta=0.00135085 diff=0.00531571
18: I1118 14:55:59.757470 14214 LayerGradUtil.cpp:562] layer_type=lstmemory useGpu=1
18: I1118 14:55:59.757930 14214 LayerGradUtil.cpp:596] cost 13.3
18: I1118 14:55:59.759021 14214 LayerGradUtil.cpp:39] lstmemory para_0 step=4.47501e-06 cost1=13.3001 cost2=13.3 true_delta=0.000132561 analytic_delta=0.000133 diff=-0.00330459
18: I1118 14:55:59.759539 14214 LayerGradUtil.cpp:39] lstmemory........... bias................step=1e-06..........cost1=13.3001...cost2=13.2999...true_delta=0.000185013....analytic_delta=0.000185087....diff=-0.00040168
18: I1118 14:55:59.760105 14214 LayerGradUtil.cpp:39] lstmemory layer_0 step=2.3728e-05 cost1=13.3001 cost2=13.3 true_delta=0.000132561 analytic_delta=0.000133 diff=-0.0033047
18: /paddle/paddle/gserver/tests/LayerGradUtil.cpp:236: Failure
18: Value of: Argument::sumCosts(args)
18: Actual: -2.98023e-08
18: Expected: 0
18: testBatchState failed
18: I1118 14:55:59.760807 14214 LayerGradUtil.cpp:562] layer_type=lstmemory useGpu=1
18: I1118 14:55:59.761165 14214 LayerGradUtil.cpp:596] cost 19.1121
18: I1118 14:55:59.762101 14214 LayerGradUtil.cpp:39] lstmemory layer_0 step=8.1007e-05 cost1=19.1122 cost2=19.112 true_delta=0.000192642 analytic_delta=0.000191121 diff=0.00795843
18: /paddle/paddle/gserver/tests/LayerGradUtil.cpp:236: Failure
18: Value of: Argument::sumCosts(args)
18: Actual: 1.49012e-07
18: Expected: 0
18: testBatchState failed
18: [ FAILED ] Layer.LstmLayer (1039 ms)
18: [----------] 1 test from Layer (1039 ms total)
18:
18: [----------] Global test environment tear-down
18: [==========] 1 test from 1 test case ran. (1039 ms total)
18: [ PASSED ] 0 tests.
18: [ FAILED ] 1 test, listed below:
18: [ FAILED ] Layer.LstmLayer
18:
18: 1 FAILED TEST
1/1 Test #18: test_LayerGrad ...................***Failed 2.69 sec

0% tests passed, 1 tests failed out of 1

Total Test time (real) = 2.70 sec

The following tests FAILED:
18 - test_LayerGrad (Failed)

-------------complete error message for Double Precision -------------

Constructing a list of tests
Done constructing a list of tests
Checking test dependency graph...
Checking test dependency graph end
test 18
Start 18: test_LayerGrad

18: Test command: /paddle/build/paddle/gserver/tests/test_LayerGrad
18: Test timeout computed to be: 9.99988e+06
18: I1118 14:40:51.643263 6255 Util.cpp:151] commandline: /paddle/build/paddle/gserver/tests/test_LayerGrad
18: I1118 14:40:53.166174 6255 Util.cpp:126] Calling runInitFunctions
18: I1118 14:40:53.166334 6255 Util.cpp:139] Call runInitFunctions done.
18: [==========] Running 1 test from 1 test case.
18: [----------] Global test environment set-up.
18: [----------] 1 test from Layer
18: [ RUN ] Layer.LstmLayer
18: I1118 14:40:53.166437 6255 LayerGradUtil.cpp:562] layer_type=lstmemory useGpu=0
18: I1118 14:40:53.166791 6255 LayerGradUtil.cpp:596] cost 143.025
18: I1118 14:40:53.167201 6255 LayerGradUtil.cpp:39] lstmemory para_0 step=1.82986e-06 cost1=143.026 cost2=143.024 true_delta=0.00143025 analytic_delta=0.00143025 diff=-9.18081e-11
18: I1118 14:40:53.167361 6255 LayerGradUtil.cpp:39] lstmemory........... bias................step=1e-06..........cost1=143.026...cost2=143.024...true_delta=0.00254425.....analytic_delta=0.00254425.....diff=-5.88511e-11
18: I1118 14:40:53.167551 6255 LayerGradUtil.cpp:39] lstmemory layer_0 step=8.36381e-05 cost1=143.026 cost2=143.024 true_delta=0.00143025 analytic_delta=0.00143025 diff=-7.19365e-11
18: I1118 14:40:53.188534 6255 LayerGradUtil.cpp:562] layer_type=lstmemory useGpu=0
18: I1118 14:40:53.188704 6255 LayerGradUtil.cpp:596] cost 142.693
18: I1118 14:40:53.189092 6255 LayerGradUtil.cpp:39] lstmemory layer_0 step=3.5546e-05 cost1=142.694 cost2=142.692 true_delta=0.00142693 analytic_delta=0.00142693 diff=-6.20659e-10
18: I1118 14:40:53.210155 6255 LayerGradUtil.cpp:562] layer_type=lstmemory useGpu=0
18: I1118 14:40:53.210330 6255 LayerGradUtil.cpp:596] cost 163.724
18: I1118 14:40:53.210714 6255 LayerGradUtil.cpp:39] lstmemory para_0 step=1e-06 cost1=163.725 cost2=163.723 true_delta=0.00201247 analytic_delta=0.00201247 diff=1.06478e-10
18: I1118 14:40:53.210867 6255 LayerGradUtil.cpp:39] lstmemory........... bias................step=1e-06..........cost1=163.728...cost2=163.72....true_delta=0.00782817.....analytic_delta=0.00782817.....diff=-2.30781e-10
18: I1118 14:40:53.211063 6255 LayerGradUtil.cpp:39] lstmemory layer_0 step=3.72065e-05 cost1=163.725 cost2=163.723 true_delta=0.00163724 analytic_delta=0.00163724 diff=6.51079e-11
18: I1118 14:40:53.211086 6255 LayerGradUtil.cpp:562] layer_type=lstmemory useGpu=0
18: I1118 14:40:53.211341 6255 LayerGradUtil.cpp:596] cost 111.569
18: I1118 14:40:53.211750 6255 LayerGradUtil.cpp:39] lstmemory layer_0 step=6.79646e-06 cost1=111.57 cost2=111.568 true_delta=0.00111568 analytic_delta=0.00111569 diff=-6.52535e-06
18: I1118 14:40:53.211772 6255 LayerGradUtil.cpp:562] layer_type=lstmemory useGpu=1
18: I1118 14:40:53.217054 6255 LayerGradUtil.cpp:596] cost 164.934
18: F1118 14:40:53.217434 6255 hl_cuda_sequence.cu:447] Check failed: cudaSuccess == err (0 vs. 7) [hl_sequence2batch_add failed] CUDA error: too many resources requested for launch
18: *** Check failure stack trace: ***
18: @ 0x7f54aef38daa (unknown)
18: @ 0x7f54aef38ce4 (unknown)
18: @ 0x7f54aef386e6 (unknown)
18: @ 0x7f54aef3b687 (unknown)
18: @ 0x8a312f hl_sequence2batch_add()
18: @ 0x5e5a3b paddle::SequenceToBatch::sequence2BatchAdd()
18: @ 0x595e59 paddle::LstmLayer::backwardBatch()
18: @ 0x5964d7 paddle::LstmLayer::backward()
18: @ 0x556171 paddle::testLayerGradKernel()
18: @ 0x5570eb paddle::testLayerGrad()
18: @ 0x54e1fd Layer_LstmLayer_Test::TestBody()
18: @ 0x8d8b96 testing::internal::HandleSehExceptionsInMethodIfSupported<>()
18: @ 0x8d3d4c testing::internal::HandleExceptionsInMethodIfSupported<>()
18: @ 0x8c1039 testing::Test::Run()
18: @ 0x8c17b4 testing::TestInfo::Run()
18: @ 0x8c1d74 testing::TestCase::Run()
18: @ 0x8c69ce testing::internal::UnitTestImpl::RunAllTests()
18: @ 0x8da0d0 testing::internal::HandleSehExceptionsInMethodIfSupported<>()
18: @ 0x8d4c64 testing::internal::HandleExceptionsInMethodIfSupported<>()
18: @ 0x8c57cb testing::UnitTest::Run()
18: @ 0x543ab4 main
18: @ 0x7f54ae364f45 (unknown)
18: @ 0x54dcf8 (unknown)
18: @ (nil) (unknown)
1/1 Test #18: test_LayerGrad ...................***Exception: Other 2.10 sec

0% tests passed, 1 tests failed out of 1

Total Test time (real) = 2.11 sec

The following tests FAILED:
18 - test_LayerGrad (OTHER_FAULT)

@hedaoyuan hedaoyuan assigned emailweixu and hedaoyuan and unassigned emailweixu Nov 19, 2016
@hedaoyuan
Copy link
Contributor

We currently do not have a TitanX (Pascal) card to reproduce this issue.
In the cmake/flags.cmake file, the default setting is -gencode arch=compute_60,code=sm_60, may be you can modified into -gencode arch=compute_61,code=sm_61 and then try again.

gglin001 added a commit to graphcore/Paddle-fork that referenced this issue Mar 28, 2022
* restore code

* rm ipu_strategy.check()
AnnaTrainingG pushed a commit to AnnaTrainingG/Paddle that referenced this issue Sep 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants