Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sgd raises an exception Xbyak::Error, offset is too big #22757

Closed
zhangting2020 opened this issue Feb 25, 2020 · 5 comments · Fixed by #23120
Closed

sgd raises an exception Xbyak::Error, offset is too big #22757

zhangting2020 opened this issue Feb 25, 2020 · 5 comments · Fixed by #23120
Assignees
Labels

Comments

@zhangting2020
Copy link
Contributor

  • Compile Paddle from Source Code
cmake .. -DPY_VERSION=2.7 -DWITH_GPU=ON -DWITH_TESTING=OFF -DCMAKE_BUILD_TYPE=Release
  • When the code below is run,there is an exception:
import paddle.fluid as fluid
import paddle.fluid.compiler as compiler


data = fluid.layers.fill_constant(shape=[1], value=128, dtype='int64')
label = fluid.layers.fill_constant(shape=[1, 150], value=0.5, dtype='float32')
emb = fluid.embedding(input=data, size=(10000000, 150), dtype='float32')
out = fluid.layers.l2_normalize(x=emb, axis=-1)

cost = fluid.layers.square_error_cost(input=out, label=label)
avg_cost = fluid.layers.mean(cost)
sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.001)
sgd_optimizer.minimize(avg_cost)

place = fluid.CPUPlace()
exe = fluid.Executor(place)
exe.run(fluid.default_startup_program())
compiled_prog = compiler.CompiledProgram(fluid.default_main_program())
result = exe.run(compiled_prog, fetch_list=[avg_cost])
W0225 10:00:02.147568 11763 operator.cc:187] sgd raises an exception Xbyak::Error, offset is too big
F0225 10:00:02.147701 11763 exception_holder.h:37] std::exception caught, offset is too big
*** Check failure stack trace: ***
    @     0x7facd362aaed  google::LogMessage::Fail()
    @     0x7facd362ce38  google::LogMessage::SendToLog()
    @     0x7facd362a5fb  google::LogMessage::Flush()
    @     0x7facd362dd0e  google::LogMessageFatal::~LogMessageFatal()
    @     0x7facd5a79e9b  paddle::framework::details::ExceptionHolder::Catch()
    @     0x7facd5b5409b  paddle::framework::details::FastThreadedSSAGraphExecutor::RunOpSync()
    @     0x7facd5b4f232  paddle::framework::details::FastThreadedSSAGraphExecutor::RunOp()
    @     0x7facd5b5121d  _ZNSt17_Function_handlerIFSt10unique_ptrINSt13__future_base12_Result_baseENS2_8_DeleterEEvENS1_12_Task_setterIS0_INS1_7_ResultIvEES3_ESt12_Bind_simpleIFSt17reference_wrapperISt5_BindIFZN6paddle9framework7details28FastThreadedSSAGraphExecutor10RunOpAsyncEPSt13unordered_mapIPNSF_12OpHandleBaseESt6atomicIiESt4hashISJ_ESt8equal_toISJ_ESaISt4pairIKSJ_SL_EEESJ_RKSt10shared_ptrINSE_13BlockingQueueImEEEEUlvE_vEEEvEEvEEE9_M_invokeERKSt9_Any_data
    @     0x7facd3450cee  std::__future_base::_State_baseV2::_M_do_set()
    @     0x7fad28792a99  __pthread_once_slow
    @     0x7facd5b4da4d  _ZNSt17_Function_handlerIFvvEZN10ThreadPool7enqueueIZN6paddle9framework7details28FastThreadedSSAGraphExecutor10RunOpAsyncEPSt13unordered_mapIPNS5_12OpHandleBaseESt6atomicIiESt4hashIS9_ESt8equal_toIS9_ESaISt4pairIKS9_SB_EEES9_RKSt10shared_ptrINS4_13BlockingQueueImEEEEUlvE_JEEESt6futureINSt9result_ofIFT_DpT0_EE4typeEEOSV_DpOSW_EUlvE_E9_M_invokeERKSt9_Any_data
    @     0x7facd34531e4  _ZNSt6thread5_ImplISt12_Bind_simpleIFZN10ThreadPoolC4EmEUlvE_vEEE6_M_runEv
    @     0x7face6a33c80  (unknown)
    @     0x7fad2878b6ba  start_thread
    @     0x7fad284c141d  clone
    @              (nil)  (unknown)
Aborted (core dumped)
  • If I compile Paddle with the command below, the code can run successfully
cmake .. -DPY_VERSION=2.7 -DWITH_GPU=ON -DWITH_TESTING=OFF -DCMAKE_BUILD_TYPE=Release -DWITH_XBYAK=OFF
@grygielski
Copy link
Contributor

grygielski commented Feb 26, 2020

I've reproduced this problem on my machine. Does this size of embedding is something that exist in real use-cases (emb = fluid.embedding(input=data, size=(10000000, 150), dtype='float32') ? I ask because if you lower the size to for example 1000000 everything works fine. This implementation uses a lot of memory and XBYAK has some limitations set on memory addresses.

@zhangting2020
Copy link
Contributor Author

The size may exists in real use-cases. Some users want to set the embedding layer to run on the CPU when the size is too large. Because if embedding runs on GPU, an OOM error will raise. Is there a way to solve this limitation of XBYAK?

@grygielski
Copy link
Contributor

As far as I analyzed it, this problem occurs inside AVX instructions when offset (data) gets so big that its address exceed 32bit value. It looks like for this implementation of algorithm size=(10000000, 150) is too big and it's not possible to get over it (it's over 16 GB of RAM). However, I'm not an expert of assembly language and maybe some changes in instructions would be viable to improve memory capabilities.

@grygielski
Copy link
Contributor

@zhangting2020 please take a look at #23120 and test it

@zhangting2020
Copy link
Contributor Author

@grygielski I have tested it and the issue is fixed. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants