Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

在云上的GPU机器跑paddle0.10.0的V1的api #2946

Closed
WoNiuHu opened this issue Jul 18, 2017 · 12 comments
Closed

在云上的GPU机器跑paddle0.10.0的V1的api #2946

WoNiuHu opened this issue Jul 18, 2017 · 12 comments

Comments

@WoNiuHu
Copy link

WoNiuHu commented Jul 18, 2017

hi , RT. 在云上的机器跑V1的API, 机器不支持,报错:paddle command not found.
之后提了一个paddle 群里面问了下,建议下载nvidia-docker 和paddle:gpu-release-v0.9.0的镜像,然后cpu版本的是可以跑,跑gpu的时候报错如下


I0718 02:20:33.320343 36 Util.cpp:155] commandline: /usr/local/bin/../opt/paddle/bin/paddle_trainer --config=trainer_config.py --save_dir=./model_output --job=train --use_gpu=true --trainer_count=1 --num_passes=2 --log_period=10 --dot_period=20 --show_parameter_stats_period=100 --test_all_data_in_one_period=1

I0718 02:22:57.979029 36 Util.cpp:130] Calling runInitFunctions

I0718 02:22:57.980270 36 Util.cpp:143] Call runInitFunctions done.

[INFO 2017-07-18 02:22:59,474 networks.py:1466] The input order is [word, label]
[INFO 2017-07-18 02:22:59,474 networks.py:1472] The output order is [cost_0]
I0718 02:22:59.617970 36 Trainer.cpp:170] trainer mode: Normal

F0718 02:22:59.624567 36 hl_gpu_matrix_kernel.cuh:181] Check failed: cudaSuccess == err (0 vs. 8) [hl_gpu_apply_unary_op failed] CUDA error: invalid device function
*** Check failure stack trace: ***`

然后IDL的同学建议在0.10.0的paddle环境下跑,因为支持v1版本的api,v1的脚本要通过paddle二进制执行?这个地方怎么安装可以使得v1的api在GPU的机器下跑的起来。

@WoNiuHu WoNiuHu changed the title 在云上的GPU机器跑V1的api 在云上的GPU机器跑0.10.0的V1的api Jul 18, 2017
@WoNiuHu WoNiuHu changed the title 在云上的GPU机器跑0.10.0的V1的api 在云上的GPU机器跑paddle0.10.0的V1的api Jul 18, 2017
@helinwang
Copy link
Contributor

helinwang commented Jul 18, 2017

@WoNiuHu 您好,不好意思,Paddle 0.10.0支持的是V2 API,并不向后支持v1 API。
以下是我的尝试,确认了这个结论:

v1_api_demo git:(2885) docker run -it -v $PWD:/paddle -e "WITH_GPU=OFF" -e "WITH_AVX=OFF" paddlepaddle/paddle:0.10.0rc2 bash

root@411cb1beef42:/paddle/mnist# ./train.sh 
I0718 21:08:32.183030   298 Util.cpp:160] commandline: /usr/bin/../opt/paddle/bin/paddle_trainer --config=vgg_16_mnist.py --dot_period=10 --log_period=100 --test_all_data_in_one_period=1 --use_gpu=0 --trainer_count=1 --num_passes=100 --save_dir=./mnist_vgg_model 
F0718 21:08:32.298312   298 PythonUtil.cpp:186] Check failed: (module) != nullptr Current PYTHONPATH: ['/usr/opt/paddle/bin', '/paddle/mnist', '/usr/lib/python27.zip', '/usr/lib/python2.7', '/usr/lib/python2.7/plat-linux2', '/usr/lib/python2.7/lib-tk', '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload', '/usr/lib/python2.7/dist-packages']
Python Error: <type 'exceptions.ImportError'> : No module named paddle.trainer.config_parser
Python Callstack: 
Import paddle.trainer.config_parserError
*** Check failure stack trace: ***
    @           0x8a1d3c  google::LogMessage::Fail()
    @           0x8a1c83  google::LogMessage::SendToLog()
    @           0x8a15f8  google::LogMessage::Flush()
    @           0x8a47ed  google::LogMessageFatal::~LogMessageFatal()
    @           0x7ff47b  paddle::py::import()
    @           0x7ff4ee  paddle::callPythonFuncRetPyObj()
    @           0x7ff8bc  paddle::callPythonFunc()
    @           0x729553  paddle::TrainerConfigHelper::TrainerConfigHelper()
    @           0x729b94  paddle::TrainerConfigHelper::createFromFlags()
    @           0x594932  main
    @     0x7f36f1818b45  __libc_start_main
    @           0x5a2149  (unknown)
    @              (nil)  (unknown)
/usr/bin/paddle: line 109:   298 Aborted                 ${DEBUGGER} $MYDIR/../opt/paddle/bin/paddle_trainer ${@:2}

我会帮您创建一个v0.9.0 CUDA 8的docker image。需要一点时间,这块我不是很熟悉,可能需要问问其他的开发者。

@helinwang
Copy link
Contributor

helinwang commented Jul 18, 2017

0.10.0支持的是V2 API,并不向后支持v1 API,我先关闭这个issue,您的问题我们在#2931 讨论吧。

@typhoonzero
Copy link
Contributor

@helinwang 这个是paddlepaddle/paddle:0.10.0rc2这个docker image的一个bug,在rc3中已经修复,或者直接使用paddlepaddle/paddle:0.10.0是release版本。

@helinwang helinwang reopened this Jul 19, 2017
@helinwang
Copy link
Contributor

明白了,经测试paddlepaddle/paddle:0.10.0确实支持V1 API.

@helinwang
Copy link
Contributor

helinwang commented Jul 19, 2017

@WoNiuHu 我这里测试可以找到paddle

➜  v1_api_demo git:(2885) ✗ docker run -it paddlepaddle/paddle:0.10.0 bash        
root@482cf1f3cb15:/# paddle
usage: paddle [--help] [<args>]
These are common paddle commands used in various situations:
    train             Start a paddle_trainer
    merge_model       Start a paddle_merge_model
    pserver           Start a paddle_pserver_main
    version           Print paddle version
    dump_config       Dump the trainer config as proto string
    make_diagram      Make Diagram using Graphviz

'paddle train --help' 'paddle merge_model --help', 'paddle pserver --help', list more detailed usage of each command

@WoNiuHu
Copy link
Author

WoNiuHu commented Jul 20, 2017

@helinwang 这个是镜像paddlepaddle/paddle:0.10.0支持GPU的吗?

@WoNiuHu
Copy link
Author

WoNiuHu commented Jul 20, 2017

@typhoonzero hi,可以提供一个能跑v1版本api的gpu镜像版本吗?

@Yancey1989
Copy link
Contributor

@WoNiuHu 可以用 paddlepaddle/paddle:0.10.0-gpu , PaddlePaddle的镜像在https://hub.docker.com/r/paddlepaddle/paddle/tags/ 可以看到。

@typhoonzero
Copy link
Contributor

楼上正解~

@WoNiuHu
Copy link
Author

WoNiuHu commented Jul 20, 2017

@Yancey1989 这个确定可以跑v1版本api的GPU版本么?因为之前下的0.9.0的是支持CPU,但是GPU的时候报错

@Yancey1989
Copy link
Contributor

0.9.0的GPU报错的原因已经在#2931 (comment) 给出了解释,0.9.0-gpu的cuda版本可能过低,所以还请尝试下用cuda8编译的0.10.0-gpu 版本镜像。

@lcy-seso
Copy link
Contributor

I close this issue due to inactivity. please feel free to reopen it if more information is available.

heavengate pushed a commit to heavengate/Paddle that referenced this issue Aug 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants