Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cpp trainer lib and demo #10681

Merged
merged 33 commits into from
May 22, 2018

Conversation

jacquesqiao
Copy link
Member

@jacquesqiao jacquesqiao commented May 16, 2018

task list: #10574

@jacquesqiao jacquesqiao changed the title Add cpp trainer lib and demo [wip]Add cpp trainer lib and demo May 16, 2018
@jacquesqiao jacquesqiao changed the title [wip]Add cpp trainer lib and demo Add cpp trainer lib and demo May 16, 2018
@jacquesqiao jacquesqiao requested a review from luotao1 May 17, 2018 05:33
@@ -0,0 +1,64 @@
cmake_minimum_required(VERSION 3.0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

demo的话,是不是放到fluid/demo下比较好呢?或者doc/fluid下呢?这样以后官网可以展示demo。

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, 放到 fluid/train/demo下了,这个和c++训练有关,就先放到train目录下了


set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11")

set(PADDLE_LIB "${PROJECT_SOURCE_DIR}/lib")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PROJECT_SOURCE_DIR请问是什么呢?inference_dist按照后就只有一个目录:PADDLE_LIB就够了。

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

set(PADDLE_LIB "${PROJECT_SOURCE_DIR}/lib")
set(MATH_TYPE $ENV{LIB_TYPE} CACHE STRING "Choose the Math library type: openblas mkl")

option(WITH_MKLDNN "Compile PaddlePaddle with MKLDNN" OFF)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10行可以先去掉,mkldnn的现在效果不好,以后加如何?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

保留了,默认false

add_executable(demo_trainer demo_trainer.cc)

if(MATH_TYPE STREQUAL "mklml")
include_directories("${PADDLE_LIB}/third_party/install/mklml/include")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

33行需要加么

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我自己测试的时候是需要的。

### step 1. build paddle lib

```
# option MATH_TYPE=mklml
Copy link
Contributor

@luotao1 luotao1 May 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

第5行还要加openblas
第5-6行不是编译paddle lib需要的,是第四步的时候才需要,需要移动下位置。

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

### step 2. copy lib to this dir

```
cp -r /paddle/src/dir/paddle/fluid/train/lib .
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

第2步可以去掉,库安装完不用拷贝,只要在cmake里面加上库的路径即可。

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


This will generate two files:
- startup_program: used to init all parameters
- main_program: main logic of the network
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • 第三步,需要安装完paddlepaddle whl包的。才能跑。这里也需要说明一下。
  • 这一步跑完有什么打印的信息么?可以写一下,这样用户知道这一步成功了。

mkdir build
cd build
cmake ..
make
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这一步的例子中,cmake选项没有实例,可以参考https://github.com/luotao1/fluid_inference_example#inference-example-project

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

make
cp ../startup_program .
cp ../main_program .
./demo_trainer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里跑完有什么结果么?可以写一下结果,这样用户知道例子跑通了。

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

### step 4. build demo_trainer and run it.

```
mkdir build
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里需要说明下是在当前目录mkdir build,用户不知道去哪儿新建目录。
也需要说明整个demo目录可以放在任意一个地方。

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

} // namespace train
} // namespace paddle
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReadBinaryFile和load函数既然在paddle/train的namespace里面,是否应该放在paddle代码里,而不是demo代码里。

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

打算后面统一整理一下

@@ -0,0 +1,66 @@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

需要注明是使用CPU静态库的版本。GPU/动态库有点区别,但类似。

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

后面统一整理?目前业务方只需要用CPU的部分,可以先给他们用起来

-DWITH_MKL=OFF \
-DWITH_MKLDNN=OFF
make -j8
make -j8 inference_lib_dist
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Superjomn 训练也使用的话,叫inference_lib_dist好么?改成fluid_lib_dist?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

打算后面统一整理一下,这个demo可以尽快merge

```
step: 0 loss: 1069.02
step: 1 loss: 1069.02
step: 2 loss: 1069.02
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: Logging before InitGoogleLogging() is written to STDERR
W0521 20:55:43.756713 24692 init.cc:84] 'CUDA' is not supported, Please re-compile with WITH_GPU option
W0521 20:55:43.756901 24692 init.cc:100] 'CUDA' is not supported, Please re-compile with WITH_GPU option
step: 0 loss: 58.8651
step: 1 loss: 58.8651
step: 2 loss: 58.8651

我打出来是这样,每个阶段的loss都一样?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

对,因为没有加入optimize op,有一个参数控制,https://github.com/PaddlePaddle/Paddle/pull/10681/files#diff-7e8d0736b2aff0b2bc699d05f454e0a3R19
这是业务的需求

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

加上之后可以正常收敛


auto loss_var = scope.Var(loss_name);

for (int i = 0; i < 100; ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里i=10,10个循环就够了

Copy link
Contributor

@luotao1 luotao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 后续可以进一步优化demo

@jacquesqiao jacquesqiao merged commit be26b71 into PaddlePaddle:develop May 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants