Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update my fork #2

Merged
merged 123 commits into from
Sep 17, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
123 commits
Select commit Hold shift + click to select a range
54d8424
[2.0API] fix weight_norm support negative dim and unittest in convert…
ceci3 Sep 8, 2020
ed2f57c
Restore file changes caused by pre-commit (#27105)
Avin0323 Sep 8, 2020
58f3ef9
fix typo for interp_v2,test=develop (#26843)
tink2123 Sep 8, 2020
d6ee086
fix unsqueeze in dygraph (#27107)
zhiqiu Sep 8, 2020
5fb8c92
fix multihead matmul shared params (#27121)
cryoco Sep 8, 2020
1b102dd
optimize the error message for unpooling.cc
kinghuin Sep 8, 2020
dc00bd6
delate some wrong message test=develop, test=document_fix (#26595)
yinhaofeng Sep 8, 2020
5dec254
fix weight (#26986)
seiriosPlus Sep 8, 2020
cbcd5e4
Fix problem that target name already exists when there isn't model da…
Avin0323 Sep 8, 2020
8df5b4d
Add correlation api to contrib (#27015)
LielinJiang Sep 8, 2020
4c70e31
add save load to jit.all (#27131)
chenwhql Sep 8, 2020
f2d68d3
【paddle.fleet】parameter_server_optimizer support auto_strategy (#26838)
123malin Sep 8, 2020
d471016
add timeout unittests retry (#27152)
XieYunshen Sep 8, 2020
753a074
Temporarily turn off WITH_INFERENCE_API_TEST (#27170)
zhwesky2010 Sep 8, 2020
944f8ae
Upgrade coverage tool to python3
chalsliu Sep 8, 2020
0dab0fc
add back triu in fluid (#27135)
yaoxuefeng6 Sep 8, 2020
a28ae86
Enhance ops to support LoD as input for dygraph detection models. (#2…
jerrywgz Sep 8, 2020
eb01976
[2.0 API]Add checker in grid_sample_grad op (#27126)
wanghaoshuang Sep 8, 2020
c7b9d97
fix avg_pool3d count_include_pad as True,test=develop (#27155)
LDOUBLEV Sep 8, 2020
13804ed
Error msg/polish tensor error msg (#26976)
hbwx24 Sep 8, 2020
4d7d661
Fix kl and summary bug (#27132)
LielinJiang Sep 8, 2020
4558d39
fix Norm op error (#26771)
yongqiangma Sep 8, 2020
eb27663
resolve the issue of curl having same exit code with paddle build fai…
XieYunshen Sep 8, 2020
ed29269
optimize the error message for math dir
kinghuin Sep 9, 2020
252aeb1
[Dy2stat]Add naming rule if not specific InputSpec.name (#26997)
Aurelius84 Sep 9, 2020
3497fbe
Use paddle.disable_static() to replace with dygraph.guard(). (#27139)
liym27 Sep 9, 2020
ca6100d
disable ut, fix it @malin (#27200)
seiriosPlus Sep 9, 2020
c60352b
update requirements (#27172)
iducn Sep 9, 2020
43b0445
Add double grad in reduce sum (#27115)
qjing666 Sep 9, 2020
f7d08b7
【paddle.fleet】refine launch and distributed repr string for print (#2…
guru4elephant Sep 9, 2020
c71d79b
[cuda11 support] change the CMakeLists to support the cuda11 (#27124)
wangchaochaohu Sep 9, 2020
3b8f520
add dgc cuda11 support for Paddle (#27204)
wangchaochaohu Sep 9, 2020
a1b640b
Fix test_origin_info to be compatible with PY3.8, because ast module …
liym27 Sep 9, 2020
5d039f4
modified the implement of Lars optimizer (#26733)
JZ-LIANG Sep 9, 2020
edd962b
Add 2.0 inference api doc. (#27125)
jiweibo Sep 9, 2020
7c8e980
fix enforce shell dir, test=document_fix (#27215)
chenwhql Sep 9, 2020
50e60e8
update error info for selected_rows_functor
Steffy-zxf Sep 10, 2020
cc3306f
restruct logsumexp to speed up compiling (#27191)
zhupengyang Sep 10, 2020
40dd563
Decrease test_parallel_executor_crf CI time, test=develop (#27212)
zhhsplendid Sep 10, 2020
d3874ab
Move unittest test_optimizer_in_control_flow from CI multi_cards. (#2…
liym27 Sep 10, 2020
fde5cfe
fix the CudaPinMemory bug for the equal op (#27176)
wawltor Sep 10, 2020
60c3ef3
【paddle.fleet】parameter_server_optimizer support auto_strategy (#27181)
123malin Sep 10, 2020
5bd84b2
revert divide (#27202)
ForFishes Sep 10, 2020
07d089f
Check benchmark issues in CI
chalsliu Sep 10, 2020
5406b01
Refine jit.save implement to adapt InputSpec using cases (#26959)
chenwhql Sep 10, 2020
e005861
[oneDNN]Introducing oneDNN 1.6 (#27137)
jczaja Sep 10, 2020
78446ec
[UT] fix run type of ut test cases of test_train_recognize_digits and…
qili93 Sep 10, 2020
7c7fbd3
fix error msg of fused_embedding_fc_lstm_op, test=develop (#27231)
qili93 Sep 10, 2020
58a88ba
add double grad for expand (#27183)
Sep 10, 2020
c5f957a
add double grad for tile op and expand_v2 op (#27114)
Sep 10, 2020
f6be598
Reduce the parallel compile count (#27187)
zhwesky2010 Sep 10, 2020
ece74c4
Update the _get_fake_quant_type definition in imperative QAT. (#27222)
wzzju Sep 10, 2020
b671538
* Reduce the training iterations in test_fetch_unmerged and test_fuse…
wzzju Sep 10, 2020
2e59769
add empty op (c++, python, unit test) (#26659)
windstamp Sep 11, 2020
1b84c0b
Lite subgraph refine predictor (#27167)
jiweibo Sep 11, 2020
5e0dde0
[Dy2stat] support usage: to_static(model) (#27040)
Aurelius84 Sep 11, 2020
ac8afe1
use structured name in loaded dict (#27242)
chenwhql Sep 11, 2020
3e20ddf
[Dy2Stat - Error Handling] Fix bug and optimize dy2stat error. (#27225)
liym27 Sep 11, 2020
f1ab288
enhance inference error info. (#27251)
jiweibo Sep 11, 2020
33ff833
fix loaded no params layer run error (#27241)
chenwhql Sep 11, 2020
20a8482
fix unused var with zero gradient bug in fluid.gradient (#27246)
Aurelius84 Sep 11, 2020
f402d8d
fix bug when axis is a tensor with more than 1 element (#27263)
zhiqiu Sep 11, 2020
19228bd
Temporally disable zero_copy (#27248)
zhiqiu Sep 11, 2020
7745ad5
Add details to the summary for show more error informations (#27165)
Avin0323 Sep 12, 2020
5c4eed6
Fix GRU mkldnn kernel fail on look_table_v2 (#27198)
lidanqing-intel Sep 12, 2020
5c1bafb
use eval to improve performance, test=develop (#25459)
zhangting2020 Sep 13, 2020
9437ce3
Error description optimize for math dir
joey12300 Sep 14, 2020
255e0cf
error messages of inference/capi, test=develop (#27258)
Shixiaowei02 Sep 14, 2020
cc3f4b8
Add int8 GRU kernel (#27220)
grygielski Sep 14, 2020
2b6a579
remove auto mode from localsgd optimizer (#27237)
ForFishes Sep 14, 2020
d708b21
Update amp_check_finite_and_scale_op and add an updating_loss_scaling…
wzzju Sep 14, 2020
aae41c6
refine error message related to paddle-TRT (#27256)
cryoco Sep 14, 2020
8d53172
move DataLoader._worker_loop to top level (#27247)
chenwhql Sep 14, 2020
79149c8
polish framework error message part 8 (#27269)
chenwhql Sep 14, 2020
9166307
add check for sparse parameters with weight_decay (#27141)
MRXLT Sep 14, 2020
d4f03df
fix for tuple,test=develop (#27190)
tink2123 Sep 14, 2020
bbad341
Enhance the error messages for files in operators/math
ZHUI Sep 14, 2020
a685435
fix conv depthwise bug (#27278)
LielinJiang Sep 14, 2020
bc3e9ba
check the validation of parameters for expand and tile apis (#26816)
Sep 14, 2020
bf461fa
Improving error report message for sequence_expand op (#27245)
Sep 14, 2020
ac9afa0
paddle.nn.functional.logsigmoid -> log_sigmoid (#27277)
zhupengyang Sep 14, 2020
6947a58
disable three unittests,test=document_fix (#27299)
XieYunshen Sep 14, 2020
1483ea2
Add bfloat16 passes (#26999)
wozna Sep 14, 2020
3ae3b86
fix trt_dynamic_shape_ernie_deserialize_test (#27290)
cryoco Sep 15, 2020
2d8281d
Remove the cache in post_traning_quantization, test=develop (#26450)
juncaipeng Sep 15, 2020
ee1ed42
change sequence length attribute to input (#27193)
GaoWei8 Sep 15, 2020
e6e2e53
Optimize error report (#27254)
shangzhizhou Sep 15, 2020
dafb0e3
Polish framework error message part 6 (#27257)
chenwhql Sep 15, 2020
bd41c31
Add *.bat file for building compile environment on windows, test=deve…
Avin0323 Sep 15, 2020
bd77a42
error messages of inference/tests, test=develop (#27259)
Shixiaowei02 Sep 15, 2020
f827665
[Pass Compatible] Bind python compatible. (#27262)
jiweibo Sep 15, 2020
cb34cf1
Set timeout value on windows and mac (#27197)
chalsliu Sep 15, 2020
47fdc60
Optimize slice trt plugin (#26970)
shangzhizhou Sep 15, 2020
c8e54c5
Disable unit-test test_fleet_rolemaker_new
chalsliu Sep 15, 2020
9dedafa
fix strategy, test=develop (#27323)
mapingshuo Sep 15, 2020
696a39e
use clcache in windows (#27279)
wanghuancoder Sep 16, 2020
4c8ea49
use shared dev_ctx (#27313)
zhiqiu Sep 16, 2020
c89f269
Fix bug of handling blank characters in operators.cmake (#27310)
zhiqiu Sep 16, 2020
dae6255
Enhance infer error info message (#26731)
jiweibo Sep 16, 2020
389a9a7
fix ports conflict when use paddlecloud to launch analogue multi-node…
danleifeng Sep 16, 2020
d003573
add the error message check for the some operator
wawltor Sep 16, 2020
4e8582f
update the error message check for the some ops
wawltor Sep 16, 2020
ef6dd6b
fix the test_fleet_lars_meta_optimizer ut. (#27291)
wzzju Sep 16, 2020
8fe1c2d
move three ut to execute only at night (#27314)
XieYunshen Sep 16, 2020
18fc927
add regularizer api (#27292)
littletomatodonkey Sep 16, 2020
950301b
Add input_spec & output_spec for TranslatedLayer (#27284)
chenwhql Sep 16, 2020
4f9d652
Polish framework error message part 7 (#27266)
chenwhql Sep 16, 2020
c296618
fix error message in broadcast/allreduce/gather (#27302)
ForFishes Sep 16, 2020
c67c391
refine fleet dataset class api (#27133)
yaoxuefeng6 Sep 16, 2020
c23f09f
Support load state_dict from save_params/persistables (#27298)
chenwhql Sep 16, 2020
4582f69
- Fix to concat oneDNN overwritting data (#27273)
jczaja Sep 16, 2020
f992f8d
fix judge cache file of inference api more accurate (#27175)
zhwesky2010 Sep 16, 2020
3409153
Fix bug in continuous apply, test=develop (#27337)
Sep 16, 2020
e25bcc9
add setup (#27346)
lelelelelez Sep 16, 2020
6e29c2d
Error description optimize for the math dir
joey12300 Sep 16, 2020
54b81fa
add adaptivelsgd in meta_optimizer (#27289)
ForFishes Sep 16, 2020
11bcf0e
Cleanup redundant code files (#27319)
gongweibao Sep 16, 2020
189e10f
Remove unnecessary requirements (#27341)
zhiqiu Sep 16, 2020
ebc6d54
fix cache file judge (#27369)
zhwesky2010 Sep 17, 2020
bf8e030
modify test_imperative_using_non_zero_gpu from use two gpus to one gp…
wanghuancoder Sep 17, 2020
746a8de
fix comment of adaptive lsgd (#27362)
ForFishes Sep 17, 2020
8d05c00
fix paddle.fleet en-doc for apis in dynamic mode (#27354)
danleifeng Sep 17, 2020
d4b4357
[Dy2stat] Change the Global Switch Name of ProgramTranslator for API …
zhhsplendid Sep 17, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 1 addition & 1 deletion cmake/cuda.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ else()
set(paddle_known_gpu_archs8 "30 35 50 52 60 61")
set(paddle_known_gpu_archs9 "30 35 50 52 60 61 70")
set(paddle_known_gpu_archs10 "30 35 50 52 60 61 70 75")
set(paddle_known_gpu_archs11 "35 50 52 60 61 70 75 80")
set(paddle_known_gpu_archs11 "52 60 61 70 75 80")
endif()

######################################################################################
Expand Down
4 changes: 2 additions & 2 deletions cmake/external/dgc.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ SET(DGC_SOURCES_DIR "${THIRD_PARTY_PATH}/dgc/src/extern_dgc")
SET(DGC_INSTALL_DIR "${THIRD_PARTY_PATH}/install/dgc")
SET(DGC_INCLUDE_DIR "${DGC_INSTALL_DIR}/include" CACHE PATH "dgc include directory." FORCE)
SET(DGC_LIBRARIES "${DGC_INSTALL_DIR}/lib/libdgc.a" CACHE FILEPATH "dgc library." FORCE)
SET(DGC_URL "http://fleet.bj.bcebos.com/collective_ef2216a.tgz")
SET(DGC_URL "https://fleet.bj.bcebos.com/dgc/collective_f66ef73.tgz")
INCLUDE_DIRECTORIES(${DGC_INCLUDE_DIR})

cache_third_party(extern_dgc
Expand All @@ -30,7 +30,7 @@ ExternalProject_Add(
extern_dgc
${EXTERNAL_PROJECT_LOG_ARGS}
"${DGC_DOWNLOAD_CMD}"
URL_MD5 "2f67549fd5f1262383d83289abc4f88f"
URL_MD5 "94e6fa1bc97169d0e1aad44570fe3251"
PREFIX "${DGC_PREFIX_DIR}"
SOURCE_DIR "${DGC_SOURCES_DIR}"
CONFIGURE_COMMAND ""
Expand Down
2 changes: 1 addition & 1 deletion cmake/external/lite.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ if (NOT LITE_SOURCE_DIR OR NOT LITE_BINARY_DIR)
set(LITE_INSTALL_DIR ${THIRD_PARTY_PATH}/install/lite)

if(NOT LITE_GIT_TAG)
set(LITE_GIT_TAG dfdfa6440c83bf0b415f9f5a9ff84842ce0bb0fa)
set(LITE_GIT_TAG 6d2b2a4028a58715b01887b04eb9bff8432eb184)
endif()

if(NOT CUDA_ARCH_NAME)
Expand Down
4 changes: 2 additions & 2 deletions cmake/external/mkldnn.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,8 @@ SET(MKLDNN_PREFIX_DIR ${THIRD_PARTY_PATH}/mkldnn)
SET(MKLDNN_SOURCE_DIR ${THIRD_PARTY_PATH}/mkldnn/src/extern_mkldnn)
SET(MKLDNN_INSTALL_DIR ${THIRD_PARTY_PATH}/install/mkldnn)
SET(MKLDNN_INC_DIR "${MKLDNN_INSTALL_DIR}/include" CACHE PATH "mkldnn include directory." FORCE)
SET(MKLDNN_REPOSITORY https://github.com/intel/mkl-dnn.git)
SET(MKLDNN_TAG 1ea812f4f5aa1bd989372a23ab50d0f0f81ee677)
SET(MKLDNN_REPOSITORY https://github.com/oneapi-src/oneDNN.git)
SET(MKLDNN_TAG 64a48f9565aa72f6359917b3406328075a409939)

# Introduce variables:
# * CMAKE_INSTALL_LIBDIR
Expand Down
2 changes: 1 addition & 1 deletion cmake/external/warpctc.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ SET(WARPCTC_PREFIX_DIR ${THIRD_PARTY_PATH}/warpctc)
SET(WARPCTC_SOURCE_DIR ${THIRD_PARTY_PATH}/warpctc/src/extern_warpctc)
SET(WARPCTC_INSTALL_DIR ${THIRD_PARTY_PATH}/install/warpctc)
set(WARPCTC_REPOSITORY https://github.com/baidu-research/warp-ctc.git)
set(WARPCTC_TAG bc29dcfff07ced1c7a19a4ecee48e5ad583cef8e)
set(WARPCTC_TAG fc7f226b93758216a03b1be9d24593a12819b984)

SET(WARPCTC_INCLUDE_DIR "${WARPCTC_INSTALL_DIR}/include"
CACHE PATH "Warp-ctc Directory" FORCE)
Expand Down
10 changes: 9 additions & 1 deletion cmake/flags.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,15 @@ function(CheckCompilerCXX11Flag)
endfunction()

CheckCompilerCXX11Flag()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11")
if (WITH_GPU)
if (${CMAKE_CUDA_COMPILER_VERSION} GREATER_EQUAL 11.0)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++14")
else()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11")
endif()
else()
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11")
endif()
# safe_set_flag
#
# Set a compile flag only if compiler is support
Expand Down
4 changes: 2 additions & 2 deletions cmake/generic.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -386,7 +386,7 @@ function(cc_test_run TARGET_NAME)
set_property(TEST ${TARGET_NAME} PROPERTY ENVIRONMENT FLAGS_cudnn_deterministic=true)
# No unit test should exceed 2 minutes.
if (APPLE OR WIN32)
set_tests_properties(${TARGET_NAME} PROPERTIES TIMEOUT 600)
set_tests_properties(${TARGET_NAME} PROPERTIES TIMEOUT 150)
else()
set_tests_properties(${TARGET_NAME} PROPERTIES TIMEOUT 120)
endif()
Expand Down Expand Up @@ -748,7 +748,7 @@ function(py_test TARGET_NAME)
endif()

if (APPLE OR WIN32)
set_tests_properties(${TARGET_NAME} PROPERTIES TIMEOUT 600)
set_tests_properties(${TARGET_NAME} PROPERTIES TIMEOUT 150)
else()
# No unit test should exceed 2 minutes in Linux.
set_tests_properties(${TARGET_NAME} PROPERTIES TIMEOUT 120)
Expand Down
7 changes: 6 additions & 1 deletion cmake/operators.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -138,12 +138,17 @@ function(op_library TARGET)
# And for detail pybind information, please see generated paddle/pybind/pybind.h.
file(READ ${TARGET}.cc TARGET_CONTENT)
string(REGEX MATCH "REGISTER_OPERATOR\\(.*REGISTER_OPERATOR\\(" multi_register "${TARGET_CONTENT}")
string(REGEX MATCH "REGISTER_OPERATOR\\([a-z0-9_]*," one_register "${multi_register}")
# [ \t\r\n]* is used for blank characters
string(REGEX MATCH "REGISTER_OPERATOR\\([ \t\r\n]*[a-z0-9_]*," one_register "${multi_register}")

if (one_register STREQUAL "")
string(REPLACE "_op" "" TARGET "${TARGET}")
else ()
string(REPLACE "REGISTER_OPERATOR(" "" TARGET "${one_register}")
string(REPLACE "," "" TARGET "${TARGET}")
# [ \t\r\n]+ is used for blank characters.
# Here we use '+' instead of '*' since it is a REPLACE operation.
string(REGEX REPLACE "[ \t\r\n]+" "" TARGET "${TARGET}")
endif()

# pybind USE_NO_KERNEL_OP
Expand Down
7 changes: 4 additions & 3 deletions cmake/third_party.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -243,9 +243,10 @@ IF(WITH_TESTING OR (WITH_DISTRIBUTE AND NOT WITH_GRPC))
ENDIF()

if(WITH_GPU)
include(external/cub) # download cub
list(APPEND third_party_deps extern_cub)

if (${CMAKE_CUDA_COMPILER_VERSION} LESS 11.0)
include(external/cub) # download cub
list(APPEND third_party_deps extern_cub)
endif()
set(CUDAERROR_URL "http://paddlepaddledeps.bj.bcebos.com/cudaErrorMessage.tar.gz" CACHE STRING "" FORCE)
file_download_and_uncompress(${CUDAERROR_URL} "cudaerror") # download file cudaErrorMessage
endif(WITH_GPU)
Expand Down
3 changes: 2 additions & 1 deletion paddle/fluid/framework/c/c_api.cc
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,8 @@ std::vector<std::string> PD_GetGradOpDescStrs(
for (size_t i = 0; i < op_num; ++i) {
PADDLE_ENFORCE_EQ(
grad_op_descs[i]->Proto()->SerializePartialToString(&ret[i]), true,
"Cannot serialize message.");
paddle::platform::errors::Unavailable(
"Cannot serialize operator desc message."));
}
}
return ret;
Expand Down
14 changes: 13 additions & 1 deletion paddle/fluid/framework/distributed_strategy.proto
100755 → 100644
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,15 @@ message AMPConfig {
repeated string custom_black_varnames = 9;
}

message LocalSGDConfig { optional int32 k_steps = 1 [ default = 4 ]; }
message LocalSGDConfig {
optional int32 k_steps = 1 [ default = 1 ];
optional int32 begin_step = 2 [ default = 1 ];
}

message AdaptiveLocalSGDConfig {
optional int32 init_k_steps = 1 [ default = 1 ];
optional int32 begin_step = 2 [ default = 1 ];
}

message GradientMergeConfig {
optional int32 k_steps = 1 [ default = 1 ];
Expand All @@ -52,6 +60,8 @@ message DGCConfig {
message LarsConfig {
optional float lars_coeff = 1 [ default = 0.001 ];
optional float lars_weight_decay = 2 [ default = 0.0005 ];
optional float epsilon = 3 [ default = 0.0 ];
repeated string exclude_from_weight_decay = 4;
}

message LambConfig {
Expand Down Expand Up @@ -116,6 +126,7 @@ message DistributedStrategy {
optional bool cudnn_exhaustive_search = 21 [ default = true ];
optional int32 conv_workspace_size_limit = 22 [ default = 4000 ];
optional bool cudnn_batchnorm_spatial_persistent = 23 [ default = true ];
optional bool adaptive_localsgd = 24 [ default = false ];

optional RecomputeConfig recompute_configs = 101;
optional AMPConfig amp_configs = 102;
Expand All @@ -126,6 +137,7 @@ message DistributedStrategy {
optional AsyncConfig a_sync_configs = 107;
optional LarsConfig lars_configs = 108;
optional LambConfig lamb_configs = 109;
optional AdaptiveLocalSGDConfig adaptive_localsgd_configs = 110;
optional BuildStrategy build_strategy = 201;
optional ExecutionStrategy execution_strategy = 202;
}
Expand Down
11 changes: 6 additions & 5 deletions paddle/fluid/framework/fleet/nccl_wrapper.cc
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ bool NCCLWrapper::is_initialized_ = false;

void NCCLWrapper::InitNCCL() {
#if defined(PADDLE_WITH_NCCL)
PADDLE_ENFORCE(platform::dynload::ncclCommInitRank(
PADDLE_ENFORCE_CUDA_SUCCESS(platform::dynload::ncclCommInitRank(
&(nccl_info_.comm_), nccl_info_.global_ranks_, nccl_info_.nccl_id_,
nccl_info_.my_global_rank_));
#endif
Expand All @@ -41,7 +41,8 @@ void NCCLWrapper::SetNCCLId(const NCCLInfo& nccl_info) {

NCCLInfo NCCLWrapper::GetNCCLId() {
#if defined(PADDLE_WITH_NCCL)
PADDLE_ENFORCE(platform::dynload::ncclGetUniqueId(&(nccl_info_.nccl_id_)));
PADDLE_ENFORCE_CUDA_SUCCESS(
platform::dynload::ncclGetUniqueId(&(nccl_info_.nccl_id_)));
#endif
return nccl_info_;
}
Expand All @@ -52,8 +53,8 @@ void NCCLWrapper::SetRankInfo(const int local_rank, const int global_rank,
nccl_info_.local_rank_ = local_rank;
nccl_info_.my_global_rank_ = global_rank;
nccl_info_.global_ranks_ = ranks;
PADDLE_ENFORCE(cudaSetDevice(local_rank));
PADDLE_ENFORCE(cudaStreamCreate(&(nccl_info_.stream_)));
PADDLE_ENFORCE_CUDA_SUCCESS(cudaSetDevice(local_rank));
PADDLE_ENFORCE_CUDA_SUCCESS(cudaStreamCreate(&(nccl_info_.stream_)));
#endif
return;
}
Expand All @@ -65,7 +66,7 @@ void NCCLWrapper::SyncVar(const int root_rank, const Scope& scope,
auto var = scope.FindVar(name);
LoDTensor* tensor = var->GetMutable<LoDTensor>();
int32_t total_size = tensor->numel();
PADDLE_ENFORCE(platform::dynload::ncclBcast(
PADDLE_ENFORCE_CUDA_SUCCESS(platform::dynload::ncclBcast(
reinterpret_cast<void*>(tensor->data<float>()), total_size, ncclFloat,
root_rank, nccl_info_.comm_, nccl_info_.stream_));
cudaStreamSynchronize(nccl_info_.stream_);
Expand Down
4 changes: 4 additions & 0 deletions paddle/fluid/framework/ir/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,8 @@ if(WITH_MKLDNN)
pass_library(conv_concat_relu_mkldnn_fuse_pass inference DIR mkldnn)
pass_library(conv_elementwise_add_mkldnn_fuse_pass inference DIR mkldnn)
pass_library(scale_matmul_fuse_pass inference DIR mkldnn)
pass_library(cpu_bfloat16_placement_pass inference DIR mkldnn)
pass_library(cpu_bfloat16_pass inference DIR mkldnn)
pass_library(fc_mkldnn_pass inference DIR mkldnn)
pass_library(cpu_quantize_placement_pass base DIR mkldnn)
pass_library(cpu_quantize_pass inference DIR mkldnn)
Expand Down Expand Up @@ -162,4 +164,6 @@ endif()
cc_test(test_cpu_quantize_squash_pass SRCS mkldnn/cpu_quantize_squash_pass_tester.cc DEPS cpu_quantize_squash_pass naive_executor)
cc_test(test_reshape_transpose_matmul_mkldnn_fuse_pass SRCS mkldnn/reshape_transpose_matmul_mkldnn_fuse_pass_tester.cc DEPS reshape_transpose_matmul_mkldnn_fuse_pass)
cc_test(test_matmul_transpose_reshape_fuse_pass SRCS mkldnn/matmul_transpose_reshape_fuse_pass_tester.cc DEPS matmul_transpose_reshape_fuse_pass)
cc_test(test_cpu_bfloat16_placement_pass SRCS mkldnn/cpu_bfloat16_placement_pass_tester.cc DEPS cpu_bfloat16_placement_pass)
cc_test(test_cpu_bfloat16_pass SRCS mkldnn/cpu_bfloat16_pass_tester.cc DEPS cpu_bfloat16_pass)
endif ()
76 changes: 76 additions & 0 deletions paddle/fluid/framework/ir/graph_pattern_detector.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1892,6 +1892,82 @@ PDNode *patterns::QuantizePlacement::operator()(
return op;
}

PDNode *patterns::Bfloat16Placement::operator()(
const std::unordered_set<std::string> &bfloat16_enabled_op_types) {
std::unordered_set<std::string> supported_op_types =
std::unordered_set<std::string>();
if (!bfloat16_enabled_op_types.empty()) {
supported_op_types = bfloat16_enabled_op_types;
}
auto *op = pattern->NewNode(op_repr())->assert_is_ops(supported_op_types);
return op;
}

PDNode *patterns::OrphanedBfloat16::operator()() {
auto *prev_op = pattern->NewNode(prev_op_repr())->assert_is_op();
prev_op->assert_more([&](Node *node) {
return node->Op()->GetAttrIfExists<std::string>("mkldnn_data_type") ==
"float32";
});
auto *prev_out = pattern->NewNode(prev_out_repr())->AsOutput();

auto *op = pattern->NewNode(op_repr())->assert_is_op();
op->assert_more([&](Node *node) {
return node->Op()->GetAttrIfExists<std::string>("mkldnn_data_type") ==
"bfloat16";
});
auto *op_out = pattern->NewNode(op_out_repr())->AsOutput();

auto *next_op = pattern->NewNode(next_op_repr())->assert_is_op();
next_op->assert_more([&](Node *node) {
return node->Op()->GetAttrIfExists<std::string>("mkldnn_data_type") ==
"float32";
});

prev_op->LinksTo({prev_out});
op->LinksFrom({prev_out}).LinksTo({op_out});
next_op->LinksFrom({op_out});
return next_op;
}

PDNode *patterns::LastBfloat16Ops::operator()() {
auto *op = pattern->NewNode(op_repr())->assert_is_op();
op->assert_more([&](Node *node) {
return node->Op()->GetAttrIfExists<std::string>("mkldnn_data_type") ==
"bfloat16";
});
auto *op_out = pattern->NewNode(op_out_repr())->AsOutput();

auto *next_op = pattern->NewNode(next_op_repr())->assert_is_op();
next_op->assert_more([&](Node *node) {
return node->Op()->GetAttrIfExists<std::string>("mkldnn_data_type") !=
"bfloat16";
});

op->LinksTo({op_out});
next_op->LinksFrom({op_out});
return next_op;
}

PDNode *patterns::FirstBfloat16Ops::operator()() {
auto *prev_op = pattern->NewNode(prev_op_repr())->assert_is_op();
prev_op->assert_more([&](Node *node) {
return node->Op()->GetAttrIfExists<std::string>("mkldnn_data_type") !=
"bfloat16";
});
auto *op_in = pattern->NewNode(op_in_repr())->AsOutput();

auto *op = pattern->NewNode(op_repr())->assert_is_op();
op->assert_more([&](Node *node) {
return node->Op()->GetAttrIfExists<std::string>("mkldnn_data_type") ==
"bfloat16";
});

prev_op->LinksTo({op_in});
op->LinksFrom({op_in});
return op;
}

PDNode *patterns::MKLDNNInPlace::operator()() {
const std::unordered_set<std::string> &supported_op_types = {
"abs",
Expand Down
41 changes: 41 additions & 0 deletions paddle/fluid/framework/ir/graph_pattern_detector.h
Original file line number Diff line number Diff line change
Expand Up @@ -1129,6 +1129,47 @@ struct QuantizePlacement : public PatternBase {
PATTERN_DECL_NODE(op);
};

struct Bfloat16Placement : public PatternBase {
Bfloat16Placement(PDPattern* pattern, const std::string& name_scope)
: PatternBase(pattern, name_scope, "bfloat16_placement") {}
PDNode* operator()(
const std::unordered_set<std::string>& bfloat16_enabled_op_types);

PATTERN_DECL_NODE(op);
};

struct OrphanedBfloat16 : public PatternBase {
OrphanedBfloat16(PDPattern* pattern, const std::string& name_scope)
: PatternBase(pattern, name_scope, "orphaned_bfloat16") {}
PDNode* operator()();

PATTERN_DECL_NODE(prev_op);
PATTERN_DECL_NODE(prev_out);
PATTERN_DECL_NODE(op);
PATTERN_DECL_NODE(op_out);
PATTERN_DECL_NODE(next_op);
};

struct LastBfloat16Ops : public PatternBase {
LastBfloat16Ops(PDPattern* pattern, const std::string& name_scope)
: PatternBase(pattern, name_scope, "last_bfloat16_ops") {}
PDNode* operator()();

PATTERN_DECL_NODE(op);
PATTERN_DECL_NODE(op_out);
PATTERN_DECL_NODE(next_op);
};

struct FirstBfloat16Ops : public PatternBase {
FirstBfloat16Ops(PDPattern* pattern, const std::string& name_scope)
: PatternBase(pattern, name_scope, "first_bfloat16_ops") {}
PDNode* operator()();

PATTERN_DECL_NODE(prev_op);
PATTERN_DECL_NODE(op_in);
PATTERN_DECL_NODE(op);
};

// Pattern used for enforcing inplace computation for in-place computation
// supporting DNNL ops. softmax, batch_norm and layer_norm
struct MKLDNNInPlace : public PatternBase {
Expand Down
Loading