Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update USERNAME/paddle #5

Merged
merged 137 commits into from
Oct 20, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
137 commits
Select commit Hold shift + click to select a range
4237fef
Add shellcheck tools and modify copyright hook (#27722)
gongweibao Oct 13, 2020
1d95a0f
fix error message for nce_op (#27863)
Oct 13, 2020
9215ad9
Update code examples for api2.0
Steffy-zxf Oct 13, 2020
e122e16
fix english doc, unittest, and remove useless alias of 2.0 lr_schedul…
zhwesky2010 Oct 13, 2020
92b3a71
Update api 2.0 for some ops
Steffy-zxf Oct 13, 2020
94faa11
Polish save load en doc details (#27845)
chenwhql Oct 13, 2020
e6a4d17
modify dtype in doublegrad matmul ut (#27868)
wangxinxin08 Oct 13, 2020
6d63cd2
add gather_op xpu, test=kunlun (#27822)
ForFishes Oct 13, 2020
9637d96
update index sample (#27839)
MrChengmo Oct 13, 2020
92708a2
modify test_load_op save path from /tmp to ./ (#27872)
wanghuancoder Oct 13, 2020
345574a
Demo CMakeLists add openmp flag. (#27848)
jiweibo Oct 13, 2020
79b5db1
bug: fix mul unitest bug (#27852)
QingshuChen Oct 13, 2020
8c25dfa
op error info (#27856)
Thunderbrook Oct 13, 2020
274071a
Perfect build compile environment script on windows, test=document_fi…
Avin0323 Oct 13, 2020
8d2cb14
support gradient merge with recompute, test=develop (#27834)
mapingshuo Oct 13, 2020
04be37c
add xpu slice op (#27349)
Thunderbrook Oct 13, 2020
9145580
Refine ProgramTranslator API English Doc for 2.0rc (#27849)
zhhsplendid Oct 13, 2020
1b12177
fix deform_conv2d doc, test=document_fix (#27873)
baiyfbupt Oct 13, 2020
7409263
Add local_response_norm in nn.functional and nn.layer (#27725)
huangjun12 Oct 13, 2020
6da7a74
add conv for xpu, test=kunlun (#27809)
tink2123 Oct 13, 2020
ddcd1b5
Add bfloat16 resnet50 test (#27755)
wozna Oct 13, 2020
c90d355
Add batch_norm and layer_norm XPU kernels (#27818)
hong19860320 Oct 13, 2020
049696b
Refine the format of printing tensor (#27673)
zhiqiu Oct 13, 2020
1b48f2f
Fix en doc for rnn.py. test=document_fix (#27835)
smallv0221 Oct 13, 2020
8028321
add kunlun-approval (#27902)
luotao1 Oct 13, 2020
1607e87
add xpu sgd & momentum (#27728)
MrChengmo Oct 13, 2020
70c8c31
support mean,softmax_with_cross_entropy on Baidu Kunlun (#27792)
yghstill Oct 13, 2020
50619cd
use floyd algorithm to find meta optimizer max path, test=develop (#2…
wangxicoding Oct 13, 2020
ae01801
Add dropout and log_loss for kunlun (#27790)
tink2123 Oct 13, 2020
9b3b3b7
Refine ParallelExecutor English Doc for 2.0RC (#27862)
zhhsplendid Oct 13, 2020
426de25
Refine Executor API English Doc for 2.0rc (#27857)
zhhsplendid Oct 13, 2020
3f2a6ab
fix error msg (#27887)
hutuxian Oct 13, 2020
62556d5
Add api of KaimingUniform & KaimingNormal in paddle.nn.initializer (#…
Ray2020BD Oct 14, 2020
6898746
disable ut (#27913)
XieYunshen Oct 14, 2020
ed31dac
remove scale loss and coll grads, test=document_fix (#27874)
chenwhql Oct 14, 2020
41aad9b
revert 4 files, from clear include by iwyu, test=develop (#27895)
wanghuancoder Oct 14, 2020
b301adc
Update all the examples which use paddle.static.nn.fc. (#27904)
Xreki Oct 14, 2020
0567781
Enable GLOO when compiled with WITH_DISTRIBUTE=ON (#27840)
Oct 14, 2020
d5cc144
tune backward filter algorithm for float16 (#27529)
zhangting2020 Oct 14, 2020
9209f34
Made a change in python API to support the GPU version of unique OP w…
AshburnLee Oct 14, 2020
6150cc8
fix Error of gpu version paddle when CUDA device is not set properly …
pangyoki Oct 14, 2020
3eb106d
Lookup table v2 xpu (#27888)
yinhaofeng Oct 14, 2020
7a58431
fix norm api doc, test=develop (#27652)
frankwhzhang Oct 14, 2020
5a83496
Multi task (#26002)
frankwhzhang Oct 14, 2020
2712d07
support kunlun matmul_v2 (#27910)
QingshuChen Oct 14, 2020
b19b01a
add dygraph code for unstack (#27881)
MRXLT Oct 14, 2020
a4f8507
【paddle.fleet】bug fix for parameter_recv (#27838)
123malin Oct 14, 2020
8e70b18
add paddle.nn.initializer API, including: Normal, TruncatedNormal, Un…
windstamp Oct 14, 2020
4ba977c
Polish some error message in opeators (#27876)
chenwhql Oct 14, 2020
95aa534
update the code for the topk message optimize
wawltor Oct 14, 2020
cf70d5b
fix paddle error informations (#27889)
seiriosPlus Oct 14, 2020
328cb28
【paddle.fleet】fix sparse load (#27680)
MrChengmo Oct 14, 2020
b3cfb02
Update example codes of ConstantInitializer & MSRAInitializer (#27915)
Ray2020BD Oct 14, 2020
772a01d
Execlude some large file temparily of shellcheck. (#27905)
gongweibao Oct 14, 2020
72efd83
[API 2.0: doc] fix doc of linspace cast assign addcmul (#27897)
vslyu Oct 14, 2020
a820871
Change PR-CI-Kunlun Test Number (#27923)
tianshuo78520a Oct 14, 2020
c5fcc96
xpu support for fill_constant Op (#27675)
wangchaochaohu Oct 14, 2020
8b30704
[API2.0]remove image_resize series api in Paddle2.0 (#27886)
shippingwang Oct 14, 2020
ae6ad23
Refine tensor to_string (#27925)
zhiqiu Oct 14, 2020
c791df0
Add elementwise XPU OP kernel for KUNLUN core, including (but still c…
joey12300 Oct 14, 2020
ff8922f
fix the dataset remove bug for the dataset
wawltor Oct 14, 2020
766b351
Remove paddle.metric.auc in 2.0 API (#27851)
qingqing01 Oct 14, 2020
82eb486
fix test_group_norm, test=develop (#27929)
frankwhzhang Oct 14, 2020
b0edda4
kunlun add op (#27890)
LDOUBLEV Oct 14, 2020
263a9e9
Fix adam (#27778)
MRXLT Oct 14, 2020
3ee6ad6
solve bug in pull_dense_worker (#27918)
Thunderbrook Oct 14, 2020
3573413
del the DEFINE_ALIAS of sigmoid_cross_entropy_with_logits (#27883)
chajchaj Oct 14, 2020
9a2a4b5
Support setting xpu place in dygraph mode (#27909)
zhiqiu Oct 14, 2020
6e5034e
fix test directory migration (#27885)
pangyoki Oct 14, 2020
d05058d
Remove and reorganize the alias of APIs (#27717)
MingMingShangTian Oct 14, 2020
fea09fe
disable ut quickly (#27793)
XieYunshen Oct 14, 2020
ed6ee53
Update requirements approval (#27947)
tianshuo78520a Oct 14, 2020
0140d74
Fix load pretrained model in hapi (#27893)
LielinJiang Oct 14, 2020
6bbb6e7
Implement the function of OutScaleForTraining/OutScaleForInference in…
gfwm2013 Oct 14, 2020
338c765
update syncbn docs, test=document_fix (#27948)
ceci3 Oct 14, 2020
d998ed0
bugfix for docs of paddle.nn.Assign API (#27927)
windstamp Oct 15, 2020
7750844
fix norm code format error (#27955)
tianshuo78520a Oct 15, 2020
d17681d
fix DataLoader single process mode exit SIGABRT error. (#27850)
heavengate Oct 15, 2020
b808979
step lr_scheduler on epoch end in hapi/model.py fit (#27730)
heavengate Oct 15, 2020
947b752
Reimplement paddle.utils.install_check. (#27771)
Xreki Oct 15, 2020
2e84518
support channel last in BatchNorm*d
Oct 15, 2020
bf412f4
add tensor clone (#27953)
zhwesky2010 Oct 15, 2020
4a4f773
Add reduce sum and reduce mean xpu op (#27939)
qjing666 Oct 15, 2020
8d7908f
【paddle.fleet】raise error when using multi-cards in fleet non_distrib…
danleifeng Oct 15, 2020
5ccaaab
reshape support bool, test=develop (#27944)
mapingshuo Oct 15, 2020
840c521
Fix problem with flags fp32 and int8 (#27954)
wozna Oct 15, 2020
f58434e
fix slice doc (#27941)
Thunderbrook Oct 15, 2020
2ac6c6c
fix bug of tensor copy of CUDAPinnedPlace (#27966)
zhwesky2010 Oct 15, 2020
aa3b4ed
【paddle.fleet】geo send sparse optimize (#27719)
123malin Oct 15, 2020
202bfab
Feature/large scale kv save base/delta (#27470)
seiriosPlus Oct 15, 2020
832458d
update code examples for paddle.sums
Steffy-zxf Oct 15, 2020
0133581
Clean text.py and decode.py for API 2.0 (#26853)
guoshengCS Oct 16, 2020
4aacacb
change paddle.fluid.layers.fill_constant to paddle.full in sample cod…
MingMingShangTian Oct 16, 2020
5f2d111
Adjust ENFORCE CI rules adapt centos (#27951)
chenwhql Oct 16, 2020
5bcb4c7
Change reduce mean (#27997)
MingMingShangTian Oct 16, 2020
afce32f
build gloo from source code instead of using the pre-compiled library…
Oct 16, 2020
64c2634
fix kunlun kernel of reshape op (#27988)
mapingshuo Oct 16, 2020
7cb4a8b
[oneDNN] Conv dilation support (#27914)
lidanqing-intel Oct 16, 2020
d330cf6
Fix xpu enforce (#27978)
joey12300 Oct 16, 2020
57a9c27
change paddle.fluid.data to paddle.static.data in sample code (#27992)
MingMingShangTian Oct 16, 2020
ff0ebef
put gloo initialization log to file (#27969)
Oct 16, 2020
f94d053
error message optimization in mean_xpu,softmax_with_cross_entropy_op_…
yghstill Oct 16, 2020
05fd49e
change paddle.fluid.layers.reduce_sum to paddle.sum in sample codes (…
MingMingShangTian Oct 16, 2020
ffcc117
[Dy2Stat] Fix Error when generating train_program in eval mode (#27975)
Aurelius84 Oct 16, 2020
78b1026
fix random failure (#27996)
zhiqiu Oct 16, 2020
fa9d3fa
Incorporate cudnn_lstm into LSTM api (#27217)
guoshengCS Oct 16, 2020
fb641c9
【paddle.fleet】fleet add _get_applied_meta_list and _get_applied_graph…
wangxicoding Oct 16, 2020
ff02173
add a comment, test=document_fix (#28008)
Oct 16, 2020
bf5325f
disable test_lstm,test=document_fix (#28030)
XieYunshen Oct 16, 2020
3718b2e
Fix test_lstm unittest failed and Add more unittest (#28029)
Aurelius84 Oct 17, 2020
2ed84a6
Add API for pad op. (#27943)
littletomatodonkey Oct 17, 2020
3e95686
add cast/concat/assign xpu op (#27911)
vslyu Oct 18, 2020
abf4d52
Polish kunlun error (#27974)
tink2123 Oct 19, 2020
74ce039
Add uniform_random XPU kernel (#27846)
pangyoki Oct 19, 2020
5b8e500
Add gaussian_random XPU kernels (#27853)
pangyoki Oct 19, 2020
4c5b779
Add truncated_gaussian_random XPU kernel (#27861)
pangyoki Oct 19, 2020
c1eed1f
error message opt for XPU, test=kunlun (#27972)
LDOUBLEV Oct 19, 2020
b6eff44
update yolo_box support h != w. test=develop (#27327)
heavengate Oct 19, 2020
975bd88
Fix error message of multinomial op (#27946)
pangyoki Oct 19, 2020
d466893
Allclose op (#27891)
huangxu96 Oct 19, 2020
e21b13f
[API 2.0: doc] transfer from paddle.fluid.layers.assign() into creati…
vslyu Oct 19, 2020
a0b2f93
reduce trt warning message (#28011)
cryoco Oct 19, 2020
086b92d
fix optimizer init (#27995)
zhwesky2010 Oct 19, 2020
5bb348a
add doc for ReduceOp (#28051)
Oct 19, 2020
55098b9
fleet support paddle.optimzier (#28026)
MRXLT Oct 19, 2020
c8d32c8
Fix diag OP bug on Windows Python3.8
LutaoChu Oct 19, 2020
5f04875
Fix xpu error message (#28061)
MrChengmo Oct 19, 2020
a5f65d5
hapi/model step learning rate on batch end. (#27991)
heavengate Oct 19, 2020
a5c95cd
Add xpu transpose2 op.test=kunlun (#28086)
TeslaZhao Oct 19, 2020
6f0c3d1
xpu adam op (#28031)
yinhaofeng Oct 19, 2020
2cb1ecb
lookup_table_v2_op_xpu report errors;test=kunlun (#28064)
yinhaofeng Oct 19, 2020
463c72c
refine gpu kernel config for Paddle (#28085)
wangchaochaohu Oct 20, 2020
a21b571
Add AVX512 instruction check for C-API (#28087)
wozna Oct 20, 2020
e3d02c9
rm max_input in conv2d for kunlun, test=kunlun (#28062)
tink2123 Oct 20, 2020
0b733e4
Add zhhsplendid into PE Approver (#27919)
zhhsplendid Oct 20, 2020
8327acc
Fix dataloader when stack input data with different type (#27950)
LielinJiang Oct 20, 2020
651dab4
Catch exception in download (#28090)
phlrain Oct 20, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
10 changes: 9 additions & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -48,5 +48,13 @@ repos:
name: copyright_checker
entry: python ./tools/codestyle/copyright.hook
language: system
files: \.(c|cc|cxx|cpp|cu|h|hpp|hxx|proto|py)$
files: \.(c|cc|cxx|cpp|cu|h|hpp|hxx|proto|py|sh)$
exclude: (?!.*third_party)^.*$ | (?!.*book)^.*$
- repo: local
hooks:
- id: shellcheck
name: shellcheck
entry: shellcheck
language: system
files: .sh$
exclude: (paddle_build.sh|fast_install.sh|check_file_diff_approvals.sh)
6 changes: 6 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -240,6 +240,12 @@ if(WITH_AMD_GPU)
include(hip)
endif(WITH_AMD_GPU)

if(WITH_DISTRIBUTE)
if(LINUX)
set(WITH_GLOO ON CACHE STRING "Enable GLOO when compiling WITH_DISTRIBUTE=ON." FORCE)
endif()
endif()

if(WITH_ARM)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fPIC")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fPIC")
Expand Down
74 changes: 30 additions & 44 deletions cmake/external/gloo.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -14,55 +14,41 @@

INCLUDE(ExternalProject)

execute_process(COMMAND bash -c "gcc -dumpversion" OUTPUT_VARIABLE GCC_VERSION)

SET(GLOO_PROJECT "extern_gloo")
IF((NOT DEFINED GLOO_VER) OR (NOT DEFINED GLOO_URL))
MESSAGE(STATUS "use pre defined download url")
SET(GLOO_VER "master" CACHE STRING "" FORCE)
SET(GLOO_NAME "gloo" CACHE STRING "" FORCE)

if(${GCC_VERSION} VERSION_EQUAL "8.2.0")
SET(GLOO_URL "https://fleet.bj.bcebos.com/gloo/gloo.tar.gz.gcc8" CACHE STRING "" FORCE)
else()
SET(GLOO_URL "https://fleet.bj.bcebos.com/gloo/gloo.tar.gz.gcc482" CACHE STRING "" FORCE)
endif()
ENDIF()

MESSAGE(STATUS "GLOO_NAME: ${GLOO_NAME}, GLOO_URL: ${GLOO_URL}")
SET(GLOO_SOURCE_DIR "${THIRD_PARTY_PATH}/gloo")
SET(GLOO_DOWNLOAD_DIR "${GLOO_SOURCE_DIR}/src/${GLOO_PROJECT}")
SET(GLOO_DST_DIR "gloo")
SET(GLOO_INSTALL_ROOT "${THIRD_PARTY_PATH}/install")
SET(GLOO_INSTALL_DIR ${GLOO_INSTALL_ROOT}/${GLOO_DST_DIR})
SET(GLOO_ROOT ${GLOO_INSTALL_DIR})
SET(GLOO_INC_DIR ${GLOO_ROOT}/include)
SET(GLOO_LIB_DIR ${GLOO_ROOT}/lib)
SET(GLOO_LIB ${GLOO_LIB_DIR}/libgloo.a)
#SET(GLOO_IOMP_LIB ${GLOO_LIB_DIR}/libiomp5.so) #todo what is this
SET(CMAKE_INSTALL_RPATH "${CMAKE_INSTALL_RPATH}" "${GLOO_ROOT}/lib")

INCLUDE_DIRECTORIES(${GLOO_INC_DIR})

FILE(WRITE ${GLOO_DOWNLOAD_DIR}/CMakeLists.txt
"PROJECT(GLOO)\n"
"cmake_minimum_required(VERSION 3.0)\n"
"install(DIRECTORY ${GLOO_NAME}/include ${GLOO_NAME}/lib \n"
" DESTINATION ${GLOO_DST_DIR})\n")
SET(GLOO_PREFIX_DIR ${THIRD_PARTY_PATH}/gloo)
SET(GLOO_SOURCE_DIR ${THIRD_PARTY_PATH}/gloo/src/extern_gloo/gloo)
SET(GLOO_INSTALL_DIR ${THIRD_PARTY_PATH}/install/gloo)
SET(GLOO_INCLUDE_DIR "${GLOO_INSTALL_DIR}/include" CACHE PATH "gloo include directory." FORCE)
SET(GLOO_LIBRARY_DIR "${GLOO_INSTALL_DIR}/lib" CACHE PATH "gloo library directory." FORCE)
# As we add extra features for gloo, we use the non-official repo
SET(GLOO_REPOSITORY https://github.com/sandyhouse/gloo.git)
SET(GLOO_TAG v0.0.2)
SET(GLOO_LIBRARIES "${GLOO_INSTALL_DIR}/lib/libgloo.a" CACHE FILEPATH "gloo library." FORCE)

INCLUDE_DIRECTORIES(${GLOO_INCLUDE_DIR})

cache_third_party(extern_gloo
REPOSITORY ${GLOO_REPOSITORY}
TAG ${GLOO_TAG}
DIR GLOO_SOURCE_DIR)

ExternalProject_Add(
${GLOO_PROJECT}
extern_gloo
${EXTERNAL_PROJECT_LOG_ARGS}
PREFIX ${GLOO_SOURCE_DIR}
DOWNLOAD_DIR ${GLOO_DOWNLOAD_DIR}
DOWNLOAD_COMMAND wget --no-check-certificate ${GLOO_URL} -c -q -O ${GLOO_NAME}.tar.gz
&& tar zxvf ${GLOO_NAME}.tar.gz
DOWNLOAD_NO_PROGRESS 1
${SHALLOW_CLONE}
"${GLOO_DOWNLOAD_CMD}"
PREFIX "${GLOO_PREFIX_DIR}"
SOURCE_DIR "${GLOO_SOURCE_DIR}"
UPDATE_COMMAND ""
CMAKE_ARGS -DCMAKE_INSTALL_PREFIX=${GLOO_INSTALL_ROOT}
CMAKE_CACHE_ARGS -DCMAKE_INSTALL_PREFIX:PATH=${GLOO_INSTALL_ROOT}
CONFIGURE_COMMAND ""
BUILD_COMMAND mkdir -p ${GLOO_SOURCE_DIR}/build
&& cd ${GLOO_SOURCE_DIR}/build && cmake .. && make
&& mkdir -p ${GLOO_LIBRARY_DIR} ${GLOO_INCLUDE_DIR}/gloo
INSTALL_COMMAND ${CMAKE_COMMAND} -E copy ${GLOO_SOURCE_DIR}/build/gloo/libgloo.a ${GLOO_LIBRARY_DIR}
COMMAND ${CMAKE_COMMAND} -E copy_directory "${GLOO_SOURCE_DIR}/gloo/" "${GLOO_INCLUDE_DIR}/gloo"
)

ADD_LIBRARY(gloo SHARED IMPORTED GLOBAL)
SET_PROPERTY(TARGET gloo PROPERTY IMPORTED_LOCATION ${GLOO_LIB})

ADD_LIBRARY(gloo STATIC IMPORTED GLOBAL)
SET_PROPERTY(TARGET gloo PROPERTY IMPORTED_LOCATION ${GLOO_LIBRARIES})
ADD_DEPENDENCIES(gloo ${GLOO_PROJECT})
3 changes: 2 additions & 1 deletion paddle/fluid/framework/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -247,7 +247,8 @@ cc_library(parallel_executor SRCS parallel_executor.cc DEPS
graph build_strategy collective_helper
fast_threaded_ssa_graph_executor variable_helper)

cc_test(dist_multi_trainer_test SRCS dist_multi_trainer_test.cc DEPS executor)
cc_test(dist_multi_trainer_test SRCS dist_multi_trainer_test.cc DEPS
conditional_block_op executor)
cc_library(prune SRCS prune.cc DEPS framework_proto boost)
cc_test(prune_test SRCS prune_test.cc DEPS op_info prune recurrent_op device_context)
cc_test(var_type_inference_test SRCS var_type_inference_test.cc DEPS op_registry
Expand Down
5 changes: 5 additions & 0 deletions paddle/fluid/framework/device_worker.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ limitations under the License. */
#include <map>
#include <memory>
#include <mutex> // NOLINT
#include <set>
#include <string>
#include <thread> // NOLINT
#include <unordered_map> // NOLINT
Expand Down Expand Up @@ -313,6 +314,10 @@ class DownpourWorker : public HogwildWorker {
std::map<uint64_t, std::vector<std::string>> dense_value_names_;
std::map<uint64_t, uint64_t> table_dependency_;
std::vector<std::pair<uint64_t, uint64_t>> copy_dense_tables_;
// multitask
std::map<int32_t, uint64_t> cond2table_map_;
std::set<uint64_t> condvalue_set_;
bool flag_partial_push_;

private:
// std::vector<std::string> dump_param_;
Expand Down
51 changes: 44 additions & 7 deletions paddle/fluid/framework/downpour_worker.cc
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */

#include <cstdlib>
#include <ctime>
#include "paddle/fluid/framework/device_worker.h"
#include "paddle/fluid/platform/cpu_helper.h"

Expand Down Expand Up @@ -65,6 +67,13 @@ void DownpourWorker::Initialize(const TrainerDesc& desc) {
}
}

flag_partial_push_ = false;
for (auto& m : param_.program_config(0).partial_pushdense_condtable_map()) {
cond2table_map_[m.key()] = m.value();
condvalue_set_.insert(m.value());
flag_partial_push_ = true;
}

skip_ops_.resize(param_.skip_ops_size());
for (int i = 0; i < param_.skip_ops_size(); ++i) {
skip_ops_[i] = param_.skip_ops(i);
Expand Down Expand Up @@ -876,14 +885,42 @@ void DownpourWorker::TrainFiles() {
#endif

if (need_to_push_dense_) {
for (int i = 0; i < param_.program_config(0).push_dense_table_id_size();
++i) {
uint64_t tid = static_cast<uint64_t>(
param_.program_config(0).push_dense_table_id(i));
fleet_ptr_->PushDenseVarsAsync(
*thread_scope_, tid, dense_grad_names_[tid], &push_sparse_status_,
scale_datanorm_, cur_batch);
if (flag_partial_push_) {
Variable* var = (*thread_scope_).FindVar("cond_tag");
LoDTensor* tensor = var->GetMutable<LoDTensor>();
// check type in python code
int64_t* cond_value_batch = tensor->data<int64_t>();

for (int i = 0; i < param_.program_config(0).push_dense_table_id_size();
++i) {
uint64_t tid = static_cast<uint64_t>(
param_.program_config(0).push_dense_table_id(i));
if (condvalue_set_.find(tid) != condvalue_set_.end()) {
// common dense table must push dense
if (cond2table_map_[cond_value_batch[0]] != tid) {
// can't push dense
continue;
}
}

VLOG(3) << "push multitask dense gradient " << tid;
fleet_ptr_->PushDenseVarsAsync(
*thread_scope_, tid, dense_grad_names_[tid], &push_sparse_status_,
scale_datanorm_, cur_batch);
}

} else {
for (int i = 0; i < param_.program_config(0).push_dense_table_id_size();
++i) {
uint64_t tid = static_cast<uint64_t>(
param_.program_config(0).push_dense_table_id(i));

fleet_ptr_->PushDenseVarsAsync(
*thread_scope_, tid, dense_grad_names_[tid], &push_sparse_status_,
scale_datanorm_, cur_batch);
}
}

VLOG(3) << "push dense gradient done.";

// the following code should be more precise and clean
Expand Down
6 changes: 6 additions & 0 deletions paddle/fluid/framework/fleet/fleet_wrapper.cc
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,12 @@ limitations under the License. */
#include "paddle/fluid/framework/fleet/fleet_wrapper.h"
#include <algorithm>
#include <utility>
#include "paddle/fluid/framework/channel.h"
#include "paddle/fluid/framework/data_feed.h"
#include "paddle/fluid/framework/io/fs.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/scope.h"
#include "paddle/fluid/platform/timer.h"

namespace paddle {
namespace framework {
Expand Down
9 changes: 9 additions & 0 deletions paddle/fluid/framework/fleet/heter_wrapper.cc
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,15 @@ See the License for the specific language governing permissions and
limitations under the License. */

#include "paddle/fluid/framework/fleet/heter_wrapper.h"
#include <algorithm>
#include <utility>
#include "paddle/fluid/framework/channel.h"
#include "paddle/fluid/framework/data_feed.h"
#include "paddle/fluid/framework/device_worker.h"
#include "paddle/fluid/framework/io/fs.h"
#include "paddle/fluid/framework/op_registry.h"
#include "paddle/fluid/framework/scope.h"
#include "paddle/fluid/platform/timer.h"
#ifdef PADDLE_WITH_PSLIB

namespace paddle {
Expand Down
7 changes: 7 additions & 0 deletions paddle/fluid/framework/hetercpu_worker.cc
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,13 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. */

#include "paddle/fluid/framework/device_worker.h"
#include "paddle/fluid/framework/device_worker_factory.h"
#include "paddle/fluid/framework/fleet/fleet_wrapper.h"
#include "paddle/fluid/framework/fleet/heter_wrapper.h"
#include "paddle/fluid/platform/cpu_helper.h"
#include "paddle/fluid/string/string_helper.h"

#ifdef PADDLE_WITH_PSLIB

#if defined _WIN32 || defined __APPLE__
Expand Down
3 changes: 3 additions & 0 deletions paddle/fluid/framework/hogwild_worker.cc
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ limitations under the License. */
#include "paddle/fluid/framework/data_type.h"
#include "paddle/fluid/framework/device_worker.h"
#include "paddle/fluid/framework/device_worker_factory.h"
#include "paddle/fluid/operators/controlflow/conditional_block_op_helper.h"
#include "paddle/fluid/operators/distributed/distributed.h"
#include "paddle/fluid/platform/cpu_helper.h"
#include "paddle/fluid/platform/lodtensor_printer.h"
Expand Down Expand Up @@ -47,6 +48,8 @@ void HogwildWorker::CreateThreadOperators(const ProgramDesc &program) {
ops_.push_back(local_op_ptr);
continue;
}
operators::PrepareSafeEagerDeletionOnConditionalOpAndConditionalGradOp(
program, 0, ops_);
}

void HogwildWorker::CreateThreadScope(const ProgramDesc &program) {
Expand Down
13 changes: 0 additions & 13 deletions paddle/fluid/framework/ir/mkldnn/conv_bias_mkldnn_fuse_pass.cc
Original file line number Diff line number Diff line change
Expand Up @@ -84,19 +84,6 @@ void ConvBiasFusePass::ApplyImpl(ir::Graph* graph) const {
VLOG(3) << "do not perform " + type() + "+bias fuse";
return;
}
if (conv->Op()->HasAttr("dilations")) {
auto dilations =
BOOST_GET_CONST(std::vector<int>, conv->Op()->GetAttr("dilations"));
for (const auto& d : dilations) {
if (d != 1) {
LOG(WARNING)
<< "dilation conv not supported in MKLDNN, fuse not apply "
<< "and set conv attribute use_mkldnn = false";
conv->Op()->SetAttr("use_mkldnn", false);
return;
}
}
}

auto* eltwise_bias_tensor =
scope->FindVar(eltwise_bias->Name())->GetMutable<LoDTensor>();
Expand Down
2 changes: 2 additions & 0 deletions paddle/fluid/framework/ir/shuffle_channel_detect_pass.cc
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@
// See the License for the specific language governing permissions and
// limitations under the License.

#include <string>

#include "paddle/fluid/framework/ir/shuffle_channel_detect_pass.h"
#include "paddle/fluid/framework/op_version_registry.h"

Expand Down
7 changes: 2 additions & 5 deletions paddle/fluid/framework/pull_dense_worker.cc
Original file line number Diff line number Diff line change
Expand Up @@ -14,14 +14,11 @@ limitations under the License. */
#include <time.h>

#include "paddle/fluid/framework/device_worker.h"
#include "paddle/fluid/framework/fleet/fleet_wrapper.h"

namespace paddle {
namespace framework {

class LoDTensor;
class Scope;
class Variable;

std::shared_ptr<PullDenseWorker> PullDenseWorker::s_instance_ = NULL;
std::mutex PullDenseWorker::mutex_for_version_;
std::map<uint64_t, uint64_t> PullDenseWorker::last_versions_;
Expand Down Expand Up @@ -70,7 +67,7 @@ void PullDenseWorker::Initialize(const TrainerDesc& param) {
}

void PullDenseWorker::CreatePinVar() {
#if (defined PADDLE_WITH_CUDA) || (defined PADDLE_WITH_PSLIB)
#if (defined PADDLE_WITH_CUDA) || (defined PADDLE_WITH_XPU)
// for (auto& v : dense_value_names_) {
// for (auto& name : v.second) {
for (int i = 0; i < dwp_param_.program_config(0).pull_dense_table_id_size();
Expand Down
21 changes: 21 additions & 0 deletions paddle/fluid/framework/tensor_util.cc
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,14 @@ See the License for the specific language governing permissions and
limitations under the License. */

#include "paddle/fluid/framework/tensor_util.h"

#include <algorithm>
#include <limits>
#include <memory>
#include <string>
#include <utility>
#include <vector>

#include "paddle/fluid/framework/data_type.h"
#include "paddle/fluid/platform/profiler.h"

Expand Down Expand Up @@ -81,6 +84,12 @@ void TensorCopy(const Tensor& src, const platform::Place& dst_place,
}
#endif
#ifdef PADDLE_WITH_CUDA
else if (platform::is_cuda_pinned_place(src_place) && // NOLINT
platform::is_cuda_pinned_place(dst_place)) {
memory::Copy(BOOST_GET_CONST(platform::CUDAPinnedPlace, dst_place), dst_ptr,
BOOST_GET_CONST(platform::CUDAPinnedPlace, src_place), src_ptr,
size);
}
else if (platform::is_cuda_pinned_place(src_place) && // NOLINT
platform::is_cpu_place(dst_place)) {
memory::Copy(BOOST_GET_CONST(platform::CPUPlace, dst_place), dst_ptr,
Expand Down Expand Up @@ -282,6 +291,12 @@ void TensorCopySync(const Tensor& src, const platform::Place& dst_place,
}
#endif
#ifdef PADDLE_WITH_CUDA
else if (platform::is_cuda_pinned_place(src_place) && // NOLINT
platform::is_cuda_pinned_place(dst_place)) {
memory::Copy(BOOST_GET_CONST(platform::CUDAPinnedPlace, dst_place), dst_ptr,
BOOST_GET_CONST(platform::CUDAPinnedPlace, src_place), src_ptr,
size);
}
else if (platform::is_cuda_pinned_place(src_place) && // NOLINT
platform::is_cpu_place(dst_place)) {
memory::Copy(BOOST_GET_CONST(platform::CPUPlace, dst_place), dst_ptr,
Expand Down Expand Up @@ -943,6 +958,12 @@ void TensorFromDLPack(const ::DLTensor& dl_tensor, framework::Tensor* dst) {
#endif
}

template <typename T>
std::string format_tensor(const framework::Tensor& tensor) {
// TODO(zhiqiu): use the print option to format tensor.
return "NOT IMPLEMENTED";
}

template <typename T>
std::ostream& print_tensor(std::ostream& os, const framework::Tensor& tensor) {
auto inspect = tensor.data<T>();
Expand Down
Loading