Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decorator 4 disable recursive boxing call #5796

Merged
merged 22 commits into from
Aug 13, 2021

Conversation

lixinqi
Copy link
Contributor

@lixinqi lixinqi commented Aug 8, 2021

本pr修复一个关键BUG: 递归调用eager consistent op interpreter时不再设置consistent id,因为此时不是logical op。
本pr加入更强的检查,递归调用eager consistent op interpreter时,禁止boxing行为。

#include "oneflow/core/rpc/include/global_process_ctx.h"

namespace oneflow {

/*static*/ Maybe<Symbol<RankGroup>> RankGroup::New(Symbol<ParallelDesc> parallel_desc) {
return DECORATE(&RankGroup::RawNew, ThreadLocal)(parallel_desc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

从 RawNew(Symbol parallel_desc) 的实现来看它并不需要缓存?直接调 New(const std::set<int64_t>& ranks) 就可以,然后 New(const std::set<int64_t>& ranks) 缓存了 RawNew(const std::set<int64_t>& ranks) 的结果

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

构建和查询std::set<int64_t>成本也不低。Symbol的成本只有int64_t

sbp = (flow.sbp.broadcast,)
y = x.to_consistent(placement=placement, sbp=sbp)
y.check_meta_consistency()
print(y.shape, y.placement, y.sbp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个 print 如果是为了检查正确性,需要改成 test_case.assertTrue(...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

嗯。这应该是忘删了。

@@ -74,7 +72,6 @@ template<Maybe<void> (*SendOrRecv)(const TransportToken&, int64_t, void*, std::s
std::function<void()>*)>
Maybe<void> AccessToAllOtherRanks(Symbol<RankGroup> rank_group, const TransportToken& token,
AsyncTransportCtx* ctx) {
CHECK_OR_RETURN(rank_group->ContainingCurrentRank());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里的检查为什么移动到了外层函数里呢,放在这里会更合理吧

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里发生过BUG。某些情况下rank0 会给 rank{1, 2, 3}发数据,显然rank{1, 2, 3}不会包含rank0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

也许我应该改一下函数名称

++*recursive_depth;
RetT&& ret = func(arg0, arg1, outputs, args...);
--*recursive_depth;
if (*recursive_depth == 0) { JUST(InitConsistentId(outputs)); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

建议:这个地方可以考虑用 raii?这样可以直接 return func(...),并且之后如果 40 行的区域里出现会提前返回的情况(比如 RetT&& ret = func(arg0, JUST(arg1), outputs, args...);)也能保证这里一定执行

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以。

struct CheckConsistentTensorMeta<RetT, const std::shared_ptr<one::Tensor>&, Args...> {
static_assert(is_maybe<RetT>::value, "returned value type must be Maybe<T>.");
template<RetT (*func)(const std::shared_ptr<one::Tensor>&, Args...)>
static RetT Call(const std::shared_ptr<one::Tensor>& tensor, Args... args) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个函数可以直接返回 CheckConsistentTensorMeta<RetT, const one::Tensor&, Args...>::Call(*tensor, args...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

可以。

@oneflow-ci-bot oneflow-ci-bot self-requested a review August 13, 2021 03:52
@chengtbf chengtbf removed the request for review from oneflow-ci-bot August 13, 2021 04:43
@chengtbf
Copy link
Contributor

mark。暂时移除 automerge,两个小时以后加回来。

@github-actions
Copy link
Contributor

Speed stats:
GPU Name: GeForce GTX 1080 

PyTorch resnet50 time: 140.5ms (= 7026.4ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 128.3ms (= 6414.0ms / 50, input_shape=[16, 3, 224, 224], backward is enabled)
Relative speed: 1.10 (= 140.5ms / 128.3ms)

PyTorch resnet50 time: 82.4ms (= 4120.3ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 74.3ms (= 3717.0ms / 50, input_shape=[8, 3, 224, 224], backward is enabled)
Relative speed: 1.11 (= 82.4ms / 74.3ms)

PyTorch resnet50 time: 58.3ms (= 2914.2ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 49.3ms (= 2465.7ms / 50, input_shape=[4, 3, 224, 224], backward is enabled)
Relative speed: 1.18 (= 58.3ms / 49.3ms)

PyTorch resnet50 time: 48.9ms (= 2447.3ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 44.6ms (= 2229.1ms / 50, input_shape=[2, 3, 224, 224], backward is enabled)
Relative speed: 1.10 (= 48.9ms / 44.6ms)

PyTorch resnet50 time: 45.0ms (= 2251.4ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
OneFlow resnet50 time: 39.1ms (= 1952.9ms / 50, input_shape=[1, 3, 224, 224], backward is enabled)
Relative speed: 1.15 (= 45.0ms / 39.1ms)

@oneflow-ci-bot oneflow-ci-bot merged commit c071635 into master Aug 13, 2021
@oneflow-ci-bot oneflow-ci-bot deleted the decorator_4_disable_recursive_boxing_call branch August 13, 2021 07:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants