Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto Parallel #8891

Merged
merged 61 commits into from
Sep 27, 2022
Merged
Show file tree
Hide file tree
Changes from 28 commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
85ac376
add auto_parallel code
wyg1997 Mar 18, 2022
40e791f
Feat ap remove hierarchy cast (#7919)
wyg1997 Mar 30, 2022
1006247
Fix add conv grad cost (#7972)
wyg1997 Apr 8, 2022
b49b953
Auto parallel/fast collector (#7958)
Yipeng1994 Apr 27, 2022
d92cd6b
AutoParallel mainstem algorithm add mutable_op_ctrl_edge (#8033)
wyg1997 May 5, 2022
50f478c
fix(AutoParallel): fix pooling computation cost function bug (#8147)
wyg1997 May 6, 2022
c919dce
[WIP] Fix auto parallel dump uniform sbp bug (#8330)
wyg1997 Jun 2, 2022
f3fc750
update auto_parallel config (#8356)
wyg1997 Jun 2, 2022
5d76b85
Refactor dump nd sbp for auto parallel (#8353)
wyg1997 Jun 9, 2022
98353b7
rename Global to Singleton
wyg1997 Jul 4, 2022
f26b68e
Refactor SbpEdge (#8684)
wyg1997 Jul 20, 2022
b5cc87b
Refactor auto parallel sbp node (#8712)
Yipeng1994 Jul 21, 2022
4e8aebb
Refactor auto parallel sbp graph (#8722)
Yipeng1994 Jul 22, 2022
56d70f8
Refactor auto parallel rest (#8731)
Yipeng1994 Jul 25, 2022
a183c7c
fix merge conflict
wyg1997 Jul 28, 2022
f4093ff
Remove template for sbp signature (#8787)
Yipeng1994 Aug 1, 2022
7587e8c
Refactor auto parallel class object stuff (#8835)
Yipeng1994 Aug 4, 2022
accb933
Fix auto parallel copy cost infer2 (#8788)
Yipeng1994 Aug 5, 2022
5ddc991
Refactor prune identity as much as possible (#8849)
Yipeng1994 Aug 5, 2022
a2db39d
Fix auto parallel low throughput (#8876)
Yipeng1994 Aug 8, 2022
6642e74
Refactor auto parallel final check (#8887)
Yipeng1994 Aug 9, 2022
7a47afb
Merge branch 'master' into feat-auto_parallel
wyg1997 Aug 9, 2022
96237b3
Merge branch 'master' into feat-auto_parallel
Yipeng1994 Aug 10, 2022
afd8a96
Docs auto parallel doc (#8896)
wyg1997 Aug 10, 2022
a3e0886
Merge remote-tracking branch 'origin/master' into feat-auto_parallel
wyg1997 Aug 12, 2022
9d1105f
Merge branch 'master' into feat-auto_parallel
Yipeng1994 Aug 12, 2022
d0a834e
Merge remote-tracking branch 'origin/master' into feat-auto_parallel
wyg1997 Aug 15, 2022
e0d1770
Test alexnet for auto_parallel (#8917)
wyg1997 Aug 16, 2022
bf0da26
Fix get sbp bug (#8939)
Yipeng1994 Aug 18, 2022
929d42d
Merge branch 'master' into feat-auto_parallel
Yipeng1994 Aug 22, 2022
010de9c
Resolve confits while merging master
Yipeng1994 Aug 22, 2022
41a2835
Recompute cost with time shape (#9009)
Yipeng1994 Aug 29, 2022
4d91a0b
Address comments
Yipeng1994 Aug 29, 2022
f77441b
Merge branch 'master' into feat-auto_parallel
Yipeng1994 Aug 29, 2022
a6ba01b
fix merge conflict
wyg1997 Aug 29, 2022
480afbb
Address comments
Yipeng1994 Sep 6, 2022
512e17e
Disabled ZeRO when enabled AutoParallel (#9087)
wyg1997 Sep 14, 2022
f1d22ba
Update oneflow/core/job_rewriter/optimizer_placement_optimization_pas…
wyg1997 Sep 15, 2022
2c5b3f8
Address comments
Yipeng1994 Sep 19, 2022
99efb17
Address comment.
Yipeng1994 Sep 20, 2022
c5872f3
Merge branch 'master' into feat-auto_parallel
Yipeng1994 Sep 21, 2022
22f557f
Update oneflow/core/job_rewriter/auto_parallel.cpp
Yipeng1994 Sep 21, 2022
76dac2b
New interface for pr#9018
Yipeng1994 Sep 21, 2022
2102158
Static analysis
Yipeng1994 Sep 21, 2022
af49e8d
Merge branch 'master' into feat-auto_parallel
mergify[bot] Sep 21, 2022
3942334
Merge branch 'master' into feat-auto_parallel
mergify[bot] Sep 21, 2022
7fc2f99
Fix ones like sbp bug and fix test import error in CI (#9123)
wyg1997 Sep 21, 2022
20d7199
Merge branch 'master' into feat-auto_parallel
wyg1997 Sep 21, 2022
145049e
auto format by CI
oneflow-ci-bot Sep 21, 2022
e332dfd
test(AutoParallel): skip acc check
wyg1997 Sep 21, 2022
82c910e
Merge branch 'master' into feat-auto_parallel
wyg1997 Sep 22, 2022
0f0e25b
Address comments
Yipeng1994 Sep 26, 2022
194c79f
rename source op set nd_sbp function and add check
wyg1997 Sep 27, 2022
c6e3f91
fix typo
wyg1997 Sep 27, 2022
6052f44
Feat full auto parallel (#9140)
Yipeng1994 Sep 27, 2022
16d39c2
add debugg log for non-deleted cast ops
wyg1997 Sep 27, 2022
9eff2ca
Merge branch 'feat-auto_parallel' of github.com:oneflow-inc/oneflow i…
wyg1997 Sep 27, 2022
083a623
update prune parallel cast op log
wyg1997 Sep 27, 2022
9144a5b
rename auto_parallel_prune_parallel_cast_ops to enable_auto_parallel_…
wyg1997 Sep 27, 2022
234e988
Merge branch 'master' into feat-auto_parallel
wyg1997 Sep 27, 2022
5e2014c
Merge branch 'master' into feat-auto_parallel
mergify[bot] Sep 27, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
70 changes: 70 additions & 0 deletions docs/source/auto_parallel.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
Auto Parallelism
====================================================

As the scale of deep-learning models grows larger and larger, distributed training,
or parallelism, is needed. Data parallelism and model parallelism has been designed
to speed up the training and solve memory issues.

In oneflow, SBP signature enables users to configure parallelism policy easily.
However, users still need to specify the SBP property for each operator, or most of them.
Users might spend a couple of days digging into the detail of parallelism and get a
low throughput just because of a slight mistake in the configuration of SBP signature.

.. note::

It only works on :doc:`graph` mode.


Our strength
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

To get rid of all those configurations for SBP signatures, we developed auto parallelism.
Still, configurations of placement are necessary and we have not supported auto placement
yet. If you read this paragraph before you rush into any SBP stuff, then congratulation,
you do not need to learn SBPs. You can start writing your code as you did under CPU mode.
Our auto parallelism would generate a fast strategy customized for your specific models,
the size of parameters, and the number of available GPUs.


How to use auto parallelism?
^^^^^^^^^^^^^^^^^^^^^^^^^^^^

You just need to simply enable the configuration settings in the model
of :doc:`graph` .

Example::

import oneflow as flow
class SubclassGraph(flow.nn.Graph):
def __init__(self):
super().__init__() # MUST be called
# auto parallelism configuration
self.config.enable_auto_parallel(True)
# other configurations about auto parallelism
# ......

def build(self):
pass

.. warning::

If you enable auto parallelism, OneFlow will take care of the SBP configurations
of operators except for explicit ``to_global`` functions.


Configuration API for auto parallelism
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. currentmodule:: oneflow.nn.graph.graph_config.GraphConfig

.. autosummary::
:toctree: generated
:nosignatures:

enable_auto_parallel
enable_auto_parallel_prune_parallel_cast_ops
set_auto_parallel_computation_cost_ratio
set_auto_parallel_wait_time
enable_auto_parallel_mainstem_algo
enable_auto_parallel_sbp_collector

1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ OneFlow upholds the core concept and architecture of static compilation and stre
nn.init
optim
graph
auto_parallel
image
utils.data
one_embedding
Expand Down
33 changes: 33 additions & 0 deletions oneflow/core/auto_parallel/algorithm_util.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
/*
Copyright 2020 The OneFlow Authors. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/

#include "oneflow/core/auto_parallel/algorithm_util.h"

namespace oneflow {
namespace auto_parallel {

// Inverse function of order
// The reason why we need the inverse_order, a.k.a id2order, instead of id2value is to eliminate
// equality. For example, we have v[0] < v[1] = v[2] < v[3] We do not know v[1] is before or after
// v[2] with comp(v[1], v[2]). But if we transfer it to order order[0] < order[1] < order[2] <
// order[3] We know the strict order.
void InverseOrder(const std::vector<int32_t>& order, std::vector<int32_t>& inverse_order) {
inverse_order.resize(order.size());
for (int32_t i = 0; i < order.size(); i++) { inverse_order[order[i]] = i; }
Yipeng1994 marked this conversation as resolved.
Show resolved Hide resolved
}

} // namespace auto_parallel
} // namespace oneflow
82 changes: 82 additions & 0 deletions oneflow/core/auto_parallel/algorithm_util.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
/*
Copyright 2020 The OneFlow Authors. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
#ifndef ONEFLOW_CORE_AUTO_PARALLEL_ALGORITHM_UTIL_H_
#define ONEFLOW_CORE_AUTO_PARALLEL_ALGORITHM_UTIL_H_

#include <vector>
#include <cstdlib>
#include <algorithm>
#include <unordered_map>

namespace oneflow {
namespace auto_parallel {

// this function is to remove the i-th element from a vector in Constant time.
// the vector should not care about ordering.
// Be more careful about this function. Make sure that the traveling order of
// the vector goes from back to front.
template<class T>
void RemoveFrom(std::vector<T>& v, int32_t i) {
v[i] = v.back();
v.pop_back();
}

template<class T>
void CheckAndRemoveFrom(std::vector<T>& v, T& t) {
for (int32_t i = v.size() - 1; i >= 0; i--) {
if (v[i] == t) {
RemoveFrom<T>(v, i);
break;
}
}
}

// Inverse function, which transfer a vector to an unordered_map.
template<class T>
void InverseFunction(const std::vector<T>& v, std::unordered_map<T, int32_t>& inverse_map) {
inverse_map.clear();
for (int32_t i = 0; i < v.size(); i++) { inverse_map[v[i]] = i; }
}

// When you want to sort something but you can not move any elements, use order.
// Decide the order of sorting in a list v, we have
// v[order[i]] < v[order[j]] for all i<j.
// We could define the comparison, then we have
// comp(v[order[i]], v[order[j]]) == true for all i<j.
template<class T, class Compare>
void DecideOrder(const T& v, std::vector<int32_t>& order, const Compare& comp) {
// Initialize order
order.resize(v.size());
for (int32_t i = 0; i < v.size(); i++) { order[i] = i; }
// sort
std::sort(order.begin(), order.end(), [&](int32_t i, int32_t j) { return comp(v[i], v[j]); });
}

// Inverse function of order
// The reason why we need the inverse_order, a.k.a id2order, instead of id2value is to eliminate
// equality. For example, we have v[0] < v[1] = v[2] < v[3] We do not know v[1] is before or after
// v[2] with comp(v[1], v[2]). But if we transfer it to order order[0] < order[1] < order[2] <
// order[3] We know the strict order.
void InverseOrder(const std::vector<int32_t>& order, std::vector<int32_t>& inverse_order);
Yipeng1994 marked this conversation as resolved.
Show resolved Hide resolved

} // namespace auto_parallel

static const double float_deviation_minus = 0.9999999;
Yipeng1994 marked this conversation as resolved.
Show resolved Hide resolved
static const double float_deviation_plus = 1.0000001;

} // namespace oneflow

#endif // ONEFLOW_CORE_AUTO_PARALLEL_ALGORITHM_UTIL_H_
143 changes: 143 additions & 0 deletions oneflow/core/auto_parallel/binary_set.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
/*
Copyright 2020 The OneFlow Authors. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
*/
#include "oneflow/core/auto_parallel/binary_set.h"

namespace oneflow {
namespace auto_parallel {

// A static function for initialization of log_2 mapping
std::unordered_map<kBinarySetEntryType, int32_t> BinarySet::InitLog2() {
std::unordered_map<kBinarySetEntryType, int32_t> log_2;
for (int32_t i = 0; i < BinarySet::bit_entry_type_; i++) {
log_2[(kBinarySetEntryType)(1 << i)] = i;
Yipeng1994 marked this conversation as resolved.
Show resolved Hide resolved
}
return log_2;
}

// Initialization of log_2 mapping
const std::unordered_map<kBinarySetEntryType, int32_t> BinarySet::log_2_ = BinarySet::InitLog2();

// Constructor
BinarySet::BinarySet(int32_t size_of_set) : size_of_set_(size_of_set) {
int32_t k = (size_of_set - 1) / bit_entry_type_ + 1;
binary_set_values_.resize(k, 0);
}

// Initialization if needed
void BinarySet::Initialize(int32_t size_of_set) {
size_of_set_ = size_of_set;
int32_t k = (size_of_set - 1) / bit_entry_type_ + 1;
binary_set_values_.resize(k, 0);
Yipeng1994 marked this conversation as resolved.
Show resolved Hide resolved
}

// Clear all the elements in the set
void BinarySet::Clear() { binary_set_values_.assign(binary_set_values_.size(), 0); }

// Check if i-th element in this subset
int32_t BinarySet::CheckExistence(int32_t i) const {
int32_t k = i / bit_entry_type_;
int32_t j = i % bit_entry_type_;
return (binary_set_values_[k] >> j) & 1;
}

// Add i-th element into this subset
void BinarySet::AddEntry(int32_t i) {
int32_t k = i / bit_entry_type_;
int32_t j = i % bit_entry_type_;
binary_set_values_[k] |= (1 << j);
}
// Take i-th element out from this subset
void BinarySet::DeleteEntry(int32_t i) {
int32_t k = i / bit_entry_type_;
int32_t j = i % bit_entry_type_;
binary_set_values_[k] &= ~(1 << j);
}
// Get the union with another subset and store it into u
void BinarySet::UnionTo(const BinarySet& bs, BinarySet& u) {
for (int32_t k = 0; k < binary_set_values_.size(); k++) {
u.binary_set_values_[k] = binary_set_values_[k] | bs.binary_set_values_[k];
}
}
// If this binary set intersects another one
bool BinarySet::IfIntersect(const BinarySet& bs) const {
int32_t min_bs_size = std::min(binary_set_values_.size(), bs.binary_set_values_.size());
for (int32_t k = 0; k < min_bs_size; k++) {
if (binary_set_values_[k] & bs.binary_set_values_[k]) { return true; }
}
return false;
}
// Get the intersection with another subset and store it into i
void BinarySet::IntersectionTo(const BinarySet& bs, BinarySet& i) const {
int32_t min_bs_size = std::min(binary_set_values_.size(), bs.binary_set_values_.size());
if (min_bs_size > i.binary_set_values_.size()) { i.binary_set_values_.resize(min_bs_size, 0); }
for (int32_t k = 0; k < binary_set_values_.size(); k++) {
i.binary_set_values_[k] = binary_set_values_[k] & bs.binary_set_values_[k];
}
}
// Count number of elements in this subset
int32_t BinarySet::Total() const {
int32_t t = 0;
for (int32_t k = 0; k < binary_set_values_.size(); k++) {
kBinarySetEntryType bsv = binary_set_values_[k];
bsv = (bsv & 0x5555555555555555) + ((bsv >> 1) & 0x5555555555555555);
bsv = (bsv & 0x3333333333333333) + ((bsv >> 2) & 0x3333333333333333);
bsv = (bsv & 0x0F0F0F0F0F0F0F0F) + ((bsv >> 4) & 0x0F0F0F0F0F0F0F0F);
bsv = (bsv & 0x00FF00FF00FF00FF) + ((bsv >> 8) & 0x00FF00FF00FF00FF);
bsv = (bsv & 0x0000FFFF0000FFFF) + ((bsv >> 16) & 0x0000FFFF0000FFFF);
// bsv = (bsv & 0x00000000FFFFFFFF) + ((bsv >> 32) & 0x00000000FFFFFFFF);
t += int32_t(bsv);
}
return t;
}

// Output all the elements in the subset
void BinarySet::OutPut(std::vector<int32_t>& out) const {
out.clear();
for (int32_t i = 0; i < size_of_set_; i++) {
if (CheckExistence(i)) { out.emplace_back(i); }
}
}

// Output all the elements in the subset
void BinarySet::QuickOutPut(std::vector<int32_t>& out) const {
out.clear();
for (int32_t i = 0; i < binary_set_values_.size(); i++) {
kBinarySetEntryType x = binary_set_values_[i];
kBinarySetEntryType y = 0;
while (x) {
y = x;
x &= x - 1;
out.emplace_back(i * BinarySet::bit_entry_type_ + log_2_.find(y - x)->second);
}
}
}

// Add elements of input into this subset
void BinarySet::AddEntries(std::vector<int32_t>& in) {
for (int32_t i : in) { AddEntry(i); }
}

// If two binary sets are equal to each other
bool BinarySet::operator==(const BinarySet& rhs) const {
if (size_of_set_ != rhs.size_of_set_) { return false; }
for (int32_t i = 0; i < binary_set_values_.size(); i++) {
if (binary_set_values_[i] != rhs.binary_set_values_[i]) { return false; }
}
return true;
}

} // namespace auto_parallel
} // namespace oneflow
Loading