Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Executor and OpGraph #540

Merged
merged 2 commits into from
Mar 11, 2019
Merged

Conversation

klecki
Copy link
Contributor

@klecki klecki commented Feb 14, 2019

Check some constraints on OpGraph separately from Executor
processing the OpGraph

Add static traits for OpGraph constraints

Unify OpNodes processing for different OpType

Executor "owns" buffer for corresponding TensorNodes,
using data factory and related types

next: #551

dali/common.h Outdated
case DALIOpType::SUPPORT:
return "SUPPORT";
default:
return "INVALID OP TYPE";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer something that doesn't contain spaces and is obviously wrong - like "<INVALID>". Motivation for not having spaces is easier parsing, should it ever be necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

workspace_owner_t op_data;

void Resize(int support, int cpu, int mixed, int gpu) {
std::get<static_cast<int>(DALIOpType::SUPPORT)>(op_data).resize(support);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hurts my eyes, really. If it were not an enum class the same line would look like this:
std::get<DALI_OP_SUPPORT>(op_data).resize(support)


template <typename Backend>
bool IsPinned(HostWorkspace::output_t<Backend> &t) {
bool is_pinned = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a contradiction here. On one hand, you assume that pinned is default, because empty workspace is implicitly pinned. On the other hand, you implement all_of predicate here, making it no-so-default after all (it takes just one non-pinned tensor in the workspace to mark it as non-pinned).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is, currently we pin the memory for the whole batch, and I wanted to be able to call SetPinned(any_of_workspace_output_types, true) in consistent manner, instead of having to write

device_output->set_pinned(...)

and

for (auto t in host_output) 
   t->set_pinned(...)

mixed_op_data.clear();
gpu_op_data.clear();
support_op_data.clear();
std::get<0>(op_data).clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't those be enum values?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are, in some next PR, forgot to propagate it here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so maybe good idea to add it here before merging

// We instantiate the operation of adding the input only for parent op_type and device
// that are specifically allowed
template <DALIOpType op_type, DALIOpType producer_type, DALITensorDevice device>
en_if_t<allows_op_input<op_type>(producer_type) && allows_tensor_input<op_type>(device)> add_input(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that en_if_t means enable_if_t. I'd rather splurge on having the extra characters :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...or make it if_t ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// std::tuple<storage_gen_t<0>, storage_gen_t<1>, storage_gen_t<2>, storage_gen_t<3>,
// storage_gen_t<4>, storage_gen_t<5>, storage_gen_t<6>, storage_gen_t<7>>;

using storage_owner_t =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to read through a lot of code to grasp what this "storage owner" is. Consider renaming it to something less generic, e.g. WorkspaceDataStore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I'm happy for good type name suggestions. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with tensor_data_store_t as it is more about covering the TensorNodes in graph. Also I want to differentiate from workspace_store_t.

DALI_ENFORCE(device == DALITensorDevice::CPU, "Only CPU outputs allowed");
// Allocate `batch_size` Tensors for this ops
// results and add them to the workspace.
storage_gen_t<GetStorageIndex(DALIOpType::CPU, device)> output(batch_size, nullptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deserves an alias:

template <DALIOpType op_type, DALITensorDevice device>
using WorkspaceStorage = storage_gen_t<GetStorageIndex(op_type, device)>;

@klecki klecki force-pushed the executor-refactor branch 2 times, most recently from 5d7f9cd to 1196ec0 Compare February 18, 2019 17:28
@klecki klecki changed the title [WIP] Refactor Executor and OpGraph Refactor Executor and OpGraph Mar 5, 2019
@klecki klecki force-pushed the executor-refactor branch 5 times, most recently from b7bc3c4 to f3f022f Compare March 6, 2019 18:57

std::vector<OpNode> op_nodes_;
std::vector<TensorNode> tensor_nodes_;
std::vector<std::vector<OpNodeId>> op_paritions_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::vector<std::vector<OpNodeId>> op_paritions_;
std::vector<std::vector<OpNodeId>> op_partitions_;

Amazing, what code completion can propagate...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done :)

void CheckOpConstraints(const OpSpec &spec) {
const OpSchema &schema = SchemaRegistry::GetSchema(spec.name());

bool allows_multiple_inputs = schema.AllowsMultipleInputSets();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool allows_multiple_inputs = schema.AllowsMultipleInputSets();
bool allows_multiple_input_sets = schema.AllowsMultipleInputSets();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@klecki klecki force-pushed the executor-refactor branch 2 times, most recently from fe3b08d to c99e211 Compare March 7, 2019 21:10
@@ -0,0 +1,150 @@
// Copyright (c) 2019, NVIDIA CORPORATION. All rights reserved.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Format this document after changing DALIOpType to OpType.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@klecki klecki force-pushed the executor-refactor branch 2 times, most recently from f4ca954 to c059e65 Compare March 8, 2019 14:39
@klecki
Copy link
Contributor Author

klecki commented Mar 8, 2019

Build 665966

#include "dali/pipeline/util/event_pool.h"
#include "dali/pipeline/util/stream_pool.h"
#include "dali/pipeline/util/thread_pool.h"


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for those two empty lines

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

mixed_op_data.clear();
gpu_op_data.clear();
support_op_data.clear();
std::get<0>(op_data).clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so maybe good idea to add it here before merging


template <>
inline void Executor::SetupStreamsAndEvents<OpType::MIXED>(MixedWorkspace &ws,
const OpGraph &graph,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


template <>
inline void Executor::SetupStreamsAndEvents<OpType::GPU>(DeviceWorkspace &ws,
const OpGraph &graph,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

namespace dali {

std::vector<tensor_data_store_t> CreateBackingStorageForTensorNodes(const OpGraph &op_graph,
int batch_size) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

namespace dali {

std::vector<tensor_data_store_t> CreateBackingStorageForTensorNodes(const OpGraph &op_graph,
int batch_size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

void CheckArgumentInputConstraints(const OpGraph& op_graph, const OpNode& op) {
static const auto allows_argument_input = ArgumentInputConstraints();
bool arg_in_allowed = allows_argument_input[static_cast<int>(op.op_type)];
if (!arg_in_allowed) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sure that arg inputs are allowed for mixed ops as well

Copy link
Contributor Author

@klecki klecki Mar 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will do so in a followup PR, do not want to introduce changes that can break something unless it is to fix a bug.

@klecki
Copy link
Contributor Author

klecki commented Mar 8, 2019

Build: 666123 (it's stuck due to not enough free workers).

@JanuszL JanuszL added this to the Release_0.8.0 milestone Mar 11, 2019
Check some constraints on OpGraph separately from Executor
processing the OpGraph

Add static traits for OpGraph constraints

Unify OpNodes processing for different OpType

Executor "owns" buffer for corresponding TensorNodes,
using data factory and related types

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
@klecki
Copy link
Contributor Author

klecki commented Mar 11, 2019

Build: 670584

@klecki klecki merged commit 58dceff into NVIDIA:master Mar 11, 2019
haoxintong pushed a commit to haoxintong/DALI that referenced this pull request Jul 16, 2019
Check some constraints on OpGraph separately from Executor
processing the OpGraph

Add static traits for OpGraph constraints

Unify OpNodes processing for different OpType

Executor "owns" buffer for corresponding TensorNodes,
using data factory and related types

Allow for Argument Inputs in Mixed Ops

Signed-off-by: Krzysztof Lecki <klecki@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants