-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
iter-to-iter variable batch size #2481
Conversation
60bf4f2
to
f08e191
Compare
99a4f54
to
a15ea34
Compare
!build |
CI MESSAGE: [1924708]: BUILD STARTED |
CI MESSAGE: [1924708]: BUILD FAILED |
!build |
CI MESSAGE: [1926219]: BUILD STARTED |
CI MESSAGE: [1926219]: BUILD FAILED |
dali/operators/decoder/nvjpeg/decoupled_api/nvjpeg_decoder_decoupled_api.h
Outdated
Show resolved
Hide resolved
dali/operators/decoder/nvjpeg/decoupled_api/nvjpeg_decoder_decoupled_api.h
Outdated
Show resolved
Hide resolved
!build |
CI MESSAGE: [1963736]: BUILD STARTED |
CI MESSAGE: [1963736]: BUILD FAILED |
exec_pipelined=False) | ||
with pipe: | ||
data = fn.external_source(source=input_data, cycle=False, device=device) | ||
processed = fn.python_function(data, function=resize, num_outputs=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
processed = fn.python_function(data, function=resize, num_outputs=1) | |
processed = fn.python_function(data, function=resize) |
DALI_ENFORCE(bsps[0]->NextBatchSize() == bsps[i]->NextBatchSize(), | ||
"Batch size must be uniform across an iteration"); | ||
} | ||
auto batch_size = bsps[0]->NextBatchSize(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: move the variable to line 317 and use it in the loop
569af33
to
89ab45e
Compare
!build |
CI MESSAGE: [1964913]: BUILD STARTED |
CI MESSAGE: [1964913]: BUILD FAILED |
CI MESSAGE: [1965134]: BUILD STARTED |
CI MESSAGE: [1965142]: BUILD STARTED |
CI MESSAGE: [1965140]: BUILD STARTED |
CI MESSAGE: [1965143]: BUILD STARTED |
CI MESSAGE: [1965151]: BUILD FAILED |
!build |
CI MESSAGE: [1969460]: BUILD STARTED |
CI MESSAGE: [1969460]: BUILD PASSED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few files still missing, I would like to check a bit more about the side-casts on my own.
My major gripe is mixing the C-API with the C++-API, not enough tests for the former one.
Also the prophet part probably doesn't happen in any scenario, but looking at the isolated CachingList it looks like a bug.
|
||
template <typename Backend> | ||
struct is_backend { | ||
static constexpr bool value = std::is_same<Backend, CPUBackend>::value || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a small nitpick, but probably 3 specializations and default for falls could be faster (but that is just guessing), I rarely measure perf for such things.
|
||
namespace dali { | ||
|
||
class BatchSizeProvider { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe some doc what this class/interface is supposed to represent instead of usage patterns only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
template <typename WorkspacePolicy, typename QueuePolicy> | ||
template <typename Workspace> | ||
void Executor<WorkspacePolicy, QueuePolicy>::RunHelper(OpNode &op_node, Workspace &ws) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would put the RunHelper at the top of the functions that use it, but whatever.
dali/pipeline/executor/executor.cc
Outdated
make_string("Expected batch size lower or equal to max batch size. Actual: ", | ||
ws.GetInputBatchSize(i), " <= ", max_batch_size_)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking if printing 500 < 128 is the best idea, maybe:
make_string("Expected batch size lower or equal to max batch size. Actual: ", | |
ws.GetInputBatchSize(i), " <= ", max_batch_size_)); | |
make_string("Expected batch size lower or equal to max batch size. Expected at most: ", | |
max_batch_size_, ", got: ", ws.GetInputBatchSize(i))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
template <typename WorkspacePolicy, typename QueuePolicy> | ||
void Executor<WorkspacePolicy, QueuePolicy>::RunCPU() { | ||
PreRun(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to suggest that PreRun shouldn't be a member of RunCPU, but probably it's complicated by the fact that it would need to be adjusted everywhere as those RunStage functions are used directly by users (in C++).
The implementation looks a bit weird for me with those push/pops, I must analyze it a bit more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We decided, that PreRun belongs here after all. It is weird, but we don't have better idea ATM
#include <string> | ||
#include <vector> | ||
#include <memory> | ||
#include <chrono> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chrrrono
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
dali/pipeline/operator/op_spec.h
Outdated
"tensor with ", batch_size, " elements. Got:\n", shape)); | ||
} | ||
return valid_shape; | ||
return true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This worries me. Why don't you do any verification? Where is this used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Added removed lines back and added one more check to operator.cc::EnforceUniformBatchSize
"Attempted to step over the last element in the list. This operation is forbidden. Add " | ||
"more elements to CachingList before calling AdvanceProphet."); | ||
apprentice_ = prophet_; | ||
advance(prophet_, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's wrong with prophet_++
?
Also, what if our prophet (or apprentice_
) is popped out of the list and put in the empty list? Wouldn't we start advancing in the set of empty nodes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -415,9 +494,10 @@ class ExternalSource : public Operator<Backend> { | |||
*/ | |||
struct ExternalSourceState { | |||
bool copied_shared_data = false; | |||
size_t batch_size = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see only modifications putting the current bs in, but no access to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed
0-255, with shape like [640, 480, 3]) and you want to test default arguments | ||
only, please add a record to the `ops_image_default_args` list | ||
2. If the operator is typically processing image-like data (i.e. 3-dim, uint8, | ||
0-255, with shape like [640, 480, 3]) and you want to specify any number of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about more demanding scenarios (where maybe the shape is a bit more fuzzy?).
Also, didn't look for it yet, but do you test stuff like FCHW sequences? There might be some weirdness in sequence processing operators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added sequence tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finished the rest, there appear to be some issues in ImageDecoder and I would like to see the tests for sequences and with non-uniform input batches.
if (spec_.GetSchema().HasArgument("hw_decoder_load")) { | ||
hw_decoder_load_ = spec.GetArgument<float>("hw_decoder_load"); | ||
try_init_hw_decoder = true; | ||
} else { | ||
hw_decoder_load_ = 0; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part makes it optional without default. HasArgument
is true only for arguments that were specified by the user.
} | ||
|
||
if (hw_decoder_bs_ > 0 && | ||
if (try_init_hw_decoder && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By removing the part of code that sets the hw_decoder_bs_
you are making sure that the nvjpegDecodeBatchedPreAllocate
uses batch=0.
Probably needs to be adjusted to use max_batch_size * hw_decoder_load_
in the nvjpegDecodeBatchedPreAllocate
call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -909,6 +894,36 @@ class nvJPEGDecoder : public Operator<MixedBackend>, CachedDecoderImpl { | |||
RegisterDiagnostic("using_hw_decoder", &using_hw_decoder_); | |||
} | |||
|
|||
int CalcHwDecoderBatchSize(float hw_decoder_load, int curr_batch_size) { | |||
if (hw_decoder_load == .0f) return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (hw_decoder_load == .0f) return 0; | |
if (hw_decoder_load == 0.f) return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if isinstance(sample_shape, tuple): | ||
size = sample_shape | ||
elif inspect.isgeneratorfunction(sample_shape): | ||
size = sample_shape() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought it will be called for every sample instead of once. It would allow to pass a random shape generator and have some variability in shapes.
Maybe it's worth to have some more randomness in the input (non uniform cases)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
shape of every sample. | ||
:param lo: | ||
:param hi: | ||
:param dtype: 'int' for uint8 or 'float' for float32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those are just numpy types the doc is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
"`sample_shape` shall be either a tuple or a callable. Provide `(val,)` tuple for 1D shape") | ||
|
||
if np.issubdtype(dtype, np.integer): | ||
return [np.random.randint(lo, hi, size=(bs,) + size, dtype=dtype) for bs in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want non-uniform shapes, we probably need to add another nested list of Tensors instead of generating a numpy array of shape (bs, ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, differently, but there's a non-uniformity
data = data * 2 | ||
data = data + 3 | ||
data = data - 4 | ||
data = data / 5 | ||
data = data // 6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check unary -
, ternary math.clamp
, and throw in a binary op with two DALI tensor arguments, so something like data + data2
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
c777b79
to
8f4e8ca
Compare
!build |
CI MESSAGE: [2000529]: BUILD STARTED |
CI MESSAGE: [2000529]: BUILD FAILED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nitpicks for subshape and enforces.
dali/c_api/c_api.cc
Outdated
@@ -183,14 +205,20 @@ void daliPrefetchSeparate(daliPipelineHandle *pipe_handle, | |||
} | |||
|
|||
|
|||
void daliSetExternalInputBatchSize(daliPipelineHandle *pipe_handle, const char *name, | |||
int batch_size) { | |||
auto bs_map = reinterpret_cast<batch_size_map_t *>(pipe_handle->batch_sizes_map); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto bs_map = reinterpret_cast<batch_size_map_t *>(pipe_handle->batch_sizes_map); | |
auto *bs_map = reinterpret_cast<batch_size_map_t *>(pipe_handle->batch_sizes_map); |
I like the auto *
. :P
include/dali/core/tensor_shape.h
Outdated
assert(begin >= 0); | ||
auto div_ceil = [](int x, int y) { return 1 + ((x - 1) / y); }; | ||
int nsamples = div_ceil(end - begin, step); | ||
std::vector<TensorShape<ndims>> dst_shapes(nsamples); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not create a TensorListShape directly, as far as I remember it's movable, and you already have sample_dim() (ndims can be negative), and you can just .set_tensor_shape(j, in[i]);
dali/pipeline/operator/operator.cc
Outdated
static_assert(!std::is_same<Backend, MixedBackend>::value, | ||
"MixedBatch doesn't have an accessible output"); | ||
for (int i = 0; i < ws.NumOutput(); i++) { | ||
auto ref_batch_size = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can move this out of the loop.
dali/pipeline/operator/operator.h
Outdated
@@ -293,6 +296,7 @@ class Operator<CPUBackend> : public OperatorBase { | |||
SetupSharedSampleParams(ws); | |||
RunImpl(ws); | |||
ws.GetThreadPool().WaitForWork(); | |||
// EnforceUniformOutputBatchSize<CPUBackend>(ws); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this commentedo out?
dali/pipeline/operator/operator.h
Outdated
return SetupImpl(output_desc, ws); | ||
} | ||
|
||
void Run(DeviceWorkspace &ws) override { | ||
CheckInputLayouts(ws, spec_); | ||
SetupSharedSampleParams(ws); | ||
RunImpl(ws); | ||
// EnforceUniformOutputBatchSize<GPUBackend>(ws); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here?
dali/pipeline/operator/operator.cc
Outdated
|
||
template <typename Backend> | ||
void OperatorBase::EnforceUniformOutputBatchSize(const workspace_t<Backend> &ws) const { | ||
static_assert(!std::is_same<Backend, MixedBackend>::value, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be possible for Mixed AFAIK.
dali/pipeline/pipeline.h
Outdated
@@ -428,7 +429,8 @@ class DLL_PUBLIC Pipeline { | |||
DLL_PUBLIC int num_outputs() const; | |||
|
|||
/** | |||
* @brief Returns a string describing the type device type backing the output specified by given id. | |||
* @brief Returns a string describing the type device type backing the output specified by given |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand the type device type
f60b369
to
5aab6c8
Compare
Signed-off-by: Michał Szołucha <mszolucha@nvidia.com>
5aab6c8
to
bc2191b
Compare
!build |
CI MESSAGE: [2002764]: BUILD STARTED |
Signed-off-by: Michał Szołucha <mszolucha@nvidia.com>
!build |
CI MESSAGE: [2002862]: BUILD STARTED |
CI MESSAGE: [2002862]: BUILD PASSED |
TODO:
pipeline.py::batch_size(self)
and add@property max_batch_size()
batch_size
arg inpipeline.py
_check_data_batch
inexternal_source.py
: expected batch size condition and error message (~L24)ImageDecoder
reworkmax_batch_size
setting might explode the memory consumptionSigned-off-by: szalpal mszolucha@nvidia.com
What happened in this PR:
h
andcc
file, to make development convenientBatchSizeProvider
interface and add it toExternalSource
CachingList
(innerExternalSource
memory), so that it can traverse over the data list asynchronously w.r.t current head and tail of this listWhy we need this PR?
Pick one, remove the rest
JIRA TASK: [Use DALI-XXXX or NA]