Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better error message when insufficient data in cache #2924

Merged
merged 4 commits into from
May 6, 2021

Conversation

szalpal
Copy link
Member

@szalpal szalpal commented May 5, 2021

Signed-off-by: Michał Szołucha mszolucha@nvidia.com

Why we need this PR?

Pick one, remove the rest
This PR adds better error message, when DALI fails on the situation, that there's not enough data in external_source's cache

What happened in this PR?

Fill relevant points, put NA otherwise. Replace anything inside []

  • What solution was applied:
    try-catch block
  • Affected modules and functionalities:
    executor
  • Key points relevant for the review:
    NA
  • Validation and testing:
    NA
  • Documentation (including examples):
    NA

JIRA TASK: [Use DALI-XXXX or NA]

@szalpal
Copy link
Member Author

szalpal commented May 5, 2021

!build

@JanuszL JanuszL self-assigned this May 5, 2021
Comment on lines 370 to 372
DALI_FAIL(
make_string("No more batches in cache left. Make sure, that DALI Pipeline is fed with "
"sufficient amount of data. ", e.what()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the cache? I think both errors should go to the same bag - the user hasn't provided enough data.

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2337017]: BUILD STARTED

@szalpal
Copy link
Member Author

szalpal commented May 5, 2021

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2337023]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2337023]: BUILD FAILED

for (auto &bsp : bsps) {
bsp->Advance();
}
} catch (const std::exception &e) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's dirty to catch a generic exception and assume its meaning. There should be a better error resolution upstream (preferred) or better inspection here (passable, but still ugly). The way it is, we can produce truly ridiculous errors here, like:

"Failed to acquire the next batch. Make sure, that DALI Pipeline is fed with sufficient amount of data. Bad alloc."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding specialized exception to batch_size_provider.h? NextBatchSize and Advance could throw this one then

struct cache_empty : std::out_of_range {};

?

Copy link
Contributor

@mzient mzient May 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's necessary. However, we already throw out_of_range and we could catch that. This at least eliminates everything else we could possibly catch here.

Suggested change
} catch (const std::exception &e) {
} catch (const std::out_of_range &e) {

Comment on lines +367 to +368
make_string("Failed to acquire the next batch. Make sure, that DALI Pipeline is fed "
"with sufficient amount of data. ", e.what()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a name of the offending node here? If the node was populated with feed_input, we'll have the name of the very source the user failed to feed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usually the name is generated and doesn't tell much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for external_source that's manually fed with feed_input, usually by name (like in Triton) - and for others it would at least be "ExternalSource_" and that index would most likely correspond to the ordinal of the external source in order of operator instantiation.

Signed-off-by: Michał Szołucha <mszolucha@nvidia.com>
Signed-off-by: Michał Szołucha <mszolucha@nvidia.com>
@szalpal
Copy link
Member Author

szalpal commented May 6, 2021

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2340802]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2340802]: BUILD FAILED

@szalpal
Copy link
Member Author

szalpal commented May 6, 2021

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2340851]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [2340851]: BUILD PASSED

@szalpal szalpal merged commit c098946 into NVIDIA:master May 6, 2021
@szalpal szalpal deleted the better_err_msg branch February 9, 2024 00:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants