-
Notifications
You must be signed in to change notification settings - Fork 611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix handling of non_existing entries in TFRecord reader #2952
Conversation
!build |
CI MESSAGE: [2364396]: BUILD STARTED |
- DALI assumes that all features provided in the TFRecord reader exists in every entry, otherwise protobuf will throw. This PR lifts that assumption makes the reader return empty tensor when the desired entry has not been found. It should be in pair what TensorFlow reader does. Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
!build |
CI MESSAGE: [2364415]: BUILD STARTED |
CI MESSAGE: [2364415]: BUILD PASSED |
(void) output.mutable_data<int64_t>(); | ||
break; | ||
case FeatureType::string: | ||
(void) output.mutable_data<uint8_t>(); | ||
break; | ||
case FeatureType::float32: | ||
(void) output.mutable_data<float>(); | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting the type as a side effect of calling mutable_data
is ugly.
(void) output.mutable_data<int64_t>(); | |
break; | |
case FeatureType::string: | |
(void) output.mutable_data<uint8_t>(); | |
break; | |
case FeatureType::float32: | |
(void) output.mutable_data<float>(); | |
break; | |
output.Resize({0}, TypeTable::GetTypeInfo(DALI_INT64)); | |
break; | |
case FeatureType::string: | |
output.Resize({0}, TypeTable::GetTypeInfo(DALI_UINT8)); | |
break; | |
case FeatureType::float32: | |
output.Resize({0}, TypeTable::GetTypeInfo(DALI_FLOAT)); | |
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
for tensor in out[0]: | ||
assert len(np.array(tensor)) != 0 | ||
for tensor in out[1]: | ||
assert len(np.array(tensor)) == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert len(np.array(tensor)) == 0 | |
assert len(np.array(tensor)) == 0 | |
assert np.dtype == np.int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
for tensor in out[1]: | ||
assert len(np.array(tensor)) == 0 | ||
for tensor in out[2]: | ||
assert len(np.array(tensor)) == 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert len(np.array(tensor)) == 0 | |
assert len(np.array(tensor)) == 0 | |
assert np.dtype == np.uint8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
!build |
CI MESSAGE: [2366185]: BUILD STARTED |
CI MESSAGE: [2366185]: BUILD PASSED |
@Riyria-was-taken - this should fix the issue with the TFRecord you mentioned some time ago. |
exists in every entry, otherwise protobuf will throw.
This PR lifts that assumption makes the reader return
empty tensor when the desired entry has not been found.
It should be in pair what TensorFlow reader does.
Signed-off-by: Janusz Lisiecki jlisiecki@nvidia.com
Why we need this PR?
Pick one, remove the rest
What happened in this PR?
Fill relevant points, put NA otherwise. Replace anything inside []
DALI assumes that all features provided in the TFRecord reader exists in every entry, otherwise protobuf will throw.
This PR lifts that assumption makes the reader return empty tensor when the desired entry has not been found.
It should be in pair what TensorFlow reader does.
tfrecord_parser.h
should we return empty tensor and/or warn/enforce that all entries exits
test is added
TFRecord Reader documentation is updated
JIRA TASK: [NA]