New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TF dataset tests rework #2539
TF dataset tests rework #2539
Conversation
Signed-off-by: Albert Wolant <awolant@nvidia.com>
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
!build |
CI MESSAGE: [1884587]: BUILD STARTED |
CI MESSAGE: [1884587]: BUILD PASSED |
Signed-off-by: Albert Wolant <awolant@nvidia.com>
!build |
CI MESSAGE: [1889921]: BUILD STARTED |
CI MESSAGE: [1889921]: BUILD PASSED |
Signed-off-by: Albert Wolant <awolant@nvidia.com>
!build |
CI MESSAGE: [1892434]: BUILD STARTED |
CI MESSAGE: [1892434]: BUILD PASSED |
1 similar comment
CI MESSAGE: [1892434]: BUILD PASSED |
@@ -8,7 +8,7 @@ | |||
"\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make the training distributed to multiple GPUs, we use tf.distribute.MirroredStrategy.
Comma?
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -8,7 +8,7 @@ | |||
"\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
dali_tf_plugin/dali_dataset_op.cc
Outdated
std::stringstream msg; | ||
msg << "TF device and DALI device mismatch. TF device: "; | ||
msg << (dataset()->device_type_ == device_type_t::CPU ? "CPU" : "GPU"); | ||
msg << ", DALI device: "; | ||
msg << (dali_device_type == device_type_t::CPU ? "CPU" : "GPU"); | ||
msg << " for output " << i; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use make_string? Not sure how this would affect readability but that's more of a DALI way of doing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Signed-off-by: Albert Wolant <awolant@nvidia.com>
dataset_pipeline = PythonOperatorPipeline() | ||
pipeline = Pipeline(1, 1, 0, 0) | ||
with pipeline: | ||
output = fn.python_function(function=lambda: np.zeros((3, 3, 3))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not clear to me what is the expected error. Maybe the name of the test case should say it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PythonFunction
is not allowed with TF for now. Changed the name of the test to include that inforamation.
images = tf.reshape(images, [BATCH_SIZE, IMAGE_SIZE*IMAGE_SIZE]) | ||
labels = tf.reshape( | ||
tf.one_hot(labels, NUM_CLASSES), | ||
images = tf_v1.reshape(images, [BATCH_SIZE, IMAGE_SIZE*IMAGE_SIZE]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out of curiosity: Why do we need to reshape? Aren't the outputs of the dataset already shaped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, no reason for this. Done
test_data_root = os.environ['DALI_EXTRA_PATH'] | ||
file_root = os.path.join(test_data_root, 'db', 'coco_dummy', 'images') | ||
annotations_file = os.path.join( | ||
test_data_root, 'db', 'coco_dummy', 'instances.json') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: indent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if (dali_device_type != dataset()->device_type_) { | ||
auto msg = dali::make_string( | ||
"TF device and DALI device mismatch. TF device: ", | ||
(dataset()->device_type_ == device_type_t::CPU ? "CPU" : "GPU"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: Add std::string to_string(device_type_t type)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain what do you mean here? I compare two device_type_t
things and return string
from ternary operator.
nosetests --verbose -s test_dali_tf_dataset.py:_test_tf_dataset_other_gpu | ||
nosetests --verbose -s test_dali_tf_dataset.py:_test_tf_dataset_multigpu | ||
nosetests --verbose -s test_dali_tf_dataset_mnist.py | ||
nosetests --verbose -s test_dali_tf_dataset_graph.py:_test_tf_dataset_other_gpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, why do we need to list those tests explicitly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made all tests that need more GPUs hidden (name starts with _
), so they don't trigger in L0, where we run it on single GPU machine. Here I manually run those tests.
Alternative is to create two files with tests, but the end result is the same.
Signed-off-by: Albert Wolant <awolant@nvidia.com>
Signed-off-by: Albert Wolant <awolant@nvidia.com>
!build |
CI MESSAGE: [1991279]: BUILD STARTED |
CI MESSAGE: [1991279]: BUILD PASSED |
* TF dataset tests rework for multi GPU dataset Signed-off-by: Albert Wolant <awolant@nvidia.com>
Signed-off-by: Albert Wolant awolant@nvidia.com
Why we need this PR?
Pick one, remove the rest
What happened in this PR?
Fill relevant points, put NA otherwise. Replace anything inside []
Added tests for multi GPU with mirrored strategy, added eager mode tests, fixed device mismatch error.
TF dataset tests
Tests, docs.
Updated docs are part of this PR
JIRA TASK: [Use DALI-1757]