TF dataset tests rework #2539

awolant · 2020-12-10T16:25:34Z

Signed-off-by: Albert Wolant awolant@nvidia.com

Why we need this PR?

Pick one, remove the rest

Refactoring to improve TF dataset tests.

What happened in this PR?

Fill relevant points, put NA otherwise. Replace anything inside []

What solution was applied:
Added tests for multi GPU with mirrored strategy, added eager mode tests, fixed device mismatch error.
Affected modules and functionalities:
TF dataset tests
Key points relevant for the review:
Tests, docs.
Documentation (including examples):
Updated docs are part of this PR

JIRA TASK: [Use DALI-1757]

Signed-off-by: Albert Wolant <awolant@nvidia.com>

review-notebook-app · 2020-12-10T16:25:38Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

awolant · 2020-12-10T16:25:45Z

!build

dali-automaton · 2020-12-10T16:39:16Z

CI MESSAGE: [1884587]: BUILD STARTED

dali-automaton · 2020-12-10T18:16:59Z

CI MESSAGE: [1884587]: BUILD PASSED

Signed-off-by: Albert Wolant <awolant@nvidia.com>

awolant · 2020-12-10T21:35:54Z

!build

dali-automaton · 2020-12-10T21:51:16Z

CI MESSAGE: [1889921]: BUILD STARTED

dali-automaton · 2020-12-10T23:55:43Z

CI MESSAGE: [1889921]: BUILD PASSED

Signed-off-by: Albert Wolant <awolant@nvidia.com>

awolant · 2020-12-11T08:09:14Z

!build

dali-automaton · 2020-12-11T08:15:46Z

CI MESSAGE: [1892434]: BUILD STARTED

dali-automaton · 2020-12-11T09:43:33Z

CI MESSAGE: [1892434]: BUILD PASSED

dali-automaton · 2020-12-11T11:08:16Z

CI MESSAGE: [1892434]: BUILD PASSED

banasraf · 2020-12-11T15:44:51Z

docs/examples/frameworks/tensorflow/tensorflow-dataset-multigpu.ipynb

@@ -8,7 +8,7 @@
    "\n",


To make the training distributed to multiple GPUs, we use tf.distribute.MirroredStrategy. Comma?

Reply via ReviewNB

banasraf · 2020-12-11T15:44:51Z

docs/examples/frameworks/tensorflow/tensorflow-dataset-multigpu.ipynb

@@ -8,7 +8,7 @@
    "\n",


everyrhing

Reply via ReviewNB

banasraf · 2020-12-11T16:06:41Z

dali_tf_plugin/dali_dataset_op.cc

+            std::stringstream msg;
+            msg << "TF device and DALI device mismatch. TF device: ";
+            msg << (dataset()->device_type_ == device_type_t::CPU ? "CPU" : "GPU");
+            msg << ", DALI device: ";
+            msg << (dali_device_type == device_type_t::CPU ? "CPU" : "GPU");
+            msg << " for output " << i;


Use make_string? Not sure how this would affect readability but that's more of a DALI way of doing this.

Signed-off-by: Albert Wolant <awolant@nvidia.com>

jantonguirao · 2020-12-17T10:06:28Z

dali/test/python/test_dali_tf_dataset.py

-    dataset_pipeline = PythonOperatorPipeline()
+    pipeline = Pipeline(1, 1, 0, 0)
+    with pipeline:
+        output = fn.python_function(function=lambda: np.zeros((3, 3, 3)))


it's not clear to me what is the expected error. Maybe the name of the test case should say it

PythonFunction is not allowed with TF for now. Changed the name of the test to include that inforamation.

jantonguirao · 2020-12-17T10:11:10Z

dali/test/python/test_dali_tf_dataset_mnist.py

-        images = tf.reshape(images, [BATCH_SIZE, IMAGE_SIZE*IMAGE_SIZE])
-        labels = tf.reshape(
-            tf.one_hot(labels, NUM_CLASSES),
+        images = tf_v1.reshape(images, [BATCH_SIZE, IMAGE_SIZE*IMAGE_SIZE])


out of curiosity: Why do we need to reshape? Aren't the outputs of the dataset already shaped?

Good catch, no reason for this. Done

jantonguirao · 2020-12-17T11:57:03Z

dali/test/python/test_utils_tensorflow.py

+    test_data_root = os.environ['DALI_EXTRA_PATH']
+    file_root = os.path.join(test_data_root, 'db', 'coco_dummy', 'images')
+    annotations_file = os.path.join(
+    test_data_root, 'db', 'coco_dummy', 'instances.json')


nitpick: indent

jantonguirao · 2020-12-17T12:00:35Z

dali_tf_plugin/dali_dataset_op.cc

+          if (dali_device_type != dataset()->device_type_) {
+            auto msg = dali::make_string(
+              "TF device and DALI device mismatch. TF device: ",
+              (dataset()->device_type_ == device_type_t::CPU ? "CPU" : "GPU"),


nitpick: Add std::string to_string(device_type_t type)

Can you explain what do you mean here? I compare two device_type_t things and return string from ternary operator.

jantonguirao · 2020-12-17T12:03:08Z

qa/TL1_tensorflow_dataset/test.sh

-        nosetests --verbose -s test_dali_tf_dataset.py:_test_tf_dataset_other_gpu
-        nosetests --verbose -s test_dali_tf_dataset.py:_test_tf_dataset_multigpu
-        nosetests --verbose -s test_dali_tf_dataset_mnist.py
+        nosetests --verbose -s test_dali_tf_dataset_graph.py:_test_tf_dataset_other_gpu


Out of curiosity, why do we need to list those tests explicitly?

I made all tests that need more GPUs hidden (name starts with _), so they don't trigger in L0, where we run it on single GPU machine. Here I manually run those tests.
Alternative is to create two files with tests, but the end result is the same.

Signed-off-by: Albert Wolant <awolant@nvidia.com>

awolant · 2021-01-18T14:37:55Z

!build

dali-automaton · 2021-01-18T14:41:20Z

CI MESSAGE: [1991279]: BUILD STARTED

dali-automaton · 2021-01-18T16:57:21Z

CI MESSAGE: [1991279]: BUILD PASSED

* TF dataset tests rework for multi GPU dataset Signed-off-by: Albert Wolant <awolant@nvidia.com>

TF dataset tests rework

2012fc0

Signed-off-by: Albert Wolant <awolant@nvidia.com>

Fix import

7ce359f

Signed-off-by: Albert Wolant <awolant@nvidia.com>

Fix compatibility

bd0b0bc

Signed-off-by: Albert Wolant <awolant@nvidia.com>

awolant assigned jantonguirao and banasraf Dec 11, 2020

banasraf reviewed Dec 11, 2020

View reviewed changes

banasraf approved these changes Dec 11, 2020

View reviewed changes

Fix for review

d0f4481

Signed-off-by: Albert Wolant <awolant@nvidia.com>

jantonguirao reviewed Dec 17, 2020

View reviewed changes

jantonguirao approved these changes Dec 17, 2020

View reviewed changes

awolant added 2 commits January 18, 2021 14:40

Merge remote-tracking branch 'nvidia/master' into tf_data_multigpu

04587d3

Signed-off-by: Albert Wolant <awolant@nvidia.com>

Fix for review

1a9a530

Signed-off-by: Albert Wolant <awolant@nvidia.com>

awolant merged commit 9becd97 into NVIDIA:master Jan 18, 2021

TheTimmy pushed a commit to TheTimmy/DALI that referenced this pull request Jan 20, 2021

TF dataset tests rework (NVIDIA#2539)

9bd5e88

* TF dataset tests rework for multi GPU dataset Signed-off-by: Albert Wolant <awolant@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TF dataset tests rework #2539

TF dataset tests rework #2539

awolant commented Dec 10, 2020

review-notebook-app bot commented Dec 10, 2020

awolant commented Dec 10, 2020

dali-automaton commented Dec 10, 2020

dali-automaton commented Dec 10, 2020

awolant commented Dec 10, 2020

dali-automaton commented Dec 10, 2020

dali-automaton commented Dec 10, 2020

awolant commented Dec 11, 2020

dali-automaton commented Dec 11, 2020

dali-automaton commented Dec 11, 2020

dali-automaton commented Dec 11, 2020

banasraf Dec 11, 2020

awolant Dec 14, 2020

banasraf Dec 11, 2020

awolant Dec 14, 2020

banasraf Dec 11, 2020

awolant Dec 14, 2020

jantonguirao Dec 17, 2020

awolant Jan 18, 2021

jantonguirao Dec 17, 2020

awolant Jan 18, 2021

jantonguirao Dec 17, 2020

awolant Jan 18, 2021

jantonguirao Dec 17, 2020

awolant Jan 18, 2021

jantonguirao Dec 17, 2020

awolant Jan 18, 2021

awolant commented Jan 18, 2021

dali-automaton commented Jan 18, 2021

dali-automaton commented Jan 18, 2021

TF dataset tests rework #2539

TF dataset tests rework #2539

Conversation

awolant commented Dec 10, 2020

Why we need this PR?

What happened in this PR?

review-notebook-app bot commented Dec 10, 2020

awolant commented Dec 10, 2020

dali-automaton commented Dec 10, 2020

dali-automaton commented Dec 10, 2020

awolant commented Dec 10, 2020

dali-automaton commented Dec 10, 2020

dali-automaton commented Dec 10, 2020

awolant commented Dec 11, 2020

dali-automaton commented Dec 11, 2020

dali-automaton commented Dec 11, 2020

dali-automaton commented Dec 11, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

awolant commented Jan 18, 2021

dali-automaton commented Jan 18, 2021

dali-automaton commented Jan 18, 2021