Python formatting #4035

mzient · 2022-07-04T17:03:02Z

Signed-off-by: Michal Zientkiewicz michalz@nvidia.com

Category:

Refactoring
Formatting
Bug fix

Description:

Formatting
Minor refactoring to make code easier to read
Fixed minor bugs in segmentation pipeline test**

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

Checklist

Documentation

DALI team only

Requirements

Implements new requirements
Affects existing requirements
N/A

REQ IDs: N/A

JIRA TASK: DALI-2824

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

mzient · 2022-07-05T16:03:24Z

dali/test/python/test_operator_reader_shuffling.py

+    # each GPU needs to iterate from `shard_id * data_size / num_gpus` samples
+    # to `(shard_id + 1)* data_size / num_gpus`
    # after each epoch each GPU moves to the next shard
-    # epochs_run takes into account that after epoch readers advances to the next shard, if shuffle_after_epoch or stick_to_shard
-    # if doesn't matter and could/should be 0
-    # it is relevant only pad_last_batch == False, otherwise each shard has the same size thanks to padding
+    # epochs_run takes into account that after epoch readers advances to the
+    # next shard, if shuffle_after_epoch or stick_to_shard if doesn't matter
+    # and could/should be 0
+    # it is relevant only if pad_last_batch == False, otherwise each shard has
+    # the same size thanks to padding


To someone knowledgeable: what does this mean? The grammar is quite broken and I wasn't able to fix it without the risk of changing the meaning.

My understanding from the comment and the code. Feel free to copy it or reword it as you see fit:

there's a variable epochs_run that tells you the epoch count.

Epochs run is used to know in which shard we currently are. This is because we move to the next shard at the end of the epoch. Here's an example:
3 shards:
Epoch 0: [shard0, shard1, shard2] # meaning pipelines are reading those shards
Epoch 1: [shard1, shard2, shard0]
Epoch 2: [shard2, shard0, shard1] # ... and so on

Such logic does not make sense when shuffle_after_epoch=True or stick_to_shard=True, therefore we just set epochs_run=0 so that it doesn't influence in the shard_size calculation

Also, when pad_last_batch=False, all shards at the same size, so this logic also becomes irrelevant.

mzient · 2022-07-05T16:03:44Z

!build

dali-automaton · 2022-07-05T16:05:10Z

CI MESSAGE: [5261344]: BUILD STARTED

mzient · 2022-07-05T16:20:49Z

dali/test/python/test_dali_tf_plugin_cpu_only_dataset.py

+                                 output_dtypes=(tf.int32),
+                                 output_shapes=[(1)])


These aren't tuples - just values in parenthesis. What was the intention?

Looks like the author meant [(1,)] and (tf.int32,)

It doesn't work with the trailing commas. Fixing it (if necessary) is beyond the scope of this task.
I'll remove the misleading parenthesis, though.

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

mzient · 2022-07-05T16:27:46Z

dali/test/python/test_operator_slice.py

+
+    with assert_raises(
+            RuntimeError,
+            glob='"end", "rel_end", "shape", and "rel_shape" arguments are mutually exclusive'):


Here I deliberately chose the other quotes to get rid of escape sequences.

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

mzient · 2022-07-05T16:34:07Z

dali/test/python/test_pipeline_segmentation.py

            np.testing.assert_allclose(expected_out_vertices_abs, out_vertices_abs, rtol=1e-4)

            # Checking clamping of the relative coordinates
-            expected_out_vertices_clamped = np.copy(expected_out_vertices)
-            np.clip(expected_out_vertices_clamped, a_min=0.0, a_max=1.0)


This is a bug - this line has no effect.

mzient · 2022-07-05T16:34:48Z

dali/test/python/test_pipeline_segmentation.py

            np.testing.assert_allclose(expected_out_vertices_abs, out_vertices_abs, rtol=1e-4)

            # Checking clamping of the relative coordinates
-            expected_out_vertices_clamped = np.copy(expected_out_vertices)
-            np.clip(expected_out_vertices_clamped, a_min=0.0, a_max=1.0)
-            np.testing.assert_allclose(expected_out_vertices_clamped, out_vertices, rtol=1e-4)


Comparison of wrong value (out_vertices) against mistakenly unmodified expected values... It passed because of two errors cancelling each other.

mzient · 2022-07-05T16:35:45Z

dali/test/python/test_pipeline_segmentation.py

+            h, w, _ = image_shape
+            wh = np.array([w, h])
+            whwh = np.array([w, h, w, h])
+            expected_out_vertices_abs = expected_out_vertices * wh


Refactored - broadcasting works on innermost dimensions, so we can just use this instead of running loops.

mzient · 2022-07-05T16:36:55Z

!build

dali-automaton · 2022-07-05T16:40:09Z

CI MESSAGE: [5261750]: BUILD STARTED

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

mzient · 2022-07-05T16:41:46Z

!build

dali-automaton · 2022-07-05T16:45:06Z

CI MESSAGE: [5261788]: BUILD STARTED

dali-automaton · 2022-07-05T18:17:23Z

CI MESSAGE: [5261788]: BUILD PASSED

szalpal · 2022-07-05T21:11:10Z

Triggered L1 tests

szalpal

First batch of remarks (13/20 files viewed)

szalpal · 2022-07-05T20:29:53Z

dali/test/python/segmentation_test_utils.py

+def make_batch_select_masks(batch_size,
+                            npolygons_range=(1, 10), nvertices_range=(3, 40),
+                            vertex_ndim=2, vertex_dtype=np.float32):


The batch_size in the first line looks weird. How about?

Suggested change

def make_batch_select_masks(batch_size,

npolygons_range=(1, 10), nvertices_range=(3, 40),

vertex_ndim=2, vertex_dtype=np.float32):

def make_batch_select_masks(batch_size,

npolygons_range=(1, 10),

nvertices_range=(3, 40),

vertex_ndim=2,

vertex_dtype=np.float32):

szalpal · 2022-07-05T20:33:33Z

dali/test/python/test_dali_tf_dataset_graph.py

        get_pipeline_desc=external_source_tester(max_shape,
                                                 dtype,
-                                                 RandomSampleIterator(max_shape, dtype(0), min_shape=min_shape),
+                                                 iterator,
                                                 batch=batch),


These would right now fit in a single line, right?

Hmmm... i guess it would. However, it would look inconsistent with other invocations of this function in the same file. If we're about readability, I think that a common pattern within this file should have a common look. I'm fine either way.

szalpal · 2022-07-05T20:33:45Z

dali/test/python/test_dali_tf_dataset_graph.py

        get_pipeline_desc=external_source_tester(max_shape,
                                                 dtype,
-                                                 RandomSampleIterator(max_shape, dtype(0), min_shape=min_shape),
+                                                 iterator,
                                                 "gpu",
                                                 batch=batch),


szalpal · 2022-07-05T20:39:12Z

dali/test/python/test_external_source_cupy.py

+            pipe.set_outputs(fn.external_source(lambda i: [cp.array([attempt * 100 + i * 10 + 1.5],
+                                                dtype=cp.float32)]))


I believe breaking this lambda in the middle looks confusing. At the first glance it looks like dtype is an arg to the fn.external_source, not cp.array

Suggested change

pipe.set_outputs(fn.external_source(lambda i: [cp.array([attempt * 100 + i * 10 + 1.5],

dtype=cp.float32)]))

pipe.set_outputs(fn.external_source(

lambda i: [cp.array([attempt * 100 + i * 10 + 1.5], dtype=cp.float32)]

))

Alternatively you could define the lambda in the line above

Nice catch. I'll change the lambda to a local function.

szalpal · 2022-07-05T20:47:10Z

dali/test/python/test_operator_segmentation_select_masks.py

+def check_select_masks(batch_size,
+                       npolygons_range=(1, 10), nvertices_range=(3, 40),
+                       vertex_ndim=2, vertex_dtype=np.float32,
+                       reindex_masks=False):


As somewhere above

Suggested change

def check_select_masks(batch_size,

npolygons_range=(1, 10), nvertices_range=(3, 40),

vertex_ndim=2, vertex_dtype=np.float32,

reindex_masks=False):

def check_select_masks(batch_size,

npolygons_range=(1, 10),

nvertices_range=(3, 40),

vertex_ndim=2,

vertex_dtype=np.float32,

reindex_masks=False):

szalpal · 2022-07-05T20:48:37Z

dali/test/python/test_operator_segmentation_select_masks.py

            source=get_data_source(batch_size, npolygons_range=npolygons_range,
-            nvertices_range=nvertices_range, vertex_ndim=vertex_ndim, vertex_dtype=vertex_dtype),
+                                   nvertices_range=nvertices_range,
+                                   vertex_ndim=vertex_ndim,
+                                   vertex_dtype=vertex_dtype),


Could you also put npolygons_range in the next line for consistency? I'd create a suggestion, but github forbids to do it for non-changed lines

You can always type ```suggestion by hand and it works.

szalpal · 2022-07-05T20:49:14Z

dali/test/python/test_operator_segmentation_select_masks.py

+                yield (check_select_masks,
+                       batch_size,
+                       npolygons_range, nvertices_range,
+                       vertex_ndim, vertex_dtype,
+                       reindex_masks)


Similarly here, how about putting every arg in new line?

Suggested change

yield (check_select_masks,

batch_size,

npolygons_range, nvertices_range,

vertex_ndim, vertex_dtype,

reindex_masks)

yield (check_select_masks,

batch_size,

npolygons_range,

nvertices_range,

vertex_ndim,

vertex_dtype,

reindex_masks)

I'm fine either way.

szalpal · 2022-07-05T20:51:37Z

dali/test/python/test_operator_subscript.py

+                    ({"lo_0": 0, "at_0": 0}, "both as an index"),
+                    ({"at_0": 0, "step_0": 1}, "cannot have a step")
+                ]:


I believe this indentation may be shrunk

Suggested change

({"lo_0": 0, "at_0": 0}, "both as an index"),

({"at_0": 0, "step_0": 1}, "cannot have a step")

]:

({"lo_0": 0, "at_0": 0}, "both as an index"),

({"at_0": 0, "step_0": 1}, "cannot have a step")

]:

It cannot - it must be indented more than the following line.

szalpal · 2022-07-05T20:52:26Z

dali/test/python/test_operator_subscript.py

+                    {"step_0": 2},
+                    {"step_1": -1},
+                ]:


Similarly here

Suggested change

{"step_0": 2},

{"step_1": -1},

]:

{"step_0": 2},

{"step_1": -1},

]:

jantonguirao · 2022-07-06T07:42:37Z

dali/test/python/test_operator_reader_shuffling.py

+        for gpu in range(num_gpus)
+    ]


Suggested change

for gpu in range(num_gpus)

]

for gpu in range(num_gpus)]

I think most pep8 autoformatters would stick to this.

They wouldn't if the list contents start in a new line.
The other option would be:

pipes = [COCOReaderPipeline(batch_size=batch_size, num_threads=4, shard_id=gpu, num_gpus=num_gpus, data_paths=datasets[0], random_shuffle=False, stick_to_shard=False, shuffle_after_epoch=True, pad_last_batch=False) for gpu in range(num_gpus)]

but it's indented more and the braces are not quite as pronounced as I think they should be (it's not common to create a list of pipelines like this, so I wanted it to stand out)

jantonguirao

LGTM, with comments

jantonguirao · 2022-07-06T07:43:30Z

dali/test/python/test_operator_segmentation_select_masks.py

@@ -99,7 +108,7 @@ def _test_select_masks_wrong_input(data_source_fn, err_regex):
    p = wrong_input_pipe(data_source_fn=data_source_fn)
    p.build()
    with assert_raises(RuntimeError, regex=err_regex):
-        o = p.run()
+        _ = p.run()


p.run() should be enough. Or am I missing something?

I guess this is about readability? - we say that there is a result, we just ignore it. It does make a difference when run interactively (the result would be printed).

jantonguirao · 2022-07-06T07:44:23Z

dali/test/python/test_operator_slice.py

+                        ((1, 0), None), ((0, 1), None)
+                    ]:


Suggested change

((1, 0), None), ((0, 1), None)

]:

((1, 0), None), ((0, 1), None)]:

Would be preferred, I think

Again - not when there's a new line after the opening brace.

jantonguirao · 2022-07-06T07:51:32Z

dali/test/python/test_torch_pipeline_rnnt.py

+        seq = torch.arange(max_len, dtype=seq_len.dtype, device=x.device)
+        mask = seq.expand(x.size(0), max_len) >= seq_len.unsqueeze(1)


It's mostly OK, but I would have preferred not refactoring this code, since it's taken from "the reference" just to check that our suggested DALI pipeline does the same thing.

I've run this test and it still works...

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

mzient · 2022-07-06T09:27:29Z

!build

dali-automaton · 2022-07-06T09:30:06Z

CI MESSAGE: [5268841]: BUILD STARTED

dali-automaton · 2022-07-06T10:48:40Z

CI MESSAGE: [5268841]: BUILD PASSED

szalpal

Few comments, all files reviewed. Please make sure, that L1 tests pass

szalpal · 2022-07-06T12:59:37Z

dali/test/python/test_operator_slice.py

+        super().__init__(
            batch_size, num_threads, device_id, seed=1234)


Suggested change

super().__init__(

batch_size, num_threads, device_id, seed=1234)

super().__init__(batch_size, num_threads, device_id, seed=1234)

szalpal · 2022-07-06T13:00:05Z

dali/test/python/test_operator_slice.py

+        super().__init__(
            batch_size, num_threads, device_id, seed=1234)


Suggested change

super().__init__(

batch_size, num_threads, device_id, seed=1234)

super().__init__(batch_size, num_threads, device_id, seed=1234)

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

dali-automaton · 2022-07-06T14:33:32Z

CI MESSAGE: [5270468]: BUILD STARTED

dali-automaton · 2022-07-06T15:45:45Z

CI MESSAGE: [5270468]: BUILD FAILED

dali-automaton · 2022-07-07T08:34:30Z

CI MESSAGE: [5270468]: BUILD PASSED

random_object_bbox tests + test_utils

45bc0ec

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

mzient marked this pull request as draft July 4, 2022 17:03

mzient changed the title ~~random_object_bbox tests + test_utils~~ Python formatting Jul 4, 2022

mzient added 3 commits July 5, 2022 10:36

Revert expand_callable_args.

e13055e

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

Format rnnt pipeline test (torch).

5a28614

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

Complete.

2a6fd2f

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

mzient marked this pull request as ready for review July 5, 2022 16:01

mzient commented Jul 5, 2022

View reviewed changes

Minor improvement.

e645b66

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

mzient commented Jul 5, 2022

View reviewed changes

Bugfix in segmentation pipeline test.

67e5039

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

mzient commented Jul 5, 2022

View reviewed changes

Minor improvements.

4400c7d

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

szalpal self-assigned this Jul 5, 2022

szalpal reviewed Jul 5, 2022

View reviewed changes

jantonguirao self-assigned this Jul 6, 2022

jantonguirao reviewed Jul 6, 2022

View reviewed changes

jantonguirao approved these changes Jul 6, 2022

View reviewed changes

Review issues.

4c7b7af

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

Split lines.

d739beb

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

szalpal reviewed Jul 6, 2022

View reviewed changes

mzient added 2 commits July 6, 2022 16:27

Fix RNNT pipeline (remove dangling comma).

0868dbf

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

Minor improvements.

e56c83a

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

szalpal approved these changes Jul 7, 2022

View reviewed changes

mzient merged commit dd1bc13 into NVIDIA:main Jul 7, 2022

		pipe.set_outputs(fn.external_source(lambda i: [cp.array([attempt * 100 + i * 10 + 1.5],
		dtype=cp.float32)]))

	((1, 0), None), ((0, 1), None)
	]:
	((1, 0), None), ((0, 1), None)]:

		seq = torch.arange(max_len, dtype=seq_len.dtype, device=x.device)
		mask = seq.expand(x.size(0), max_len) >= seq_len.unsqueeze(1)

		super().__init__(
		batch_size, num_threads, device_id, seed=1234)

	super().__init__(
	batch_size, num_threads, device_id, seed=1234)
	super().__init__(batch_size, num_threads, device_id, seed=1234)

Python formatting #4035

Python formatting #4035

Conversation

mzient commented Jul 4, 2022 • edited

Category:

Description:

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

Checklist

Documentation

DALI team only

Requirements

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mzient commented Jul 5, 2022

dali-automaton commented Jul 5, 2022

Choose a reason for hiding this comment

jantonguirao Jul 6, 2022 • edited

Choose a reason for hiding this comment

mzient Jul 6, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mzient commented Jul 5, 2022

dali-automaton commented Jul 5, 2022

mzient commented Jul 5, 2022

dali-automaton commented Jul 5, 2022

dali-automaton commented Jul 5, 2022

szalpal commented Jul 5, 2022

szalpal left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mzient Jul 6, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jantonguirao Jul 6, 2022 • edited by mzient

Choose a reason for hiding this comment

mzient Jul 6, 2022 • edited

Choose a reason for hiding this comment

jantonguirao left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mzient commented Jul 6, 2022

dali-automaton commented Jul 6, 2022

dali-automaton commented Jul 6, 2022

szalpal left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dali-automaton commented Jul 6, 2022

dali-automaton commented Jul 6, 2022

dali-automaton commented Jul 7, 2022

mzient commented Jul 4, 2022 •

edited

jantonguirao Jul 6, 2022 •

edited

mzient Jul 6, 2022 •

edited

mzient Jul 6, 2022 •

edited

jantonguirao Jul 6, 2022 •

edited by mzient

mzient Jul 6, 2022 •

edited