Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python formatting #4035

Merged
merged 11 commits into from
Jul 7, 2022
Merged

Python formatting #4035

merged 11 commits into from
Jul 7, 2022

Conversation

mzient
Copy link
Contributor

@mzient mzient commented Jul 4, 2022

Signed-off-by: Michal Zientkiewicz michalz@nvidia.com

Category:

Refactoring
Formatting
Bug fix

Description:

  • Formatting
  • Minor refactoring to make code easier to read
  • Fixed minor bugs in segmentation pipeline test**

Additional information:

Affected modules and functionalities:

Key points relevant for the review:

Tests:

  • Existing tests apply
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: DALI-2824

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
@mzient mzient marked this pull request as draft July 4, 2022 17:03
@mzient mzient changed the title random_object_bbox tests + test_utils Python formatting Jul 4, 2022
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
@mzient mzient marked this pull request as ready for review July 5, 2022 16:01
Comment on lines 171 to 178
# each GPU needs to iterate from `shard_id * data_size / num_gpus` samples
# to `(shard_id + 1)* data_size / num_gpus`
# after each epoch each GPU moves to the next shard
# epochs_run takes into account that after epoch readers advances to the next shard, if shuffle_after_epoch or stick_to_shard
# if doesn't matter and could/should be 0
# it is relevant only pad_last_batch == False, otherwise each shard has the same size thanks to padding
# epochs_run takes into account that after epoch readers advances to the
# next shard, if shuffle_after_epoch or stick_to_shard if doesn't matter
# and could/should be 0
# it is relevant only if pad_last_batch == False, otherwise each shard has
# the same size thanks to padding
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To someone knowledgeable: what does this mean? The grammar is quite broken and I wasn't able to fix it without the risk of changing the meaning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding from the comment and the code. Feel free to copy it or reword it as you see fit:

  • there's a variable epochs_run that tells you the epoch count.
  • Epochs run is used to know in which shard we currently are. This is because we move to the next shard at the end of the epoch. Here's an example:
    3 shards:
    Epoch 0: [shard0, shard1, shard2] # meaning pipelines are reading those shards
    Epoch 1: [shard1, shard2, shard0]
    Epoch 2: [shard2, shard0, shard1] # ... and so on
  • Such logic does not make sense when shuffle_after_epoch=True or stick_to_shard=True, therefore we just set epochs_run=0 so that it doesn't influence in the shard_size calculation
  • Also, when pad_last_batch=False, all shards at the same size, so this logic also becomes irrelevant.

@mzient
Copy link
Contributor Author

mzient commented Jul 5, 2022

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [5261344]: BUILD STARTED

Comment on lines 47 to 48
output_dtypes=(tf.int32),
output_shapes=[(1)])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These aren't tuples - just values in parenthesis. What was the intention?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the author meant [(1,)] and (tf.int32,)

Copy link
Contributor Author

@mzient mzient Jul 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't work with the trailing commas. Fixing it (if necessary) is beyond the scope of this task.
I'll remove the misleading parenthesis, though.

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>

with assert_raises(
RuntimeError,
glob='"end", "rel_end", "shape", and "rel_shape" arguments are mutually exclusive'):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I deliberately chose the other quotes to get rid of escape sequences.

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
np.testing.assert_allclose(expected_out_vertices_abs, out_vertices_abs, rtol=1e-4)

# Checking clamping of the relative coordinates
expected_out_vertices_clamped = np.copy(expected_out_vertices)
np.clip(expected_out_vertices_clamped, a_min=0.0, a_max=1.0)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bug - this line has no effect.

np.testing.assert_allclose(expected_out_vertices_abs, out_vertices_abs, rtol=1e-4)

# Checking clamping of the relative coordinates
expected_out_vertices_clamped = np.copy(expected_out_vertices)
np.clip(expected_out_vertices_clamped, a_min=0.0, a_max=1.0)
np.testing.assert_allclose(expected_out_vertices_clamped, out_vertices, rtol=1e-4)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparison of wrong value (out_vertices) against mistakenly unmodified expected values... It passed because of two errors cancelling each other.

h, w, _ = image_shape
wh = np.array([w, h])
whwh = np.array([w, h, w, h])
expected_out_vertices_abs = expected_out_vertices * wh
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored - broadcasting works on innermost dimensions, so we can just use this instead of running loops.

@mzient
Copy link
Contributor Author

mzient commented Jul 5, 2022

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [5261750]: BUILD STARTED

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
@mzient
Copy link
Contributor Author

mzient commented Jul 5, 2022

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [5261788]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [5261788]: BUILD PASSED

@szalpal szalpal self-assigned this Jul 5, 2022
@szalpal
Copy link
Member

szalpal commented Jul 5, 2022

Triggered L1 tests

Copy link
Member

@szalpal szalpal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First batch of remarks (13/20 files viewed)

Comment on lines 19 to 21
def make_batch_select_masks(batch_size,
npolygons_range=(1, 10), nvertices_range=(3, 40),
vertex_ndim=2, vertex_dtype=np.float32):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The batch_size in the first line looks weird. How about?

Suggested change
def make_batch_select_masks(batch_size,
npolygons_range=(1, 10), nvertices_range=(3, 40),
vertex_ndim=2, vertex_dtype=np.float32):
def make_batch_select_masks(batch_size,
npolygons_range=(1, 10),
nvertices_range=(3, 40),
vertex_ndim=2,
vertex_dtype=np.float32):

Comment on lines 60 to 63
get_pipeline_desc=external_source_tester(max_shape,
dtype,
RandomSampleIterator(max_shape, dtype(0), min_shape=min_shape),
iterator,
batch=batch),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These would right now fit in a single line, right?

Copy link
Contributor Author

@mzient mzient Jul 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... i guess it would. However, it would look inconsistent with other invocations of this function in the same file. If we're about readability, I think that a common pattern within this file should have a common look. I'm fine either way.

Comment on lines 88 to 92
get_pipeline_desc=external_source_tester(max_shape,
dtype,
RandomSampleIterator(max_shape, dtype(0), min_shape=min_shape),
iterator,
"gpu",
batch=batch),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above

Comment on lines 42 to 43
pipe.set_outputs(fn.external_source(lambda i: [cp.array([attempt * 100 + i * 10 + 1.5],
dtype=cp.float32)]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe breaking this lambda in the middle looks confusing. At the first glance it looks like dtype is an arg to the fn.external_source, not cp.array

Suggested change
pipe.set_outputs(fn.external_source(lambda i: [cp.array([attempt * 100 + i * 10 + 1.5],
dtype=cp.float32)]))
pipe.set_outputs(fn.external_source(
lambda i: [cp.array([attempt * 100 + i * 10 + 1.5], dtype=cp.float32)]
))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively you could define the lambda in the line above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. I'll change the lambda to a local function.

Comment on lines 26 to 29
def check_select_masks(batch_size,
npolygons_range=(1, 10), nvertices_range=(3, 40),
vertex_ndim=2, vertex_dtype=np.float32,
reindex_masks=False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As somewhere above

Suggested change
def check_select_masks(batch_size,
npolygons_range=(1, 10), nvertices_range=(3, 40),
vertex_ndim=2, vertex_dtype=np.float32,
reindex_masks=False):
def check_select_masks(batch_size,
npolygons_range=(1, 10),
nvertices_range=(3, 40),
vertex_ndim=2,
vertex_dtype=np.float32,
reindex_masks=False):

Comment on lines 37 to 40
source=get_data_source(batch_size, npolygons_range=npolygons_range,
nvertices_range=nvertices_range, vertex_ndim=vertex_ndim, vertex_dtype=vertex_dtype),
nvertices_range=nvertices_range,
vertex_ndim=vertex_ndim,
vertex_dtype=vertex_dtype),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also put npolygons_range in the next line for consistency? I'd create a suggestion, but github forbids to do it for non-changed lines

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can always type ```suggestion by hand and it works.

Comment on lines 91 to 95
yield (check_select_masks,
batch_size,
npolygons_range, nvertices_range,
vertex_ndim, vertex_dtype,
reindex_masks)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly here, how about putting every arg in new line?

Suggested change
yield (check_select_masks,
batch_size,
npolygons_range, nvertices_range,
vertex_ndim, vertex_dtype,
reindex_masks)
yield (check_select_masks,
batch_size,
npolygons_range,
nvertices_range,
vertex_ndim,
vertex_dtype,
reindex_masks)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine either way.

Comment on lines +143 to +145
({"lo_0": 0, "at_0": 0}, "both as an index"),
({"at_0": 0, "step_0": 1}, "cannot have a step")
]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this indentation may be shrunk

Suggested change
({"lo_0": 0, "at_0": 0}, "both as an index"),
({"at_0": 0, "step_0": 1}, "cannot have a step")
]:
({"lo_0": 0, "at_0": 0}, "both as an index"),
({"at_0": 0, "step_0": 1}, "cannot have a step")
]:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It cannot - it must be indented more than the following line.

Comment on lines +152 to +154
{"step_0": 2},
{"step_1": -1},
]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly here

Suggested change
{"step_0": 2},
{"step_1": -1},
]:
{"step_0": 2},
{"step_1": -1},
]:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise.

@jantonguirao jantonguirao self-assigned this Jul 6, 2022
Comment on lines +240 to +241
for gpu in range(num_gpus)
]
Copy link
Contributor

@jantonguirao jantonguirao Jul 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for gpu in range(num_gpus)
]
for gpu in range(num_gpus)]

I think most pep8 autoformatters would stick to this.

Copy link
Contributor Author

@mzient mzient Jul 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They wouldn't if the list contents start in a new line.
The other option would be:

    pipes = [COCOReaderPipeline(batch_size=batch_size,
                                num_threads=4,
                                shard_id=gpu,
                                num_gpus=num_gpus,
                                data_paths=datasets[0],
                                random_shuffle=False,
                                stick_to_shard=False,
                                shuffle_after_epoch=True,
                                pad_last_batch=False)
             for gpu in range(num_gpus)]

but it's indented more and the braces are not quite as pronounced as I think they should be (it's not common to create a list of pipelines like this, so I wanted it to stand out)

Copy link
Contributor

@jantonguirao jantonguirao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with comments

@@ -99,7 +108,7 @@ def _test_select_masks_wrong_input(data_source_fn, err_regex):
p = wrong_input_pipe(data_source_fn=data_source_fn)
p.build()
with assert_raises(RuntimeError, regex=err_regex):
o = p.run()
_ = p.run()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p.run() should be enough. Or am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is about readability? - we say that there is a result, we just ignore it. It does make a difference when run interactively (the result would be printed).

Comment on lines +442 to +443
((1, 0), None), ((0, 1), None)
]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
((1, 0), None), ((0, 1), None)
]:
((1, 0), None), ((0, 1), None)]:

Would be preferred, I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again - not when there's a new line after the opening brace.

Comment on lines 185 to 186
seq = torch.arange(max_len, dtype=seq_len.dtype, device=x.device)
mask = seq.expand(x.size(0), max_len) >= seq_len.unsqueeze(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's mostly OK, but I would have preferred not refactoring this code, since it's taken from "the reference" just to check that our suggested DALI pipeline does the same thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've run this test and it still works...

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
@mzient
Copy link
Contributor Author

mzient commented Jul 6, 2022

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [5268841]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [5268841]: BUILD PASSED

Copy link
Member

@szalpal szalpal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few comments, all files reviewed. Please make sure, that L1 tests pass

Comment on lines 74 to 75
super().__init__(
batch_size, num_threads, device_id, seed=1234)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
super().__init__(
batch_size, num_threads, device_id, seed=1234)
super().__init__(batch_size, num_threads, device_id, seed=1234)

Comment on lines 121 to 122
super().__init__(
batch_size, num_threads, device_id, seed=1234)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
super().__init__(
batch_size, num_threads, device_id, seed=1234)
super().__init__(batch_size, num_threads, device_id, seed=1234)

Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [5270468]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [5270468]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [5270468]: BUILD PASSED

@mzient mzient merged commit dd1bc13 into NVIDIA:main Jul 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants