-
Notifications
You must be signed in to change notification settings - Fork 607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python formatting #4035
Python formatting #4035
Conversation
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
# each GPU needs to iterate from `shard_id * data_size / num_gpus` samples | ||
# to `(shard_id + 1)* data_size / num_gpus` | ||
# after each epoch each GPU moves to the next shard | ||
# epochs_run takes into account that after epoch readers advances to the next shard, if shuffle_after_epoch or stick_to_shard | ||
# if doesn't matter and could/should be 0 | ||
# it is relevant only pad_last_batch == False, otherwise each shard has the same size thanks to padding | ||
# epochs_run takes into account that after epoch readers advances to the | ||
# next shard, if shuffle_after_epoch or stick_to_shard if doesn't matter | ||
# and could/should be 0 | ||
# it is relevant only if pad_last_batch == False, otherwise each shard has | ||
# the same size thanks to padding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To someone knowledgeable: what does this mean? The grammar is quite broken and I wasn't able to fix it without the risk of changing the meaning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding from the comment and the code. Feel free to copy it or reword it as you see fit:
- there's a variable epochs_run that tells you the epoch count.
- Epochs run is used to know in which shard we currently are. This is because we move to the next shard at the end of the epoch. Here's an example:
3 shards:
Epoch 0: [shard0, shard1, shard2] # meaning pipelines are reading those shards
Epoch 1: [shard1, shard2, shard0]
Epoch 2: [shard2, shard0, shard1] # ... and so on - Such logic does not make sense when shuffle_after_epoch=True or stick_to_shard=True, therefore we just set epochs_run=0 so that it doesn't influence in the shard_size calculation
- Also, when pad_last_batch=False, all shards at the same size, so this logic also becomes irrelevant.
!build |
CI MESSAGE: [5261344]: BUILD STARTED |
output_dtypes=(tf.int32), | ||
output_shapes=[(1)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These aren't tuples - just values in parenthesis. What was the intention?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the author meant [(1,)]
and (tf.int32,)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't work with the trailing commas. Fixing it (if necessary) is beyond the scope of this task.
I'll remove the misleading parenthesis, though.
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
|
||
with assert_raises( | ||
RuntimeError, | ||
glob='"end", "rel_end", "shape", and "rel_shape" arguments are mutually exclusive'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I deliberately chose the other quotes to get rid of escape sequences.
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
np.testing.assert_allclose(expected_out_vertices_abs, out_vertices_abs, rtol=1e-4) | ||
|
||
# Checking clamping of the relative coordinates | ||
expected_out_vertices_clamped = np.copy(expected_out_vertices) | ||
np.clip(expected_out_vertices_clamped, a_min=0.0, a_max=1.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bug - this line has no effect.
np.testing.assert_allclose(expected_out_vertices_abs, out_vertices_abs, rtol=1e-4) | ||
|
||
# Checking clamping of the relative coordinates | ||
expected_out_vertices_clamped = np.copy(expected_out_vertices) | ||
np.clip(expected_out_vertices_clamped, a_min=0.0, a_max=1.0) | ||
np.testing.assert_allclose(expected_out_vertices_clamped, out_vertices, rtol=1e-4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparison of wrong value (out_vertices) against mistakenly unmodified expected values... It passed because of two errors cancelling each other.
h, w, _ = image_shape | ||
wh = np.array([w, h]) | ||
whwh = np.array([w, h, w, h]) | ||
expected_out_vertices_abs = expected_out_vertices * wh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactored - broadcasting works on innermost dimensions, so we can just use this instead of running loops.
!build |
CI MESSAGE: [5261750]: BUILD STARTED |
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
!build |
CI MESSAGE: [5261788]: BUILD STARTED |
CI MESSAGE: [5261788]: BUILD PASSED |
Triggered L1 tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First batch of remarks (13/20 files viewed)
def make_batch_select_masks(batch_size, | ||
npolygons_range=(1, 10), nvertices_range=(3, 40), | ||
vertex_ndim=2, vertex_dtype=np.float32): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The batch_size
in the first line looks weird. How about?
def make_batch_select_masks(batch_size, | |
npolygons_range=(1, 10), nvertices_range=(3, 40), | |
vertex_ndim=2, vertex_dtype=np.float32): | |
def make_batch_select_masks(batch_size, | |
npolygons_range=(1, 10), | |
nvertices_range=(3, 40), | |
vertex_ndim=2, | |
vertex_dtype=np.float32): |
get_pipeline_desc=external_source_tester(max_shape, | ||
dtype, | ||
RandomSampleIterator(max_shape, dtype(0), min_shape=min_shape), | ||
iterator, | ||
batch=batch), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These would right now fit in a single line, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm... i guess it would. However, it would look inconsistent with other invocations of this function in the same file. If we're about readability, I think that a common pattern within this file should have a common look. I'm fine either way.
get_pipeline_desc=external_source_tester(max_shape, | ||
dtype, | ||
RandomSampleIterator(max_shape, dtype(0), min_shape=min_shape), | ||
iterator, | ||
"gpu", | ||
batch=batch), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above
pipe.set_outputs(fn.external_source(lambda i: [cp.array([attempt * 100 + i * 10 + 1.5], | ||
dtype=cp.float32)])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe breaking this lambda in the middle looks confusing. At the first glance it looks like dtype
is an arg to the fn.external_source
, not cp.array
pipe.set_outputs(fn.external_source(lambda i: [cp.array([attempt * 100 + i * 10 + 1.5], | |
dtype=cp.float32)])) | |
pipe.set_outputs(fn.external_source( | |
lambda i: [cp.array([attempt * 100 + i * 10 + 1.5], dtype=cp.float32)] | |
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively you could define the lambda in the line above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch. I'll change the lambda to a local function.
def check_select_masks(batch_size, | ||
npolygons_range=(1, 10), nvertices_range=(3, 40), | ||
vertex_ndim=2, vertex_dtype=np.float32, | ||
reindex_masks=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As somewhere above
def check_select_masks(batch_size, | |
npolygons_range=(1, 10), nvertices_range=(3, 40), | |
vertex_ndim=2, vertex_dtype=np.float32, | |
reindex_masks=False): | |
def check_select_masks(batch_size, | |
npolygons_range=(1, 10), | |
nvertices_range=(3, 40), | |
vertex_ndim=2, | |
vertex_dtype=np.float32, | |
reindex_masks=False): |
source=get_data_source(batch_size, npolygons_range=npolygons_range, | ||
nvertices_range=nvertices_range, vertex_ndim=vertex_ndim, vertex_dtype=vertex_dtype), | ||
nvertices_range=nvertices_range, | ||
vertex_ndim=vertex_ndim, | ||
vertex_dtype=vertex_dtype), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also put npolygons_range
in the next line for consistency? I'd create a suggestion, but github forbids to do it for non-changed lines
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can always type ```suggestion
by hand and it works.
yield (check_select_masks, | ||
batch_size, | ||
npolygons_range, nvertices_range, | ||
vertex_ndim, vertex_dtype, | ||
reindex_masks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly here, how about putting every arg in new line?
yield (check_select_masks, | |
batch_size, | |
npolygons_range, nvertices_range, | |
vertex_ndim, vertex_dtype, | |
reindex_masks) | |
yield (check_select_masks, | |
batch_size, | |
npolygons_range, | |
nvertices_range, | |
vertex_ndim, | |
vertex_dtype, | |
reindex_masks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine either way.
({"lo_0": 0, "at_0": 0}, "both as an index"), | ||
({"at_0": 0, "step_0": 1}, "cannot have a step") | ||
]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this indentation may be shrunk
({"lo_0": 0, "at_0": 0}, "both as an index"), | |
({"at_0": 0, "step_0": 1}, "cannot have a step") | |
]: | |
({"lo_0": 0, "at_0": 0}, "both as an index"), | |
({"at_0": 0, "step_0": 1}, "cannot have a step") | |
]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It cannot - it must be indented more than the following line.
{"step_0": 2}, | ||
{"step_1": -1}, | ||
]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly here
{"step_0": 2}, | |
{"step_1": -1}, | |
]: | |
{"step_0": 2}, | |
{"step_1": -1}, | |
]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise.
for gpu in range(num_gpus) | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for gpu in range(num_gpus) | |
] | |
for gpu in range(num_gpus)] |
I think most pep8 autoformatters would stick to this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They wouldn't if the list contents start in a new line.
The other option would be:
pipes = [COCOReaderPipeline(batch_size=batch_size,
num_threads=4,
shard_id=gpu,
num_gpus=num_gpus,
data_paths=datasets[0],
random_shuffle=False,
stick_to_shard=False,
shuffle_after_epoch=True,
pad_last_batch=False)
for gpu in range(num_gpus)]
but it's indented more and the braces are not quite as pronounced as I think they should be (it's not common to create a list of pipelines like this, so I wanted it to stand out)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with comments
@@ -99,7 +108,7 @@ def _test_select_masks_wrong_input(data_source_fn, err_regex): | |||
p = wrong_input_pipe(data_source_fn=data_source_fn) | |||
p.build() | |||
with assert_raises(RuntimeError, regex=err_regex): | |||
o = p.run() | |||
_ = p.run() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p.run()
should be enough. Or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is about readability? - we say that there is a result, we just ignore it. It does make a difference when run interactively (the result would be printed).
((1, 0), None), ((0, 1), None) | ||
]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
((1, 0), None), ((0, 1), None) | |
]: | |
((1, 0), None), ((0, 1), None)]: |
Would be preferred, I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again - not when there's a new line after the opening brace.
seq = torch.arange(max_len, dtype=seq_len.dtype, device=x.device) | ||
mask = seq.expand(x.size(0), max_len) >= seq_len.unsqueeze(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's mostly OK, but I would have preferred not refactoring this code, since it's taken from "the reference" just to check that our suggested DALI pipeline does the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've run this test and it still works...
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
!build |
CI MESSAGE: [5268841]: BUILD STARTED |
CI MESSAGE: [5268841]: BUILD PASSED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few comments, all files reviewed. Please make sure, that L1 tests pass
super().__init__( | ||
batch_size, num_threads, device_id, seed=1234) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super().__init__( | |
batch_size, num_threads, device_id, seed=1234) | |
super().__init__(batch_size, num_threads, device_id, seed=1234) |
super().__init__( | ||
batch_size, num_threads, device_id, seed=1234) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super().__init__( | |
batch_size, num_threads, device_id, seed=1234) | |
super().__init__(batch_size, num_threads, device_id, seed=1234) |
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
Signed-off-by: Michal Zientkiewicz <michalz@nvidia.com>
CI MESSAGE: [5270468]: BUILD STARTED |
CI MESSAGE: [5270468]: BUILD FAILED |
CI MESSAGE: [5270468]: BUILD PASSED |
Signed-off-by: Michal Zientkiewicz michalz@nvidia.com
Category:
Refactoring
Formatting
Bug fix
Description:
Additional information:
Affected modules and functionalities:
Key points relevant for the review:
Tests:
Checklist
Documentation
DALI team only
Requirements
REQ IDs: N/A
JIRA TASK: DALI-2824