Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add last_batch_policy to the framework iterator #2269

Merged
merged 8 commits into from
Oct 17, 2020

Conversation

JanuszL
Copy link
Contributor

@JanuszL JanuszL commented Sep 11, 2020

  • adds a last_batch_policy which can be DROP, FILL, or PARTIAL
  • replaces fill_last_batch, True means FILL, False PARTIAL
  • when DROP policy is set the last batch is dropped when cannot be fully filled

Signed-off-by: Janusz Lisiecki jlisiecki@nvidia.com

Why we need this PR?

Pick one, remove the rest

  • It adds last_batch_policy to the framework iterator

What happened in this PR?

Fill relevant points, put NA otherwise. Replace anything inside []

  • What solution was applied:
    adds a last_batch_policy which can be DROP, FILL, or PARTIAL
    deprecates fill_last_batch
  • Affected modules and functionalities:
    framework iterator
  • Key points relevant for the review:
    NA
  • Validation and testing:
    new test cases are added
  • Documentation (including examples):
    [ Describe here if documentation and examples were updated. ]

JIRA TASK: [NA]

@JanuszL
Copy link
Contributor Author

JanuszL commented Sep 11, 2020

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1618455]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1618455]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1618455]: BUILD PASSED

@JanuszL
Copy link
Contributor Author

JanuszL commented Sep 14, 2020

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1623519]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1623519]: BUILD FAILED

@JanuszL
Copy link
Contributor Author

JanuszL commented Sep 14, 2020

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1623602]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1623602]: BUILD PASSED

@JanuszL
Copy link
Contributor Author

JanuszL commented Sep 15, 2020

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1626200]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1626200]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1626200]: BUILD PASSED

@@ -253,18 +256,91 @@ def test_mxnet_iterator_not_fill_last_batch_pad_last_batch():
assert len(next_img_ids_list_set) == data_size
assert len(set(next_mirrored_data)) == 1

def check_mxnet_iterator_pass_reader_name(shards_num, pipes_number, batch_size, stick_to_shard, pad, iters, fill_last_batch):
LastBatchPolicy = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

????!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed


if pad and pipes_number == shards_num:
assert len(set.intersection(*out_set)) == 0, "Shards should not overlaps in the epoch"
if last_batch_policy == LastBatchPolicy.DROP:
Copy link
Contributor

@mzient mzient Sep 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is set afterwards with a global?? Why can't we just import the LastBatchPolicy at the beginning?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added just from nvidia.dali.plugin.base_iterator import LastBatchPolicy as LastBatchPolicy

But I don't know if we want to encourage users to use this internal, undocumented base module.

@JanuszL JanuszL force-pushed the add_drop_last branch 2 times, most recently from b1cf9f8 to 6948e9a Compare September 23, 2020 20:58
@JanuszL
Copy link
Contributor Author

JanuszL commented Sep 23, 2020

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1649313]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1649313]: BUILD FAILED

@JanuszL
Copy link
Contributor Author

JanuszL commented Sep 24, 2020

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1650799]: BUILD STARTED

@unique
class LastBatchPolicy(Enum):
"""
Describes the last batch policy behavior when there are no enough samples in the epoch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Describes the last batch policy behavior when there are no enough samples in the epoch
Describes the last batch policy behavior when there are not enough samples in the epoch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

class LastBatchPolicy(Enum):
"""
Describes the last batch policy behavior when there are no enough samples in the epoch
to fully fill it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
to fully fill it
to fill a whole batch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

# calculate each shard size for each id, and check how many samples are left by substracting
# from iterator counter the shard size, then go though all GPUs and check how much data needs to be dropped
left = self.batch_size - (self._counter - self._shard_sizes_per_gpu_initial[self._shards_id])
if_drop = np.less(left, self.batch_size)
return if_drop, left

def _advance_and_check_drop_last(self):
"""
Checks if the current batch is not fully filled and if drop it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Checks if the current batch is not fully filled and if drop it
Checks whether the current batch is not fully filled and whether it should be dropped.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

`auto_reset` don't work in such case. It works with only one pipeline inside
the iterator.
Mutually exclusive with `reader_name` argument
reader_name : str, default = None
Name of the reader which will be queried to the shard size, number of shards and
all other properties necessary to count properly the number of relevant and padded
samples that iterator needs to deal with. It automatically sets `fill_last_batch` and
`last_batch_padded` accordingly to match the reader's configuration
samples that iterator needs to deal with. It automatically sets `last_batch_policy` to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the last sentence. Can you elaborate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about now?

@@ -70,7 +70,7 @@ def get_train_dali_loader(args, default_boxes, local_seed):
train_pipe,
["images", "boxes", "labels"],
reader_name="Reader",
fill_last_batch=True)
last_batch_policy=LastBatchPolicy.PARTIAL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You changed the policy. Is it intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

"\n",
"`last_batch_padded` and `last_batch_policy` are set here only for the demonstration purposes. The user may write any custom code and change the epoch size epoch to epoch. In that case, it is recommended to set `size` to -1 and let the iterator just wait for StopIteration exception from the `iter_setup`.\n",
"\n",
"The `last_batch_padded` here tells the iterator that the difference between data set size and batch size alignment is padded by real data that could be skipped when provided to the framework (`last_batch_policy`):"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"The `last_batch_padded` here tells the iterator that the difference between data set size and batch size alignment is padded by real data that could be skipped when provided to the framework (`last_batch_policy`):"
"The `last_batch_padded` here tells the iterator that the difference between dataset size and batch size alignment is padded by real data that could be skipped when provided to the framework (`last_batch_policy`):"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -141,9 +141,9 @@
"\n",
"In the end, let us see how it works.\n",
"\n",
"`last_batch_padded` and `fill_last_batch` are set here only for the demonstration purposes. The user may write any custom code and change the epoch size epoch to epoch. In that case, it is recommended to set `size` to -1 and let the iterator just wait for StopIteration exception from the `iter_setup`.\n",
"`last_batch_padded` and `last_batch_policy` are set here only for the demonstration purposes. The user may write any custom code and change the epoch size epoch to epoch. In that case, it is recommended to set `size` to -1 and let the iterator just wait for StopIteration exception from the `iter_setup`.\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"`last_batch_padded` and `last_batch_policy` are set here only for the demonstration purposes. The user may write any custom code and change the epoch size epoch to epoch. In that case, it is recommended to set `size` to -1 and let the iterator just wait for StopIteration exception from the `iter_setup`.\n",
"`last_batch_padded` and `last_batch_policy` are set here only for the demonstration purposes. The user may write any custom code and change the epoch size epoch to epoch. In that case, it is recommended to set `size` to -1 and let the iterator just wait for StopIteration exception from `iter_setup`.\n",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"\n",
"The `last_batch_padded` here tells the iterator that the difference between data set size and batch size alignment is padded by real data that could be skipped when provided to the framework (`fill_last_batch`):"
"The `last_batch_padded` here tells the iterator that the difference between data set size and batch size alignment is padded by real data that could be skipped when provided to the framework (`last_batch_policy`):"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"The `last_batch_padded` here tells the iterator that the difference between data set size and batch size alignment is padded by real data that could be skipped when provided to the framework (`last_batch_policy`):"
"The `last_batch_padded` here tells the iterator that the difference between dataset size and batch size alignment is padded by real data that could be skipped when provided to the framework (`last_batch_policy`):"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -217,7 +217,10 @@ Here are the iterator options:
- | ``last_batch_padded``: Determines whether the tail of the data consists of data from the next
shard (``False``) or is duplicated dummy data (``True``).
| It is applicable when the shard size is not a multiple of the batch size,

- | ``last_batch_policy`` - whether the last batch should be full no matter if shard size is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- | ``last_batch_policy`` - whether the last batch should be full no matter if shard size is
- | ``last_batch_policy`` - Determines the police about the last batch when the shard size is not

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


- | ``last_batch_policy`` - whether the last batch should be full no matter if shard size is
divisible by the batch size.
| Only partially filled with the data or dropped entirely if it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

T

Suggested change
| Only partially filled with the data or dropped entirely if it
The possible options are:
- FILL ...
- DROP ...
- PARTIAL ...

Something like this would read better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1676861]: BUILD PASSED

try:
self._first_batch = self.next()
except StopIteration:
assert False, "It seems that there is not data in the pipeline, please check if last_batch_policy is set PARTIAL and batch size is bigger than the shard size"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert False, "It seems that there is not data in the pipeline, please check if last_batch_policy is set PARTIAL and batch size is bigger than the shard size"
assert False, "It seems that there is no data in the pipeline. This may happen if `last_batch_policy` is set to PARTIAL and the requested batch size is greater than the shard size."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

try:
self._first_batch = self.next()
except StopIteration:
assert False, "It seems that there is not data in the pipeline, please check if last_batch_policy is set PARTIAL and batch size is bigger than the shard size"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert False, "It seems that there is not data in the pipeline, please check if last_batch_policy is set PARTIAL and batch size is bigger than the shard size"
assert False, "It seems that there is no data in the pipeline. This may happen if `last_batch_policy` is set to PARTIAL and the requested batch size is greater than the shard size."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

try:
self._first_batch = self.next()
except StopIteration:
assert False, "It seems that there is not data in the pipeline, please check if last_batch_policy is set PARTIAL and batch size is bigger than the shard size"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert False, "It seems that there is not data in the pipeline, please check if last_batch_policy is set PARTIAL and batch size is bigger than the shard size"
assert False, "It seems that there is no data in the pipeline. This may happen if `last_batch_policy` is set to PARTIAL and the requested batch size is greater than the shard size."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

try:
self._first_batch = self.next()
except StopIteration:
assert False, "It seems that there is not data in the pipeline, please check if last_batch_policy is set PARTIAL and batch size is bigger than the shard size"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert False, "It seems that there is not data in the pipeline, please check if last_batch_policy is set PARTIAL and batch size is bigger than the shard size"
assert False, "It seems that there is no data in the pipeline. This may happen if `last_batch_policy` is set to PARTIAL and the requested batch size is greater than the shard size."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

"At the end let us see how it works. Please also notice the usage of `last_batch_padded` that tell iterator that the difference between data set size and batch size alignment is padded by real data that could be skipped at when provided to the framework (`fill_last_batch`):"
"In the end, let us see how it works.\n",
"\n",
"`last_batch_padded` and `last_batch_policy` are set here only for the demonstration purposes. The user may write any custom code and change the epoch size epoch to epoch. In that case, it is recommended to set `size` to -1 and let the iterator just wait for StopIteration exception from `iter_setup`.\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"`last_batch_padded` and `last_batch_policy` are set here only for the demonstration purposes. The user may write any custom code and change the epoch size epoch to epoch. In that case, it is recommended to set `size` to -1 and let the iterator just wait for StopIteration exception from `iter_setup`.\n",
"`last_batch_padded` and `last_batch_policy` are set here only for the demonstration purposes. The user may write any custom code and change the epoch size from epoch to epoch. In that case, it is recommended to set `size` to -1 and let the iterator just wait for StopIteration exception from `iter_setup`.\n",

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -223,4 +224,4 @@
},
"nbformat": 4,
"nbformat_minor": 2
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing empty line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -211,4 +216,4 @@
},
"nbformat": 4,
"nbformat_minor": 2
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@JanuszL JanuszL force-pushed the add_drop_last branch 3 times, most recently from cfcfa47 to 049d631 Compare October 16, 2020 15:54
@JanuszL
Copy link
Contributor Author

JanuszL commented Oct 16, 2020

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1708269]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1708269]: BUILD FAILED

- adds a last_batch_policy which can be DROP, FILL, or PARTIAL
- replaces fill_last_batch, True means FILL, False PARTIAL
- when DROP policy is set the last batch is dropped when cannot
  be fully filled

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
@JanuszL
Copy link
Contributor Author

JanuszL commented Oct 16, 2020

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1708962]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1708962]: BUILD PASSED

@JanuszL JanuszL merged commit 1ee090d into NVIDIA:master Oct 17, 2020
@JanuszL JanuszL deleted the add_drop_last branch October 17, 2020 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants