Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update examples with COCO data set and fix reader behavior for padding #1557

Merged
merged 2 commits into from
Jan 9, 2020
Merged

Update examples with COCO data set and fix reader behavior for padding #1557

merged 2 commits into from
Jan 9, 2020

Conversation

JanuszL
Copy link
Contributor

@JanuszL JanuszL commented Dec 9, 2019

  • updates examples using COCO data set adjusting it to new data with meaningful bboxes and segmentation data
  • fixes problem with wrong cloning of the last sample in the batch when pad_last_batch is enabled and the reader needs to stick to the shard
  • makes the reader pad the whole batch when the number of batches differs between shards following the PyTorch behavior - each shard size is calculated as int(ceil(data_size/no_shards))*no_shards
  • adjust test_operator_reader_shuffling.py to work when the readers for different GPUs have a different number of iterations to make - data_set_size not divisible by the number of requested shards

Signed-off-by: Janusz Lisiecki jlisiecki@nvidia.com

Why we need this PR?

  • updates examples with COCO data set
  • adjusts test to use old COCO in different pah
  • fixes problem with wrong cloning of the last sample in the batch when pad_last_batch is enabled and the reader needs to stick to the shard
  • makes the reader pad the whole batch when the number of batches differs between shards following the PyTorch behavior - each shard size is calculated as int(ceil(data_size/no_shards))*no_shards

What happened in this PR?

  • updates examples using COCO data set adjusting it to new data with meaningful bboxes and segmentation data
  • fixes problem with wrong cloning of the last sample in the batch when pad_last_batch is enabled and the reader needs to stick to the shard inside loader.h
  • makes the reader pad the whole batch when the number of batches differs between shards following the PyTorch behavior - each shard size is calculated as int(ceil(data_size/no_shards))*no_shards
  • adjust test_operator_reader_shuffling.py to work when the readers for different GPUs have a different number of iterations to make - data_set_size not divisible by the number of requested shards. Also refactors test code
  • examples have been regenerated with new data in Add a meaningful bboxes and segmentation to COCO dataset DALI_extra#36
  • DALI_EXTRA_VERSION needs to be updated
  • examples using COCO are regenerated, pad_last_batch option description is updated

JIRA TASK: [NA]

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1024363]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1024363]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1026525]: BUILD STARTED

@JanuszL JanuszL changed the title Update examples with COCO data set Update examples with COCO data set and fix tests Dec 10, 2019
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1026525]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1027232]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1027232]: BUILD FAILED

iterate_over = gpus_arg
img_ids_list = [[] for _ in pipes]

# each GPU needst to iterate from `shard_id * data_size / num_gpus` samples to `(shard_id + 1)* data_size / num_gpus`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo, I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1029894]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1029894]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1032081]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1032081]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1032771]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1032771]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1033593]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1033593]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1033642]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1033642]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1033933]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1033933]: BUILD PASSED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1036979]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1036979]: BUILD FAILED

@JanuszL JanuszL changed the title Update examples with COCO data set and fix tests Update examples with COCO data set and fix reader behavior for padding Dec 17, 2019
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1039438]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1039438]: BUILD PASSED

with the shard size.)code", false);
R"code(If set to true, the Loader will pad the last batch with the last image when the batch size is
not aligned with the shard size. It means that the remainder of the batch or even the whole batch can be
artificially added when data set size is not equally divisible by the number of shards, and the shard is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
artificially added when data set size is not equally divisible by the number of shards, and the shard is
artificially added when the data set size is not equally divisible by the number of shards, and the shard is

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -9,7 +9,10 @@ DALI_EXTRA_VERSION_PATH="${DIR}/../DALI_EXTRA_VERSION"
read -r DALI_EXTRA_VERSION < ${DALI_EXTRA_VERSION_PATH}
echo "Using DALI_EXTRA_VERSION = ${DALI_EXTRA_VERSION}"
if [ ! -d "$DALI_EXTRA_PATH" ] ; then
git clone "$DALI_EXTRA_URL" "$DALI_EXTRA_PATH"
git clone https://github.com/JanuszL/DALI_extra.git "$DALI_EXTRA_PATH"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to be reverted

@@ -1 +1 @@
d61722e9fa6df5379cba68941e3f94bff9814def
e05294a49bea2d0d0da516955eef0bc476c92ae2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs to be bumped up


size_t start_index(const size_t shard_id,
const size_t shard_num,
const size_t size) {
return size * shard_id / shard_num;
}

Index num_samples(const size_t shard_num,
const size_t size) {
return static_cast<size_t>(std::ceil(size * 1.0 / shard_num));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return static_cast<size_t>(std::ceil(size * 1.0 / shard_num));
return static_cast<Index>(std::ceil(size * 1.0 / shard_num));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

if (!loading_flag_) {
PrepareMetadata();
}
if (!pad_last_batch_) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: I think it'd read more natural if you rephrase the logi as

if (pad_last_batch) {
  // .. handle special case
} else {
  return SizeImpl();
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -317,8 +338,9 @@ class Loader {
std::once_flag fetch_cache_;
std::shared_ptr<ImageCache> cache_;

// Counts how many samples reader have read already from this and next epoch
// Counts how many samples reader have read already from this epoch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Counts how many samples reader have read already from this epoch
// Counts how many samples the reader have read already from this epoch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Index read_sample_counter_;
Index returned_sample_counter_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you describe this member variable as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

val = np.concatenate(pipe.outputs()[0].as_array())
yield check, data_set, num_gpus, batch_size, stick_to_shard, shuffle_after_epoch, dry_run_num, len(ref_img_ids)

def check(data_set, num_gpus, batch_size, stick_to_shard, shuffle_after_epoch, dry_run_num, len_ref_img_ids):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def check(data_set, num_gpus, batch_size, stick_to_shard, shuffle_after_epoch, dry_run_num, len_ref_img_ids):
def check_shuffling_patterns(data_set, num_gpus, batch_size, stick_to_shard, shuffle_after_epoch, dry_run_num, len_ref_img_ids):

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -146,11 +150,14 @@ class Loader {

int samples_to_choose_from = initial_buffer_fill_;
if (shards_.front().start == shards_.front().end) {
if (!is_new_epoch && pad_last_batch_) {
if ((returned_sample_counter_ < num_samples(num_shards_, Size()) || !is_new_epoch) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand the logic behind this. Can you add couple of sentences as a comment here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@JanuszL
Copy link
Contributor Author

JanuszL commented Jan 3, 2020

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1055499]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1055499]: BUILD FAILED

- updates examples using COCO data set adjusting it to new data
  with meaningful bboxes and segmentation data
- fixes problem with wrong cloning of the last sample in the batch
  when pad_last_batch is enabled and the reader needs to stick to the shard
- makes the reader pad the whole batch when the number of batches differs
  between shards following the PyTorch behavior - each shard size is
  calculated as int(ceil(data_size/no_shards))*no_shards
- adjust test_operator_reader_shuffling.py to work when the readers for
  different GPUs have a different number of iterations to make - data_set_size
  not divisible by the number of requested shards

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
@JanuszL
Copy link
Contributor Author

JanuszL commented Jan 3, 2020

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1055995]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1055995]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1056700]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1056700]: BUILD PASSED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1064194]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1064194]: BUILD FAILED

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1064216]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [1064216]: BUILD PASSED

@JanuszL JanuszL merged commit 0d0cefa into NVIDIA:master Jan 9, 2020
@JanuszL JanuszL deleted the update_coco_ex branch January 9, 2020 12:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants