Update examples with COCO data set and fix reader behavior for padding #1557

JanuszL · 2019-12-09T17:26:23Z

updates examples using COCO data set adjusting it to new data with meaningful bboxes and segmentation data
fixes problem with wrong cloning of the last sample in the batch when pad_last_batch is enabled and the reader needs to stick to the shard
makes the reader pad the whole batch when the number of batches differs between shards following the PyTorch behavior - each shard size is calculated as int(ceil(data_size/no_shards))*no_shards
adjust test_operator_reader_shuffling.py to work when the readers for different GPUs have a different number of iterations to make - data_set_size not divisible by the number of requested shards

Signed-off-by: Janusz Lisiecki jlisiecki@nvidia.com

Why we need this PR?

updates examples with COCO data set
adjusts test to use old COCO in different pah
fixes problem with wrong cloning of the last sample in the batch when pad_last_batch is enabled and the reader needs to stick to the shard
makes the reader pad the whole batch when the number of batches differs between shards following the PyTorch behavior - each shard size is calculated as int(ceil(data_size/no_shards))*no_shards

What happened in this PR?

updates examples using COCO data set adjusting it to new data with meaningful bboxes and segmentation data
fixes problem with wrong cloning of the last sample in the batch when pad_last_batch is enabled and the reader needs to stick to the shard inside loader.h
makes the reader pad the whole batch when the number of batches differs between shards following the PyTorch behavior - each shard size is calculated as int(ceil(data_size/no_shards))*no_shards
adjust test_operator_reader_shuffling.py to work when the readers for different GPUs have a different number of iterations to make - data_set_size not divisible by the number of requested shards. Also refactors test code
examples have been regenerated with new data in Add a meaningful bboxes and segmentation to COCO dataset DALI_extra#36
DALI_EXTRA_VERSION needs to be updated
examples using COCO are regenerated, pad_last_batch option description is updated

JIRA TASK: [NA]

dali-automaton · 2019-12-09T17:31:51Z

CI MESSAGE: [1024363]: BUILD STARTED

dali-automaton · 2019-12-09T18:18:08Z

CI MESSAGE: [1024363]: BUILD FAILED

dali-automaton · 2019-12-10T19:07:02Z

CI MESSAGE: [1026525]: BUILD STARTED

dali-automaton · 2019-12-11T00:23:11Z

CI MESSAGE: [1026525]: BUILD FAILED

dali-automaton · 2019-12-11T01:43:17Z

CI MESSAGE: [1027232]: BUILD STARTED

dali-automaton · 2019-12-11T02:35:42Z

CI MESSAGE: [1027232]: BUILD FAILED

awolant · 2019-12-12T12:15:33Z

dali/test/python/test_operator_reader_shuffling.py

+       iterate_over = gpus_arg
+    img_ids_list = [[] for _ in pipes]
+
+    # each GPU needst to iterate from `shard_id * data_size / num_gpus` samples to `(shard_id + 1)* data_size / num_gpus`


Typo, I think

dali-automaton · 2019-12-12T13:36:15Z

CI MESSAGE: [1029894]: BUILD STARTED

dali-automaton · 2019-12-12T14:29:58Z

CI MESSAGE: [1029894]: BUILD FAILED

dali-automaton · 2019-12-13T17:16:36Z

CI MESSAGE: [1032081]: BUILD STARTED

dali-automaton · 2019-12-13T18:33:34Z

CI MESSAGE: [1032081]: BUILD FAILED

dali-automaton · 2019-12-14T00:57:31Z

CI MESSAGE: [1032771]: BUILD STARTED

dali-automaton · 2019-12-14T02:59:37Z

CI MESSAGE: [1032771]: BUILD FAILED

dali-automaton · 2019-12-14T22:50:30Z

CI MESSAGE: [1033593]: BUILD STARTED

dali-automaton · 2019-12-14T23:51:25Z

CI MESSAGE: [1033593]: BUILD FAILED

dali-automaton · 2019-12-15T00:18:57Z

CI MESSAGE: [1033642]: BUILD STARTED

dali-automaton · 2019-12-15T01:15:12Z

CI MESSAGE: [1033642]: BUILD FAILED

dali-automaton · 2019-12-15T10:13:54Z

CI MESSAGE: [1033933]: BUILD STARTED

dali-automaton · 2019-12-15T11:23:39Z

CI MESSAGE: [1033933]: BUILD PASSED

dali-automaton · 2019-12-17T17:13:13Z

CI MESSAGE: [1036979]: BUILD STARTED

dali-automaton · 2019-12-17T17:13:39Z

CI MESSAGE: [1036979]: BUILD FAILED

dali-automaton · 2019-12-19T00:04:19Z

CI MESSAGE: [1039438]: BUILD STARTED

dali-automaton · 2019-12-19T01:58:12Z

CI MESSAGE: [1039438]: BUILD PASSED

jantonguirao · 2020-01-02T09:24:38Z

dali/operators/reader/loader/loader.cc

-with the shard size.)code", false);
+      R"code(If set to true, the Loader will pad the last batch with the last image when the batch size is
+not aligned with the shard size. It means that the remainder of the batch or even the whole batch can be
+artificially added when data set size is not equally divisible by the number of shards, and the shard is


Suggested change

artificially added when data set size is not equally divisible by the number of shards, and the shard is

artificially added when the data set size is not equally divisible by the number of shards, and the shard is

jantonguirao · 2020-01-02T09:27:35Z

qa/setup_dali_extra.sh

@@ -9,7 +9,10 @@ DALI_EXTRA_VERSION_PATH="${DIR}/../DALI_EXTRA_VERSION"
 read -r DALI_EXTRA_VERSION < ${DALI_EXTRA_VERSION_PATH}
 echo "Using DALI_EXTRA_VERSION = ${DALI_EXTRA_VERSION}"
 if [ ! -d "$DALI_EXTRA_PATH" ] ; then
-    git clone "$DALI_EXTRA_URL" "$DALI_EXTRA_PATH"
+    git clone https://github.com/JanuszL/DALI_extra.git "$DALI_EXTRA_PATH"


needs to be reverted

jantonguirao · 2020-01-02T09:27:45Z

DALI_EXTRA_VERSION

@@ -1 +1 @@
-d61722e9fa6df5379cba68941e3f94bff9814def
+e05294a49bea2d0d0da516955eef0bc476c92ae2


needs to be bumped up

jantonguirao · 2020-01-02T09:28:46Z

dali/operators/reader/loader/loader.cc


 size_t start_index(const size_t shard_id,
                   const size_t shard_num,
                   const size_t size) {
  return size * shard_id / shard_num;
 }

+Index num_samples(const size_t shard_num,
+                  const size_t size) {
+  return static_cast<size_t>(std::ceil(size * 1.0 / shard_num));


Suggested change

return static_cast<size_t>(std::ceil(size * 1.0 / shard_num));

return static_cast<Index>(std::ceil(size * 1.0 / shard_num));

jantonguirao · 2020-01-02T09:33:01Z

dali/operators/reader/loader/loader.h

+    if (!loading_flag_) {
+      PrepareMetadata();
+    }
+    if (!pad_last_batch_) {


nitpick: I think it'd read more natural if you rephrase the logi as

if (pad_last_batch) { // .. handle special case } else { return SizeImpl(); }

jantonguirao · 2020-01-02T09:33:37Z

dali/operators/reader/loader/loader.h

@@ -317,8 +338,9 @@ class Loader {
  std::once_flag fetch_cache_;
  std::shared_ptr<ImageCache> cache_;

-  // Counts how many samples reader have read already from this and next epoch
+  // Counts how many samples reader have read already from this epoch


Suggested change

// Counts how many samples reader have read already from this epoch

// Counts how many samples the reader have read already from this epoch

jantonguirao · 2020-01-02T09:34:37Z

dali/operators/reader/loader/loader.h

  Index read_sample_counter_;
+  Index returned_sample_counter_;


Can you describe this member variable as well?

jantonguirao · 2020-01-02T09:39:46Z

dali/test/python/test_operator_reader_shuffling.py

-            val = np.concatenate(pipe.outputs()[0].as_array())
+                            yield check, data_set, num_gpus, batch_size, stick_to_shard, shuffle_after_epoch, dry_run_num, len(ref_img_ids)
+
+def check(data_set, num_gpus, batch_size, stick_to_shard, shuffle_after_epoch, dry_run_num, len_ref_img_ids):


Suggested change

def check(data_set, num_gpus, batch_size, stick_to_shard, shuffle_after_epoch, dry_run_num, len_ref_img_ids):

def check_shuffling_patterns(data_set, num_gpus, batch_size, stick_to_shard, shuffle_after_epoch, dry_run_num, len_ref_img_ids):

jantonguirao · 2020-01-02T09:42:54Z

dali/operators/reader/loader/loader.h

@@ -146,11 +150,14 @@ class Loader {

    int samples_to_choose_from = initial_buffer_fill_;
    if (shards_.front().start == shards_.front().end) {
-      if (!is_new_epoch && pad_last_batch_) {
+      if ((returned_sample_counter_  < num_samples(num_shards_, Size()) || !is_new_epoch) &&


I don't fully understand the logic behind this. Can you add couple of sentences as a comment here?

JanuszL · 2020-01-03T00:20:56Z

!build

dali-automaton · 2020-01-03T00:25:12Z

CI MESSAGE: [1055499]: BUILD STARTED

dali-automaton · 2020-01-03T00:26:49Z

CI MESSAGE: [1055499]: BUILD FAILED

- updates examples using COCO data set adjusting it to new data with meaningful bboxes and segmentation data - fixes problem with wrong cloning of the last sample in the batch when pad_last_batch is enabled and the reader needs to stick to the shard - makes the reader pad the whole batch when the number of batches differs between shards following the PyTorch behavior - each shard size is calculated as int(ceil(data_size/no_shards))*no_shards - adjust test_operator_reader_shuffling.py to work when the readers for different GPUs have a different number of iterations to make - data_set_size not divisible by the number of requested shards Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

JanuszL · 2020-01-03T09:14:29Z

!build

dali-automaton · 2020-01-03T09:20:26Z

CI MESSAGE: [1055995]: BUILD STARTED

dali-automaton · 2020-01-03T10:27:30Z

CI MESSAGE: [1055995]: BUILD FAILED

dali-automaton · 2020-01-03T22:14:39Z

CI MESSAGE: [1056700]: BUILD STARTED

dali-automaton · 2020-01-03T23:18:49Z

CI MESSAGE: [1056700]: BUILD PASSED

dali-automaton · 2020-01-09T11:07:05Z

CI MESSAGE: [1064194]: BUILD STARTED

dali-automaton · 2020-01-09T11:47:12Z

CI MESSAGE: [1064194]: BUILD FAILED

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

dali-automaton · 2020-01-09T11:51:40Z

CI MESSAGE: [1064216]: BUILD STARTED

dali-automaton · 2020-01-09T12:44:23Z

CI MESSAGE: [1064216]: BUILD PASSED

JanuszL requested review from jantonguirao, klecki and awolant December 9, 2019 17:26

JanuszL changed the title ~~Update examples with COCO data set~~ Update examples with COCO data set and fix tests Dec 10, 2019

awolant approved these changes Dec 12, 2019

View reviewed changes

JanuszL changed the title ~~Update examples with COCO data set and fix tests~~ Update examples with COCO data set and fix reader behavior for padding Dec 17, 2019

jantonguirao reviewed Jan 2, 2020

View reviewed changes

jantonguirao approved these changes Jan 3, 2020

View reviewed changes

Update DALI_EXTRA_VERSION

599ae76

Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>

JanuszL merged commit 0d0cefa into NVIDIA:master Jan 9, 2020

JanuszL deleted the update_coco_ex branch January 9, 2020 12:49

JanuszL mentioned this pull request Feb 25, 2020

Epoch data not aligned when stick_to_shard=True #1762

Closed

	artificially added when data set size is not equally divisible by the number of shards, and the shard is
	artificially added when the data set size is not equally divisible by the number of shards, and the shard is

		@@ -1 +1 @@
		d61722e9fa6df5379cba68941e3f94bff9814def
		e05294a49bea2d0d0da516955eef0bc476c92ae2

	return static_cast<size_t>(std::ceil(size * 1.0 / shard_num));
	return static_cast<Index>(std::ceil(size * 1.0 / shard_num));

	// Counts how many samples reader have read already from this epoch
	// Counts how many samples the reader have read already from this epoch

	def check(data_set, num_gpus, batch_size, stick_to_shard, shuffle_after_epoch, dry_run_num, len_ref_img_ids):
	def check_shuffling_patterns(data_set, num_gpus, batch_size, stick_to_shard, shuffle_after_epoch, dry_run_num, len_ref_img_ids):

Update examples with COCO data set and fix reader behavior for padding #1557

Update examples with COCO data set and fix reader behavior for padding #1557

Conversation

JanuszL commented Dec 9, 2019 • edited Loading

Why we need this PR?

What happened in this PR?

dali-automaton commented Dec 9, 2019

dali-automaton commented Dec 9, 2019

dali-automaton commented Dec 10, 2019

dali-automaton commented Dec 11, 2019

dali-automaton commented Dec 11, 2019

dali-automaton commented Dec 11, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dali-automaton commented Dec 12, 2019

dali-automaton commented Dec 12, 2019

dali-automaton commented Dec 13, 2019

dali-automaton commented Dec 13, 2019

dali-automaton commented Dec 14, 2019

dali-automaton commented Dec 14, 2019

dali-automaton commented Dec 14, 2019

dali-automaton commented Dec 14, 2019

dali-automaton commented Dec 15, 2019

dali-automaton commented Dec 15, 2019

dali-automaton commented Dec 15, 2019

dali-automaton commented Dec 15, 2019

dali-automaton commented Dec 17, 2019

dali-automaton commented Dec 17, 2019

dali-automaton commented Dec 19, 2019

dali-automaton commented Dec 19, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JanuszL commented Jan 3, 2020

dali-automaton commented Jan 3, 2020

dali-automaton commented Jan 3, 2020

JanuszL commented Jan 3, 2020

dali-automaton commented Jan 3, 2020

dali-automaton commented Jan 3, 2020

dali-automaton commented Jan 3, 2020

dali-automaton commented Jan 3, 2020

dali-automaton commented Jan 9, 2020

dali-automaton commented Jan 9, 2020

dali-automaton commented Jan 9, 2020

dali-automaton commented Jan 9, 2020

JanuszL commented Dec 9, 2019 •

edited

Loading