-
Notifications
You must be signed in to change notification settings - Fork 621
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update examples with COCO data set and fix reader behavior for padding #1557
Conversation
CI MESSAGE: [1024363]: BUILD STARTED |
CI MESSAGE: [1024363]: BUILD FAILED |
CI MESSAGE: [1026525]: BUILD STARTED |
CI MESSAGE: [1026525]: BUILD FAILED |
CI MESSAGE: [1027232]: BUILD STARTED |
CI MESSAGE: [1027232]: BUILD FAILED |
iterate_over = gpus_arg | ||
img_ids_list = [[] for _ in pipes] | ||
|
||
# each GPU needst to iterate from `shard_id * data_size / num_gpus` samples to `(shard_id + 1)* data_size / num_gpus` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo, I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
CI MESSAGE: [1029894]: BUILD STARTED |
CI MESSAGE: [1029894]: BUILD FAILED |
CI MESSAGE: [1032081]: BUILD STARTED |
CI MESSAGE: [1032081]: BUILD FAILED |
CI MESSAGE: [1032771]: BUILD STARTED |
CI MESSAGE: [1032771]: BUILD FAILED |
CI MESSAGE: [1033593]: BUILD STARTED |
CI MESSAGE: [1033593]: BUILD FAILED |
CI MESSAGE: [1033642]: BUILD STARTED |
CI MESSAGE: [1033642]: BUILD FAILED |
CI MESSAGE: [1033933]: BUILD STARTED |
CI MESSAGE: [1033933]: BUILD PASSED |
CI MESSAGE: [1036979]: BUILD STARTED |
CI MESSAGE: [1036979]: BUILD FAILED |
CI MESSAGE: [1039438]: BUILD STARTED |
CI MESSAGE: [1039438]: BUILD PASSED |
with the shard size.)code", false); | ||
R"code(If set to true, the Loader will pad the last batch with the last image when the batch size is | ||
not aligned with the shard size. It means that the remainder of the batch or even the whole batch can be | ||
artificially added when data set size is not equally divisible by the number of shards, and the shard is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
artificially added when data set size is not equally divisible by the number of shards, and the shard is | |
artificially added when the data set size is not equally divisible by the number of shards, and the shard is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
qa/setup_dali_extra.sh
Outdated
@@ -9,7 +9,10 @@ DALI_EXTRA_VERSION_PATH="${DIR}/../DALI_EXTRA_VERSION" | |||
read -r DALI_EXTRA_VERSION < ${DALI_EXTRA_VERSION_PATH} | |||
echo "Using DALI_EXTRA_VERSION = ${DALI_EXTRA_VERSION}" | |||
if [ ! -d "$DALI_EXTRA_PATH" ] ; then | |||
git clone "$DALI_EXTRA_URL" "$DALI_EXTRA_PATH" | |||
git clone https://github.com/JanuszL/DALI_extra.git "$DALI_EXTRA_PATH" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs to be reverted
DALI_EXTRA_VERSION
Outdated
@@ -1 +1 @@ | |||
d61722e9fa6df5379cba68941e3f94bff9814def | |||
e05294a49bea2d0d0da516955eef0bc476c92ae2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
needs to be bumped up
|
||
size_t start_index(const size_t shard_id, | ||
const size_t shard_num, | ||
const size_t size) { | ||
return size * shard_id / shard_num; | ||
} | ||
|
||
Index num_samples(const size_t shard_num, | ||
const size_t size) { | ||
return static_cast<size_t>(std::ceil(size * 1.0 / shard_num)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return static_cast<size_t>(std::ceil(size * 1.0 / shard_num)); | |
return static_cast<Index>(std::ceil(size * 1.0 / shard_num)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
if (!loading_flag_) { | ||
PrepareMetadata(); | ||
} | ||
if (!pad_last_batch_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: I think it'd read more natural if you rephrase the logi as
if (pad_last_batch) {
// .. handle special case
} else {
return SizeImpl();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -317,8 +338,9 @@ class Loader { | |||
std::once_flag fetch_cache_; | |||
std::shared_ptr<ImageCache> cache_; | |||
|
|||
// Counts how many samples reader have read already from this and next epoch | |||
// Counts how many samples reader have read already from this epoch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Counts how many samples reader have read already from this epoch | |
// Counts how many samples the reader have read already from this epoch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Index read_sample_counter_; | ||
Index returned_sample_counter_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you describe this member variable as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
val = np.concatenate(pipe.outputs()[0].as_array()) | ||
yield check, data_set, num_gpus, batch_size, stick_to_shard, shuffle_after_epoch, dry_run_num, len(ref_img_ids) | ||
|
||
def check(data_set, num_gpus, batch_size, stick_to_shard, shuffle_after_epoch, dry_run_num, len_ref_img_ids): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def check(data_set, num_gpus, batch_size, stick_to_shard, shuffle_after_epoch, dry_run_num, len_ref_img_ids): | |
def check_shuffling_patterns(data_set, num_gpus, batch_size, stick_to_shard, shuffle_after_epoch, dry_run_num, len_ref_img_ids): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -146,11 +150,14 @@ class Loader { | |||
|
|||
int samples_to_choose_from = initial_buffer_fill_; | |||
if (shards_.front().start == shards_.front().end) { | |||
if (!is_new_epoch && pad_last_batch_) { | |||
if ((returned_sample_counter_ < num_samples(num_shards_, Size()) || !is_new_epoch) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully understand the logic behind this. Can you add couple of sentences as a comment here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
!build |
CI MESSAGE: [1055499]: BUILD STARTED |
CI MESSAGE: [1055499]: BUILD FAILED |
- updates examples using COCO data set adjusting it to new data with meaningful bboxes and segmentation data - fixes problem with wrong cloning of the last sample in the batch when pad_last_batch is enabled and the reader needs to stick to the shard - makes the reader pad the whole batch when the number of batches differs between shards following the PyTorch behavior - each shard size is calculated as int(ceil(data_size/no_shards))*no_shards - adjust test_operator_reader_shuffling.py to work when the readers for different GPUs have a different number of iterations to make - data_set_size not divisible by the number of requested shards Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
!build |
CI MESSAGE: [1055995]: BUILD STARTED |
CI MESSAGE: [1055995]: BUILD FAILED |
CI MESSAGE: [1056700]: BUILD STARTED |
CI MESSAGE: [1056700]: BUILD PASSED |
CI MESSAGE: [1064194]: BUILD STARTED |
CI MESSAGE: [1064194]: BUILD FAILED |
Signed-off-by: Janusz Lisiecki <jlisiecki@nvidia.com>
CI MESSAGE: [1064216]: BUILD STARTED |
CI MESSAGE: [1064216]: BUILD PASSED |
Signed-off-by: Janusz Lisiecki jlisiecki@nvidia.com
Why we need this PR?
What happened in this PR?
pad_last_batch
option description is updatedJIRA TASK: [NA]