New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reader: treating empty string range as expected. #2883
Conversation
In a lot of places in the code, an empry string range means default. This leads to a lot of confusion as the user can also pass in empty string ranges and an empty string is a valid coordinate. This change removes the assumption that empty strings are default in a few functions in the dimension class and relies only on std::string_view compare. They are: - Dimension::covered<char> - Dimension::overlap<char> - Dimension::overlap_ratio<char> - Dimension::relevant_ranges<char> - Dimension::covered_vec<char> To fix the default range behavior for strings, two functions needed to be adjusted at the subarray level: compute_relevant_fragments and compute_relevant_fragment_tile_overlap. For the first one, the fix was trivial, simply default the relevant fragment bitmap to 1 for default ranges and skip the computation. For the tile overlap however, it meant that the is_default values need to be passed in through the rtree to the domain as the overlap ratio computation for all dimensions is done at that level. --- TYPE: IMPROVEMENT DESC: Reader: treating empty string range as expected.
This pull request has been linked to Shortcut Story #14611: Unordered reader missing data. |
tiledb/sm/array_schema/dimension.cc
Outdated
const auto r1_after_r2 = r1_start > mbr_end; | ||
const auto mbr_after_r1 = mbr_start > r1_end; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(seems prior to your changes)
curious naming, r1, r2, when there is essentially only 'r', and no r2 anywhere in routine I can see...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some questions/observations that came to mind while looking at the changes.
if (r1_default[d]) | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Appears this will now exclude a 'default' range from consideration of any overlap and in calculating ratio,
is that desireable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in this case, if it's default, it means that we include everything in this dimension, which means overlap ration should be 1, which would leave ratio unaltered.
std::vector<bool> is_default(subarray.size(), false); | ||
auto tile_overlap = rtree_.get_tile_overlap(subarray, is_default); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there certainty here that the NDRange& subarray is entirely NON-default, and/or does it not matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is certainty. This all gets called from Array::get_max_buffer_size, and there is a check in there to not allow var size dimensions. For fixed size dimensions, even if the dimension might be default, there is still values in there representing the non empty domain so we are good.
In a lot of places in the code, an empry string range means default.
This leads to a lot of confusion as the user can also pass in empty
string ranges and an empty string is a valid coordinate. This change
removes the assumption that empty strings are default in a few
functions in the dimension class and relies only on std::string_view
compare. They are:
To fix the default range behavior for strings, two functions needed to
be adjusted at the subarray level: compute_relevant_fragments and
compute_relevant_fragment_tile_overlap. For the first one, the fix was
trivial, simply default the relevant fragment bitmap to 1 for default
ranges and skip the computation. For the tile overlap however, it meant
that the is_default values need to be passed in through the rtree to
the domain as the overlap ratio computation for all dimensions is done
at that level.
TYPE: IMPROVEMENT
DESC: Reader: treating empty string range as expected.