Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse global order reader: refactor merge algorithm. #3173

Merged
merged 3 commits into from May 17, 2022

Conversation

KiterLuc
Copy link
Contributor

While running tests with real world array, I noticed that the
optimization trying to find the the length of a cell slab once a cell
to be merged was found didn't work. Trying to use a binary search across
the whole tile resulted in more comparisons than a linear search since
the cell slab lengths are never that long. Also, next_cell in
ResultCoords didn't use the bitmap, which caused some issues, so
ResultCoords was split into two classes, ResultCoords and
GlobalOrderResultCoords (which has access to the bitmap). This also
allowed to push some of the logic of the merge down into that class to
simplify graetly the merge function.


TYPE: IMPROVEMENT
DESC: Sparse global order reader: refactor merge algorithm.

While running tests with real world array, I noticed that the
optimization trying to find the the length of a cell slab once a cell
to be merged was found didn't work. Trying to use a binary search across
the whole tile resulted in more comparisons than a linear search since
the cell slab lengths are never that long. Also, next_cell in
ResultCoords didn't use the bitmap, which caused some issues, so
ResultCoords was split into two classes, ResultCoords and
GlobalOrderResultCoords (which has access to the bitmap). This also
allowed to push some of the logic of the merge down into that class to
simplify graetly the merge function.

---
TYPE: IMPROVEMENT
DESC: Sparse global order reader: refactor merge algorithm.
Copy link
Contributor

@ypatia ypatia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the refactoring and the fixes! Readability has improved a lot!

A few comments to verify my understanding/correctness and a few about adding UTs for those nice testable new classes you have added.

test/src/unit-sparse-global-order-reader.cc Outdated Show resolved Hide resolved
tiledb/sm/query/result_coords.h Show resolved Hide resolved
tiledb/sm/query/result_coords.h Outdated Show resolved Hide resolved
tiledb/sm/query/result_coords.h Show resolved Hide resolved
tiledb/sm/query/result_coords.h Show resolved Hide resolved
tiledb/sm/query/result_tile.h Outdated Show resolved Hide resolved
std::vector<BitmapType> bitmap_;

/** Number of cells in this bitmap. */
uint64_t bitmap_result_num_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make those attributes private and access/update through member functions so that we can actually follow/test the behavior of that class?
Same for GlobalOrderResultTile

When this is done, we should also consider adding UTs, again there are methods that are safer to validate by UT than by reading.

Copy link
Contributor Author

@KiterLuc KiterLuc May 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some, but for ResultTileWithBitmap, it would be a change too big for a patch release so I filed the following: https://app.shortcut.com/tiledb-inc/story/17815/make-resulttilewithbtimap-more-opaque

Copy link
Contributor Author

@KiterLuc KiterLuc May 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not that some unit tests are already present in unit-result-tile.cc.

tiledb/sm/query/sparse_global_order_reader.cc Outdated Show resolved Hide resolved
tiledb/sm/query/sparse_global_order_reader.cc Outdated Show resolved Hide resolved
tiledb/sm/query/sparse_global_order_reader.cc Outdated Show resolved Hide resolved
@KiterLuc KiterLuc force-pushed the lr/sparse-global-order-next-fix branch from 41c3a0a to 21c3d7d Compare May 16, 2022 14:13
@KiterLuc KiterLuc force-pushed the lr/sparse-global-order-next-fix branch from 21c3d7d to 10182eb Compare May 16, 2022 14:15
@KiterLuc KiterLuc merged commit 1b9a20e into dev May 17, 2022
@KiterLuc KiterLuc deleted the lr/sparse-global-order-next-fix branch May 17, 2022 10:20
@github-actions
Copy link
Contributor

The backport to release-2.9 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-release-2.9 release-2.9
# Navigate to the new working tree
cd .worktrees/backport-release-2.9
# Create a new branch
git switch --create backport-3173-to-release-2.9
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick --mainline 1 1b9a20e85416f7c8fd8286d2352306b652a91fda
# Push it to GitHub
git push --set-upstream origin backport-3173-to-release-2.9
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-release-2.9

Then, create a pull request where the base branch is release-2.9 and the compare/head branch is backport-3173-to-release-2.9.

KiterLuc added a commit that referenced this pull request May 17, 2022
* Sparse global order reader: refactor merge algorithm.

While running tests with real world array, I noticed that the
optimization trying to find the the length of a cell slab once a cell
to be merged was found didn't work. Trying to use a binary search across
the whole tile resulted in more comparisons than a linear search since
the cell slab lengths are never that long. Also, next_cell in
ResultCoords didn't use the bitmap, which caused some issues, so
ResultCoords was split into two classes, ResultCoords and
GlobalOrderResultCoords (which has access to the bitmap). This also
allowed to push some of the logic of the merge down into that class to
simplify graetly the merge function.

---
TYPE: IMPROVEMENT
DESC: Sparse global order reader: refactor merge algorithm.

* Addressing feedback from @ypatia.

* Addressing feedback from @ypatia, part 2.
ihnorton pushed a commit that referenced this pull request May 17, 2022
* Sparse global order reader: refactor merge algorithm.

While running tests with real world array, I noticed that the
optimization trying to find the the length of a cell slab once a cell
to be merged was found didn't work. Trying to use a binary search across
the whole tile resulted in more comparisons than a linear search since
the cell slab lengths are never that long. Also, next_cell in
ResultCoords didn't use the bitmap, which caused some issues, so
ResultCoords was split into two classes, ResultCoords and
GlobalOrderResultCoords (which has access to the bitmap). This also
allowed to push some of the logic of the merge down into that class to
simplify graetly the merge function.

---
TYPE: IMPROVEMENT
DESC: Sparse global order reader: refactor merge algorithm.

* Addressing feedback from ypatia.

* Addressing feedback from ypatia, part 2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants