Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-14211: [C++][Compute] Fixing thread sanitizer problems in hash join node #11350

Closed

Conversation

michalursa
Copy link
Contributor

Fixing 3 issues:

  • one in SchemaProjectionMaps - I simplified all of the code to get rid of thread synchronization at all
  • one in TaskScheduler - added (unnecessary) mutex
  • one in HashJoinImpl - switching from shared byte vector to local bit vectors and merge (for recording if a match for a hash table row has been seen)

@michalursa
Copy link
Contributor Author

@github-actions crossbow submit -g cpp

@github-actions
Copy link

github-actions bot commented Oct 7, 2021

@github-actions
Copy link

github-actions bot commented Oct 7, 2021

Revision: 16ded42

Submitted crossbow builds: ursacomputing/crossbow @ actions-902

Task Status
test-conda-cpp Github Actions
test-conda-cpp-valgrind Azure
test-debian-11-cpp Github Actions
test-fedora-33-cpp Github Actions
test-ubuntu-18.04-cpp Github Actions
test-ubuntu-18.04-cpp-release Github Actions
test-ubuntu-18.04-cpp-static Github Actions
test-ubuntu-20.04-cpp Github Actions
test-ubuntu-20.04-cpp-14 Github Actions
test-ubuntu-20.04-cpp-17 Github Actions
test-ubuntu-20.04-cpp-thread-sanitizer Github Actions

@michalursa
Copy link
Contributor Author

@github-actions crossbow submit -g cpp

@github-actions
Copy link

github-actions bot commented Oct 7, 2021

Revision: 9682388

Submitted crossbow builds: ursacomputing/crossbow @ actions-903

Task Status
test-conda-cpp Github Actions
test-conda-cpp-valgrind Azure
test-debian-11-cpp Github Actions
test-fedora-33-cpp Github Actions
test-ubuntu-18.04-cpp Github Actions
test-ubuntu-18.04-cpp-release Github Actions
test-ubuntu-18.04-cpp-static Github Actions
test-ubuntu-20.04-cpp Github Actions
test-ubuntu-20.04-cpp-14 Github Actions
test-ubuntu-20.04-cpp-17 Github Actions
test-ubuntu-20.04-cpp-thread-sanitizer Github Actions

@pitrou pitrou requested a review from bkietz October 7, 2021 07:42
@michalursa michalursa changed the title ARROW-14197: [C++][Compute] Fixing thread sanitizer problems in hash join node ARROW-14211: [C++][Compute] Fixing thread sanitizer problems in hash join node Oct 7, 2021
@github-actions
Copy link

github-actions bot commented Oct 7, 2021

@michalursa
Copy link
Contributor Author

@github-actions crossbow submit -g cpp

@github-actions
Copy link

Revision: 679309b

Submitted crossbow builds: ursacomputing/crossbow @ actions-908

Task Status
test-conda-cpp Github Actions
test-conda-cpp-valgrind Azure
test-debian-11-cpp Github Actions
test-fedora-33-cpp Github Actions
test-ubuntu-18.04-cpp Github Actions
test-ubuntu-18.04-cpp-release Github Actions
test-ubuntu-18.04-cpp-static Github Actions
test-ubuntu-20.04-cpp Github Actions
test-ubuntu-20.04-cpp-14 Github Actions
test-ubuntu-20.04-cpp-17 Github Actions
test-ubuntu-20.04-cpp-thread-sanitizer Github Actions

@michalursa
Copy link
Contributor Author

@github-actions crossbow submit -g cpp

@github-actions
Copy link

Revision: 3c32de9

Submitted crossbow builds: ursacomputing/crossbow @ actions-910

Task Status
test-conda-cpp Github Actions
test-conda-cpp-valgrind Azure
test-debian-11-cpp Github Actions
test-fedora-33-cpp Github Actions
test-ubuntu-18.04-cpp Github Actions
test-ubuntu-18.04-cpp-release Github Actions
test-ubuntu-18.04-cpp-static Github Actions
test-ubuntu-20.04-cpp Github Actions
test-ubuntu-20.04-cpp-14 Github Actions
test-ubuntu-20.04-cpp-17 Github Actions
test-ubuntu-20.04-cpp-thread-sanitizer Github Actions

@michalursa
Copy link
Contributor Author

@github-actions crossbow submit -g cpp

@github-actions
Copy link

Revision: f1dae79

Submitted crossbow builds: ursacomputing/crossbow @ actions-914

Task Status
test-conda-cpp Github Actions
test-conda-cpp-valgrind Azure
test-debian-11-cpp Github Actions
test-fedora-33-cpp Github Actions
test-ubuntu-18.04-cpp Github Actions
test-ubuntu-18.04-cpp-release Github Actions
test-ubuntu-18.04-cpp-static Github Actions
test-ubuntu-20.04-cpp Github Actions
test-ubuntu-20.04-cpp-14 Github Actions
test-ubuntu-20.04-cpp-17 Github Actions
test-ubuntu-20.04-cpp-thread-sanitizer Github Actions

@michalursa
Copy link
Contributor Author

@github-actions crossbow submit -g cpp

@github-actions
Copy link

Revision: 1deb369

Submitted crossbow builds: ursacomputing/crossbow @ actions-915

Task Status
test-conda-cpp Github Actions
test-conda-cpp-valgrind Azure
test-debian-11-cpp Github Actions
test-fedora-33-cpp Github Actions
test-ubuntu-18.04-cpp Github Actions
test-ubuntu-18.04-cpp-release Github Actions
test-ubuntu-18.04-cpp-static Github Actions
test-ubuntu-20.04-cpp Github Actions
test-ubuntu-20.04-cpp-14 Github Actions
test-ubuntu-20.04-cpp-17 Github Actions
test-ubuntu-20.04-cpp-thread-sanitizer Github Actions

@michalursa
Copy link
Contributor Author

@github-actions crossbow submit -g cpp

@github-actions
Copy link

Revision: 3d1ce0c

Submitted crossbow builds: ursacomputing/crossbow @ actions-917

Task Status
test-conda-cpp Github Actions
test-conda-cpp-valgrind Azure
test-debian-11-cpp Github Actions
test-fedora-33-cpp Github Actions
test-ubuntu-18.04-cpp Github Actions
test-ubuntu-18.04-cpp-release Github Actions
test-ubuntu-18.04-cpp-static Github Actions
test-ubuntu-20.04-cpp Github Actions
test-ubuntu-20.04-cpp-14 Github Actions
test-ubuntu-20.04-cpp-17 Github Actions
test-ubuntu-20.04-cpp-thread-sanitizer Github Actions

@michalursa michalursa force-pushed the ARROW-14211-hash-join-tsan branch 2 times, most recently from 37993cf to c94bd41 Compare October 12, 2021 22:39
@pitrou
Copy link
Member

pitrou commented Oct 14, 2021

@michalursa Is this ready for review? Ideally we would like to ship this fix in 6.0.0, which will hopefully be cut in a few days.

@github-actions
Copy link

Revision: c94bd41

Submitted crossbow builds: ursacomputing/crossbow @ actions-952

Task Status
test-conda-cpp Github Actions
test-conda-cpp-valgrind Azure
test-debian-11-cpp Github Actions
test-fedora-33-cpp Github Actions
test-ubuntu-18.04-cpp Github Actions
test-ubuntu-18.04-cpp-release Github Actions
test-ubuntu-18.04-cpp-static Github Actions
test-ubuntu-20.04-cpp Github Actions
test-ubuntu-20.04-cpp-14 Github Actions
test-ubuntu-20.04-cpp-17 Github Actions
test-ubuntu-20.04-cpp-thread-sanitizer Github Actions

@pitrou
Copy link
Member

pitrou commented Oct 14, 2021

Also, can you rebase from master to ensure that we get the latest CI fixes?

@michalursa
Copy link
Contributor Author

@pitrou I rebased. It is ready for review.

@pitrou
Copy link
Member

pitrou commented Oct 15, 2021

@github-actions crossbow submit -g cpp

@github-actions
Copy link

Revision: 9d5bcee

Submitted crossbow builds: ursacomputing/crossbow @ actions-966

Task Status
test-conda-cpp Github Actions
test-conda-cpp-valgrind Azure
test-debian-11-cpp Github Actions
test-fedora-33-cpp Github Actions
test-ubuntu-18.04-cpp Github Actions
test-ubuntu-18.04-cpp-release Github Actions
test-ubuntu-18.04-cpp-static Github Actions
test-ubuntu-20.04-cpp Github Actions
test-ubuntu-20.04-cpp-14 Github Actions
test-ubuntu-20.04-cpp-17 Github Actions
test-ubuntu-20.04-cpp-thread-sanitizer Github Actions

@pitrou
Copy link
Member

pitrou commented Oct 15, 2021

@github-actions crossbow submit -g r

@pitrou
Copy link
Member

pitrou commented Oct 15, 2021

@bkietz Are you available to give a quick review on this? If CI is ok, it would be nice to have this fix in 6.0.0.

@github-actions
Copy link

Revision: 9d5bcee

Submitted crossbow builds: ursacomputing/crossbow @ actions-967

Task Status
conda-linux-gcc-py36-cpu-r40 Azure
conda-linux-gcc-py37-cpu-r41 Azure
conda-osx-clang-py36-r40 Azure
conda-osx-clang-py37-r41 Azure
conda-win-vs2017-py36-r40 Azure
conda-win-vs2017-py37-r41 Azure
homebrew-r-autobrew Github Actions
test-r-arrow-backwards-compatibility Github Actions
test-r-depsource-auto Azure
test-r-depsource-system Github Actions
test-r-devdocs Github Actions
test-r-gcc-11 Github Actions
test-r-install-local Github Actions
test-r-linux-as-cran Github Actions
test-r-linux-rchk Github Actions
test-r-linux-valgrind Azure
test-r-minimal-build Azure
test-r-offline-maximal Github Actions
test-r-offline-minimal Azure
test-r-rhub-debian-gcc-devel-lto-latest Azure
test-r-rhub-ubuntu-gcc-release-latest Azure
test-r-rocker-r-base-latest Azure
test-r-rstudio-r-base-3.6-bionic Azure
test-r-rstudio-r-base-3.6-centos7-devtoolset-8 Azure
test-r-rstudio-r-base-3.6-centos8 Azure
test-r-rstudio-r-base-3.6-opensuse15 Azure
test-r-rstudio-r-base-3.6-opensuse42 Azure
test-r-ubuntu-21.04 Github Actions
test-r-versions Github Actions
test-ubuntu-18.04-r-sanitizer Azure

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one nit, LGTM

cpp/src/arrow/compute/exec/hash_join.cc Outdated Show resolved Hide resolved
@kou
Copy link
Member

kou commented Oct 18, 2021

@github-actions crossbow submit -g cpp -g r

@github-actions
Copy link

Revision: 9fd1c02

Submitted crossbow builds: ursacomputing/crossbow @ actions-994

Task Status
conda-linux-gcc-py36-cpu-r40 Azure
conda-linux-gcc-py37-cpu-r41 Azure
conda-osx-clang-py36-r40 Azure
conda-osx-clang-py37-r41 Azure
conda-win-vs2017-py36-r40 Azure
conda-win-vs2017-py37-r41 Azure
homebrew-r-autobrew Github Actions
test-conda-cpp Github Actions
test-conda-cpp-valgrind Azure
test-debian-11-cpp Github Actions
test-fedora-33-cpp Github Actions
test-r-arrow-backwards-compatibility Github Actions
test-r-depsource-auto Azure
test-r-depsource-system Github Actions
test-r-devdocs Github Actions
test-r-gcc-11 Github Actions
test-r-install-local Github Actions
test-r-linux-as-cran Github Actions
test-r-linux-rchk Github Actions
test-r-linux-valgrind Azure
test-r-minimal-build Azure
test-r-offline-maximal Github Actions
test-r-offline-minimal Azure
test-r-rhub-debian-gcc-devel-lto-latest Azure
test-r-rhub-ubuntu-gcc-release-latest Azure
test-r-rocker-r-base-latest Azure
test-r-rstudio-r-base-3.6-bionic Azure
test-r-rstudio-r-base-3.6-centos7-devtoolset-8 Azure
test-r-rstudio-r-base-3.6-centos8 Azure
test-r-rstudio-r-base-3.6-opensuse15 Azure
test-r-rstudio-r-base-3.6-opensuse42 Azure
test-r-ubuntu-21.04 Github Actions
test-r-versions Github Actions
test-ubuntu-18.04-cpp Github Actions
test-ubuntu-18.04-cpp-release Github Actions
test-ubuntu-18.04-cpp-static Github Actions
test-ubuntu-18.04-r-sanitizer Azure
test-ubuntu-20.04-cpp Github Actions
test-ubuntu-20.04-cpp-14 Github Actions
test-ubuntu-20.04-cpp-17 Github Actions
test-ubuntu-20.04-cpp-thread-sanitizer Github Actions

@kou
Copy link
Member

kou commented Oct 19, 2021

@michalursa Can we merge this?

@michalursa
Copy link
Contributor Author

@michalursa Can we merge this?

@kou Yes, I believe so. This is a relatively simple change (more precisely 3 simple fixes put together) and I don't see any errors that could be related.

@kou
Copy link
Member

kou commented Oct 19, 2021

OK.
I merge this.

@kou kou closed this in d9ef519 Oct 19, 2021
@ursabot
Copy link

ursabot commented Oct 19, 2021

Benchmark runs are scheduled for baseline = 6e1293b and contender = d9ef519. d9ef519 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.51% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.36% ⬆️0.09%] ursa-thinkcentre-m75q
Supported benchmarks:
ursa-i9-9960x: langs = Python, R, JavaScript
ursa-thinkcentre-m75q: langs = C++, Java
ec2-t3-xlarge-us-east-2: cloud = True

lidavidm pushed a commit that referenced this pull request Oct 19, 2021
#11350 moved ThreadIndexer from `aggregate_node.cc` to `compute/exec/util.h` to allow usage from the join node and incremented its capacity, but did not remove the original implementation.

Closes #11463 from bkietz/remove-duplicated-ThreadIndexer

Authored-by: Benjamin Kietzman <bengilgit@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
ViniciusSouzaRoque pushed a commit to s1mbi0se/arrow that referenced this pull request Nov 3, 2021
apache#11350 moved ThreadIndexer from `aggregate_node.cc` to `compute/exec/util.h` to allow usage from the join node and incremented its capacity, but did not remove the original implementation.

Closes apache#11463 from bkietz/remove-duplicated-ThreadIndexer

Authored-by: Benjamin Kietzman <bengilgit@gmail.com>
Signed-off-by: David Li <li.davidm96@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants