Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Thread Sanitizer #1824

Conversation

ibrhmkuru
Copy link
Contributor

@ibrhmkuru ibrhmkuru commented Dec 13, 2022

Signed-off-by: Ibrahim Kuru ibrahim.kuru@apex.ai

Pre-Review Checklist for the PR Author

  1. Code follows the coding style of CONTRIBUTING.md
  2. Tests follow the best practice for testing
  3. Changelog updated in the unreleased section including API breaking changes
  4. Branch follows the naming format (iox-123-this-is-a-branch)
  5. Commits messages are according to this guideline
  6. Update the PR title
    • Follow the same conventions as for commit messages
    • Link to the relevant issue
  7. Relevant issues are linked
  8. Add sensible notes for the reviewer
  9. All checks have passed (except task-list-completed)
  10. All touched (C/C++) source code files from iceoryx_hoofs are added to ./clang-tidy-diff-scans.txt
  11. Assign PR to reviewer

Notes for Reviewer

What's been done :

  • Thread Sanitizer added only for ubuntu-clang (not for macos, nor for gcc. They have been tried, but it takes too much time to run them)
  • Thread Sanitizer is a separate job. (separated from address-sanitizer to make it run parallel and finish faster)
  • Current warnings are suppressed through a suppression file.
  • Thread Sanitizer also calls run_tests.sh with continue-on-error. So, it will not exit immediately after first test failure, but run all the tests even it fails. This way we'll be able to see total number of thread sanitizer warnings.
  • Follow-up task for investigating the warnings is at: this issue

Checklist for the PR Reviewer

  • Commits are properly organized and messages are according to the guideline
  • Code according to our coding style and naming conventions
  • Unit tests have been written for new behavior
  • Public API changes are documented via doxygen
  • Copyright owner are updated in the changed files
  • All touched (C/C++) source code files from iceoryx_hoofs have been added to ./clang-tidy-diff-scans.txt
  • PR title describes the changes

Post-review Checklist for the PR Author

  1. All open points are addressed and tracked via issues

References

@codecov
Copy link

codecov bot commented Dec 13, 2022

Codecov Report

Merging #1824 (9f96934) into master (6765566) will decrease coverage by 0.22%.
The diff coverage is n/a.

❗ Current head 9f96934 differs from pull request most recent head f8e0f74. Consider uploading reports for the commit f8e0f74 to get more accurate results

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1824      +/-   ##
==========================================
- Coverage   75.68%   75.46%   -0.22%     
==========================================
  Files         375      375              
  Lines       14565    14594      +29     
  Branches     2067     2070       +3     
==========================================
- Hits        11024    11014      -10     
- Misses       2910     2950      +40     
+ Partials      631      630       -1     
Flag Coverage Δ
unittests 75.13% <ø> (-0.22%) ⬇️
unittests_timing 15.30% <ø> (-0.07%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
.../posix_wrapper/shared_memory_object/memory_map.cpp 33.33% <0.00%> (-5.06%) ⬇️
iceoryx_hoofs/source/cxx/requires.cpp 33.33% <0.00%> (-3.04%) ⬇️
...eoryx_posh/source/roudi/memory/memory_provider.cpp 92.47% <0.00%> (-1.98%) ⬇️
iceoryx_dust/source/posix_wrapper/named_pipe.cpp 51.53% <0.00%> (-0.58%) ⬇️
...posh/source/gateway/toml_gateway_config_parser.cpp 90.38% <0.00%> (-0.53%) ⬇️
...h/source/roudi/roudi_config_toml_file_provider.cpp 78.46% <0.00%> (-0.33%) ⬇️
iceoryx_hoofs/source/posix_wrapper/file_lock.cpp 39.58% <0.00%> (-0.15%) ⬇️
...e/iceoryx_hoofs/internal/cxx/storable_function.inl 94.87% <0.00%> (-0.13%) ⬇️
...ofs/include/iceoryx_hoofs/internal/cxx/convert.inl 82.38% <0.00%> (-0.12%) ⬇️
..._hoofs/include/iceoryx_hoofs/internal/cxx/list.inl 90.80% <0.00%> (-0.11%) ⬇️
... and 61 more

@mossmaurice mossmaurice added the tooling All iceoryx related tooling (scripts etc.) label Dec 14, 2022
@dkroenke dkroenke self-requested a review December 14, 2022 17:37
- name: Run Address Sanitizer
run: ./tools/ci/build-test-macos-with-sanitizers.sh asan
- name: Run Thread Sanitizer
run: ./tools/ci/build-test-macos-with-sanitizers.sh tsan
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ibrhmkuru The current OS for Mac is using macOS 11.7 with Clang 13 as default toolchain.

On MacOS 12 we have Clang 14 installed: https://github.com/actions/runner-images/blob/main/images/macos/macos-12-Readme.md

Would you test that on MacOS 12? You just need to change this statement above: runs-on: macos-latest to runs-on: macos-12. And let's see about the differences.

@elBoberido
Copy link
Member

@ibrhmkuru I did not yet look at the changes but would it be possible to split the asan/ubsan and tsan CI jobs? They can run in parallel and decrease the CI time. The clang based job runs extremely long on ubuntu. Have you done some research about what could be the reason.

@elfenpiff since we might end up with additional three CI jobs, do you think it would be sufficient to run only one tsan job, e.g. clang on ubuntu?

cc @dkroenke

@ibrhmkuru
Copy link
Contributor Author

ibrhmkuru commented Jan 10, 2023

@elBoberido We can split the asan and tsan jobs.

For long run time, I even saw more than 5 hours of run time (see : https://github.com/eclipse-iceoryx/iceoryx/actions/runs/3786509420) This job cancelled after time out of 5 hours expired :) This happened only for gcc; the same tests run for ~1.5 hour for clang; but 5 hours was not enough for gcc.

We realized that the test FindInMaximumMixedServices is the main cause why it run so long. This test runs multiple times and one run can be long as ~30 minutes. (see: thread_sanitizer.log )

For thread sanitizer job, we can filter this test but I don't know how important this test and if it makes sense to filter this test for thread sanitizer. If you confirm I can filter it and I can split asan and tsan jobs and run thread sanitizer only with clang.

cc @dkroenke @elfenpiff

@elBoberido
Copy link
Member

@ibrhmkuru I think splitting the jobs to asan and tsan makes sense. Same with skipping the test that causes the long run time. Regarding running only the clang based toolchain, @elfenpiff do you agree to run only clang on ubuntu for now?

cc @dkroenke

@elfenpiff
Copy link
Contributor

do you agree to run only clang on ubuntu for now?

@elBoberido yes no problem. The only restriction I would have is that it runs on the newest clang compiler.

Signed-off-by: Ibrahim Kuru <ibrahim.kuru@apex.ai>
@ibrhmkuru ibrhmkuru force-pushed the iox-692-add-thread-sanitizer-for-iceoryx branch from f40ff85 to 4b8d3c6 Compare January 12, 2023 11:42
@ibrhmkuru ibrhmkuru marked this pull request as ready for review January 12, 2023 12:03
@elBoberido elBoberido self-requested a review January 12, 2023 12:40
…le instead

Signed-off-by: Ibrahim Kuru <ibrahim.kuru@apex.ai>
@elBoberido
Copy link
Member

@ibrhmkuru you mentioned that there is one tests which takes extremely long with the thread sanitizer enabled. Would it be possible to disable that test when the thread sanitizer runs? An intermediate solution would also be to run hoofs tests only and take care of posh in a follow up PR.

Copy link
Member

@elBoberido elBoberido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Only a minor change ... and maybe a major one regarding the long test run with tsan enabled.

@@ -65,6 +69,10 @@ for arg in "$@"; do
ASAN_ONLY=true
TEST_SCOPE="no_timing_test"
;;
"tsan-only")
CONTINUE_ON_ERROR=true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be removed? If I understood this correctly then the CI would hide detected errors, right? This should be done explicitly by the continue-on-error flag

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elBoberido You're right. Done!

@ibrhmkuru
Copy link
Contributor Author

ibrhmkuru commented Jan 12, 2023

@ibrhmkuru you mentioned that there is one tests which takes extremely long with the thread sanitizer enabled. Would it be possible to disable that test when the thread sanitizer runs? An intermediate solution would also be to run hoofs tests only and take care of posh in a follow up PR.

@elBoberido This was in gcc. The test name is FindInMaximumServices. It's called many times (16, if I'm not wrong) in a test and each call was about ~20-30 min. So it was causing the test to timeout. But in clang, we don't have this problem; each call is just take 3-5 seconds.

In clang total run time, executing all (contains all_tests, timing_module_tests, timing_integration_tests) takes ~1.5 hour. Is that okay, or should we disable some tests ?

@elBoberido
Copy link
Member

@elBoberido This was in gcc. The test name is FindInMaximumServices. It's called many times (16, if I'm not wrong) in a test and each call was about ~20-30 min. So it was causing the test to timeout. But in clang, we don't have this problem; each call is just take 3-5 seconds.

In clang total run time, executing all (contains all_tests, timing_module_tests, timing_integration_tests) takes ~1.5 hour. Is that okay, or should we disable some tests ?

Well, this triples our CI time. I had a brief look at the tests and it seems some are quite fast, e.g. 1ms and others take 7s and more. My gut feeling is that the integration tests take quite long with the start of the RouDiEnvironment.

Okay, the TheadSanitizer test run just finished after 52 minutes but the job somehow hangs. Let's see how long it takes to fully finish.

@elBoberido
Copy link
Member

Okay, my gut feelings were wrong. It seems to be related to DEATH_TESTS or error handler calls, e.g. expected_test.AccessingErrorOfRValueExpectedWhichContainsValueLeadsToErrorHandlerCall (39339 ms) and all tests for epected are 63 tests from expected_test (500611 ms total)

Comment on lines 112 to 123
build-test-ubuntu-with-thread-sanitizer-clang-latest:
# prevent stuck jobs consuming runners for 3 hours
timeout-minutes: 180
runs-on: ubuntu-latest
needs: pre-flight-check
steps:
- name: Checkout
uses: actions/checkout@v3
- name: Install iceoryx dependencies and clang-tidy
uses: ./.github/actions/install-iceoryx-deps-and-clang
- name: Run Thread Sanitizer
run: ./tools/ci/build-test-ubuntu-with-sanitizers.sh clang tsan
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elBoberido @ibrhmkuru As mentioned in #1824 (comment) the CI run for the threadsanitizer takes about 50 mins. This is definitely too long for the Pull-Request Pipeline.

We have the following options:

  1. The execution time can be reduced on the job
  2. Move the job to this file: https://github.com/eclipse-iceoryx/iceoryx/blob/master/.github/workflows/lint_master.yml to make it only run on master
  3. Move the job to nigthly execution: https://github.com/eclipse-iceoryx/iceoryx/blob/master/.github/workflows/nightly.yml

I think option 2 is feasible for this job, would that work?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dkroenke

  1. have a least the hoofs test run in the PR CI and the full job on master
  2. split the job into 4
    • first job builds only
    • the other three jobs run in parallel one of hoofs_moduletests, posh_moduletests or posh_integrationtests

I'm leaning towards 4 or 2 in the short term and 1 or 5 in the long term. Whichever is more feasible.

cc @ibrhmkuru

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elBoberido

I can implement 5 :

  • first job builds only
  • the other three jobs run in parallel one of hoofs_moduletests, posh_moduletests or posh_integrationtests

As I checked, I need to use actions/upload-artifact and actions/upload-artifact to share test binaries between build job and test job.

cc @dkroenke

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ibrhmkuru would be great but since this PR is already in a good state I think we should go for option 2 in this PR and do 5 in a follow up. This has the advantage that someone can look at the warnings and check for false positives.

Btw. I did something like option 5 for iceoryx-rs

cc @dkroenke

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elBoberido Thanks for the example for inspiration :) I've done option 2 as suggested. It's ready for review.

cc @dkroenke

@ibrhmkuru ibrhmkuru force-pushed the iox-692-add-thread-sanitizer-for-iceoryx branch from bd2ea9b to 53d8d72 Compare January 19, 2023 07:34
Copy link
Member

@dkroenke dkroenke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a small finding, everything else looks good, thanks!

timeout-minutes: 60
runs-on: macos-latest
runs-on: macos-12
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use here macos-latest, using version 12 was for testing purposes.
Within the next weeks the migration to MacOS 12 will be done: https://github.blog/changelog/2022-10-03-github-actions-jobs-running-on-macos-latest-are-now-running-on-macos-12/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dkroenke Right. Done!

@ibrhmkuru
Copy link
Contributor Author

@elBoberido @dkroenke

Do I need to squash the commits, or 5 commits for this PR is okay ?

@dkroenke
Copy link
Member

@ibrhmkuru

Do I need to squash the commits, or 5 commits for this PR is okay ?

I would recommend to fixup these three commits: 71964c6, 9f96934 and 53d8d72 (the last three commits).
Commits addressing review findings are a bit redundant and should be merged with the commit that adds the code, if possible.

elBoberido
elBoberido previously approved these changes Jan 19, 2023
Copy link
Member

@elBoberido elBoberido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Usually we do not squash to 1 commit if there are meaningful intermediate steps and the commit messages look sensible. The only thing that could be improved are the two commit messages with Address review comments. Usually we write what is done instead of referring just to review comments. Please keep this in mind for the next PR :)

…master branch

Signed-off-by: Ibrahim Kuru <ibrahim.kuru@apex.ai>
@ibrhmkuru
Copy link
Contributor Author

@elBoberido @dkroenke Thanks for information and feedback. I've squashed "Address review comments" commits.

@dkroenke dkroenke self-requested a review January 19, 2023 10:23
@dkroenke
Copy link
Member

@ibrhmkuru One small hint: When the PR is ready and you want to ping the reviewers you can re-request the review by using the small symbol near to the reviewers name with the round arrows.

@ibrhmkuru
Copy link
Contributor Author

@dkroenke Thanks! Do I need to do any additional things for merge? like assigning to marge-bot as in gitlab?

@dkroenke dkroenke merged commit 43f4eaf into eclipse-iceoryx:master Jan 19, 2023
@dkroenke dkroenke deleted the iox-692-add-thread-sanitizer-for-iceoryx branch January 19, 2023 10:52
@dkroenke
Copy link
Member

@ibrhmkuru

Thanks! Do I need to do any additional things for merge? like assigning to marge-bot as in gitlab?

No action point for you, the committer of the project merge the PR.

@elBoberido elBoberido linked an issue Jan 19, 2023 that may be closed by this pull request
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tooling All iceoryx related tooling (scripts etc.)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ThreadSanitizer for iceoryx
5 participants