Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add travis sanitizers tests. #3557

Merged
merged 2 commits into from Aug 19, 2018
Merged

Conversation

trivialfis
Copy link
Member

Early PR. I need to see how's travis reacting.

@trivialfis trivialfis force-pushed the sanitizers-travis branch 2 times, most recently from 385cf5a to 82c8fcc Compare August 5, 2018 22:09
@codecov-io
Copy link

codecov-io commented Aug 6, 2018

Codecov Report

Merging #3557 into master will increase coverage by 0.03%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #3557      +/-   ##
============================================
+ Coverage     50.03%   50.06%   +0.03%     
  Complexity      188      188              
============================================
  Files           172      172              
  Lines         13887    13924      +37     
  Branches        457      457              
============================================
+ Hits           6948     6971      +23     
- Misses         6714     6728      +14     
  Partials        225      225
Impacted Files Coverage Δ Complexity Δ
tests/cpp/objective/test_ranking_obj.cc 100% <ø> (ø) 0 <0> (ø) ⬇️
tests/cpp/objective/test_regression_obj.cc 95.86% <100%> (+0.25%) 0 <0> (ø) ⬇️
tests/cpp/common/test_column_matrix.cc 100% <100%> (ø) 0 <0> (ø) ⬇️
tests/cpp/data/test_metainfo.cc 95.83% <100%> (+0.08%) 0 <0> (ø) ⬇️
tests/cpp/objective/test_objective.cc 100% <100%> (ø) 0 <0> (ø) ⬇️
tests/cpp/predictor/test_cpu_predictor.cc 100% <100%> (ø) 0 <0> (ø) ⬇️
tests/cpp/test_learner.cc 100% <100%> (ø) 0 <0> (ø) ⬇️
tests/cpp/metric/test_metric.cc 100% <100%> (ø) 0 <0> (ø) ⬇️
tests/cpp/helpers.cc 95.83% <100%> (ø) 0 <0> (ø) ⬇️
tests/cpp/linear/test_linear.cc 100% <100%> (ø) 0 <0> (ø) ⬇️
... and 22 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 983cb0b...a62491a. Read the comment docs.

@trivialfis
Copy link
Member Author

Seems asan-0.0 comes with GCC 4.8 is not doing its job well.

@trivialfis
Copy link
Member Author

trivialfis commented Aug 8, 2018

@hcho3 @RAMitchell Just tested asan on my machine with gcc-4.9 with Guix (where has no CUDA), which has libasan-1.0. In short, it doesn't work.

To confirm the testxgboost did linked against libasan, do:

ASAN_OPTIONS=verbosity=1 ./testxgboost

I saw outputs like:

==22882==Parsed ASAN_OPTIONS: verbosity=1
==22882==AddressSanitizer: libc interceptors initialized
...

But no error is generated. So I think the problem is indeed on asan.

I attached the complete log for running ./testxgboost built by gcc-7, generated from:

export ASAN_SYMBOLIZER_PATH=$(which llvm-symbolizer)
ASAN_OPTIONS=symbolize=1 ./testxgboost > log 2>&1

You can check the errors, see if it's worthy to upgrade Travis.

log.txt

@RAMitchell
Copy link
Member

I think we should definitely add it to continuous integration. Some of these errors may be quite significant.

This thread has some possibilities for installing different packages on Travis for different parts of the build matrix: travis-ci/travis-ci#3505

Could we install gcc7 in this way?

@hcho3
Copy link
Collaborator

hcho3 commented Aug 14, 2018

@trivialfis Thanks a lot for introducing me to sanitizers. I used address sanitizer to locate a double-free error in MXNet. This is super cool!

@trivialfis
Copy link
Member Author

@hcho3 Nah. Idea of putting it in XGBoost came from @RAMitchell while I did the leg work. I guess he had enough with these bugs in XGBoost.
I am still trying to make it to Travis, lets wait and see.

@trivialfis
Copy link
Member Author

trivialfis commented Aug 16, 2018

Voilà. asan on Travis sorted out.

Other than asan, the thread sanitizer reveals loads of data races, many of them relate to openmp. Some of them might be false positive since tsan still on beta, but seem pretty scary.

Running thread sanitizer would take much longer than running address sanitizer (so long that I never cross the c_api.XGDMatrixCreateFromMat_omp test), and requires a lot more memory too. So it might not be a good fit for Travis.

@hcho3 , @RAMitchell WDYT?

@RAMitchell
Copy link
Member

Very nice! Some of the data races may actually be intentional - for example the shotgun linear models solver uses hog wild parallelism. Can you post the log for tsan?

Let's try to fix some of these leaks in this PR.

It might also be interesting to run the python tests, although they are probably too time consuming to run on Travis.

@trivialfis
Copy link
Member Author

@RAMitchell The attached tsan log is generated without CUDA. The time from shell says that it takes about 45 minutes to get it so you might want a better computer to deal with it. That make sense, shotgun did contribute about half of the warning.

tsan.log

@@ -442,14 +442,14 @@ void PrefixSum(size_t *x, size_t N) {
delete[] suma;
}

XGB_DLL int XGDMatrixCreateFromMat_omp(const bst_float* data, // NOLINT
XGB_DLL int XGDMatrixCreateFromMat_omp(const bst_float* _data, // NOLINT
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we renaming data to _data?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first time I read the code got a little bit confused by namespace data, param data and mat.page_.data. I will try to fix all leaks from tests in this PR and change these thing back after done.

@trivialfis
Copy link
Member Author

If everything looks fine then I will rebase it into two commits, one concerning Travis and another concerning fixes.

* Add gcc-7 in Travis.
* Add SANITIZER_PATH for CMake.
* Enable sanitizer tests in Travis.
* Fix all memory leaks reported by Address Sanitizer.
* tests/cpp/helpers.h/CreateDMatrix now returns raw pointer.
@trivialfis
Copy link
Member Author

Ready now.

@RAMitchell RAMitchell merged commit cf2d86a into dmlc:master Aug 19, 2018
@trivialfis trivialfis deleted the sanitizers-travis branch August 19, 2018 14:24
@lock lock bot locked as resolved and limited conversation to collaborators Nov 17, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants