Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Fixing umap gtest failure under cuda 11.2. #3696

Merged

Conversation

cjnolet
Copy link
Member

@cjnolet cjnolet commented Apr 1, 2021

Closes #3406.

There's a couple things to note in this PR:

  1. There is a kernel in the UMAP gtest that computes an L1 between two different embeddings and it wasn't using atomics for the addition so we may have just not been seeing these failures until now
  2. The Python code wasn't failing because it's using random init instead of spectral. I'm going to open an issue in RAFT to investigate why the spectral init differs from run to run with the same inputs even when random seed is set.

@cjnolet cjnolet requested a review from a team as a code owner April 1, 2021 15:56
@cjnolet cjnolet added bug Something isn't working non-breaking Non-breaking change labels Apr 1, 2021
@cjnolet cjnolet added this to PR-WIP in v0.19 Release via automation Apr 1, 2021
@cjnolet
Copy link
Member Author

cjnolet commented Apr 1, 2021

This also depends on rapidsai/raft#193

@cjnolet cjnolet moved this from PR-WIP to PR-Needs review in v0.19 Release Apr 1, 2021
@cjnolet cjnolet moved this from PR-Needs review to PR-Reviewer approved in v0.19 Release Apr 1, 2021
@cjnolet cjnolet moved this from PR-Reviewer approved to PR-Needs review in v0.19 Release Apr 1, 2021
Copy link
Contributor

@viclafargue viclafargue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this, great fix! Forgot about atomics... sorry for this.

@github-actions github-actions bot added the CMake label Apr 1, 2021
@cjnolet cjnolet requested a review from a team as a code owner April 1, 2021 19:09
v0.19 Release automation moved this from PR-Needs review to PR-Reviewer approved Apr 1, 2021
@@ -36,7 +36,8 @@ namespace Spectral {
* @param out output array for embedding (size n*n_comonents)
*/
void fit_embedding(const raft::handle_t &handle, int *rows, int *cols,
float *vals, int nnz, int n, int n_components, float *out);
float *vals, int nnz, int n, int n_components, float *out,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to leave a comment that one CI job showed a small doxygen issue, so that it doesn't block things later:

Generating docs for c/workspace/cpp/include/cuml/cluster/spectral.hpp:38: error: The following parameter of 
ML::Spectral::fit_embedding(const raft::handle_t &handle, int *rows, int *cols, float *vals, int nnz, int n, int n_components, float 
*out, unsigned long long seed=1234567) is not documented:
17:47:37   parameter 'seed' (warning treated as error, aborting now)

@dantegd dantegd added the 4 - Waiting on Author Waiting for author to respond to review label Apr 2, 2021
@cjnolet cjnolet added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 4 - Waiting on Author Waiting for author to respond to review labels Apr 2, 2021
@github-actions github-actions bot added the Cython / Python Cython or Python issue label Apr 2, 2021
@cjnolet cjnolet requested a review from a team as a code owner April 2, 2021 22:32
@JohnZed JohnZed changed the title [REVIEW] Fixing umap gtest failure under cuda 11.2. [WIP] Fixing umap gtest failure under cuda 11.2. Apr 5, 2021
@codecov-io
Copy link

Codecov Report

Merging #3696 (dbc509f) into branch-0.19 (fd9ec89) will increase coverage by 2.31%.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##           branch-0.19    #3696      +/-   ##
===============================================
+ Coverage        80.70%   83.02%   +2.31%     
===============================================
  Files              227      227              
  Lines            17615    17712      +97     
===============================================
+ Hits             14217    14706     +489     
+ Misses            3398     3006     -392     
Flag Coverage Δ
dask 45.62% <ø> (+0.63%) ⬆️
non-dask 75.11% <ø> (+2.19%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
python/cuml/__init__.py 95.58% <ø> (+0.20%) ⬆️
...cuml/_thirdparty/sklearn/preprocessing/__init__.py 100.00% <ø> (ø)
...on/cuml/_thirdparty/sklearn/preprocessing/_data.py 64.40% <ø> (+1.29%) ⬆️
...hirdparty/sklearn/preprocessing/_discretization.py 83.33% <ø> (-0.88%) ⬇️
...l/_thirdparty/sklearn/preprocessing/_imputation.py 85.54% <ø> (+22.74%) ⬆️
python/cuml/_thirdparty/sklearn/utils/extmath.py 56.89% <ø> (ø)
...cuml/_thirdparty/sklearn/utils/skl_dependencies.py 80.00% <ø> (+26.07%) ⬆️
...ython/cuml/_thirdparty/sklearn/utils/validation.py 18.41% <ø> (-4.04%) ⬇️
python/cuml/cluster/__init__.py 100.00% <ø> (ø)
python/cuml/cluster/agglomerative.pyx 96.47% <ø> (ø)
... and 181 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b45e27b...dbc509f. Read the comment docs.

@dantegd dantegd added 2 - In Progress Currenty a work in progress and removed 5 - Ready to Merge Testing and reviews complete, ready to merge labels Apr 5, 2021
@github-actions github-actions bot removed the CMake label Apr 7, 2021
@cjnolet cjnolet changed the title [WIP] Fixing umap gtest failure under cuda 11.2. [REVIEW] Fixing umap gtest failure under cuda 11.2. Apr 7, 2021
@cjnolet cjnolet added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 2 - In Progress Currenty a work in progress labels Apr 7, 2021
@cjnolet
Copy link
Member Author

cjnolet commented Apr 7, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 1838ff2 into rapidsai:branch-0.19 Apr 8, 2021
v0.19 Release automation moved this from PR-Reviewer approved to Done Apr 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge bug Something isn't working Cython / Python Cython or Python issue non-breaking Non-breaking change
Projects
No open projects
v0.19 Release
  
Done
Development

Successfully merging this pull request may close these issues.

[BUG] CUDA 11.2 libcuml++ C++ test failures EDIT: Updated with 11.2 update 2
5 participants