cuML 23.02.00 (Date TBD)

Please see https://github.com/rapidsai/cuml/releases/tag/v23.02.00a for the latest changes to this development branch.

cuML 22.12.00 (8 Dec 2022)

🚨 Breaking Changes

Change docs theme to pydata-sphinx theme (#4985) @galipremsagar
Remove "Open In Colab" link from Estimator Intro notebook. (#4980) @bdice
Remove CumlArray.copy() (#4958) @madsbk

🐛 Bug Fixes

Remove cupy.cusparse custom serialization (#5024) @dantegd
Restore LinearRegression documentation (#5020) @viclafargue
Don't use CMake 3.25.0 as it has a FindCUDAToolkit show stopping bug (#5007) @robertmaynard
verifying cusparse wrapper revert passes CI (#4990) @cjnolet
Use rapdsi_cpm_find(COMPONENTS ) for proper component tracking (#4989) @robertmaynard
Fix integer overflow in AutoARIMA due to bool-to-int cub scan (#4971) @Nyrio
Add missing includes (#4947) @vyasr
Fix the CMake option for disabling deprecation warnings. (#4946) @vyasr
Make doctest resilient to changes in cupy reprs (#4945) @vyasr
Assign python/ sub-directory to python-codeowners (#4940) @csadorf
Fix for non-contiguous strides (#4736) @viclafargue

📖 Documentation

Change docs theme to pydata-sphinx theme (#4985) @galipremsagar
Remove "Open In Colab" link from Estimator Intro notebook. (#4980) @bdice
Updating build instructions (#4979) @cjnolet

🚀 New Features

Reenable copy_prs. (#5010) @vyasr
Add wheel builds (#5009) @vyasr
LinearRegression: add support for multiple targets (#4988) @ahendriksen
CPU/GPU interoperability POC (#4874) @viclafargue

🛠️ Improvements

Upgrade Treelite to 3.0.1 (#5018) @hcho3
fix addition of nan_euclidean_distances to public api (#5015) @mattf
Fixing raft pin to 22.12 (#5000) @cjnolet
Pin dask and distributed for release (#4999) @galipremsagar
Update dask nightly install command in CI (#4978) @galipremsagar
Improve error message for array_equal asserts. (#4973) @csadorf
Use new rapids-cmake functionality for rpath handling. (#4966) @vyasr
Impl. CumlArray.deserialize() (#4965) @madsbk
Update cuda-python dependency to 11.7.1 (#4961) @galipremsagar
Add check for nsys utility version in the nvtx_benchmarks.py script (#4959) @viclafargue
Remove CumlArray.copy() (#4958) @madsbk
Implement hypothesis-based tests for linear models (#4952) @csadorf
Switch to using rapids-cmake for gbench. (#4950) @vyasr
Remove stale labeler (#4949) @raydouglass
Fix url in python/setup.py setuptools metadata. (#4937) @csadorf
Updates to fix cuml build (#4928) @cjnolet
Documenting hdbscan module to add prediction functions (#4925) @cjnolet
Unpin dask and distributed for development (#4912) @galipremsagar
Use KMeans from Raft (#4713) @lowener
Update cuml raft header extensions (#4599) @cjnolet
Reconciling primitives moved to RAFT (#4583) @cjnolet

cuML 22.10.00 (12 Oct 2022)

🐛 Bug Fixes

Skipping some hdbscan tests when cuda version is <= 11.2. (#4916) @cjnolet
Fix HDBSCAN python namespace (#4895) @cjnolet
Cupy 11 fixes (#4889) @dantegd
Fix small fp precision failure in linear regression doctest test (#4884) @lowener
Remove unused cuDF imports (#4873) @beckernick
Update for thrust 1.17 and fixes to accommodate for cuDF Buffer refactor (#4871) @dantegd
Use rapids-cmake 22.10 best practice for RAPIDS.cmake location (#4862) @robertmaynard
Patch for nightly test&bench (#4840) @viclafargue
Fixed Large memory requirements for SimpleImputer strategy median #4794 (#4817) @erikrene
Transforms RandomForest estimators non-consecutive labels to consecutive labels where appropriate (#4780) @VamsiTallam95

📖 Documentation

Document that minimum required CMake version is now 3.23.1 (#4899) @robertmaynard
Update KMeans notebook for clarity (#4886) @beckernick

🚀 New Features

Allow cupy 11 (#4880) @galipremsagar
Add sample_weight to Coordinate Descent solver (Lasso and ElasticNet) (#4867) @lowener
Import treelite models into FIL in a different precision (#4839) @canonizer
#4783 Added nan_euclidean distance metric to pairwise_distances (#4797) @Sreekiran096
PowerTransformer, QuantileTransformer and KernelCenterer (#4755) @viclafargue
Add "median" to TargetEncoder (#4722) @daxiongshu
New Feature StratifiedKFold (#3109) @daxiongshu

🛠️ Improvements

Updating python to use pylibraft (#4887) @cjnolet
Upgrade Treelite to 3.0.0 (#4885) @hcho3
Statically link all CUDA toolkit libraries (#4881) @trxcllnt
approximate_predict function for HDBSCAN (#4872) @tarang-jain
Pin dask and distributed for release (#4859) @galipremsagar
Remove Raft deprecated headers (#4858) @lowener
Fix forward-merge conflicts (#4857) @ajschmidt8
Update the NVTX bench helper for the new nsys utility (#4826) @viclafargue
All points membership vector for HDBSCAN (#4800) @tarang-jain
TSNE and UMAP allow several distance types (#4779) @tarang-jain
Convert fp32 datasets to fp64 in ARIMA and AutoARIMA + update notebook to avoid deprecation warnings with positional parameters (#4195) @Nyrio

cuML 22.08.00 (17 Aug 2022)

🚨 Breaking Changes

Update Python build to scikit-build (#4818) @dantegd
Bump xgboost to 1.6.0 from 1.5.2 (#4777) @galipremsagar

🐛 Bug Fixes

Revert "Allow CuPy 11" (#4847) @galipremsagar
Fix RAFT_NVTX option not set (#4825) @achirkin
Fix KNN error message. (#4782) @trivialfis
Update raft pinnings in dev yml files (#4778) @galipremsagar
Bump xgboost to 1.6.0 from 1.5.2 (#4777) @galipremsagar
Fixes exception when using predict_proba on fitted Pipeline object with a ColumnTransformer step (#4774) @VamsiTallam95
Regression errors failing with mixed data type combinations (#4770) @shaswat-indian

📖 Documentation

Use common code in python docs and defer js loading (#4852) @galipremsagar
Centralize common css & js code in docs (#4844) @galipremsagar
Add ComplementNB to the documentation (#4805) @lowener
Fix forward-merge branch-22.06 to branch-22.08 (#4789) @divyegala

🚀 New Features

Update Python build to scikit-build (#4818) @dantegd
Vectorizers to accept Pandas Series as input (#4811) @shaswat-indian
Cython wrapper for v-measure (#4785) @shaswat-indian

🛠️ Improvements

Pin dask & distributed for release (#4850) @galipremsagar
Allow CuPy 11 (#4837) @jakirkham
Remove duplicate adj_to_csr implementation (#4829) @ahendriksen
Update conda environment files to UCX 1.13.0 (#4813) @pentschev
Update conda recipes to UCX 1.13.0 (#4809) @pentschev
Fix #3414: remove naive versions dbscan algorithms (#4804) @ahendriksen
Accelerate adjacency matrix to CSR conversion for DBSCAN (#4803) @ahendriksen
Pin max version of cuda-python to 11.7.0 (#4793) @Ethyling
Allow cosine distance metric in dbscan (#4776) @tarang-jain
Unpin dask & distributed for development (#4771) @galipremsagar
Clean up Thrust includes. (#4675) @bdice
Improvements in feature sampling (#4278) @vinaydes

cuML 22.06.00 (7 Jun 2022)

🐛 Bug Fixes

Fix sg benchmark build. (#4766) @trivialfis
Resolve KRR hypothesis test failure (#4761) @RAMitchell
Fix KBinsDiscretizer bin_edges_ (#4735) @viclafargue
FIX Accept small floats in RandomForest (#4717) @thomasjpfan
Remove import of scalar_broadcast_to from stemmer (#4706) @viclafargue
Replace 22.04.x with 22.06.x in yaml files (#4692) @daxiongshu
Replace cudf.logical_not with ~ (#4669) @canonizer

📖 Documentation

Fix docs builds (#4733) @ajschmidt8
Change "principals" to "principles" (#4695) @cakiki
Update pydoc and promote ColumnTransformer out of experimental (#4509) @viclafargue

🚀 New Features

float64 support in FIL functions (#4655) @canonizer
float64 support in FIL core (#4646) @canonizer
Allow "LabelEncoder" to accept cupy and numpy arrays as input. (#4620) @daxiongshu
MNMG Logistic Regression (dask-glm wrapper) (#3512) @daxiongshu

🛠️ Improvements

Pin dask & distributed for release (#4758) @galipremsagar
Simplicial set functions (#4756) @viclafargue
Upgrade Treelite to 2.4.0 (#4752) @hcho3
Simplify recipes (#4749) @Ethyling
Inference for float64 random forests using FIL (#4739) @canonizer
MNT Removes unused optim_batch_size from UMAP's docstring (#4732) @thomasjpfan
Require UCX 1.12.1+ (#4720) @jakirkham
Allow enabling raft NVTX markers when raft is installed (#4718) @achirkin
Fix identifier collision (#4716) @viclafargue
Use raft::span in TreeExplainer (#4714) @hcho3
Expose simplicial set functions (#4711) @viclafargue
Refactor tests in cuml (#4703) @galipremsagar
Use conda to build python packages during GPU tests (#4702) @Ethyling
Update pinning to allow newer CMake versions. (#4698) @vyasr
TreeExplainer extensions (#4697) @RAMitchell
Add sample_weight for Ridge (#4696) @lowener
Unpin dask & distributed for development (#4693) @galipremsagar
float64 support in treelite->FIL import and Python layer (#4690) @canonizer
Enable building static libs (#4673) @trxcllnt
Treeshap hypothesis tests (#4671) @RAMitchell
float64 support in multi-sum and child_index() (#4648) @canonizer
Add libcuml-tests package (#4635) @Ethyling
Random ball cover algorithm for 3D data (#4582) @cjnolet
Use conda compilers (#4577) @Ethyling
Build packages using mambabuild (#4542) @Ethyling

cuML 22.04.00 (6 Apr 2022)

🚨 Breaking Changes

Moving more ling prims to raft (#4567) @cjnolet
Refactor QN solver: pass parameters via a POD struct (#4511) @achirkin

🐛 Bug Fixes

Fix single-GPU build by separating multi-GPU decomposition utils from single GPU (#4645) @dantegd
RF: fix stream bug causing performance regressions (#4644) @venkywonka
XFail test_hinge_loss temporarily (#4621) @lowener
cuml now supports building non static treelite (#4598) @robertmaynard
Fix mean_squared_error with cudf series (#4584) @daxiongshu
Fix for nightly CI tests: Use CUDA_REL variable in gpu build.sh script (#4581) @dantegd
Fix the TargetEncoder when transforming dataframe/series with custom index (#4578) @daxiongshu
Removing sign from pca assertions for now. (#4559) @cjnolet
Fix compatibility of OneHotEncoder fit (#4544) @lowener
Fix worker streams in OLS-eig executing in an unsafe order (#4539) @achirkin
Remove xfail from test_hinge_loss (#4504) @Nanthini10
Fix automerge #4501 (#4502) @dantegd
Remove classmethod of SimpleImputer (#4439) @lowener

📖 Documentation

RF: Fix improper documentation in dask-RF (#4666) @venkywonka
Add doctest (#4618) @lowener
Fix document layouts in Parameters sections (#4609) @Yosshi999
Updates to consistency of MNMG PCA/TSVD solvers (docs + code consolidation) (#4556) @cjnolet

🚀 New Features

Add a dummy argument deep to TargetEncoder.get_params() (#4601) @daxiongshu
Add Complement Naive Bayes (#4595) @lowener
Add get_params() to TargetEncoder (#4588) @daxiongshu
Target Encoder with variance statistics (#4483) @daxiongshu
Interruptible execution (#4463) @achirkin
Configurable libcuml++ per algorithm (#4296) @dantegd

🛠️ Improvements

Adding some prints when hdbscan assertion fails (#4656) @cjnolet
Temporarily disable new ops-bot functionality (#4652) @ajschmidt8
Use CPMFindPackage to retrieve cumlprims_mg (#4649) @trxcllnt
Pin dask & distributed versions (#4647) @galipremsagar
Remove RAFT MM includes (#4637) @viclafargue
Add option to build RAFT artifacts statically into libcuml++ (#4633) @dantegd
Upgrade dask & distributed minimum version (#4632) @galipremsagar
Add .github/ops-bot.yaml config file (#4630) @ajschmidt8
Small fixes for certain test failures (#4628) @vinaydes
Templatizing FIL types to add float64 support (#4625) @canonizer
Fitsne as default tsne method (#4597) @lowener
Add get_feature_names to OneHotEncoder (#4596) @viclafargue
Fix OOM and cudaContext crash in C++ benchmarks (#4594) @RAMitchell
Using Pyraft and automatically cloning when raft pin changes (#4593) @cjnolet
Upgrade Treelite to 2.3.0 (#4590) @hcho3
Sphinx warnings as errors (#4585) @RAMitchell
Adding missing FAISS license (#4579) @cjnolet
Add QN solver to ElasticNet and Lasso models (#4576) @achirkin
Move remaining stats prims to raft (#4568) @cjnolet
Moving more ling prims to raft (#4567) @cjnolet
Adding libraft conda dependencies (#4564) @cjnolet
Fix RF integer overflow (#4563) @RAMitchell
Add CMake install rules for tests (#4551) @ajschmidt8
Faster GLM preprocessing by fusing kernels (#4549) @achirkin
RAFT API updates for lap, label, cluster, and spectral apis (#4548) @cjnolet
Moving cusparse wrappers to detail API in RAFT. (#4547) @cjnolet
Unpin max dask and distributed versions (#4546) @galipremsagar
Kernel density estimation (#4545) @RAMitchell
Update xgboost version in CI (#4541) @ajschmidt8
replaces ccache with sccache (#4534) @AyodeAwe
Remove RAFT memory management (2/2) (#4526) @viclafargue
Updating RAFT linalg headers (#4515) @divyegala
Refactor QN solver: pass parameters via a POD struct (#4511) @achirkin
Kernel ridge regression (#4492) @RAMitchell
QN solvers: Use different gradient norms for different for different loss functions. (#4491) @achirkin
RF: Variable binning and other minor refactoring (#4479) @venkywonka
Rewrite CD solver using more BLAS (#4446) @achirkin
Add support for sample_weights in LinearRegression (#4428) @lowener
Nightly automated benchmark (#4414) @viclafargue
Use FAISS with RMM (#4297) @viclafargue
Split C++ tests into separate binaries (#4295) @dantegd

cuML 22.02.00 (2 Feb 2022)

🚨 Breaking Changes

Move NVTX range helpers to raft (#4445) @achirkin

🐛 Bug Fixes

Always upload libcuml (#4530) @raydouglass
Fix RAFT pin to main branch (#4508) @dantegd
Pin dask & distributed (#4505) @galipremsagar
Replace use of RMM provided CUDA bindings with CUDA Python (#4499) @shwina
Dataframe Index as columns in ColumnTransformer (#4481) @viclafargue
Support compilation with Thrust 1.15 (#4469) @robertmaynard
fix minor ASAN issues in UMAPAlgo::Optimize::find_params_ab() (#4405) @yitao-li

📖 Documentation

Remove comment numerical warning (#4408) @viclafargue
Fix docstring for npermutations in PermutationExplainer (#4402) @hcho3

🚀 New Features

Combine and expose SVC's support vectors when fitting multi-class data (#4454) @NV-jpt
Accept fold index for TargetEncoder (#4453) @daxiongshu
Move NVTX range helpers to raft (#4445) @achirkin

🛠️ Improvements

Fix packages upload (#4517) @Ethyling
Testing split fused l2 knn compilation units (#4514) @cjnolet
Prepare upload scripts for Python 3.7 removal (#4500) @Ethyling
Renaming macros with their RAFT counterparts (#4496) @divyegala
Allow CuPy 10 (#4487) @jakirkham
Upgrade Treelite to 2.2.1 (#4484) @hcho3
Unpin dask and distributed (#4482) @galipremsagar
Support categorical splits in in TreeExplainer (#4473) @hcho3
Remove RAFT memory management (#4468) @viclafargue
Add missing imports tests (#4452) @Ethyling
Update CUDA 11.5 conda environment to use 22.02 pinnings. (#4450) @bdice
Support cuML / scikit-learn RF classifiers in TreeExplainer (#4447) @hcho3
Remove IncludeCategories from .clang-format (#4438) @codereport
Simplify perplexity normalization in t-SNE (#4425) @zbjornson
Unify dense and sparse tests (#4417) @levsnv
Update ucx-py version on release using rvc (#4411) @Ethyling
Universal Treelite tree walk function for FIL (#4407) @levsnv
Update to UCX-Py 0.24 (#4396) @pentschev
Using sparse public API functions from RAFT (#4389) @cjnolet
Add a warning to prefer LinearSVM over SVM(kernel='linear') (#4382) @achirkin
Hiding cusparse deprecation warnings (#4373) @cjnolet
Unify dense and sparse import in FIL (#4328) @levsnv
Integrating RAFT handle updates (#4313) @divyegala
Use RAFT template instantations for distances (#4302) @cjnolet
RF: code re-organization to enhance build parallelism (#4299) @venkywonka
Add option to build faiss and treelite shared libs, inherit common dependencies from raft (#4256) @trxcllnt

cuML 21.12.00 (9 Dec 2021)

🚨 Breaking Changes

Fix indexing of PCA to use safer types (#4255) @lowener
RF: Add Gamma and Inverse Gaussian loss criteria (#4216) @venkywonka
update RF docs (#4138) @venkywonka

🐛 Bug Fixes

Update conda recipe to have explicit libcusolver (#4392) @dantegd
Restore FIL convention of inlining code (#4366) @levsnv
Fix SVR intercept AttributeError (#4358) @lowener
Fix is_stable_build logic for CI scripts (#4350) @ajschmidt8
Temporarily disable rmm devicebuffer in array.py (#4333) @dantegd
Fix categorical test in python (#4326) @levsnv
Revert "Merge pull request #4319 from AyodeAwe/branch-21.12" (#4325) @ajschmidt8
Preserve indexing in methods when applied to DataFrame and Series objects (#4317) @dantegd
Fix potential CUDA context poison when negative (invalid) categories provided to FIL model (#4314) @levsnv
Using sparse expanded distances where possible (#4310) @cjnolet
Fix for mean_squared_error (#4287) @viclafargue
Fix for Categorical Naive Bayes sparse handling (#4277) @lowener
Throw an explicit excpetion if the input array is empty in DBSCAN.fit #4273 (#4275) @viktorkovesd
Fix KernelExplainer returning TypeError for certain input (#4272) @Nanthini10
Remove most warnings from pytest suite (#4196) @dantegd

📖 Documentation

Add experimental GPUTreeSHAP to API doc (#4398) @hcho3
Fix GLM typo on device/host pointer (#4320) @lowener
update RF docs (#4138) @venkywonka

🚀 New Features

Add GPUTreeSHAP to cuML explainer module (experimental) (#4351) @hcho3
Enable training single GPU cuML models using Dask DataFrames and Series (#4300) @ChrisJar
LinearSVM using QN solvers (#4268) @achirkin
Add support for exogenous variables to ARIMA (#4221) @Nyrio
Use opt-in shared memory carveout for FIL (#3759) @levsnv
Symbolic Regression/Classification C/C++ (#3638) @vimarsh6739

🛠️ Improvements

Fix Changelog Merge Conflicts for branch-21.12 (#4393) @ajschmidt8
Pin max dask and distributed to 2012.11.2 (#4390) @galipremsagar
Fix forward merge #4349 (#4374) @dantegd
Upgrade clang to 11.1.0 (#4372) @galipremsagar
Update clang-format version in docs; allow unanchored version string (#4365) @zbjornson
Add CUDA 11.5 developer environment (#4364) @dantegd
Fix aliasing violation in t-SNE (#4363) @zbjornson
Promote FITSNE from experimental (#4361) @lowener
Fix unnecessary f32/f64 conversions in t-SNE KL calc (#4331) @zbjornson
Update rapids-cmake version (#4330) @dantegd
rapids-cmake version update to 21.12 (#4327) @dantegd
Use compute-sanitizer instead of cuda-memcheck (#4324) @teju85
Ability to pass fp64 type to cuml benchmarks (#4323) @teju85
Split treelite fil import from forest object definition (#4306) @levsnv
update xgboost version (#4301) @msadang
Accounting for RAFT updates to matrix, stats, and random implementations in detail (#4294) @divyegala
Update cudf matrix calls for to_numpy and to_cupy (#4293) @dantegd
Update conda recipes for Enhanced Compatibility effort (#4288) @ajschmidt8
Increase parallelism from 4 to 8 jobs in CI (#4286) @dantegd
RAFT distance prims public API update (#4280) @cjnolet
Update to UCX-Py 0.23 (#4274) @pentschev
In FIL, clip blocks_per_sm to one wave instead of asserting (#4271) @levsnv
Update of "Gracefully accept 'n_jobs', a common sklearn parameter, in NearestNeighbors Estimator" (#4267) @NV-jpt
Improve numerical stability of the Kalman filter for ARIMA (#4259) @Nyrio
Fix indexing of PCA to use safer types (#4255) @lowener
Change calculation of ARIMA confidence intervals (#4248) @Nyrio
Unpin dask & distributed in CI (#4235) @galipremsagar
RF: Add Gamma and Inverse Gaussian loss criteria (#4216) @venkywonka
Exposing KL divergence in TSNE (#4208) @viclafargue
Unify template parameter dispatch for FIL inference and shared memory footprint estimation (#4013) @levsnv

cuML 21.10.00 (7 Oct 2021)

🚨 Breaking Changes

RF: python api behaviour refactor (#4207) @venkywonka
Implement vector leaf for random forest (#4191) @RAMitchell
Random forest refactoring (#4166) @RAMitchell
RF: Add Poisson deviance impurity criterion (#4156) @venkywonka
avoid paramsSolver::{n_rows,n_cols} shadowing their base class counterparts (#4130) @yitao-li
Apply modifications to account for RAFT changes (#4077) @viclafargue

🐛 Bug Fixes

Update scikit-learn version in conda dev envs to 0.24 (#4241) @dantegd
Using pinned host memory for Random Forest and DBSCAN (#4215) @divyegala
Make sure we keep the rapids-cmake and cuml cal version in sync (#4213) @robertmaynard
Add thrust_create_target to install export in CMakeLists (#4209) @dantegd
Change the error type to match sklearn. (#4198) @achirkin
Fixing remaining hdbscan bug (#4179) @cjnolet
Fix for cuDF changes to cudf.core (#4168) @dantegd
Fixing UMAP reproducibility pytest failures in 11.4 by using random init for now (#4152) @cjnolet
avoid paramsSolver::{n_rows,n_cols} shadowing their base class counterparts (#4130) @yitao-li
Use the new RAPIDS.cmake to fetch rapids-cmake (#4102) @robertmaynard

📖 Documentation

Expose train_test_split in API doc (#4234) @hcho3
Adding docs for .get_feature_names() inside TfidfVectorizer (#4226) @mayankanand007
Removing experimental flag from hdbscan description in docs (#4211) @cjnolet
updated build instructions (#4200) @shaneding
Forward-merge branch-21.08 to branch-21.10 (#4171) @jakirkham

🚀 New Features

Experimental option to build libcuml++ only with FIL (#4225) @dantegd
FIL to import categorical models from treelite (#4173) @levsnv
Add hamming, jensen-shannon, kl-divergence, correlation and russellrao distance metrics (#4155) @mdoijade
Add Categorical Naive Bayes (#4150) @lowener
FIL to infer categorical forests and generate them in C++ tests (#4092) @levsnv
Add Gaussian Naive Bayes (#4079) @lowener
ARIMA - Add support for missing observations and padding (#4058) @Nyrio

🛠️ Improvements

Pin max dask and distributed versions to 2021.09.1 (#4229) @galipremsagar
Fea/umap refine (#4228) @AjayThorve
Upgrade Treelite to 2.1.0 (#4220) @hcho3
Add option to clone RAFT even if it is in the environment (#4217) @dantegd
RF: python api behaviour refactor (#4207) @venkywonka
Pytest updates for Scikit-learn 0.24 (#4205) @dantegd
Faster glm ols-via-eigendecomposition algorithm (#4201) @achirkin
Implement vector leaf for random forest (#4191) @RAMitchell
Refactor kmeans sampling code (#4190) @Nanthini10
Gracefully accept 'n_jobs', a common sklearn parameter, in NearestNeighbors Estimator (#4178) @NV-jpt
Update with rapids cmake new features (#4175) @robertmaynard
Update to UCX-Py 0.22 (#4174) @pentschev
Random forest refactoring (#4166) @RAMitchell
Fix log level for dask tree_reduce (#4163) @lowener
Add CUDA 11.4 development environment (#4160) @dantegd
RF: Add Poisson deviance impurity criterion (#4156) @venkywonka
Split FIL infer_k into phases to speed up compilation (when a patch is applied) (#4148) @levsnv
RF node queue rewrite (#4125) @RAMitchell
Remove max version pin for dask & distributed on development branch (#4118) @galipremsagar
Correct name of a cmake function in get_spdlog.cmake (#4106) @robertmaynard
Apply modifications to account for RAFT changes (#4077) @viclafargue
Warnings are errors (#4075) @harrism
ENH Replace gpuci_conda_retry with gpuci_mamba_retry (#4065) @dillon-cullinan
Changes to NearestNeighbors to call 2d random ball cover (#4003) @cjnolet
support space in workspace (#3752) @jolorunyomi

cuML 21.08.00 (4 Aug 2021)

🚨 Breaking Changes

Remove deprecated target_weights in UMAP (#4081) @lowener
Upgrade Treelite to 2.0.0 (#4072) @hcho3
RF/DT cleanup (#4005) @venkywonka
RF: memset and batch size optimization for computing splits (#4001) @venkywonka
Remove old RF backend (#3868) @RAMitchell
Enable warp-per-tree inference in FIL for regression and binary classification (#3760) @levsnv

🐛 Bug Fixes

Disabling umap reproducibility tests for cuda 11.4 (#4128) @cjnolet
Fix for crash in RF when max_leaves parameter is specified (#4126) @vinaydes
Running umap mnmg test twice (#4112) @cjnolet
Minimal fix for SparseRandomProjection (#4100) @viclafargue
Creating copy of components in PCA transform and inverse transform (#4099) @divyegala
Fix SVM model parameter handling in case n_support=0 (#4097) @tfeher
Fix set_params for linear models (#4096) @lowener
Fix train test split pytest comparison (#4062) @dantegd
Fix fit_transform on KMeans (#4055) @lowener
Fixing -1 key access in 1nn reduce op in HDBSCAN (#4052) @divyegala
Disable installing gbench to avoid container permission issues (#4049) @dantegd
Fix double fit crash in preprocessing models (#4040) @viclafargue
Always add faiss library alias if it's missing (#4028) @trxcllnt
Fixing intermittent HBDSCAN pytest failure in CI (#4025) @divyegala
HDBSCAN bug on A100 (#4024) @divyegala
Add treelite include paths to treelite targets (#4023) @trxcllnt
Add Treelite_BINARY_DIR include to cuml++ build interface include paths (#4018) @trxcllnt
Small ARIMA-related bug fixes in Hessenberg reduction and make_arima (#4017) @Nyrio
Update setup.py (#4015) @ajschmidt8
Update treelite version in get_treelite.cmake (#4014) @ajschmidt8
Fix build with latest RAFT branch-21.08 (#4012) @trxcllnt
Skipping hdbscan pytests when gpu is a100 (#4007) @cjnolet
Using 64-bit array lengths to increase scale of pca & tsvd (#3983) @cjnolet
Fix MNMG test in Dask RF (#3964) @hcho3
Use nested include in destination of install headers to avoid docker permission issues (#3962) @dantegd
Fix automerge #3939 (#3952) @dantegd
Update UCX-Py version to 0.21 (#3950) @pentschev
Fix kernel and line info in cmake (#3941) @dantegd
Fix for multi GPU PCA compute failing bug after transform and added error handling when n_components is not passed (#3912) @akaanirban
Tolerate QN linesearch failures when it's harmless (#3791) @achirkin

📖 Documentation

Improve docstrings for silhouette score metrics. (#4026) @bdice
Update CHANGELOG.md link (#3956) @Salonijain27
Update documentation build examples to be generator agnostic (#3909) @robertmaynard
Improve FIL code readability and documentation (#3056) @levsnv

🚀 New Features

Add Multinomial and Bernoulli Naive Bayes variants (#4053) @lowener
Add weighted K-Means sampling for SHAP (#4051) @Nanthini10
Use chebyshev, canberra, hellinger and minkowski distance metrics (#3990) @mdoijade
Implement vector leaf prediction for fil. (#3917) @RAMitchell
change TargetEncoder's smooth argument from ratio to count (#3876) @daxiongshu
Enable warp-per-tree inference in FIL for regression and binary classification (#3760) @levsnv

🛠️ Improvements

Remove clang/clang-tools from conda recipe (#4109) @dantegd
Pin dask version (#4108) @galipremsagar
ANN warnings/tests updates (#4101) @viclafargue
Removing local memory operations from computeSplitKernel and other optimizations (#4083) @vinaydes
Fix libfaiss dependency to not expressly depend on conda-forge (#4082) @Ethyling
Remove deprecated target_weights in UMAP (#4081) @lowener
Upgrade Treelite to 2.0.0 (#4072) @hcho3
Optimize dtype conversion for FIL (#4070) @dantegd
Adding quick notes to HDBSCAN public API docs as to why discrepancies may occur between cpu and gpu impls. (#4061) @cjnolet
Update conda environment name for CI (#4039) @ajschmidt8
Rewrite random forest gtests (#4038) @RAMitchell
Updating Clang Version to 11.0.0 (#4029) @codereport
Raise ARIMA parameter limits from 4 to 8 (#4022) @Nyrio
Testing extract clusters in HDBSCAN (#4009) @divyegala
ARIMA - Kalman loop rewrite: single megakernel instead of host loop (#4006) @Nyrio
RF/DT cleanup (#4005) @venkywonka
Exposing condensed hierarchy through cython for easier unit-level testing (#4004) @cjnolet
Use the 21.08 branch of rapids-cmake as rmm requires it (#4002) @robertmaynard
RF: memset and batch size optimization for computing splits (#4001) @venkywonka
Reducing cluster size to number of selected clusters. Returning stability scores (#3987) @cjnolet
HDBSCAN: Lazy-loading (and caching) condensed & single-linkage tree objects (#3986) @cjnolet
Fix 21.08 forward-merge conflicts (#3982) @ajschmidt8
Update Dask/Distributed version (#3978) @pentschev
Use clang-tools on x86 only (#3969) @jakirkham
Promote trustworthiness_score to public header, add missing includes, update dependencies (#3968) @trxcllnt
Moving FAISS ANN wrapper to raft (#3963) @cjnolet
Add MG weighted k-means (#3959) @lowener
Remove unused code in UMAP. (#3931) @trivialfis
Fix automerge #3900 and correct package versions in meta packages (#3918) @dantegd
Adaptive stress tests when GPU memory capacity is insufficient (#3916) @lowener
Fix merge conflicts (#3892) @ajschmidt8
Remove old RF backend (#3868) @RAMitchell
Refactor to extract random forest objectives (#3854) @RAMitchell

cuML 21.06.00 (9 Jun 2021)

🚨 Breaking Changes

Remove Base.enable_rmm_pool method as it is no longer needed (#3875) @teju85
RF: Make experimental-backend default for regression tasks and deprecate old-backend. (#3872) @venkywonka
Deterministic UMAP with floating point rounding. (#3848) @trivialfis
Fix RF regression performance (#3845) @RAMitchell
Add feature to print forest shape in FIL upon importing (#3763) @levsnv
Remove 'seed' and 'output_type' deprecated features (#3739) @lowener

🐛 Bug Fixes

Disable UMAP deterministic test on CTK11.2 (#3942) @trivialfis
Revert #3869 (#3933) @hcho3
RF: fix the bug in pdf_to_cdf device function that causes hang when n_bins > TPB && n_bins % TPB != 0 (#3921) @venkywonka
Fix number of permutations in pytest and getting handle for cuml models (#3920) @dantegd
Fix typo in umap target_weight parameter (#3914) @lowener
correct compliation of cuml c library (#3908) @robertmaynard
Correct install path for include folder to avoid double nesting (#3901) @dantegd
Add type check for y in train_test_split (#3886) @Nanthini10
Fix for MNMG test_rf_classification_dask_fil_predict_proba (#3831) @lowener
Fix MNMG test test_rf_regression_dask_fil (#3830) @hcho3
AgglomerativeClustering support single cluster and ignore only zero distances from self-loops (#3824) @cjnolet

📖 Documentation

Small doc fixes for 21.06 release (#3936) @dantegd
Document ability to export cuML RF to predict on other machines (#3890) @hcho3

🚀 New Features

Deterministic UMAP with floating point rounding. (#3848) @trivialfis
HDBSCAN (#3821) @cjnolet
Add feature to print forest shape in FIL upon importing (#3763) @levsnv

🛠️ Improvements

Pin dask ot 2021.5.1 for 21.06 release (#3937) @dantegd
Upgrade xgboost to 1.4.2 (#3925) @dantegd
Use UCX-Py 0.20 (#3911) @jakirkham
Upgrade NCCL to 2.9.9 (#3902) @dantegd
Update conda developer environments (#3898) @viclafargue
ARIMA: pre-allocation of temporary memory to reduce latencies (#3895) @Nyrio
Condense TSNE parameters into a struct (#3884) @lowener
Update CHANGELOG.md links for calver (#3883) @ajschmidt8
Make sure __init__ is called in graph callback. (#3881) @trivialfis
Update docs build script (#3877) @ajschmidt8
Remove Base.enable_rmm_pool method as it is no longer needed (#3875) @teju85
RF: Make experimental-backend default for regression tasks and deprecate old-backend. (#3872) @venkywonka
Enable probability output from RF binary classifier (alternative implementaton) (#3869) @hcho3
CI test speed improvement (#3851) @lowener
Fix RF regression performance (#3845) @RAMitchell
Update to CMake 3.20 features, rapids-cmake and CPM (#3844) @dantegd
Support sparse input features in QN solvers and Logistic Regression (#3827) @achirkin
Trustworthiness score improvements (#3826) @viclafargue
Performance optimization of RF split kernels by removing empty cycles (#3818) @vinaydes
Correct deprecate positional args decorator for CalVer (#3784) @lowener
ColumnTransformer & FunctionTransformer (#3745) @viclafargue
Remove 'seed' and 'output_type' deprecated features (#3739) @lowener

cuML 0.19.0 (21 Apr 2021)

🚨 Breaking Changes

Use the new RF backend by default for classification (#3686) @hcho3
Deprecating quantile-per-tree and removing three previously deprecated Random Forest parameters (#3667) @vinaydes
Update predict() / predict_proba() of RF to match sklearn (#3609) @hcho3
Upgrade FAISS to 1.7.x (#3509) @viclafargue
cuML's estimator Base class for preprocessing models (#3270) @viclafargue

🐛 Bug Fixes

Fix brute force KNN distance metric issue (#3755) @viclafargue
Fix min_max_axis (#3735) @viclafargue
Fix NaN errors observed with ARIMA in CUDA 11.2 builds (#3730) @Nyrio
Fix random state generator (#3716) @viclafargue
Fixes the out of memory access issue for computeSplit kernels (#3715) @vinaydes
Fixing umap gtest failure under cuda 11.2. (#3696) @cjnolet
Fix irreproducibility issue in RF classification (#3693) @vinaydes
BUG fix BatchedLevelAlgo DtClsTest & DtRegTest failing tests (#3690) @venkywonka
Restore the functionality of RF score() (#3685) @hcho3
Use main build.sh to build docs in docs CI (#3681) @dantegd
Revert "Update conda recipes pinning of repo dependencies" (#3680) @raydouglass
Skip tests that fail on CUDA 11.2 (#3679) @dantegd
Dask KNN Cl&Re 1D labels (#3668) @viclafargue
Update conda recipes pinning of repo dependencies (#3666) @mike-wendt
OOB access in GLM SoftMax (#3642) @divyegala
SilhouetteScore C++ tests seed (#3640) @divyegala
SimpleImputer fix (#3624) @viclafargue
Silhouette Score make_monotonic for non-monotonic label set (#3619) @divyegala
Fixing support for empty rows in sparse Jaccard / Cosine (#3612) @cjnolet
Fix train_test_split with stratify option (#3611) @Nanthini10
Update predict() / predict_proba() of RF to match sklearn (#3609) @hcho3
Change dask and distributed branch to main (#3593) @dantegd
Fixes memory allocation for experimental backend and improves quantile computations (#3586) @vinaydes
Add ucx-proc package back that got lost during an auto merge conflict (#3550) @dantegd
Fix failing Hellinger gtest (#3549) @cjnolet
Directly invoke make for non-CMake docs target (#3534) @wphicks
Fix Codecov.io Coverage Upload for Branch Builds (#3524) @mdemoret-nv
Ensure global_output_type is thread-safe (#3497) @wphicks
List as input for SimpleImputer (#3489) @viclafargue

📖 Documentation

Add sparse docstring comments (#3712) @JohnZed
FIL and Dask demo (#3698) @miroenev
Deprecating quantile-per-tree and removing three previously deprecated Random Forest parameters (#3667) @vinaydes
Fixing Indentation for Docstring Generators (#3650) @mdemoret-nv
Update doc to indicate ExtraTree support (#3635) @hcho3
Update doc, now that FIL supports multi-class classification (#3634) @hcho3
Document model_type='xgboost_json' in FIL (#3633) @hcho3
Including log loss metric to the documentation website (#3617) @lowener
Update the build doc regarding the use of GCC 7.5 (#3605) @hcho3
Update One-Hot Encoder doc (#3600) @lowener
Fix documentation of KMeans (#3595) @lowener

🚀 New Features

Reduce the size of the cuml libraries (#3702) @robertmaynard
Use ninja as default CMake generator (#3664) @wphicks
Single-Linkage Hierarchical Clustering Python Wrapper (#3631) @cjnolet
Support for precomputed distance matrix in DBSCAN (#3585) @Nyrio
Adding haversine to brute force knn (#3579) @cjnolet
Support for sample_weight parameter in LogisticRegression (#3572) @viclafargue
Provide "--ccache" flag for build.sh (#3566) @wphicks
Eliminate unnecessary includes discovered by cppclean (#3564) @wphicks
Single-linkage Hierarchical Clustering C++ (#3545) @cjnolet
Expose sparse distances via semiring to Python API (#3516) @lowener
Use cmake --build in build.sh to facilitate switching build tools (#3487) @wphicks
Add cython hinge_loss (#3409) @Nanthini10
Adding CodeCov Info for Dask Tests (#3338) @mdemoret-nv
Add predict_proba() to XGBoost-style models in FIL C++ (#2894) @levsnv

🛠️ Improvements

Updating docs, readme, and umap param tests for 0.19 (#3731) @cjnolet
Locking RAFT hash for 0.19 (#3721) @cjnolet
Upgrade to Treelite 1.1.0 (#3708) @hcho3
Update to XGBoost 1.4.0rc1 (#3699) @hcho3
Use the new RF backend by default for classification (#3686) @hcho3
Update LogisticRegression documentation (#3677) @viclafargue
Preprocessing out of experimental (#3676) @viclafargue
ENH Decision Tree new backend computeSplit*Kernel histogram calculation optimization (#3674) @venkywonka
Remove check_cupy8 (#3669) @viclafargue
Use custom conda build directory for ccache integration (#3658) @dillon-cullinan
Disable three flaky tests (#3657) @hcho3
CUDA 11.2 developer environment (#3648) @dantegd
Store data frequencies in tree nodes of RF (#3647) @hcho3
Row major Gram matrices (#3639) @tfeher
Converting all Estimator Constructors to Keyword Arguments (#3636) @mdemoret-nv
Adding make_pipeline + test score with pipeline (#3632) @viclafargue
ENH Decision Tree new backend computeSplitClassificationKernel histogram calculation and occupancy optimization (#3616) @venkywonka
Revert "ENH Fix stale GHA and prevent duplicates " (#3614) @mike-wendt
ENH Fix stale GHA and prevent duplicates (#3613) @mike-wendt
KNN from RAFT (#3603) @viclafargue
Update Changelog Link (#3601) @ajschmidt8
Move SHAP explainers out of experimental (#3596) @dantegd
Fixing compatibility issue with CUDA array interface (#3594) @lowener
Remove cutlass usage in row major input for euclidean exp/unexp, cosine and L1 distance matrix (#3589) @mdoijade
Test FIL probabilities with absolute error thresholds in python (#3582) @levsnv
Removing sparse prims and fused l2 nn prim from cuml (#3578) @cjnolet
Prepare Changelog for Automation (#3570) @ajschmidt8
Print debug message if SVM convergence is poor (#3562) @tfeher
Fix merge conflicts in 3552 (#3557) @ajschmidt8
Additional distance metrics for ANN (#3533) @viclafargue
Improve warning message when QN solver reaches max_iter (#3515) @tfeher
Fix merge conflicts in 3502 (#3513) @ajschmidt8
Upgrade FAISS to 1.7.x (#3509) @viclafargue
ENH Pass ccache variables to conda recipe & use Ninja in CI (#3508) @Ethyling
Fix forward-merger conflicts in #3502 (#3506) @dantegd
Sklearn meta-estimators into namespace (#3493) @viclafargue
Add flexibility to copyright checker (#3466) @lowener
Update sparse KNN to use rmm device buffer (#3460) @lowener
Fix forward-merger conflicts in #3444 (#3455) @ajschmidt8
Replace ML::MetricType with raft::distance::DistanceType (#3389) @lowener
RF param initialization cython and C++ layer cleanup (#3358) @venkywonka
MNMG RF broadcast feature (#3349) @viclafargue
cuML's estimator Base class for preprocessing models (#3270) @viclafargue
Make _get_tags a class/static method (#3257) @dantegd
NVTX Markers for RF and RF-backend (#3014) @venkywonka

cuML 0.18.0 (24 Feb 2021)

Breaking Changes 🚨

cuml.experimental SHAP improvements (#3433) @dantegd
Enable feature sampling for the experimental backend of Random Forest (#3364) @vinaydes
re-enable cuML's copyright checker script (#3363) @teju85
Batched Silhouette Score (#3362) @divyegala
Update failing MNMG tests (#3348) @viclafargue
Rename print_summary() of Dask RF to get_summary_text(); it now returns string to the client (#3341) @hcho3
Rename dump_as_json() -> get_json(); expose it from Dask RF (#3340) @hcho3
MNMG KNN consolidation (#3307) @viclafargue
Return confusion matrix as int unless float weights are used (#3275) @lowener
Approximate Nearest Neighbors (#2780) @viclafargue

Bug Fixes 🐛

HOTFIX Add ucx-proc package back that got lost during an auto merge conflict (#3551) @dantegd
Non project-flash CI ml test 18.04 issue debugging and bugfixing (#3495) @dantegd
Temporarily xfail KBinsDiscretizer uniform tests (#3494) @wphicks
Fix illegal memory accesses when NITEMS > 1, and nrows % NITEMS != 0. (#3480) @canonizer
Update call to dask client persist (#3474) @dantegd
Adding warning for IVFPQ (#3472) @viclafargue
Fix failing sparse NN test in CI by allowing small number of index discrepancies (#3454) @cjnolet
Exempting thirdparty code from copyright checks (#3453) @lowener
Relaxing Batched SilhouetteScore Test Constraint (#3452) @divyegala
Mark kbinsdiscretizer quantile tests as xfail (#3450) @wphicks
Fixing documentation on SimpleImputer (#3447) @lowener
Skipping IVFPQ (#3429) @viclafargue
Adding tol to dask test_kmeans (#3426) @lowener
Fix memory bug for SVM with large n_rows (#3420) @tfeher
Allow linear regression for with CUDA >=11.0 (#3417) @wphicks
Fix vectorizer tests by restoring sort behavior in groupby (#3416) @JohnZed
Ensure make_classification respects output type (#3415) @wphicks
Clean Up #include Dependencies (#3402) @mdemoret-nv
Fix Nearest Neighbor Stress Test (#3401) @lowener
Fix array_equal in tests (#3400) @viclafargue
Improving Copyright Check When Not Running in CI (#3398) @mdemoret-nv
Also xfail zlib errors when downloading newsgroups data (#3393) @JohnZed
Fix for ANN memory release bug (#3391) @viclafargue
XFail Holt Winters test where statsmodels has known issues with gcc 9.3.0 (#3385) @JohnZed
FIX Update cupy to >= 7.8 and remove unused build.sh script (#3378) @dantegd
re-enable cuML's copyright checker script (#3363) @teju85
Update failing MNMG tests (#3348) @viclafargue
Rename print_summary() of Dask RF to get_summary_text(); it now returns string to the client (#3341) @hcho3
Fixing make_blobs to Respect the Global Output Type (#3339) @mdemoret-nv
Fix permutation explainer (#3332) @RAMitchell
k-means bug fix in debug build (#3321) @akkamesh
Fix for default arguments of PCA (#3320) @lowener
Provide workaround for cupy.percentile bug (#3315) @wphicks
Fix SVR unit test parameter (#3294) @tfeher
Add xfail on fetching 20newsgroup dataset (test_naive_bayes) (#3291) @lowener
Remove unused keyword in PorterStemmer code (#3289) @wphicks
Remove static specifier in DecisionTree unit test for C++14 compliance (#3281) @wphicks
Correct pure virtual declaration in manifold_inputs_t (#3279) @wphicks

Documentation 📖

Correct import path in docs for experimental preprocessing features (#3488) @wphicks
Minor doc updates for 0.18 (#3475) @JohnZed
Improve Python Docs with Default Role (#3445) @mdemoret-nv
Fixing Python Documentation Errors and Warnings (#3428) @mdemoret-nv
Remove outdated references to changelog in CONTRIBUTING.md (#3328) @wphicks
Adding highlighting to bibtex in readme (#3296) @cjnolet

New Features 🚀

Improve runtime performance of RF to Treelite conversion (#3410) @wphicks
Parallelize Treelite to FIL conversion over trees (#3396) @wphicks
Parallelize RF to Treelite conversion over trees (#3395) @wphicks
Allow saving Dask RandomForest models immediately after training (fixes #3331) (#3388) @jameslamb
genetic programming initial structures (#3387) @teju85
MNMG DBSCAN (#3382) @Nyrio
FIL to use L1 cache when input columns don't fit into shared memory (#3370) @levsnv
Enable feature sampling for the experimental backend of Random Forest (#3364) @vinaydes
Batched Silhouette Score (#3362) @divyegala
Rename dump_as_json() -> get_json(); expose it from Dask RF (#3340) @hcho3
Exposing model_selection in a similar way to scikit-learn (#3329) @ptartan21
Promote IncrementalPCA from experimental in 0.18 release (#3327) @lowener
Create labeler.yml (#3324) @jolorunyomi
Add slow high-precision mode to KNN (#3304) @wphicks
Sparse TSNE (#3293) @divyegala
Sparse Generalized SPMV (semiring) Primitive (#3146) @cjnolet
Multiclass meta estimator wrappers and multiclass SVC (#3092) @tfeher
Approximate Nearest Neighbors (#2780) @viclafargue
Add KNN parameter to t-SNE (#2592) @aleksficek

Improvements 🛠️

Update stale GHA with exemptions & new labels (#3507) @mike-wendt
Add GHA to mark issues/prs as stale/rotten (#3500) @Ethyling
Fix naive bayes inputs (#3448) @cjnolet
Prepare Changelog for Automation (#3442) @ajschmidt8
cuml.experimental SHAP improvements (#3433) @dantegd
Speed up knn tests (#3411) @JohnZed
Replacing sklearn functions with cuml in RF MNMG notebook (#3408) @lowener
Auto-label PRs based on their content (#3407) @jolorunyomi
Use stable 1.0.0 version of Treelite (#3394) @hcho3
API update to match RAFT PR #120 (#3386) @drobison00
Update linear models to use RMM memory allocation (#3365) @lowener
Updating dense pairwise distance enum names (#3352) @cjnolet
Upgrade Treelite module (#3316) @hcho3
Removed FIL node types with _t suffix (#3314) @canonizer
MNMG KNN consolidation (#3307) @viclafargue
Updating PyTests to Stay Below 4 Gb Limit (#3306) @mdemoret-nv
Refactoring: move internal FIL interface to a separate file (#3292) @canonizer
Return confusion matrix as int unless float weights are used (#3275) @lowener
018 add unfitted error pca & tests on IPCA (#3272) @lowener
Linear models predict function consolidation (#3256) @dantegd
Preparing sparse primitives for movement to RAFT (#3157) @cjnolet

cuML 0.17.0 (10 Dec 2020)

New Features

PR #3164: Expose silhouette score in Python
PR #3160: Least Angle Regression (experimental)
PR #2659: Add initial max inner product sparse knn
PR #3092: Multiclass meta estimator wrappers and multiclass SVC
PR #2836: Refactor UMAP to accept sparse inputs
PR #2894: predict_proba in FIL C++ for XGBoost-style multi-class models
PR #3126: Experimental versions of GPU accelerated Kernel and Permutation SHAP

Improvements

PR #3077: Improve runtime for test_kmeans
PR #3070: Speed up dask/test_datasets tests
PR #3075: Speed up test_linear_model tests
PR #3078: Speed up test_incremental_pca tests
PR #2902: matrix/matrix.cuh in RAFT namespacing
PR #2903: Moving linalg's gemm, gemv, transpose to RAFT namespaces
PR #2905: stats prims mean_center, sum to RAFT namespaces
PR #2904: Moving linalg basic math ops to RAFT namespaces
PR #2956: Follow cuML array conventions in ARIMA and remove redundancy
PR #3000: Pin cmake policies to cmake 3.17 version, bump project version to 0.17
PR #3083: Improving test_make_blobs testing time
PR #3223: Increase default SVM kernel cache to 2000 MiB
PR #2906: Moving linalg decomp to RAFT namespaces
PR #2988: FIL: use tree-per-class reduction for GROVE_PER_CLASS_FEW_CLASSES
PR #2996: Removing the max_depth restriction for switching to the batched backend
PR #3004: Remove Single Process Multi GPU (SPMG) code
PR #3032: FIL: Add optimization parameter blocks_per_sm that will help all but tiniest models
PR #3044: Move leftover linalg and stats to RAFT namespaces
PR #3067: Deleting prims moved to RAFT and updating header paths
PR #3074: Reducing dask coordinate descent test runtime
PR #3096: Avoid memory transfers in CSR WeakCC for DBSCAN
PR #3088: More readable and robust FIL C++ test management
PR #3052: Speeding up MNMG KNN Cl&Re testing
PR #3115: Speeding up MNMG UMAP testing
PR #3112: Speed test_array
PR #3111: Adding Cython to Code Coverage
PR #3129: Update notebooks README
PR #3002: Update flake8 Config To With Per File Settings
PR #3135: Add QuasiNewton tests
PR #3040: Improved Array Conversion with CumlArrayDescriptor and Decorators
PR #3134: Improving the Deprecation Message Formatting in Documentation
PR #3154: Adding estimator pickling demo notebooks (and docs)
PR #3151: MNMG Logistic Regression via dask-glm
PR #3113: Add tags and prefered memory order tags to estimators
PR #3137: Reorganize Pytest Config and Add Quick Run Option
PR #3144: Adding Ability to Set Arbitrary Cmake Flags in ./build.sh
PR #3155: Eliminate unnecessary warnings from random projection test
PR #3176: Add probabilistic SVM tests with various input array types
PR #3180: FIL: blocks_per_sm support in Python
PR #3186: Add gain to RF JSON dump
PR #3219: Update CI to use XGBoost 1.3.0 RCs
PR #3221: Update contributing doc for label support
PR #3177: Make Multinomial Naive Bayes inherit from ClassifierMixin and use it for score
PR #3241: Updating RAFT to latest
PR #3240: Minor doc updates
PR #3275: Return confusion matrix as int unless float weights are used

Bug Fixes

PR #3218: Specify dependency branches in conda dev environment to avoid pip resolver issue
PR #3196: Disable ascending=false path for sortColumnsPerRow
PR #3051: MNMG KNN Cl&Re fix + multiple improvements
PR #3179: Remove unused metrics.cu file
PR #3069: Prevent conversion of DataFrames to Series in preprocessing
PR #3065: Refactoring prims metrics function names from camelcase to underscore format
PR #3033: Splitting ml metrics to individual files
PR #3072: Fusing metrics and score directories in src_prims
PR #3037: Avoid logging deadlock in multi-threaded C code
PR #2983: Fix seeding of KISS99 RNG
PR #3011: Fix unused initialize_embeddings parameter in Barnes-Hut t-SNE
PR #3008: Check number of columns in check_array validator
PR #3012: Increasing learning rate for SGD log loss and invscaling pytests
PR #2950: Fix includes in UMAP
PR #3194: Fix cuDF to cuPy conversion (missing value)
PR #3021: Fix a hang in cuML RF experimental backend
PR #3039: Update RF and decision tree parameter initializations in benchmark codes
PR #3060: Speed up test suite test_fil
PR #3061: Handle C++ exception thrown from FIL predict
PR #3073: Update mathjax CDN URL for documentation
PR #3062: Bumping xgboost version to match cuml version
PR #3084: Fix artifacts in t-SNE results
PR #3086: Reverting FIL Notebook Testing
PR #3192: Enable pipeline usage for OneHotEncoder and LabelEncoder
PR #3114: Fixed a typo in SVC's predict_proba AttributeError
PR #3117: Fix two crashes in experimental RF backend
PR #3119: Fix memset args for benchmark
PR #3130: Return Python string from dump_as_json() of RF
PR #3132: Add min_samples_split + Rename min_rows_per_node -> min_samples_leaf
PR #3136: Fix stochastic gradient descent example
PR #3152: Fix access to attributes of individual NB objects in dask NB
PR #3156: Force local conda artifact install
PR #3162: Removing accidentally checked in debug file
PR #3191: Fix repr function for preprocessing models
PR #3175: Fix gtest pinned cmake version for build from source option
PR #3182: Fix a bug in MSE metric calculation
PR #3187: Update docstring to document behavior of bootstrap=False
PR #3215: Add a missing __syncthreads()
PR #3246: Fix MNMG KNN doc (adding batch_size)
PR #3185: Add documentation for Distributed TFIDF Transformer
PR #3190: Fix Attribute error on ICPA #3183 and PCA input type
PR #3208: Fix EXITCODE override in notebook test script
PR #3250: Fixing label binarizer bug with multiple partitions
PR #3214: Correct flaky silhouette score test by setting atol
PR #3216: Ignore splits that do not satisfy constraints
PR #3239: Fix intermittent dask random forest failure
PR #3243: Avoid unnecessary split for degenerate case where all labels are identical
PR #3245: Rename rows_sample -> max_samples to be consistent with sklearn's RF
PR #3282: Add secondary test to kernel explainer pytests for stability in Volta

cuML 0.16.0 (23 Oct 2020)

New Features

PR #2922: Install RAFT headers with cuML
PR #2909: Update allgatherv for compatibility with latest RAFT
PR #2677: Ability to export RF trees as JSON
PR #2698: Distributed TF-IDF transformer
PR #2476: Porter Stemmer
PR #2789: Dask LabelEncoder
PR #2152: add FIL C++ benchmark
PR #2638: Improve cython build with custom build_ext
PR #2866: Support XGBoost-style multiclass models (gradient boosted decision trees) in FIL C++
PR #2874: Issue warning for degraded accuracy with float64 models in Treelite
PR #2881: Introduces experimental batched backend for random forest
PR #2916: Add SKLearn multi-class GBDT model support in FIL

Improvements

PR #2947: Add more warnings for accuracy degradation with 64-bit models
PR #2873: Remove empty marker kernel code for NVTX markers
PR #2796: Remove tokens of length 1 by default for text vectorizers
PR #2741: Use rapids build packages in conda environments
PR #2735: Update seed to random_state in random forest and associated tests
PR #2739: Use cusparse_wrappers.h from RAFT
PR #2729: Replace cupy.sparse with cupyx.scipy.sparse
PR #2749: Correct docs for python version used in cuml_dev conda environment
PR #2747: Adopting raft::handle_t and raft::comms::comms_t in cuML
PR #2762: Fix broken links and provide minor edits to docs
PR #2723: Support and enable convert_dtype in estimator predict
PR #2758: Match sklearn's default n_components behavior for PCA
PR #2770: Fix doxygen version during cmake
PR #2766: Update default RandomForestRegressor score function to use r2
PR #2775: Enablinbg mg gtests w/ raft mpi comms
PR #2783: Add pytest that will fail when GPU IDs in Dask cluster are not unique
PR #2784: Add SparseCumlArray container for sparse index/data arrays
PR #2785: Add in cuML-specific dev conda dependencies
PR #2778: Add README for FIL
PR #2799: Reenable lightgbm test with lower (1%) proba accuracy
PR #2800: Align cuML's spdlog version with RMM's
PR #2824: Make data conversions warnings be debug level
PR #2835: Rng prims, utils, and dependencies in RAFT
PR #2541: Improve Documentation Examples and Source Linking
PR #2837: Make the FIL node reorder loop more obvious
PR #2849: make num_classes significant in FLOAT_SCALAR case
PR #2792: Project flash (new build process) script changes
PR #2850: Clean up unused params in paramsPCA
PR #2871: Add timing function to utils
PR #2863: in FIL, rename leaf_value_t enums to more descriptive
PR #2867: improve stability of FIL benchmark measurements
PR #2798: Add python tests for FIL multiclass classification of lightgbm models
PR #2892: Update ci/local/README.md
PR #2910: Adding Support for CuPy 8.x
PR #2914: Add tests for XGBoost multi-class models in FIL
PR #2622: Simplify tSNE perplexity search
PR #2930: Pin libfaiss to <=1.6.3
PR #2928: Updating Estimators Derived from Base for Consistency
PR #2942: Adding cuml.experimental to the Docs
PR #3010: Improve gpuCI Scripts
PR #3141: Move DistanceType enum to RAFT

Bug Fixes

PR #2973: Allow data imputation for nan values
PR #2982: Adjust kneighbors classifier test threshold to avoid intermittent failure
PR #2885: Changing test target for NVTX wrapper test
PR #2882: Allow import on machines without GPUs
PR #2875: Bug fix to enable colorful NVTX markers
PR #2744: Supporting larger number of classes in KNeighborsClassifier
PR #2769: Remove outdated doxygen options for 1.8.20
PR #2787: Skip lightgbm test for version 3 and above temporarily
PR #2805: Retain index in stratified splitting for dataframes
PR #2781: Use Python print to correctly redirect spdlogs when sys.stdout is changed
PR #2787: Skip lightgbm test for version 3 and above temporarily
PR #2813: Fix memory access in generation of non-row-major random blobs
PR #2810: Update Rf MNMG threshold to prevent sporadic test failure
PR #2808: Relax Doxygen version required in CMake to coincide with integration repo
PR #2818: Fix parsing of singlegpu option in build command
PR #2827: Force use of whole dataset when sample bootstrapping is disabled
PR #2829: Fixing description for labels in docs and removing row number constraint from PCA xform/inverse_xform
PR #2832: Updating stress tests that fail with OOM
PR #2831: Removing repeated capture and parameter in lambda function
PR #2847: Workaround for TSNE lockup, change caching preference.
PR #2842: KNN index preprocessors were using incorrect n_samples
PR #2848: Fix typo in Python docstring for UMAP
PR #2856: Fix LabelEncoder for filtered input
PR #2855: Updates for RMM being header only
PR #2844: Fix for OPG KNN Classifier & Regressor
PR #2880: Fix bugs in Auto-ARIMA when s==None
PR #2877: TSNE exception for n_components > 2
PR #2879: Update unit test for LabelEncoder on filtered input
PR #2932: Marking KBinsDiscretizer pytests as xfail
PR #2925: Fixing Owner Bug When Slicing CumlArray Objects
PR #2931: Fix notebook error handling in gpuCI
PR #2941: Fixing dask tsvd stress test failure
PR #2943: Remove unused shuffle_features parameter
PR #2940: Correcting labels meta dtype for cuml.dask.make_classification
PR #2965: Notebooks update
PR #2955: Fix for conftest for singlegpu build
PR #2968: Remove shuffle_features from RF param names
PR #2957: Fix ols test size for stability
PR #2972: Upgrade Treelite to 0.93
PR #2981: Prevent unguarded import of sklearn in SVC
PR #2984: Fix GPU test scripts gcov error
PR #2990: Reduce MNMG kneighbors regressor test threshold
PR #2997: Changing ARIMA get/set_params to get/set_fit_params

cuML 0.15.0 (26 Aug 2020)

New Features

PR #2581: Added model persistence via joblib in each section of estimator_intro.ipynb
PR #2554: Hashing Vectorizer and general vectorizer improvements
PR #2240: Making Dask models pickleable
PR #2267: CountVectorizer estimator
PR #2261: Exposing new FAISS metrics through Python API
PR #2287: Single-GPU TfidfTransformer implementation
PR #2289: QR SVD solver for MNMG PCA
PR #2312: column-major support for make_blobs
PR #2172: Initial support for auto-ARIMA
PR #2394: Adding cosine & correlation distance for KNN
PR #2392: PCA can accept sparse inputs, and sparse prim for computing covariance
PR #2465: Support pandas 1.0+
PR #2550: Single GPU Target Encoder
PR #2519: Precision recall curve using cupy
PR #2500: Replace UMAP functionality dependency on nvgraph with RAFT Spectral Clustering
PR #2502: cuML Implementation of sklearn.metrics.pairwise_distances
PR #2520: TfidfVectorizer estimator
PR #2211: MNMG KNN Classifier & Regressor
PR #2461: Add KNN Sparse Output Functionality
PR #2615: Incremental PCA
PR #2594: Confidence intervals for ARIMA forecasts
PR #2607: Add support for probability estimates in SVC
PR #2618: SVM class and sample weights
PR #2635: Decorator to generate docstrings with autodetection of parameters
PR #2270: Multi class MNMG RF
PR #2661: CUDA-11 support for single-gpu code
PR #2322: Sparse FIL forests with 8-byte nodes
PR #2675: Update conda recipes to support CUDA 11
PR #2645: Add experimental, sklearn-based preprocessing

Improvements

PR #2336: Eliminate rmm.device_array usage
PR #2262: Using fully shared PartDescriptor in MNMG decomposiition, linear models, and solvers
PR #2310: Pinning ucx-py to 0.14 to make 0.15 CI pass
PR #1945: enable clang tidy
PR #2339: umap performance improvements
PR #2308: Using fixture for Dask client to eliminate possiblity of not closing
PR #2345: make C++ logger level definition to be the same as python layer
PR #2329: Add short commit hash to conda package name
PR #2362: Implement binary/multi-classification log loss with cupy
PR #2363: Update threshold and make other changes for stress tests
PR #2371: Updating MBSGD tests to use larger batches
PR #2380: Pinning libcumlprims version to ease future updates
PR #2405: Remove references to deprecated RMM headers.
PR #2340: Import ARIMA in the root init file and fix the test_fit_function test
PR #2408: Install meta packages for dependencies
PR #2417: Move doc customization scripts to Jenkins
PR #2427: Moving MNMG decomposition to cuml
PR #2433: Add libcumlprims_mg to CMake
PR #2420: Add and set convert_dtype default to True in estimator fit methods
PR #2411: Refactor Mixin classes and use in classifier/regressor estimators
PR #2442: fix setting RAFT_DIR from the RAFT_PATH env var
PR #2469: Updating KNN c-api to document all arguments
PR #2453: Add CumlArray to API doc
PR #2440: Use Treelite Conda package
PR #2403: Support for input and output type consistency in logistic regression predict_proba
PR #2473: Add metrics.roc_auc_score to API docs. Additional readability and minor docs bug fixes
PR #2468: Add _n_features_in_ attribute to all single GPU estimators that implement fit
PR #2489: Removing explicit FAISS build and adding dependency on libfaiss conda package
PR #2480: Moving MNMG glm and solvers to cuml
PR #2490: Moving MNMG KMeans to cuml
PR #2483: Moving MNMG KNN to cuml
PR #2492: Adding additional assertions to mnmg nearest neighbors pytests
PR #2439: Update dask RF code to have print_detailed function
PR #2431: Match output of classifier predict with target dtype
PR #2237: Refactor RF cython code
PR #2513: Fixing LGTM Analysis Issues
PR #2099: Raise an error when float64 data is used with dask RF
PR #2522: Renaming a few arguments in KNeighbors* to be more readable
PR #2499: Provide access to cuml.DBSCAN core samples
PR #2526: Removing PCA TSQR as a solver due to scalability issues
PR #2536: Update conda upload versions for new supported CUDA/Python
PR #2538: Remove Protobuf dependency
PR #2553: Test pickle protocol 5 support
PR #2570: Accepting single df or array input in train_test_split
PR #2566: Remove deprecated cuDF from_gpu_matrix calls
PR #2583: findpackage.cmake.in template for cmake dependencies
PR #2577: Fully removing NVGraph dependency for CUDA 11 compatibility
PR #2575: Speed up TfidfTransformer
PR #2584: Removing dependency on sklearn's NotFittedError
PR #2591: Generate benchmark datsets using cuml.datasets
PR #2548: Fix limitation on number of rows usable with tSNE and refactor memory allocation
PR #2589: including cuda-11 build fixes into raft
PR #2599: Add Stratified train_test_split
PR #2487: Set classes_ attribute during classifier fit
PR #2605: Reduce memory usage in tSNE
PR #2611: Adding building doxygen docs to gpu ci
PR #2631: Enabling use of gtest conda package for build
PR #2623: Fixing kmeans score() API to be compatible with Scikit-learn
PR #2629: Add naive_bayes api docs
PR #2643: 'dense' and 'sparse' values of storage_type for FIL
PR #2691: Generic Base class attribute setter
PR #2666: Update MBSGD documentation to mention that the model is experimental
PR #2687: Update xgboost version to 1.2.0dev.rapidsai0.15
PR #2684: CUDA 11 conda development environment yml and faiss patch
PR #2648: Replace CNMeM with rmm::mr::pool_memory_resource.
PR #2686: Improve SVM tests
PR #2692: Changin LBFGS log level
PR #2705: Add sum operator and base operator overloader functions to cumlarray
PR #2701: Updating README + Adding ref to UMAP paper
PR #2721: Update API docs
PR #2730: Unpin cumlprims in conda recipes for release

Bug Fixes

PR #2369: Update RF code to fix set_params memory leak
PR #2364: Fix for random projection
PR #2373: Use Treelite Pip package in GPU testing
PR #2376: Update documentation Links
PR #2407: fixed batch count in DBScan for integer overflow case
PR #2413: CumlArray and related methods updates to account for cuDF.Buffer contiguity update
PR #2424: --singlegpu flag fix on build.sh script
PR #2432: Using correct algo_name for UMAP in benchmark tests
PR #2445: Restore access to coef_ property of Lasso
PR #2441: Change p2p_enabled definition to work without ucx
PR #2447: Drop nvstrings
PR #2450: Update local build to use new gpuCI image
PR #2454: Mark RF memleak test as XFAIL, because we can't detect memleak reliably
PR #2455: Use correct field to store data type in LabelEncoder.fit_transform
PR #2475: Fix typo in build.sh
PR #2496: Fixing indentation for simulate_data in test_fil.py
PR #2494: Set QN regularization strength consistent with scikit-learn
PR #2486: Fix cupy input to kmeans init
PR #2497: Changes to accomodate cuDF unsigned categorical changes
PR #2209: Fix FIL benchmark for gpuarray-c input
PR #2507: Import treelite.sklearn
PR #2521: Fixing invalid smem calculation in KNeighborsCLassifier
PR #2515: Increase tolerance for LogisticRegression test
PR #2532: Updating doxygen in new MG headers
PR #2521: Fixing invalid smem calculation in KNeighborsCLassifier
PR #2515: Increase tolerance for LogisticRegression test
PR #2545: Fix documentation of n_iter_without_progress in tSNE Python bindings
PR #2543: Improve numerical stability of QN solver
PR #2544: Fix Barnes-Hut tSNE not using specified post_learning_rate
PR #2558: Disabled a long-running FIL test
PR #2540: Update default value for n_epochs in UMAP to match documentation & sklearn API
PR #2535: Fix issue with incorrect docker image being used in local build script
PR #2542: Fix small memory leak in TSNE
PR #2552: Fixed the length argument of updateDevice calls in RF test
PR #2565: Fix cell allocation code to avoid loops in quad-tree. Prevent NaNs causing infinite descent
PR #2563: Update scipy call for arima gradient test
PR #2569: Fix for cuDF update
PR #2508: Use keyword parameters in sklearn.datasets.make_* functions
PR #2587: Attributes for estimators relying on solvers
PR #2586: Fix SVC decision function data type
PR #2573: Considering managed memory as device type on checking for KMeans
PR #2574: Fixing include path in tsvd_mg.pyx
PR #2506: Fix usage of CumlArray attributes on cuml.common.base.Base
PR #2593: Fix inconsistency in train_test_split
PR #2609: Fix small doxygen issues
PR #2610: Remove cuDF tolist call
PR #2613: Removing thresholds from kmeans score tests (SG+MG)
PR #2616: Small test code fix for pandas dtype tests
PR #2617: Fix floating point precision error in tSNE
PR #2625: Update Estimator notebook to resolve errors
PR #2634: singlegpu build option fixes
PR #2641: [Breaking] Make max_depth in RF compatible with scikit-learn
PR #2650: Make max_depth behave consistently for max_depth > 14
PR #2651: AutoARIMA Python bug fix
PR #2654: Fix for vectorizer concatenations
PR #2655: Fix C++ RF predict function access of rows/samples array
PR #2649: Cleanup sphinx doc warnings for 0.15
PR #2668: Order conversion improvements to account for cupy behavior changes
PR #2669: Revert PR 2655 Revert "Fixes C++ RF predict function"
PR #2683: Fix incorrect "Bad CumlArray Use" error messages on test failures
PR #2695: Fix debug build issue due to incorrect host/device method setup
PR #2709: Fixing OneHotEncoder Overflow Error
PR #2710: Fix SVC doc statement about predic_proba
PR #2726: Return correct output type in QN
PR #2711: Fix Dask RF failure intermittently
PR #2718: Fix temp directory for py.test
PR #2719: Set KNeighborsRegressor output dtype according to training target dtype
PR #2720: Updates to outdated links
PR #2722: Getting cuML covariance test passing w/ Cupy 7.8 & CUDA 11

cuML 0.14.0 (03 Jun 2020)

New Features

PR #1994: Support for distributed OneHotEncoder
PR #1892: One hot encoder implementation with cupy
PR #1655: Adds python bindings for homogeneity score
PR #1704: Adds python bindings for completeness score
PR #1687: Adds python bindings for mutual info score
PR #1980: prim: added a new write-only unary op prim
PR #1867: C++: add logging interface support in cuML based spdlog
PR #1902: Multi class inference in FIL C++ and importing multi-class forests from treelite
PR #1906: UMAP MNMG
PR #2067: python: wrap logging interface in cython
PR #2083: Added dtype, order, and use_full_low_rank to MNMG make_regression
PR #2074: SG and MNMG make_classification
PR #2127: Added order to SG make_blobs, and switch from C++ to cupy based implementation
PR #2057: Weighted k-means
PR #2256: Add a make_arima generator
PR #2245: ElasticNet, Lasso and Coordinate Descent MNMG
PR #2242: Pandas input support with output as NumPy arrays by default
PR #2551: Add cuML RF multiclass prediction using FIL from python
PR #1728: Added notebook testing to gpuCI gpu build

Improvements

PR #1931: C++: enabled doxygen docs for all of the C++ codebase
PR #1944: Support for dask_cudf.core.Series in _extract_partitions
PR #1947: Cleaning up cmake
PR #1927: Use Cython's new_build_ext (if available)
PR #1946: Removed zlib dependency from cmake
PR #1988: C++: cpp bench refactor
PR #1873: Remove usage of nvstring and nvcat from LabelEncoder
PR #1968: Update SVC SVR with cuML Array
PR #1972: updates to our flow to use conda-forge's clang and clang-tools packages
PR #1974: Reduce ARIMA testing time
PR #1984: Enable Ninja build
PR #1985: C++ UMAP parametrizable tests
PR #2005: Adding missing algorithms to cuml benchmarks and notebook
PR #2016: Add capability to setup.py and build.sh to fully clean all cython build files and artifacts
PR #2044: A cuda-memcheck helper wrapper for devs
PR #2018: Using cuml.dask.part_utils.extract_partitions and removing similar, duplicated code
PR #2019: Enable doxygen build in our nightly doc build CI script
PR #1996: Cythonize in parallel
PR #2032: Reduce number of tests for MBSGD to improve CI running time
PR #2031: Encapsulating UCX-py interactions in singleton
PR #2029: Add C++ ARIMA log-likelihood benchmark
PR #2085: Convert TSNE to use CumlArray
PR #2051: Reduce the time required to run dask pca and dask tsvd tests
PR #1981: Using CumlArray in kNN and DistributedDataHandler in dask kNN
PR #2053: Introduce verbosity level in C++ layer instead of boolean verbose flag
PR #2047: Make internal streams non-blocking w.r.t. NULL stream
PR #2048: Random forest testing speedup
PR #2058: Use CumlArray in Random Projection
PR #2068: Updating knn class probabilities to use make_monotonic instead of binary search
PR #2062: Adding random state to UMAP mnmg tests
PR #2064: Speed-up K-Means test
PR #2015: Renaming .h to .cuh in solver, dbscan and svm
PR #2080: Improved import of sparse FIL forests from treelite
PR #2090: Upgrade C++ build to C++14 standard
PR #2089: CI: enabled cuda-memcheck on ml-prims unit-tests during nightly build
PR #2128: Update Dask RF code to reduce the time required for GPU predict to run
PR #2125: Build infrastructure to use RAFT
PR #2131: Update Dask RF fit to use DistributedDataHandler
PR #2055: Update the metrics notebook to use important cuML models
PR #2095: Improved import of src_prims/utils.h, making it less ambiguous
PR #2118: Updating SGD & mini-batch estimators to use CumlArray
PR #2120: Speeding up dask RandomForest tests
PR #1883: Use CumlArray in ARIMA
PR #877: Adding definition of done criteria to wiki
PR #2135: A few optimizations to UMAP fuzzy simplicial set
PR #1914: Change the meaning of ARIMA's intercept to match the literature
PR #2098: Renaming .h to .cuh in decision_tree, glm, pca
PR #2150: Remove deprecated RMM calls in RMM allocator adapter
PR #2146: Remove deprecated kalman filter
PR #2151: Add pytest duration and pytest timeout
PR #2156: Add Docker 19 support to local gpuci build
PR #2178: Reduce duplicated code in RF
PR #2124: Expand tutorial docs and sample notebook
PR #2175: Allow CPU-only and dataset params for benchmark sweeps
PR #2186: Refactor cython code to build OPG structs in common utils file
PR #2180: Add fully single GPU singlegpu python build
PR #2187: CMake improvements to manage conda environment dependencies
PR #2185: Add has_sklearn function and use it in datasets/classification.
PR #2193: Order-independent local shuffle in cuml.dask.make_regression
PR #2204: Update python layer to use the logger interface
PR #2184: Refoctor headers for holtwinters, rproj, tsvd, tsne, umap
PR #2199: Remove unncessary notebooks
PR #2195: Separating fit and transform calls in SG, MNMG PCA to save transform array memory consumption
PR #2201: Re-enabling UMAP repro tests
PR #2132: Add SVM C++ benchmarks
PR #2196: Updates to benchmarks. Moving notebook
PR #2208: Coordinate Descent, Lasso and ElasticNet CumlArray updates
PR #2210: Updating KNN tests to evaluate multiple index partitions
PR #2205: Use timeout to add 2 hour hard limit to dask tests
PR #2212: Improve DBScan batch count / memory estimation
PR #2213: Standardized include statements across all cpp source files, updated copyright on all modified files
PR #2214: Remove utils folder and refactor to common folder
PR #2220: Final refactoring of all src_prims header files following rules as specified in #1675
PR #2225: input_to_cuml_array keep order option, test updates and cleanup
PR #2244: Re-enable slow ARIMA tests as stress tests
PR #2231: Using OPG structs from cuml.common in decomposition algorithms
PR #2257: Update QN and LogisticRegression to use CumlArray
PR #2259: Add CumlArray support to Naive Bayes
PR #2252: Add benchmark for the Gram matrix prims
PR #2263: Faster serialization for Treelite objects with RF
PR #2264: Reduce build time for cuML by using make_blobs from libcuml++ interface
PR #2269: Add docs targets to build.sh and fix python cuml.common docs
PR #2271: Clarify doc for _unique default implementation in OneHotEncoder
PR #2272: Add docs build.sh script to repository
PR #2276: Ensure CumlArray provided dtype conforms
PR #2281: Rely on cuDF's Serializable in CumlArray
PR #2284: Reduce dataset size in SG RF notebook to reduce run time of sklearn
PR #2285: Increase the threshold for elastic_net test in dask/test_coordinate_descent
PR #2314: Update FIL default values, documentation and test
PR #2316: 0.14 release docs additions and fixes
PR #2320: Add prediction notes to RF docs
PR #2323: Change verbose levels and parameter name to match Scikit-learn API
PR #2324: Raise an error if n_bins > number of training samples in RF
PR #2335: Throw a warning if treelite cannot be imported and load_from_sklearn is used

Bug Fixes

PR #1939: Fix syntax error in cuml.common.array
PR #1941: Remove c++ cuda flag that was getting duplicated in CMake
PR #1971: python: Correctly honor --singlegpu option and CUML_BUILD_PATH env variable
PR #1969: Update libcumlprims to 0.14
PR #1973: Add missing mg files for setup.py --singlegpu flag
PR #1993: Set umap_transform_reproducibility tests to xfail
PR #2004: Refactoring the arguments to plant() call
PR #2017: Fixing memory issue in weak cc prim
PR #2028: Skipping UMAP knn reproducibility tests until we figure out why its failing in CUDA 10.2
PR #2024: Fixed cuda-memcheck errors with sample-without-replacement prim
PR #1540: prims: support for custom math-type used for computation inside adjusted rand index prim
PR #2077: dask-make blobs arguments to match sklearn
PR #2059: Make all Scipy imports conditional
PR #2078: Ignore negative cache indices in get_vecs
PR #2084: Fixed cuda-memcheck errors with COO unit-tests
PR #2087: Fixed cuda-memcheck errors with dispersion prim
PR #2096: Fixed syntax error with nightly build command for memcheck unit-tests
PR #2115: Fixed contingency matrix prim unit-tests for computing correct golden values
PR #2107: Fix PCA transform
PR #2109: input_to_cuml_array cuda_array_interface bugfix
PR #2117: cuDF array exception small fixes
PR #2139: CumlArray for adjusted_rand_score
PR #2140: Returning self in fit model functions
PR #2144: Remove GPU arch < 60 from CMake build
PR #2153: Added missing namespaces to some Decision Tree files
PR #2155: C++: fix doxygen build break
PR #2161: Replacing depreciated bruteForceKnn
PR #2162: Use stream in transpose prim
PR #2165: Fit function test correction
PR #2166: Fix handling of temp file in RF pickling
PR #2176: C++: fix for adjusted rand index when input array is all zeros
PR #2179: Fix clang tools version in libcuml recipe
PR #2183: Fix RAFT in nightly package
PR #2191: Fix placement of SVM parameter documentation and add examples
PR #2212: Fix DBScan results (no propagation of labels through border points)
PR #2215: Fix the printing of forest object
PR #2217: Fix opg_utils naming to fix singlegpu build
PR #2223: Fix bug in ARIMA C++ benchmark
PR #2224: Temporary fix for CI until new Dask version is released
PR #2228: Update to use reduce_ex in CumlArray to override cudf.Buffer
PR #2249: Fix bug in UMAP continuous target metrics
PR #2258: Fix doxygen build break
PR #2255: Set random_state for train_test_split function in dask RF
PR #2275: Fix RF fit memory leak
PR #2274: Fix parameter name verbose to verbosity in mnmg OneHotEncoder
PR #2277: Updated cub repo path and branch name
PR #2282: Fix memory leak in Dask RF concatenation
PR #2301: Scaling KNN dask tests sample size with n GPUs
PR #2293: Contiguity fixes for input_to_cuml_array and train_test_split
PR #2295: Fix convert_to_dtype copy even with same dtype
PR #2305: Fixed race condition in DBScan
PR #2354: Fix broken links in README
PR #2619: Explicitly skip raft test folder for pytest 6.0.0
PR #2788: Set the minimum number of columns that can be sampled to 1 to fix 0 mem allocation error

cuML 0.13.0 (31 Mar 2020)

New Features

PR #1777: Python bindings for entropy
PR #1742: Mean squared error implementation with cupy
PR #1817: Confusion matrix implementation with cupy (SNSG and MNMG)
PR #1766: Mean absolute error implementation with cupy
PR #1766: Mean squared log error implementation with cupy
PR #1635: cuML Array shim and configurable output added to cluster methods
PR #1586: Seasonal ARIMA
PR #1683: cuml.dask make_regression
PR #1689: Add framework for cuML Dask serializers
PR #1709: Add decision_function() and predict_proba() for LogisticRegression
PR #1714: Add print_env.sh file to gather important environment details
PR #1750: LinearRegression CumlArray for configurable output
PR #1814: ROC AUC score implementation with cupy
PR #1767: Single GPU decomposition models configurable output
PR #1646: Using FIL to predict in MNMG RF
PR #1778: Make cuML Handle picklable
PR #1738: cuml.dask refactor beginning and dask array input option for OLS, Ridge and KMeans
PR #1874: Add predict_proba function to RF classifier
PR #1815: Adding KNN parameter to UMAP
PR #1978: Adding predict_proba function to dask RF

Improvements

PR #1644: Add predict_proba() for FIL binary classifier
PR #1620: Pickling tests now automatically finds all model classes inheriting from cuml.Base
PR #1637: Update to newer treelite version with XGBoost 1.0 compatibility
PR #1632: Fix MBSGD models inheritance, they now inherits from cuml.Base
PR #1628: Remove submodules from cuML
PR #1755: Expose the build_treelite function for python
PR #1649: Add the fil_sparse_format variable option to RF API
PR #1647: storage_type=AUTO uses SPARSE for large models
PR #1668: Update the warning statement thrown in RF when the seed is set but n_streams is not 1
PR #1662: use of direct cusparse calls for coo2csr, instead of depending on nvgraph
PR #1747: C++: dbscan performance improvements and cleanup
PR #1697: Making trustworthiness batchable and using proper workspace
PR #1721: Improving UMAP pytests
PR #1717: Call rmm_cupy_allocator for CuPy allocations
PR #1718: Import using_allocator from cupy.cuda
PR #1723: Update RF Classifier to throw an exception for multi-class pickling
PR #1726: Decorator to allocate CuPy arrays with RMM
PR #1719: UMAP random seed reproducibility
PR #1748: Test serializing CumlArray objects
PR #1776: Refactoring pca/tsvd distributed
PR #1762: Update CuPy requirement to 7
PR #1768: C++: Different input and output types for add and subtract prims
PR #1790: Add support for multiple seeding in k-means++
PR #1805: Adding new Dask cuda serializers to naive bayes + a trivial perf update
PR #1812: C++: bench: UMAP benchmark cases added
PR #1795: Add capability to build CumlArray from bytearray/memoryview objects
PR #1824: C++: improving the performance of UMAP algo
PR #1816: Add ARIMA notebook
PR #1856: Update docs for 0.13
PR #1827: Add HPO demo Notebook
PR #1825: --nvtx option in build.sh
PR #1847: Update XGBoost version for CI
PR #1837: Simplify cuML Array construction
PR #1848: Rely on subclassing for cuML Array serialization
PR #1866: Minimizing client memory pressure on Naive Bayes
PR #1788: Removing complexity bottleneck in S-ARIMA
PR #1873: Remove usage of nvstring and nvcat from LabelEncoder
PR #1891: Additional improvements to naive bayes tree reduction

Bug Fixes

PR #1835 : Fix calling default RF Classification always
PT #1904: replace cub sort
PR #1833: Fix depth issue in shallow RF regression estimators
PR #1770: Warn that KalmanFilter is deprecated
PR #1775: Allow CumlArray to work with inputs that have no 'strides' in array interface
PR #1594: Train-test split is now reproducible
PR #1590: Fix destination directory structure for run-clang-format.py
PR #1611: Fixing pickling errors for KNN classifier and regressor
PR #1617: Fixing pickling issues for SVC and SVR
PR #1634: Fix title in KNN docs
PR #1627: Adding a check for multi-class data in RF classification
PR #1654: Skip treelite patch if its already been applied
PR #1661: Fix nvstring variable name
PR #1673: Using struct for caching dlsym state in communicator
PR #1659: TSNE - introduce 'convert_dtype' and refactor class attr 'Y' to 'embedding_'
PR #1672: Solver 'svd' in Linear and Ridge Regressors when n_cols=1
PR #1670: Lasso & ElasticNet - cuml Handle added
PR #1671: Update for accessing cuDF Series pointer
PR #1652: Support XGBoost 1.0+ models in FIL
PR #1702: Fix LightGBM-FIL validation test
PR #1701: test_score kmeans test passing with newer cupy version
PR #1706: Remove multi-class bug from QuasiNewton
PR #1699: Limit CuPy to <7.2 temporarily
PR #1708: Correctly deallocate cuML handles in Cython
PR #1730: Fixes to KF for test stability (mainly in CUDA 10.2)
PR #1729: Fixing naive bayes UCX serialization problem in fit()
PR #1749: bug fix rf classifier/regressor on seg fault in bench
PR #1751: Updated RF documentation
PR #1765: Update the checks for using RF GPU predict
PR #1787: C++: unit-tests to check for RF accuracy. As well as a bug fix to improve RF accuracy
PR #1793: Updated fil pyx to solve memory leakage issue
PR #1810: Quickfix - chunkage in dask make_regression
PR #1842: DistributedDataHandler not properly setting 'multiple'
PR #1849: Critical fix in ARIMA initial estimate
PR #1851: Fix for cuDF behavior change for multidimensional arrays
PR #1852: Remove Thrust warnings
PR #1868: Turning off IPC caching until it is fixed in UCX-py/UCX
PR #1876: UMAP exponential decay parameters fix
PR #1887: Fix hasattr for missing attributes on base models
PR #1877: Remove resetting index in shuffling in train_test_split
PR #1893: Updating UCX in comms to match current UCX-py
PR #1888: Small train_test_split test fix
PR #1899: Fix dask extract_partitions(), remove transformation as instance variable in PCA and TSVD and match sklearn APIs
PR #1920: Temporarily raising threshold for UMAP reproducibility tests
PR #1918: Create memleak fixture to skip memleak tests in CI for now
PR #1926: Update batch matrix test margins
PR #1925: Fix failing dask tests
PR #1936: Update DaskRF regression test to xfail
PR #1932: Isolating cause of make_blobs failure
PR #1951: Dask Random forest regression CPU predict bug fix
PR #1948: Adjust BatchedMargin margin and disable tests temporarily
PR #1950: Fix UMAP test failure

cuML 0.12.0 (04 Feb 2020)

New Features

PR #1483: prims: Fused L2 distance and nearest-neighbor prim
PR #1494: bench: ml-prims benchmark
PR #1514: bench: Fused L2 NN prim benchmark
PR #1411: Cython side of MNMG OLS
PR #1520: Cython side of MNMG Ridge Regression
PR #1516: Suppor Vector Regression (epsilon-SVR)

Improvements

PR #1638: Update cuml/docs/README.md
PR #1468: C++: updates to clang format flow to make it more usable among devs
PR #1473: C++: lazy initialization of "costly" resources inside cumlHandle
PR #1443: Added a new overloaded GEMM primitive
PR #1489: Enabling deep trees using Gather tree builder
PR #1463: Update FAISS submodule to 1.6.1
PR #1488: Add codeowners
PR #1432: Row-major (C-style) GPU arrays for benchmarks
PR #1490: Use dask master instead of conda package for testing
PR #1375: Naive Bayes & Distributed Naive Bayes
PR #1377: Add GPU array support for FIL benchmarking
PR #1493: kmeans: add tiling support for 1-NN computation and use fusedL2-1NN prim for L2 distance metric
PR #1532: Update CuPy to >= 6.6 and allow 7.0
PR #1528: Re-enabling KNN using dynamic library loading for UCX in communicator
PR #1545: Add conda environment version updates to ci script
PR #1541: Updates for libcudf++ Python refactor
PR #1555: FIL-SKL, an SKLearn-based benchmark for FIL
PR #1537: Improve pickling and scoring suppport for many models to support hyperopt
PR #1551: Change custom kernel to cupy for col/row order transform
PR #1533: C++: interface header file separation for SVM
PR #1560: Helper function to allocate all new CuPy arrays with RMM memory management
PR #1570: Relax nccl in conda recipes to >=2.4 (matching CI)
PR #1578: Add missing function information to the cuML documenataion
PR #1584: Add has_scipy utility function for runtime check
PR #1583: API docs updates for 0.12
PR #1591: Updated FIL documentation

Bug Fixes

PR #1470: Documentation: add make_regression, fix ARIMA section
PR #1482: Updated the code to remove sklearn from the mbsgd stress test
PR #1491: Update dev environments for 0.12
PR #1512: Updating setup_cpu() in SpeedupComparisonRunner
PR #1498: Add build.sh to code owners
PR #1505: cmake: added correct dependencies for prims-bench build
PR #1534: Removed TODO comment in create_ucp_listeners()
PR #1548: Fixing umap extra unary op in knn graph
PR #1547: Fixing MNMG kmeans score. Fixing UMAP pickling before fit(). Fixing UMAP test failures.
PR #1557: Increasing threshold for kmeans score
PR #1562: Increasing threshold even higher
PR #1564: Fixed a typo in function cumlMPICommunicator_impl::syncStream
PR #1569: Remove Scikit-learn exception and depedenncy in SVM
PR #1575: Add missing dtype parameter in call to strides to order for CuPy 6.6 code path
PR #1574: Updated the init file to include SVM
PR #1589: Fixing the default value for RF and updating mnmg predict to accept cudf
PR #1601: Fixed wrong datatype used in knn voting kernel

cuML 0.11.0 (11 Dec 2019)

New Features

PR #1295: Cython side of MNMG PCA
PR #1218: prims: histogram prim
PR #1129: C++: Separate include folder for C++ API distribution
PR #1282: OPG KNN MNMG Code (disabled for 0.11)
PR #1242: Initial implementation of FIL sparse forests
PR #1194: Initial ARIMA time-series modeling support.
PR #1286: Importing treelite models as FIL sparse forests
PR #1285: Fea minimum impurity decrease RF param
PR #1301: Add make_regression to generate regression datasets
PR #1322: RF pickling using treelite, protobuf and FIL
PR #1332: Add option to cuml.dask make_blobs to produce dask array
PR #1307: Add RF regression benchmark
PR #1327: Update the code to build treelite with protobuf
PR #1289: Add Python benchmarking support for FIL
PR #1371: Cython side of MNMG tSVD
PR #1386: Expose SVC decision function value

Improvements

PR #1170: Use git to clone subprojects instead of git submodules
PR #1239: Updated the treelite version
PR #1225: setup.py clone dependencies like cmake and correct include paths
PR #1224: Refactored FIL to prepare for sparse trees
PR #1249: Include libcuml.so C API in installed targets
PR #1259: Conda dev environment updates and use libcumlprims current version in CI
PR #1277: Change dependency order in cmake for better printing at compile time
PR #1264: Add -s flag to GPU CI pytest for better error printing
PR #1271: Updated the Ridge regression documentation
PR #1283: Updated the cuMl docs to include MBSGD and adjusted_rand_score
PR #1300: Lowercase parameter versions for FIL algorithms
PR #1312: Update CuPy to version 6.5 and use conda-forge channel
PR #1336: Import SciKit-Learn models into FIL
PR #1314: Added options needed for ASVDb output (CUDA ver, etc.), added option to select algos
PR #1335: Options to print available algorithms and datasets in the Python benchmark
PR #1338: Remove BUILD_ABI references in CI scripts
PR #1340: Updated unit tests to uses larger dataset
PR #1351: Build treelite temporarily for GPU CI testing of FIL Scikit-learn model importing
PR #1367: --test-split benchmark parameter for train-test split
PR #1360: Improved tests for importing SciKit-Learn models into FIL
PR #1368: Add --num-rows benchmark command line argument
PR #1351: Build treelite temporarily for GPU CI testing of FIL Scikit-learn model importing
PR #1366: Modify train_test_split to use CuPy and accept device arrays
PR #1258: Documenting new MPI communicator for multi-node multi-GPU testing
PR #1345: Removing deprecated should_downcast argument
PR #1362: device_buffer in UMAP + Sparse prims
PR #1376: AUTO value for FIL algorithm
PR #1408: Updated pickle tests to delete the pre-pickled model to prevent pointer leakage
PR #1357: Run benchmarks multiple times for CI
PR #1382: ARIMA optimization: move functions to C++ side
PR #1392: Updated RF code to reduce duplication of the code
PR #1444: UCX listener running in its own isolated thread
PR #1445: Improved performance of FIL sparse trees
PR #1431: Updated API docs
PR #1441: Remove unused CUDA conda labels
PR #1439: Match sklearn 0.22 default n_estimators for RF and fix test errors
PR #1461: Add kneighbors to API docs

Bug Fixes

PR #1281: Making rng.h threadsafe
PR #1212: Fix cmake git cloning always running configure in subprojects
PR #1261: Fix comms build errors due to cuml++ include folder changes
PR #1267: Update build.sh for recent change of building comms in main CMakeLists
PR #1278: Removed incorrect overloaded instance of eigJacobi
PR #1302: Updates for numba 0.46
PR #1313: Updated the RF tests to set the seed and n_streams
PR #1319: Using machineName arg passed in instead of default for ASV reporting
PR #1326: Fix illegal memory access in make_regression (bounds issue)
PR #1330: Fix C++ unit test utils for better handling of differences near zero
PR #1342: Fix to prevent memory leakage in Lasso and ElasticNet
PR #1337: Fix k-means init from preset cluster centers
PR #1354: Fix SVM gamma=scale implementation
PR #1344: Change other solver based methods to create solver object in init
PR #1373: Fixing a few small bugs in make_blobs and adding asserts to pytests
PR #1361: Improve SMO error handling
PR #1384: Lower expectations on batched matrix tests to prevent CI failures
PR #1380: Fix memory leaks in ARIMA
PR #1391: Lower expectations on batched matrix tests even more
PR #1394: Warning added in svd for cuda version 10.1
PR #1407: Resolved RF predict issues and updated RF docstring
PR #1401: Patch for lbfgs solver for logistic regression with no l1 penalty
PR #1416: train_test_split numba and rmm device_array output bugfix
PR #1419: UMAP pickle tests are using wrong n_neighbors value for trustworthiness
PR #1438: KNN Classifier to properly return Dataframe with Dataframe input
PR #1425: Deprecate seed and use random_state similar to Scikit-learn in train_test_split
PR #1458: Add joblib as an explicit requirement
PR #1474: Defer knn mnmg to 0.12 nightly builds and disable ucx-py dependency

cuML 0.10.0 (16 Oct 2019)

New Features

PR #1148: C++ benchmark tool for c++/CUDA code inside cuML
PR #1071: Selective eigen solver of cuSolver
PR #1073: Updating RF wrappers to use FIL for GPU accelerated prediction
PR #1104: CUDA 10.1 support
PR #1113: prims: new batched make-symmetric-matrix primitive
PR #1112: prims: new batched-gemv primitive
PR #855: Added benchmark tools
PR #1149 Add YYMMDD to version tag for nightly conda packages
PR #892: General Gram matrices prim
PR #912: Support Vector Machine
PR #1274: Updated the RF score function to use GPU predict

Improvements

PR #961: High Peformance RF; HIST algo
PR #1028: Dockerfile updates after dir restructure. Conda env yaml to add statsmodels as a dependency
PR #1047: Consistent OPG interface for kmeans, based on internal libcumlprims update
PR #763: Add examples to train_test_split documentation
PR #1093: Unified inference kernels for different FIL algorithms
PR #1076: Paying off some UMAP / Spectral tech debt.
PR #1086: Ensure RegressorMixin scorer uses device arrays
PR #1110: Adding tests to use default values of parameters of the models
PR #1108: input_to_host_array function in input_utils for input processing to host arrays
PR #1114: K-means: Exposing useful params, removing unused params, proxying params in Dask
PR #1138: Implementing ANY_RANK semantics on irecv
PR #1142: prims: expose separate InType and OutType for unaryOp and binaryOp
PR #1115: Moving dask_make_blobs to cuml.dask.datasets. Adding conversion to dask.DataFrame
PR #1136: CUDA 10.1 CI updates
PR #1135: K-means: add boundary cases for kmeans||, support finer control with convergence
PR #1163: Some more correctness improvements. Better verbose printing
PR #1165: Adding except + in all remaining cython
PR #1186: Using LocalCUDACluster Pytest fixture
PR #1173: Docs: Barnes Hut TSNE documentation
PR #1176: Use new RMM API based on Cython
PR #1219: Adding custom bench_func and verbose logging to cuml.benchmark
PR #1247: Improved MNMG RF error checking

Bug Fixes

PR #1231: RF respect number of cuda streams from cuml handle
PR #1230: Rf bugfix memleak in regression
PR #1208: compile dbscan bug
PR #1016: Use correct libcumlprims version in GPU CI
PR #1040: Update version of numba in development conda yaml files
PR #1043: Updates to accomodate cuDF python code reorganization
PR #1044: Remove nvidia driver installation from ci/cpu/build.sh
PR #991: Barnes Hut TSNE Memory Issue Fixes
PR #1075: Pinning Dask version for consistent CI results
PR #990: Barnes Hut TSNE Memory Issue Fixes
PR #1066: Using proper set of workers to destroy nccl comms
PR #1072: Remove pip requirements and setup
PR #1074: Fix flake8 CI style check
PR #1087: Accuracy improvement for sqrt/log in RF max_feature
PR #1088: Change straggling numba python allocations to use RMM
PR #1106: Pinning Distributed version to match Dask for consistent CI results
PR #1116: TSNE CUDA 10.1 Bug Fixes
PR #1132: DBSCAN Batching Bug Fix
PR #1162: DASK RF random seed bug fix
PR #1164: Fix check_dtype arg handling for input_to_dev_array
PR #1171: SVM prediction bug fix
PR #1177: Update dask and distributed to 2.5
PR #1204: Fix SVM crash on Turing
PR #1199: Replaced sprintf() with snprintf() in THROW()
PR #1205: Update dask-cuda in yml envs
PR #1211: Fixing Dask k-means transform bug and adding test
PR #1236: Improve fix for SMO solvers potential crash on Turing
PR #1251: Disable compiler optimization for CUDA 10.1 for distance prims
PR #1260: Small bugfix for major conversion in input_utils
PR #1276: Fix float64 prediction crash in test_random_forest

cuML 0.9.0 (21 Aug 2019)

New Features

PR #894: Convert RF to treelite format
PR #826: Jones transformation of params for ARIMA models timeSeries ml-prim
PR #697: Silhouette Score metric ml-prim
PR #674: KL Divergence metric ml-prim
PR #787: homogeneity, completeness and v-measure metrics ml-prim
PR #711: Mutual Information metric ml-prim
PR #724: Entropy metric ml-prim
PR #766: Expose score method based on inertia for KMeans
PR #823: prims: cluster dispersion metric
PR #816: Added inverse_transform() for LabelEncoder
PR #789: prims: sampling without replacement
PR #813: prims: Col major istance prim
PR #635: Random Forest & Decision Tree Regression (Single-GPU)
PR #819: Forest Inferencing Library (FIL)
PR #829: C++: enable nvtx ranges
PR #835: Holt-Winters algorithm
PR #837: treelite for decision forest exchange format
PR #871: Wrapper for FIL
PR #870: make_blobs python function
PR #881: wrappers for accuracy_score and adjusted_rand_score functions
PR #840: Dask RF classification and regression
PR #870: make_blobs python function
PR #879: import of treelite models to FIL
PR #892: General Gram matrices prim
PR #883: Adding MNMG Kmeans
PR #930: Dask RF
PR #882: TSNE - T-Distributed Stochastic Neighbourhood Embedding
PR #624: Internals API & Graph Based Dimensionality Reductions Callback
PR #926: Wrapper for FIL
PR #994: Adding MPI comm impl for testing / benchmarking MNMG CUDA
PR #960: Enable using libcumlprims for MG algorithms/prims

Improvements

PR #822: build: build.sh update to club all make targets together
PR #807: Added development conda yml files
PR #840: Require cmake >= 3.14
PR #832: Stateless Decision Tree and Random Forest API
PR #857: Small modifications to comms for utilizing IB w/ Dask
PR #851: Random forest Stateless API wrappers
PR #865: High Performance RF
PR #895: Pretty prints arguments!
PR #920: Add an empty marker kernel for tracing purposes
PR #915: syncStream added to cumlCommunicator
PR #922: Random Forest support in FIL
PR #911: Update headers to credit CannyLabs BH TSNE implementation
PR #918: Streamline CUDA_REL environment variable
PR #924: kmeans: updated APIs to be stateless, refactored code for mnmg support
PR #950: global_bias support in FIL
PR #773: Significant improvements to input checking of all classes and common input API for Python
PR #957: Adding docs to RF & KMeans MNMG. Small fixes for release
PR #965: Making dask-ml a hard dependency
PR #976: Update api.rst for new 0.9 classes
PR #973: Use cudaDeviceGetAttribute instead of relying on cudaDeviceProp object being passed
PR #978: Update README for 0.9
PR #1009: Fix references to notebooks-contrib
PR #1015: Ability to control the number of internal streams in cumlHandle_impl via cumlHandle
PR #1175: Add more modules to docs ToC

Bug Fixes

PR #923: Fix misshapen level/trend/season HoltWinters output
PR #831: Update conda package dependencies to cudf 0.9
PR #772: Add missing cython headers to SGD and CD
PR #849: PCA no attribute trans_input_ transform bug fix
PR #869: Removing incorrect information from KNN Docs
PR #885: libclang installation fix for GPUCI
PR #896: Fix typo in comms build instructions
PR #921: Fix build scripts using incorrect cudf version
PR #928: TSNE Stability Adjustments
PR #934: Cache cudaDeviceProp in cumlHandle for perf reasons
PR #932: Change default param value for RF classifier
PR #949: Fix dtype conversion tests for unsupported cudf dtypes
PR #908: Fix local build generated file ownerships
PR #983: Change RF max_depth default to 16
PR #987: Change default values for knn
PR #988: Switch to exact tsne
PR #991: Cleanup python code in cuml.dask.cluster
PR #996: ucx_initialized being properly set in CommsContext
PR #1007: Throws a well defined error when mutigpu is not enabled
PR #1018: Hint location of nccl in build.sh for CI
PR #1022: Using random_state to make K-Means MNMG tests deterministic
PR #1034: Fix typos and formatting issues in RF docs
PR #1052: Fix the rows_sample dtype to float

cuML 0.8.0 (27 June 2019)

New Features

PR #652: Adjusted Rand Index metric ml-prim
PR #679: Class label manipulation ml-prim
PR #636: Rand Index metric ml-prim
PR #515: Added Random Projection feature
PR #504: Contingency matrix ml-prim
PR #644: Add train_test_split utility for cuDF dataframes
PR #612: Allow Cuda Array Interface, Numba inputs and input code refactor
PR #641: C: Separate C-wrapper library build to generate libcuml.so
PR #631: Add nvcategory based ordinal label encoder
PR #681: Add MBSGDClassifier and MBSGDRegressor classes around SGD
PR #705: Quasi Newton solver and LogisticRegression Python classes
PR #670: Add test skipping functionality to build.sh
PR #678: Random Forest Python class
PR #684: prims: make_blobs primitive
PR #673: prims: reduce cols by key primitive
PR #812: Add cuML Communications API & consolidate Dask cuML

Improvements

PR #597: C++ cuML and ml-prims folder refactor
PR #590: QN Recover from numeric errors
PR #482: Introduce cumlHandle for pca and tsvd
PR #573: Remove use of unnecessary cuDF column and series copies
PR #601: Cython PEP8 cleanup and CI integration
PR #596: Introduce cumlHandle for ols and ridge
PR #579: Introduce cumlHandle for cd and sgd, and propagate C++ errors in cython level for cd and sgd
PR #604: Adding cumlHandle to kNN, spectral methods, and UMAP
PR #616: Enable clang-format for enforcing coding style
PR #618: CI: Enable copyright header checks
PR #622: Updated to use 0.8 dependencies
PR #626: Added build.sh script, updated CI scripts and documentation
PR #633: build: Auto-detection of GPU_ARCHS during cmake
PR #650: Moving brute force kNN to prims. Creating stateless kNN API.
PR #662: C++: Bulk clang-format updates
PR #671: Added pickle pytests and correct pickling of Base class
PR #675: atomicMin/Max(float, double) with integer atomics and bit flipping
PR #677: build: 'deep-clean' to build.sh to clean faiss build as well
PR #683: Use stateless c++ API in KNN so that it can be pickled properly
PR #686: Use stateless c++ API in UMAP so that it can be pickled properly
PR #695: prims: Refactor pairwise distance
PR #707: Added stress test and updated documentation for RF
PR #701: Added emacs temporary file patterns to .gitignore
PR #606: C++: Added tests for host_buffer and improved device_buffer and host_buffer implementation
PR #726: Updated RF docs and stress test
PR #730: Update README and RF docs for 0.8
PR #744: Random projections generating binomial on device. Fixing tests.
PR #741: Update API docs for 0.8
PR #754: Pickling of UMAP/KNN
PR #753: Made PCA and TSVD picklable
PR #746: LogisticRegression and QN API docstrings
PR #820: Updating DEVELOPER GUIDE threading guidelines

Bug Fixes

PR #584: Added missing virtual destructor to deviceAllocator and hostAllocator
PR #620: C++: Removed old unit-test files in ml-prims
PR #627: C++: Fixed dbscan crash issue filed in 613
PR #640: Remove setuptools from conda run dependency
PR #646: Update link in contributing.md
PR #649: Bug fix to LinAlg::reduce_rows_by_key prim filed in issue #648
PR #666: fixes to gitutils.py to resolve both string decode and handling of uncommitted files
PR #676: Fix template parameters in bernoulli() implementation.
PR #685: Make CuPy optional to avoid nccl conda package conflicts
PR #687: prims: updated tolerance for reduce_cols_by_key unit-tests
PR #689: Removing extra prints from NearestNeighbors cython
PR #718: Bug fix for DBSCAN and increasing batch size of sgd
PR #719: Adding additional checks for dtype of the data
PR #736: Bug fix for RF wrapper and .cu print function
PR #547: Fixed issue if C++ compiler is specified via CXX during configure.
PR #759: Configure Sphinx to render params correctly
PR #762: Apply threshold to remove flakiness of UMAP tests.
PR #768: Fixing memory bug from stateless refactor
PR #782: Nearest neighbors checking properly whether memory should be freed
PR #783: UMAP was using wrong size for knn computation
PR #776: Hotfix for self.variables in RF
PR #777: Fix numpy input bug
PR #784: Fix jit of shuffle_idx python function
PR #790: Fix rows_sample input type for RF
PR #793: Fix for dtype conversion utility for numba arrays without cupy installed
PR #806: Add a seed for sklearn model in RF test file
PR #843: Rf quantile fix

cuML 0.7.0 (10 May 2019)

New Features

PR #405: Quasi-Newton GLM Solvers
PR #277: Add row- and column-wise weighted mean primitive
PR #424: Add a grid-sync struct for inter-block synchronization
PR #430: Add R-Squared Score to ml primitives
PR #463: Add matrix gather to ml primitives
PR #435: Expose cumlhandle in cython + developer guide
PR #455: Remove default-stream arguement across ml-prims and cuML
PR #375: cuml cpp shared library renamed to libcuml++.so
PR #460: Random Forest & Decision Trees (Single-GPU, Classification)
PR #491: Add doxygen build target for ml-prims
PR #505: Add R-Squared Score to python interface
PR #507: Add coordinate descent for lasso and elastic-net
PR #511: Add a minmax ml-prim
PR #516: Added Trustworthiness score feature
PR #520: Add local build script to mimic gpuCI
PR #503: Add column-wise matrix sort primitive
PR #525: Add docs build script to cuML
PR #528: Remove current KMeans and replace it with a new single GPU implementation built using ML primitives

Improvements

PR #481: Refactoring Quasi-Newton to use cumlHandle
PR #467: Added validity check on cumlHandle_t
PR #461: Rewrote permute and added column major version
PR #440: README updates
PR #295: Improve build-time and the interface e.g., enable bool-OutType, for distance()
PR #390: Update docs version
PR #272: Add stream parameters to cublas and cusolver wrapper functions
PR #447: Added building and running mlprims tests to CI
PR #445: Lower dbscan memory usage by computing adjacency matrix directly
PR #431: Add support for fancy iterator input types to LinAlg::reduce_rows_by_key
PR #394: Introducing cumlHandle API to dbscan and add example
PR #500: Added CI check for black listed CUDA Runtime API calls
PR #475: exposing cumlHandle for dbscan from python-side
PR #395: Edited the CONTRIBUTING.md file
PR #407: Test files to run stress, correctness and unit tests for cuml algos
PR #512: generic copy method for copying buffers between device/host
PR #533: Add cudatoolkit conda dependency
PR #524: Use cmake find blas and find lapack to pass configure options to faiss
PR #527: Added notes on UMAP differences from reference implementation
PR #540: Use latest release version in update-version CI script
PR #552: Re-enable assert in kmeans tests with xfail as needed
PR #581: Add shared memory fast col major to row major function back with bound checks
PR #592: More efficient matrix copy/reverse methods
PR #721: Added pickle tests for DBSCAN and Random Projections

Bug Fixes

PR #334: Fixed segfault in ML::cumlHandle_impl::destroyResources
PR #349: Developer guide clarifications for cumlHandle and cumlHandle_impl
PR #398: Fix CI scripts to allow nightlies to be uploaded
PR #399: Skip PCA tests to allow CI to run with driver 418
PR #422: Issue in the PCA tests was solved and CI can run with driver 418
PR #409: Add entry to gitmodules to ignore build artifacts
PR #412: Fix for svdQR function in ml-prims
PR #438: Code that depended on FAISS was building everytime.
PR #358: Fixed an issue when switching streams on MLCommon::device_buffer and MLCommon::host_buffer
PR #434: Fixing bug in CSR tests
PR #443: Remove defaults channel from ci scripts
PR #384: 64b index arithmetic updates to the kernels inside ml-prims
PR #459: Fix for runtime library path of pip package
PR #464: Fix for C++11 destructor warning in qn
PR #466: Add support for column-major in LinAlg::*Norm methods
PR #465: Fixing deadlock issue in GridSync due to consecutive sync calls
PR #468: Fix dbscan example build failure
PR #470: Fix resource leakage in Kalman filter python wrapper
PR #473: Fix gather ml-prim test for change in rng uniform API
PR #477: Fixes default stream initialization in cumlHandle
PR #480: Replaced qn_fit() declaration with #include of file containing definition to fix linker error
PR #495: Update cuDF and RMM versions in GPU ci test scripts
PR #499: DEVELOPER_GUIDE.md: fixed links and clarified ML::detail::streamSyncer example
PR #506: Re enable ml-prim tests in CI
PR #508: Fix for an error with default argument in LinAlg::meanSquaredError
PR #519: README.md Updates and adding BUILD.md back
PR #526: Fix the issue of wrong results when fit and transform of PCA are called separately
PR #531: Fixing missing arguments in updateDevice() for RF
PR #543: Exposing dbscan batch size through cython API and fixing broken batching
PR #551: Made use of ZLIB_LIBRARIES consistent between ml_test and ml_mg_test
PR #557: Modified CI script to run cuML tests before building mlprims and removed lapack flag
PR #578: Updated Readme.md to add lasso and elastic-net
PR #580: Fixing cython garbage collection bug in KNN
PR #577: Use find libz in prims cmake
PR #594: fixed cuda-memcheck mean_center test failures

cuML 0.6.1 (09 Apr 2019)

Bug Fixes

PR #462 Runtime library path fix for cuML pip package

cuML 0.6.0 (22 Mar 2019)

New Features

PR #249: Single GPU Stochastic Gradient Descent for linear regression, logistic regression, and linear svm with L1, L2, and elastic-net penalties.
PR #247: Added "proper" CUDA API to cuML
PR #235: NearestNeighbors MG Support
PR #261: UMAP Algorithm
PR #290: NearestNeighbors numpy MG Support
PR #303: Reusable spectral embedding / clustering
PR #325: Initial support for single process multi-GPU OLS and tSVD
PR #271: Initial support for hyperparameter optimization with dask for many models

Improvements

PR #144: Dockerfile update and docs for LinearRegression and Kalman Filter.
PR #168: Add /ci/gpu/build.sh file to cuML
PR #167: Integrating full-n-final ml-prims repo inside cuml
PR #198: (ml-prims) Removal of *MG calls + fixed a bug in permute method
PR #194: Added new ml-prims for supporting LASSO regression.
PR #114: Building faiss C++ api into libcuml
PR #64: Using FAISS C++ API in cuML and exposing bindings through cython
PR #208: Issue ml-common-3: Math.h: swap thrust::for_each with binaryOp,unaryOp
PR #224: Improve doc strings for readable rendering with readthedocs
PR #209: Simplify README.md, move build instructions to BUILD.md
PR #218: Fix RNG to use given seed and adjust RNG test tolerances.
PR #225: Support for generating random integers
PR #215: Refactored LinAlg::norm to Stats::rowNorm and added Stats::colNorm
PR #234: Support for custom output type and passing index value to main_op in *Reduction kernels
PR #230: Refactored the cuda_utils header
PR #236: Refactored cuml python package structure to be more sklearn like
PR #232: Added reduce_rows_by_key
PR #246: Support for 2 vectors in the matrix vector operator
PR #244: Fix for single GPU OLS and Ridge to support one column training data
PR #271: Added get_params and set_params functions for linear and ridge regression
PR #253: Fix for issue #250-reduce_rows_by_key failed memcheck for small nkeys
PR #269: LinearRegression, Ridge Python docs update and cleaning
PR #322: set_params updated
PR #237: Update build instructions
PR #275: Kmeans use of faster gpu_matrix
PR #288: Add n_neighbors to NearestNeighbors constructor
PR #302: Added FutureWarning for deprecation of current kmeans algorithm
PR #312: Last minute cleanup before release
PR #315: Documentation updating and enhancements
PR #330: Added ignored argument to pca.fit_transform to map to sklearn's implemenation
PR #342: Change default ABI to ON
PR #572: Pulling DBSCAN components into reusable primitives

Bug Fixes

PR #193: Fix AttributeError in PCA and TSVD
PR #211: Fixing inconsistent use of proper batch size calculation in DBSCAN
PR #202: Adding back ability for users to define their own BLAS
PR #201: Pass CMAKE CUDA path to faiss/configure script
PR #200 Avoid using numpy via cimport in KNN
PR #228: Bug fix: LinAlg::unaryOp with 0-length input
PR #279: Removing faiss-gpu references in README
PR #321: Fix release script typo
PR #327: Update conda requirements for version 0.6 requirements
PR #352: Correctly calculating numpy chunk sizing for kNN
PR #345: Run python import as part of package build to trigger compilation
PR #347: Lowering memory usage of kNN.
PR #355: Fixing issues with very large numpy inputs to SPMG OLS and tSVD.
PR #357: Removing FAISS requirement from README
PR #362: Fix for matVecOp crashing on large input sizes
PR #366: Index arithmetic issue fix with TxN_t class
PR #376: Disabled kmeans tests since they are currently too sensitive (see #71)
PR #380: Allow arbitrary data size on ingress for numba_utils.row_matrix
PR #385: Fix for long import cuml time in containers and fix for setup_pip
PR #630: Fixing a missing kneighbors in nearest neighbors python proxy

cuML 0.5.1 (05 Feb 2019)

Bug Fixes

PR #189 Avoid using numpy via cimport to prevent ABI issues in Cython compilation

cuML 0.5.0 (28 Jan 2019)

New Features

PR #66: OLS Linear Regression
PR #44: Distance calculation ML primitives
PR #69: Ridge (L2 Regularized) Linear Regression
PR #103: Linear Kalman Filter
PR #117: Pip install support
PR #64: Device to device support from cuML device pointers into FAISS

Improvements

PR #56: Make OpenMP optional for building
PR #67: Github issue templates
PR #44: Refactored DBSCAN to use ML primitives
PR #91: Pytest cleanup and sklearn toyset datasets based pytests for kmeans and dbscan
PR #75: C++ example to use kmeans
PR #117: Use cmake extension to find any zlib installed in system
PR #94: Add cmake flag to set ABI compatibility
PR #139: Move thirdparty submodules to root and add symlinks to new locations
PR #151: Replace TravisCI testing and conda pkg builds with gpuCI
PR #164: Add numba kernel for faster column to row major transform
PR #114: Adding FAISS to cuml build

Bug Fixes

PR #48: CUDA 10 compilation warnings fix
PR #51: Fixes to Dockerfile and docs for new build system
PR #72: Fixes for GCC 7
PR #96: Fix for kmeans stack overflow with high number of clusters
PR #105: Fix for AttributeError in kmeans fit method
PR #113: Removed old glm python/cython files
PR #118: Fix for AttributeError in kmeans predict method
PR #125: Remove randomized solver option from PCA python bindings

cuML 0.4.0 (05 Dec 2018)

New Features

Improvements

PR #42: New build system: separation of libcuml.so and cuml python package
PR #43: Added changelog.md

Bug Fixes

cuML 0.3.0 (30 Nov 2018)

New Features

PR #33: Added ability to call cuML algorithms using numpy arrays

Improvements

PR #24: Fix references of python package from cuML to cuml and start using versioneer for better versioning
PR #40: Added support for refactored cuDF 0.3.0, updated Conda files
PR #33: Major python test cleaning, all tests pass with cuDF 0.2.0 and 0.3.0. Preparation for new build system
PR #34: Updated batch count calculation logic in DBSCAN
PR #35: Beginning of DBSCAN refactor to use cuML mlprims and general improvements

Bug Fixes

PR #30: Fixed batch size bug in DBSCAN that caused crash. Also fixed various locations for potential integer overflows
PR #28: Fix readthedocs build documentation
PR #29: Fix pytests for cuml name change from cuML
PR #33: Fixed memory bug that would cause segmentation faults due to numba releasing memory before it was used. Also fixed row major/column major bugs for different algorithms
PR #36: Fix kmeans gtest to use device data
PR #38: cuda_free bug removed that caused google tests to sometimes pass and sometimes fail randomly
PR #39: Updated cmake to correctly link with CUDA libraries, add CUDA runtime linking and include source files in compile target

cuML 0.2.0 (02 Nov 2018)

New Features

PR #11: Kmeans algorithm added
PR #7: FAISS KNN wrapper added
PR #21: Added Conda install support

Improvements

PR #15: Added compatibility with cuDF (from prior pyGDF)
PR #13: Added FAISS to Dockerfile
PR #21: Added TravisCI build system for CI and Conda builds

Bug Fixes

PR #4: Fixed explained variance bug in TSVD
PR #5: Notebook bug fixes and updated results

cuML 0.1.0

Initial release including PCA, TSVD, DBSCAN, ml-prims and cython wrappers

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

cuML 23.02.00 (Date TBD)

cuML 22.12.00 (8 Dec 2022)

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

cuML 22.10.00 (12 Oct 2022)

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

cuML 22.08.00 (17 Aug 2022)

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

cuML 22.06.00 (7 Jun 2022)

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

cuML 22.04.00 (6 Apr 2022)

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

cuML 22.02.00 (2 Feb 2022)

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

cuML 21.12.00 (9 Dec 2021)

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

cuML 21.10.00 (7 Oct 2021)

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

cuML 21.08.00 (4 Aug 2021)

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

cuML 21.06.00 (9 Jun 2021)

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

cuML 0.19.0 (21 Apr 2021)

🚨 Breaking Changes

🐛 Bug Fixes

📖 Documentation

🚀 New Features

🛠️ Improvements

cuML 0.18.0 (24 Feb 2021)

Breaking Changes 🚨

Bug Fixes 🐛

Documentation 📖

New Features 🚀

Improvements 🛠️

cuML 0.17.0 (10 Dec 2020)

New Features

Improvements