Skip to content

Releases: NVIDIA/spark-rapids-ml

v24.02.0 release

21 Mar 23:45
e0f644d
Compare
Choose a tag to compare

Release notes:

  • Support feature standardization in logistic regression for dense vectors.
  • Add large scale synthetic sparse data generation for logistic regression testing.
  • Fix tol=0 in KMeans
  • Add sparse vectors to logistic regression notebook example.
  • Update RAPIDS dependencies to 24.02.
  • Known Issue: RandomForest training will throw an exception if the label column takes on only a single value. This will be fixed in 24.04.

pip package available at https://pypi.org/project/spark-rapids-ml/24.02.0/

v23.12.0 release

17 Jan 06:10
e8d138b
Compare
Choose a tag to compare

Release notes:

  • Match Spark's logistic regression fit behavior when data set has only one label value.
  • Support sparse vector based computations through cuML layer in logistic regression fit, transform, and cross validation.
  • Update dataproc benchmark script.
  • Update Azure Databricks instructions.
  • Update RAPIDS dependencies to 23.12.

pip package available at https://pypi.org/project/spark-rapids-ml/23.12.0/

v23.10.0 release

16 Nov 04:16
5f77d4b
Compare
Choose a tag to compare

Release Notes:

  • L1 and elastic net regularization for GPU accelerated distributed LogisticRegression, with notebook example.
  • More than 2 classes for GPU accelerated distributed LogisticRegression, with notebook example.
  • Optimized fitMultiple api for LogisticRegression.
  • Accelerated cross validation for LogisticRegression and log loss.
  • Output raw prediction column for logistic regression.
  • Updated Databricks init scripts and benchmarking scripts.
  • Improved api docs.
  • Updated RAPIDS dependencies to 23.10.

NOTE: While the runtime is compatible with Spark versions >= 3.3, some scripts in python/tests/ are not compatible with Spark 3.3. This is addressed in 23.12

pip package available at https://pypi.org/project/spark-rapids-ml/23.10.0/

v23.08.0 release

13 Sep 05:48
5dab107
Compare
Choose a tag to compare

Release Notes:

  • GPU accelerated distributed Logistic Regression with L2 regularization fit and transform, along with benchmarking and Jupyter notebook examples.
  • GPU accelerated distributed Uniform Manifold Approximation and Projection (UMAP) fit and transform for non-linear dimensionality reduction along with benchmarking and Jupyter notebook examples.
  • Stage level scheduling for training on stand-alone clusters.
  • Improved logging.
  • Preserve input column types during transform.
  • Default to float32 inputs to cuML layer.
  • Support conversion of GPU Logistic Regression models to pySpark ML CPU.
  • Improved local benchmarking script.
  • Updated RAPIDS and RAPIDS Accelerator for Spark dependencies to 23.08.

pip package available at https://pypi.org/project/spark-rapids-ml/23.8.0/

v23.06.0 release

13 Jul 07:25
04dffdf
Compare
Choose a tag to compare

Release Notes:

  • GPU accelerated CrossValidator for RandomForestClassifier, RandomForestRegressor and LinearRegression, with example notebook
  • Support for CUDA unified virtual memory to allow over-subscription of GPU memory
  • Benchmarking scripts and instructions for AWS EMR
  • Distributed synthetic data generation
  • RandomForest example notebooks
  • Support Spark ML parameters in constructors
  • Improved API docs
  • Updated RAPIDS dependencies to 23.06

pip package available at https://pypi.org/project/spark-rapids-ml/23.6.0/

v23.04.0 release

03 May 19:03
b251734
Compare
Choose a tag to compare

This release includes:

  • Getting started guide and benchmarking scripts on GCP dataproc
  • Getting started guide on AWS EMR
  • cpu method to convert Spark RAPIDS ML generated models to Spark ML models
  • Eliminating the need for CUDA on the driver node
  • Example notebook for k-NN
  • Spark 3.4 compatibility
  • Updating RAPIDS dependencies to 23.04

pip package available at https://pypi.org/project/spark-rapids-ml/23.4.0/

v23.02.0 release

03 Apr 01:09
ab575bc
Compare
Choose a tag to compare

Added GPU-accelerated PySpark-compatible APIs for the following algorithms:

  • K-Means
  • k-NN
  • LinearRegression
  • PCA
  • RandomForestClassifier
  • RandomForestRegressor

Pip package: https://pypi.org/project/spark-rapids-ml/

v22.02.0 release

22 Feb 07:49
9562e97
Compare
Choose a tag to compare

New functionality and performance improvements for this release include:

  • Refactor PCA training to leverage spark-rapids plugin.
  • Move SVD computation from Driver to Executor.
  • Optimize PCA API.
  • Fixed a bug when training on large dataset.

v21.10.0 release

17 Dec 06:41
f445f8b
Compare
Choose a tag to compare

New functionality and performance improvements for this release include:

  • Leverage spark-rapids plugin to speed up the PCA transform process
  • Link some CUDA libraries statically to avoid multiple jars for different environment

v21.10.0 release

08 Nov 07:24
ca5edb8
Compare
Choose a tag to compare
Tag for release version v21.10.0