Releases: NVIDIA/spark-rapids-ml
v24.02.0 release
Release notes:
- Support feature standardization in logistic regression for dense vectors.
- Add large scale synthetic sparse data generation for logistic regression testing.
- Fix tol=0 in KMeans
- Add sparse vectors to logistic regression notebook example.
- Update RAPIDS dependencies to 24.02.
- Known Issue: RandomForest training will throw an exception if the label column takes on only a single value. This will be fixed in 24.04.
pip package available at https://pypi.org/project/spark-rapids-ml/24.02.0/
v23.12.0 release
Release notes:
- Match Spark's logistic regression fit behavior when data set has only one label value.
- Support sparse vector based computations through cuML layer in logistic regression fit, transform, and cross validation.
- Update dataproc benchmark script.
- Update Azure Databricks instructions.
- Update RAPIDS dependencies to 23.12.
pip package available at https://pypi.org/project/spark-rapids-ml/23.12.0/
v23.10.0 release
Release Notes:
- L1 and elastic net regularization for GPU accelerated distributed LogisticRegression, with notebook example.
- More than 2 classes for GPU accelerated distributed LogisticRegression, with notebook example.
- Optimized fitMultiple api for LogisticRegression.
- Accelerated cross validation for LogisticRegression and log loss.
- Output raw prediction column for logistic regression.
- Updated Databricks init scripts and benchmarking scripts.
- Improved api docs.
- Updated RAPIDS dependencies to 23.10.
NOTE: While the runtime is compatible with Spark versions >= 3.3, some scripts in python/tests/
are not compatible with Spark 3.3. This is addressed in 23.12
pip package available at https://pypi.org/project/spark-rapids-ml/23.10.0/
v23.08.0 release
Release Notes:
- GPU accelerated distributed Logistic Regression with L2 regularization fit and transform, along with benchmarking and Jupyter notebook examples.
- GPU accelerated distributed Uniform Manifold Approximation and Projection (UMAP) fit and transform for non-linear dimensionality reduction along with benchmarking and Jupyter notebook examples.
- Stage level scheduling for training on stand-alone clusters.
- Improved logging.
- Preserve input column types during transform.
- Default to float32 inputs to cuML layer.
- Support conversion of GPU Logistic Regression models to pySpark ML CPU.
- Improved local benchmarking script.
- Updated RAPIDS and RAPIDS Accelerator for Spark dependencies to 23.08.
pip package available at https://pypi.org/project/spark-rapids-ml/23.8.0/
v23.06.0 release
Release Notes:
- GPU accelerated CrossValidator for RandomForestClassifier, RandomForestRegressor and LinearRegression, with example notebook
- Support for CUDA unified virtual memory to allow over-subscription of GPU memory
- Benchmarking scripts and instructions for AWS EMR
- Distributed synthetic data generation
- RandomForest example notebooks
- Support Spark ML parameters in constructors
- Improved API docs
- Updated RAPIDS dependencies to 23.06
pip package available at https://pypi.org/project/spark-rapids-ml/23.6.0/
v23.04.0 release
This release includes:
- Getting started guide and benchmarking scripts on GCP dataproc
- Getting started guide on AWS EMR
- cpu method to convert Spark RAPIDS ML generated models to Spark ML models
- Eliminating the need for CUDA on the driver node
- Example notebook for k-NN
- Spark 3.4 compatibility
- Updating RAPIDS dependencies to 23.04
pip package available at https://pypi.org/project/spark-rapids-ml/23.4.0/
v23.02.0 release
Added GPU-accelerated PySpark-compatible APIs for the following algorithms:
- K-Means
- k-NN
- LinearRegression
- PCA
- RandomForestClassifier
- RandomForestRegressor
Pip package: https://pypi.org/project/spark-rapids-ml/
v22.02.0 release
New functionality and performance improvements for this release include:
- Refactor PCA training to leverage spark-rapids plugin.
- Move SVD computation from Driver to Executor.
- Optimize PCA API.
- Fixed a bug when training on large dataset.
v21.10.0 release
New functionality and performance improvements for this release include:
- Leverage spark-rapids plugin to speed up the PCA transform process
- Link some CUDA libraries statically to avoid multiple jars for different environment
v21.10.0 release
Tag for release version v21.10.0