Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: __init__() got an unexpected keyword argument 'maxIter' #365

Open
johnnyzhon opened this issue Aug 14, 2023 · 3 comments
Open

Comments

@johnnyzhon
Copy link

johnnyzhon commented Aug 14, 2023

environment:
python3.9 , linux, ubuntu22.04, spark340, spark-rapids-ml branch23.08

issue:
Hit following error when run statement "lr = LogisticRegression(num_workers=gpu_number, maxIter=10, regParam=0.01)" when use spark_rapids_ml.classification.LogisticRegression
in contrast, run statement "lr = LogisticRegression(maxIter=10, regParam=0.01)" successful when use pyspark.ml.classification.LogisticRegression.
seems like this api is not compatiblity , so file this issue to track.

error:
def test_e2e_simple(gpu_number: int) -> None:
"""e2e general training and transforming test"""
gpu_number = min(gpu_number, 2)

    with CleanSparkSession() as spark:
        training = spark.createDataFrame([
            (1.0, Vectors.dense([0.0, 1.1, 0.1])),
            (0.0, Vectors.dense([2.0, 1.0, -1.0])),
            (0.0, Vectors.dense([2.0, 1.3, 1.0])),
            (1.0, Vectors.dense([0.0, 1.2, -0.5]))], ["label", "features"])
      lr = LogisticRegression(num_workers=gpu_number, maxIter=10, regParam=0.01)

E TypeError: init() got an unexpected keyword argument 'maxIter'

testsuites/test_logistic_regression.py:73: TypeError
======================================== short test summary info =========================================
FAILED testsuites/test_logistic_regression.py::test_e2e_simple - TypeError: init() got an unexpected keyword argument 'maxIter'
====================================== 1 failed, 8 passed in 4.67s =======================================

@lijinf2
Copy link
Collaborator

lijinf2 commented Aug 14, 2023

Thank you Johnny for reporting this issue!

Logistic Regression requires cuml 23.08 or above. So you can try upgrading cuml to 23.08. Note only fit is supported currently. transform is under review: #363

The ci has been upgraded to use cuml 23.08, but other parts (e.g. installation instructions) of spark rapids ml still use 23.06. The whole spark rapids ml should be upgraded to cuml 23.08 after the cuml 23.08 is officially released.

@johnnyzhon
Copy link
Author

johnnyzhon commented Aug 15, 2023

The cuml of my test enviroment is 23.08 already. please get the detail in following.

(base) root@93b04b04d66d:/usr/workspace/spark-rapids-ml/python# pip list|grep cuda
cuda-python 12.2.0
dask-cuda 23.8.0
(base) root@93b04b04d66d:/usr/workspace/spark-rapids-ml/python#
(base) root@93b04b04d66d:/usr/workspace/spark-rapids-ml/python# pip list|grep cuml
cuml 23.8.0
(base) root@93b04b04d66d:/usr/workspace/spark-rapids-ml/python# pip list|grep cupy
cupy 12.1.0
(base) root@93b04b04d66d:/usr/workspace/spark-rapids-ml/python# nvidia-smi
Tue Aug 15 00:13:21 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |

@lijinf2
Copy link
Collaborator

lijinf2 commented Aug 15, 2023

Get it. Spark rapids ml does not support init arguments (e.g. maxIter) yet. Needs this PR #363 merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants