there is hardcoding in logistic regression test case when compare model coefficients #402

johnnyzhon · 2023-08-24T07:31:45Z

https://github.com/NVIDIA/spark-rapids-ml/blob/branch-23.08/python/tests/test_logistic_regression.py#L306C9-L306C68
assert array_equal(coefficients, [-2.42377087, 2.42377087])

two concerns:

where did this basis array [-2.42377087, 2.42377087] come from?
Is that making test case difficult to maintain?

lijinf2 · 2023-08-24T19:46:16Z

It is the output of cuml single-gpu class. Copying it here to ensure our code (multiple-GPU class) does not break the output.
Seemed fine for now but would like to learn about potential risk or if there is a better way.

johnnyzhon · 2023-08-25T01:25:40Z

Thanks for @lijinf2 's information.
Shall we compare the coefficients between gpu(spark_rapids_ml) model and cpu(pyspark) model?
Is there a way to achieve that?

lijinf2 · 2023-08-25T02:28:20Z

Yeah, there is a way. In fact, that is achieved in the test_compat, which compares the coefficients between GPU model and CPU model on VectorUDT input type. Currently, every algorithm has a similar "test_compat" test case that ensures the compatibility with CPU on the VectorUDT type. I may be wrong but it seems CPU algorithm accepts VectorUDT input type only.

test_toy_example uses array input type, because GPU does not restrict to VectorUDT type. Spark rapids ml supports three types, i.e. VectorUDT, array, multi-cols. Relevant information is in the test_classifier.

test_toy_example demonstrates a use case of using array input type.

lijinf2 · 2023-08-25T02:30:45Z

Perhaps we should move test_compat up as the first test case in the file test_logistic_regression.py. Let me know if this looks better from a end user point of view.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

there is hardcoding in logistic regression test case when compare model coefficients #402

there is hardcoding in logistic regression test case when compare model coefficients #402

johnnyzhon commented Aug 24, 2023

lijinf2 commented Aug 24, 2023

johnnyzhon commented Aug 25, 2023

lijinf2 commented Aug 25, 2023 •

edited

Loading

lijinf2 commented Aug 25, 2023 •

edited

Loading

there is hardcoding in logistic regression test case when compare model coefficients #402

there is hardcoding in logistic regression test case when compare model coefficients #402

Comments

johnnyzhon commented Aug 24, 2023

lijinf2 commented Aug 24, 2023

johnnyzhon commented Aug 25, 2023

lijinf2 commented Aug 25, 2023 • edited Loading

lijinf2 commented Aug 25, 2023 • edited Loading

lijinf2 commented Aug 25, 2023 •

edited

Loading

lijinf2 commented Aug 25, 2023 •

edited

Loading