Skip to content

Conversation

zhengruifeng
Copy link
Contributor

@zhengruifeng zhengruifeng commented Feb 10, 2025

What changes were proposed in this pull request?

Pin Pin plotly<6.0.0 and torch<2.6.0

Why are the changes needed?

the latest plotlly 6.0 has caused many plot-related test failures

Does this PR introduce any user-facing change?

no

How was this patch tested?

manually checked with

python/run-tests -k --python-executables python3 --testnames 'pyspark.sql.tests.connect.test_parity_frame_plot_plotly FramePlotPlotlyParityTests.test_pie_plot'

before:

(spark_312) ➜  spark git:(pin_plotly) python/run-tests -k --python-executables python3 --testnames 'pyspark.sql.tests.connect.test_parity_frame_plot_plotly FramePlotPlotlyParityTests.test_pie_plot'
Running PySpark tests. Output is in /Users/ruifeng.zheng/Dev/spark/python/unit-tests.log
Will test against the following Python executables: ['python3']
Will test the following Python tests: ['pyspark.sql.tests.connect.test_parity_frame_plot_plotly FramePlotPlotlyParityTests.test_pie_plot']
python3 python_implementation is CPython
python3 version is: Python 3.12.9
Starting test(python3): pyspark.sql.tests.connect.test_parity_frame_plot_plotly FramePlotPlotlyParityTests.test_pie_plot (temp output: /Users/ruifeng.zheng/Dev/spark/python/target/4d10075d-bb7b-4d4b-b17d-edbef2f22227/python3__pyspark.sql.tests.connect.test_parity_frame_plot_plotly_FramePlotPlotlyParityTests.test_pie_plot__6qxzu16x.log)

Running tests...
----------------------------------------------------------------------
WARNING: Using incubator modules: jdk.incubator.vector
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/conf.py:64: UserWarning: Failed to set spark.connect.execute.reattachable.senderMaxStreamDuration to Some(1s) due to [CANNOT_MODIFY_CONFIG] Cannot modify the value of the Spark config: "spark.connect.execute.reattachable.senderMaxStreamDuration".
See also 'https://spark.apache.org/docs/latest/sql-migration-guide.html#ddl-statements'. SQLSTATE: 46110
  warnings.warn(warn)
/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/conf.py:64: UserWarning: Failed to set spark.connect.execute.reattachable.senderMaxStreamSize to Some(123) due to [CANNOT_MODIFY_CONFIG] Cannot modify the value of the Spark config: "spark.connect.execute.reattachable.senderMaxStreamSize".
See also 'https://spark.apache.org/docs/latest/sql-migration-guide.html#ddl-statements'. SQLSTATE: 46110
  warnings.warn(warn)
  test_pie_plot (pyspark.sql.tests.connect.test_parity_frame_plot_plotly.FramePlotPlotlyParityTests.test_pie_plot) ... FAIL (1.760s)

======================================================================
FAIL [1.760s]: test_pie_plot (pyspark.sql.tests.connect.test_parity_frame_plot_plotly.FramePlotPlotlyParityTests.test_pie_plot)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/tests/plot/test_frame_plot_plotly.py", line 318, in test_pie_plot
    self._check_fig_data(fig["data"][0], **expected_fig_data_sales)
  File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/tests/plot/test_frame_plot_plotly.py", line 81, in _check_fig_data
    self.assertEqual(converted_values, expected_value)
AssertionError: Lists differ: [1517356800000000000, 1519776000000000000,[37 chars]0000] != [datetime.datetime(2018, 1, 31, 0, 0), dat[105 chars], 0)]

First differing element 0:
datetime.datetime(2018, 1, 31, 0, 0)

- [1517356800000000000,
-  1519776000000000000,
-  1522454400000000000,
-  1525046400000000000]
+ [datetime.datetime(2018, 1, 31, 0, 0),
+  datetime.datetime(2018, 2, 28, 0, 0),
+  datetime.datetime(2018, 3, 31, 0, 0),
+  datetime.datetime(2018, 4, 30, 0, 0)]

----------------------------------------------------------------------
Ran 1 test in 5.573s

FAILED (failures=1)

Generating XML reports...
Generated XML report: target/test-reports/TEST-pyspark.sql.tests.connect.test_parity_frame_plot_plotly.FramePlotPlotlyParityTests-20250210120410.xml

Had test failures in pyspark.sql.tests.connect.test_parity_frame_plot_plotly FramePlotPlotlyParityTests.test_pie_plot with python3; see logs.

after:

(spark_312) ➜  spark git:(pin_plotly) python/run-tests -k --python-executables python3 --testnames 'pyspark.sql.tests.connect.test_parity_frame_plot_plotly FramePlotPlotlyParityTests.test_pie_plot'
Running PySpark tests. Output is in /Users/ruifeng.zheng/Dev/spark/python/unit-tests.log
Will test against the following Python executables: ['python3']
Will test the following Python tests: ['pyspark.sql.tests.connect.test_parity_frame_plot_plotly FramePlotPlotlyParityTests.test_pie_plot']
python3 python_implementation is CPython
python3 version is: Python 3.12.9
Starting test(python3): pyspark.sql.tests.connect.test_parity_frame_plot_plotly FramePlotPlotlyParityTests.test_pie_plot (temp output: /Users/ruifeng.zheng/Dev/spark/python/target/28a7ea8a-6249-4b16-9831-d785b1af2254/python3__pyspark.sql.tests.connect.test_parity_frame_plot_plotly_FramePlotPlotlyParityTests.test_pie_plot__eezgr0hf.log)
Finished test(python3): pyspark.sql.tests.connect.test_parity_frame_plot_plotly FramePlotPlotlyParityTests.test_pie_plot (5s)
Tests passed in 5 seconds

Was this patch authored or co-authored using generative AI tooling?

no

@zhengruifeng
Copy link
Contributor Author

also cc @cloud-fan we probably need to pin plotly in the release docker image

Copy link
Contributor

@LuciferYang LuciferYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM

@zhengruifeng
Copy link
Contributor Author

update in the docker file trigger the refresh of the cache, and torch is also upgraded and caused

======================================================================
ERROR [42.116s]: test_save_load (pyspark.ml.tests.connect.test_connect_classification.ClassificationTestsOnConnect.test_save_load)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/__w/spark/spark/python/pyspark/ml/tests/connect/test_legacy_mode_classification.py", line 185, in test_save_load
    lor_torch_model = torch.load(
                      ^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL torch.nn.modules.container.Sequential was not an allowed global by default. Please use `torch.serialization.add_safe_globals([Sequential])` or the `torch.serialization.safe_globals([Sequential])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.

@zhengruifeng zhengruifeng changed the title [SPARK-51143][PYTHON] Pin plotly==5.24.1 [SPARK-51143][PYTHON] Pin plotly<6.0.0 and torch<2.6.0 Feb 10, 2025
@LuciferYang LuciferYang marked this pull request as draft February 10, 2025 08:49
@LuciferYang
Copy link
Contributor

set to draft first to avoid accidental merging due to the existence of build failures

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for checking this, @zhengruifeng .

@zhengruifeng zhengruifeng marked this pull request as ready for review February 11, 2025 00:06
@HyukjinKwon
Copy link
Member

Merged to master and branch-4.0.

HyukjinKwon pushed a commit that referenced this pull request Feb 11, 2025
### What changes were proposed in this pull request?
Pin `Pin plotly<6.0.0` and `torch<2.6.0`

### Why are the changes needed?
the latest plotlly 6.0 has caused many plot-related test failures

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
manually checked with
```
python/run-tests -k --python-executables python3 --testnames 'pyspark.sql.tests.connect.test_parity_frame_plot_plotly FramePlotPlotlyParityTests.test_pie_plot'
```

before:
```
(spark_312) ➜  spark git:(pin_plotly) python/run-tests -k --python-executables python3 --testnames 'pyspark.sql.tests.connect.test_parity_frame_plot_plotly FramePlotPlotlyParityTests.test_pie_plot'
Running PySpark tests. Output is in /Users/ruifeng.zheng/Dev/spark/python/unit-tests.log
Will test against the following Python executables: ['python3']
Will test the following Python tests: ['pyspark.sql.tests.connect.test_parity_frame_plot_plotly FramePlotPlotlyParityTests.test_pie_plot']
python3 python_implementation is CPython
python3 version is: Python 3.12.9
Starting test(python3): pyspark.sql.tests.connect.test_parity_frame_plot_plotly FramePlotPlotlyParityTests.test_pie_plot (temp output: /Users/ruifeng.zheng/Dev/spark/python/target/4d10075d-bb7b-4d4b-b17d-edbef2f22227/python3__pyspark.sql.tests.connect.test_parity_frame_plot_plotly_FramePlotPlotlyParityTests.test_pie_plot__6qxzu16x.log)

Running tests...
----------------------------------------------------------------------
WARNING: Using incubator modules: jdk.incubator.vector
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/conf.py:64: UserWarning: Failed to set spark.connect.execute.reattachable.senderMaxStreamDuration to Some(1s) due to [CANNOT_MODIFY_CONFIG] Cannot modify the value of the Spark config: "spark.connect.execute.reattachable.senderMaxStreamDuration".
See also 'https://spark.apache.org/docs/latest/sql-migration-guide.html#ddl-statements'. SQLSTATE: 46110
  warnings.warn(warn)
/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/connect/conf.py:64: UserWarning: Failed to set spark.connect.execute.reattachable.senderMaxStreamSize to Some(123) due to [CANNOT_MODIFY_CONFIG] Cannot modify the value of the Spark config: "spark.connect.execute.reattachable.senderMaxStreamSize".
See also 'https://spark.apache.org/docs/latest/sql-migration-guide.html#ddl-statements'. SQLSTATE: 46110
  warnings.warn(warn)
  test_pie_plot (pyspark.sql.tests.connect.test_parity_frame_plot_plotly.FramePlotPlotlyParityTests.test_pie_plot) ... FAIL (1.760s)

======================================================================
FAIL [1.760s]: test_pie_plot (pyspark.sql.tests.connect.test_parity_frame_plot_plotly.FramePlotPlotlyParityTests.test_pie_plot)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/tests/plot/test_frame_plot_plotly.py", line 318, in test_pie_plot
    self._check_fig_data(fig["data"][0], **expected_fig_data_sales)
  File "/Users/ruifeng.zheng/Dev/spark/python/pyspark/sql/tests/plot/test_frame_plot_plotly.py", line 81, in _check_fig_data
    self.assertEqual(converted_values, expected_value)
AssertionError: Lists differ: [1517356800000000000, 1519776000000000000,[37 chars]0000] != [datetime.datetime(2018, 1, 31, 0, 0), dat[105 chars], 0)]

First differing element 0:
datetime.datetime(2018, 1, 31, 0, 0)

- [1517356800000000000,
-  1519776000000000000,
-  1522454400000000000,
-  1525046400000000000]
+ [datetime.datetime(2018, 1, 31, 0, 0),
+  datetime.datetime(2018, 2, 28, 0, 0),
+  datetime.datetime(2018, 3, 31, 0, 0),
+  datetime.datetime(2018, 4, 30, 0, 0)]

----------------------------------------------------------------------
Ran 1 test in 5.573s

FAILED (failures=1)

Generating XML reports...
Generated XML report: target/test-reports/TEST-pyspark.sql.tests.connect.test_parity_frame_plot_plotly.FramePlotPlotlyParityTests-20250210120410.xml

Had test failures in pyspark.sql.tests.connect.test_parity_frame_plot_plotly FramePlotPlotlyParityTests.test_pie_plot with python3; see logs.

```

after:
```
(spark_312) ➜  spark git:(pin_plotly) python/run-tests -k --python-executables python3 --testnames 'pyspark.sql.tests.connect.test_parity_frame_plot_plotly FramePlotPlotlyParityTests.test_pie_plot'
Running PySpark tests. Output is in /Users/ruifeng.zheng/Dev/spark/python/unit-tests.log
Will test against the following Python executables: ['python3']
Will test the following Python tests: ['pyspark.sql.tests.connect.test_parity_frame_plot_plotly FramePlotPlotlyParityTests.test_pie_plot']
python3 python_implementation is CPython
python3 version is: Python 3.12.9
Starting test(python3): pyspark.sql.tests.connect.test_parity_frame_plot_plotly FramePlotPlotlyParityTests.test_pie_plot (temp output: /Users/ruifeng.zheng/Dev/spark/python/target/28a7ea8a-6249-4b16-9831-d785b1af2254/python3__pyspark.sql.tests.connect.test_parity_frame_plot_plotly_FramePlotPlotlyParityTests.test_pie_plot__eezgr0hf.log)
Finished test(python3): pyspark.sql.tests.connect.test_parity_frame_plot_plotly FramePlotPlotlyParityTests.test_pie_plot (5s)
Tests passed in 5 seconds
```

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #49863 from zhengruifeng/pin_plotly.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit 22d2eb3)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
@zhengruifeng zhengruifeng deleted the pin_plotly branch February 11, 2025 00:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants