[SPARK-41775][PYTHON][ML] Adding support for PyTorch functions #39369

rithwik-db · 2023-01-03T18:35:18Z

NOTE: If you want to only view the diff from the other WIP PR regarding the baseline changes, look at the LAST COMMIT in this PR's commit history (titled "WIP adding notebook functionality"). Since I am sending out parallel PRs that are related, you should view this commit to see the diff pertaining to this ticket.

What changes were proposed in this pull request?

This is an addition to #39299 to add support for using functions as the input for distributed training. The users would follow the first workflow in the design document to run training.

Why are the changes needed?

We want to make it easier for users to run distributed training from a notebook setting.

Does this PR introduce any user-facing change?

Users can now input a training function into the PyTorchDistributor().run(...) api. This will require a lot of additional documentation though since since we are internally using cloudpickle to run training so we need to be able to have the user's train() function be picklable.

How was this patch tested?

This is a WIP PR so it hasn't been tested yet.

AmplabJenkins · 2023-01-04T04:20:44Z

Can one of the admins verify this patch?

python/pyspark/ml/torch/distributor.py

WeichenXu123 · 2023-01-19T04:23:48Z

overall good.

python/pyspark/ml/torch/distributor.py

python/pyspark/ml/torch/tests/test_distributor.py

HyukjinKwon · 2023-01-23T08:06:31Z

Merged to master.

dongjoon-hyun · 2023-01-24T09:09:44Z

python/pyspark/ml/torch/distributor.py

@@ -15,16 +15,21 @@
 # limitations under the License.
 #

+import cloudpickle  # type: ignore


This should be our cloudpickle.

dongjoon-hyun

Hi, @rithwik-db , @WeichenXu123 , @HyukjinKwon .
Apache Spark should not require two versions of cloudpickle at the same time.
I made a PR, #39715.

…`cloudpickle` ### What changes were proposed in this pull request? This is a follow-up of #39369 which aims to use `pyspark.cloudpickle` instead of outside `cloudpickle` dependency. ### Why are the changes needed? Apache PySpark should not use two versions of `cloudpickle`. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Pass the CIs. Closes #39715 from dongjoon-hyun/SPARK-41775. Authored-by: Dongjoon Hyun <dongjoon@apache.org> Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>

### What changes were proposed in this pull request? This is a follow-up of #39369 which aims to fix `stderr` rerouting to `stdout` which is required for users seeing any errors that come up due to distributed training, ### Why are the changes needed? Previously, the error would be lost since it's never returned to the user. We are fixing that issue. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? N/A Closes #39724 from rithwik-db/sterr-followup. Authored-by: Rithwik Ediga Lakhamsani <rithwik.ediga@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

github-actions bot added BUILD CORE ML PYTHON labels Jan 3, 2023

rithwik-db force-pushed the pytorch-functions branch 4 times, most recently from 9becf31 to 0d169aa Compare January 19, 2023 01:00

WeichenXu123 reviewed Jan 19, 2023

View reviewed changes

python/pyspark/ml/torch/distributor.py Show resolved Hide resolved

WeichenXu123 reviewed Jan 19, 2023

View reviewed changes

python/pyspark/ml/torch/distributor.py Show resolved Hide resolved

rithwik-db force-pushed the pytorch-functions branch from 0d169aa to c6b96a5 Compare January 19, 2023 19:48

rithwik-db mentioned this pull request Jan 19, 2023

[SPARK-41777][PYSPARK][ML] Integration testing for TorchDistributor #39637

Closed

rithwik-db changed the title ~~[WIP][SPARK-41775][PYTHON][ML] Adding support for PyForch functions~~ [SPARK-41775][PYTHON][ML] Adding support for PyForch functions Jan 19, 2023

rithwik-db force-pushed the pytorch-functions branch 2 times, most recently from 2d1ad9f to ad662be Compare January 19, 2023 20:28

WeichenXu123 reviewed Jan 20, 2023

View reviewed changes

python/pyspark/ml/torch/distributor.py Outdated Show resolved Hide resolved

WeichenXu123 approved these changes Jan 20, 2023

View reviewed changes

WeichenXu123 reviewed Jan 20, 2023

View reviewed changes

python/pyspark/ml/torch/tests/test_distributor.py Outdated Show resolved Hide resolved

srowen changed the title ~~[SPARK-41775][PYTHON][ML] Adding support for PyForch functions~~ [SPARK-41775][PYTHON][ML] Adding support for PyTorch functions Jan 20, 2023

rithwik-db force-pushed the pytorch-functions branch from ad662be to 5f10396 Compare January 20, 2023 17:54

lu-wang-dl reviewed Jan 20, 2023

View reviewed changes

python/pyspark/ml/torch/tests/test_distributor.py Outdated Show resolved Hide resolved

rithwik-db force-pushed the pytorch-functions branch from 5f10396 to d8715ba Compare January 20, 2023 17:59

lu-wang-dl approved these changes Jan 20, 2023

View reviewed changes

rithwik-db force-pushed the pytorch-functions branch from 36a9a00 to 177b291 Compare January 20, 2023 22:31

rithwik-db added 4 commits January 20, 2023 23:26

Added logging

ac2cd5a

Added changes discussed in call

fd5f3ba

Added notebook functionality and added integration tests

f48e4a3

fix formatting

50b0baa

rithwik-db added 3 commits January 20, 2023 23:28

Addessed comments discussed in meeting

a81b625

addressed minor comments + formatting changes

a2659a6

minor linting

825e986

rithwik-db force-pushed the pytorch-functions branch from 37df2c6 to 691fc3b Compare January 21, 2023 07:34

github-actions bot removed the BUILD label Jan 21, 2023

contextmanager confusion

71f78f7

rithwik-db force-pushed the pytorch-functions branch from 691fc3b to 71f78f7 Compare January 22, 2023 20:04

fixed get_port

73a62f6

rithwik-db force-pushed the pytorch-functions branch from 41aebc2 to 73a62f6 Compare January 23, 2023 03:39

HyukjinKwon closed this in ea5be38 Jan 23, 2023

dongjoon-hyun reviewed Jan 24, 2023

View reviewed changes

dongjoon-hyun mentioned this pull request Jan 24, 2023

[SPARK-41775][PYTHON][FOLLOWUP] Use pyspark.cloudpickle instead of cloudpickle #39715

Closed

rithwik-db mentioned this pull request Jan 24, 2023

[SPARK-41775][PYTHON][FOLLOWUP] Fix stdout rerouting #39724

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-41775][PYTHON][ML] Adding support for PyTorch functions #39369

[SPARK-41775][PYTHON][ML] Adding support for PyTorch functions #39369

rithwik-db commented Jan 3, 2023

AmplabJenkins commented Jan 4, 2023

WeichenXu123 commented Jan 19, 2023

HyukjinKwon commented Jan 23, 2023

dongjoon-hyun Jan 24, 2023

HyukjinKwon Jan 24, 2023

dongjoon-hyun left a comment •

edited

[SPARK-41775][PYTHON][ML] Adding support for PyTorch functions #39369

[SPARK-41775][PYTHON][ML] Adding support for PyTorch functions #39369

Conversation

rithwik-db commented Jan 3, 2023

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

AmplabJenkins commented Jan 4, 2023

WeichenXu123 commented Jan 19, 2023

HyukjinKwon commented Jan 23, 2023

dongjoon-hyun Jan 24, 2023

Choose a reason for hiding this comment

HyukjinKwon Jan 24, 2023

Choose a reason for hiding this comment

dongjoon-hyun left a comment • edited

Choose a reason for hiding this comment

dongjoon-hyun left a comment •

edited