Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-41775][PYTHON][ML] Adding support for PyTorch functions #39369

Closed
wants to merge 9 commits into from

Conversation

rithwik-db
Copy link
Contributor

NOTE: If you want to only view the diff from the other WIP PR regarding the baseline changes, look at the LAST COMMIT in this PR's commit history (titled "WIP adding notebook functionality"). Since I am sending out parallel PRs that are related, you should view this commit to see the diff pertaining to this ticket.

What changes were proposed in this pull request?

This is an addition to #39299 to add support for using functions as the input for distributed training. The users would follow the first workflow in the design document to run training.

Why are the changes needed?

We want to make it easier for users to run distributed training from a notebook setting.

Does this PR introduce any user-facing change?

Users can now input a training function into the PyTorchDistributor().run(...) api. This will require a lot of additional documentation though since since we are internally using cloudpickle to run training so we need to be able to have the user's train() function be picklable.

How was this patch tested?

This is a WIP PR so it hasn't been tested yet.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@rithwik-db rithwik-db force-pushed the pytorch-functions branch 4 times, most recently from 9becf31 to 0d169aa Compare January 19, 2023 01:00
@WeichenXu123
Copy link
Contributor

overall good.

@rithwik-db rithwik-db changed the title [WIP][SPARK-41775][PYTHON][ML] Adding support for PyForch functions [SPARK-41775][PYTHON][ML] Adding support for PyForch functions Jan 19, 2023
@rithwik-db rithwik-db force-pushed the pytorch-functions branch 2 times, most recently from 2d1ad9f to ad662be Compare January 19, 2023 20:28
@srowen srowen changed the title [SPARK-41775][PYTHON][ML] Adding support for PyForch functions [SPARK-41775][PYTHON][ML] Adding support for PyTorch functions Jan 20, 2023
@HyukjinKwon
Copy link
Member

Merged to master.

@@ -15,16 +15,21 @@
# limitations under the License.
#

import cloudpickle # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be our cloudpickle.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 thanks!

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @rithwik-db , @WeichenXu123 , @HyukjinKwon .
Apache Spark should not require two versions of cloudpickle at the same time.
I made a PR, #39715.

dongjoon-hyun added a commit that referenced this pull request Jan 24, 2023
…`cloudpickle`

### What changes were proposed in this pull request?

This is a follow-up of #39369 which aims to use `pyspark.cloudpickle` instead of outside `cloudpickle` dependency.

### Why are the changes needed?

Apache PySpark should not use two versions of `cloudpickle`.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

Closes #39715 from dongjoon-hyun/SPARK-41775.

Authored-by: Dongjoon Hyun <dongjoon@apache.org>
Signed-off-by: Dongjoon Hyun <dongjoon@apache.org>
HyukjinKwon pushed a commit that referenced this pull request Jan 25, 2023
### What changes were proposed in this pull request?

This is a follow-up of #39369 which aims to fix `stderr` rerouting to `stdout` which is required for users seeing any errors that come up due to distributed training,

### Why are the changes needed?

Previously, the error would be lost since it's never returned to the user. We are fixing that issue.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

N/A

Closes #39724 from rithwik-db/sterr-followup.

Authored-by: Rithwik Ediga Lakhamsani <rithwik.ediga@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants