Skip to content

[SPARK-41589][PYTHON][ML] PyTorch Distributor Baseline API Changes#39146

Closed
rithwik-db wants to merge 2 commits intoapache:masterfrom
rithwik-db:baseline-api
Closed

[SPARK-41589][PYTHON][ML] PyTorch Distributor Baseline API Changes#39146
rithwik-db wants to merge 2 commits intoapache:masterfrom
rithwik-db:baseline-api

Conversation

@rithwik-db
Copy link
Contributor

Just creating a small PR to start progress on the Spark-PyTorch Distributor. This is a WIP project and I left questions and comments to discuss how I will be approaching certain aspects of the code.

What changes were proposed in this pull request?

This just proposes the baseline API for how users will interact with the Spark PyTorch distributor (Design Document).

Why are the changes needed?

The design document's background section goes into more detail about the why.

Does this PR introduce any user-facing change?

Yes, this proposes an API for how users will interact with the PyTorch Distributor. The user workflow is also proposed in that design document.

How was this patch tested?

I just added some basic tests. These will need to be improved to correctly match the style that PySpark requires.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@HyukjinKwon
Copy link
Member

cc @WeichenXu123 and @mengxr

@rithwik-db rithwik-db force-pushed the baseline-api branch 2 times, most recently from 814820c to 43c2213 Compare December 23, 2022 21:31
@zhengruifeng
Copy link
Contributor

@rithwik-db could you please fix the python lint?

@rithwik-db rithwik-db force-pushed the baseline-api branch 2 times, most recently from 518438f to 3f57b2d Compare January 6, 2023 06:45
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need to add F403? I saw other test files only include F401.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mypy raises errors otherwise

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And also not seeing # type: ignore in other test files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mypy raises errors otherwise

Copy link
Contributor

@lu-wang-dl lu-wang-dl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, LGTM.

@rithwik-db rithwik-db changed the title [WIP][SPARK-41589][PYTHON][ML] PyTorch Distributor Baseline API Changes [SPARK-41589][PYTHON][ML] PyTorch Distributor Baseline API Changes Jan 9, 2023
@HyukjinKwon
Copy link
Member

Test failures are not related to this PR.

Merged to master.

HyukjinKwon added a commit that referenced this pull request Jan 11, 2023
…etup.py

### What changes were proposed in this pull request?

This PR is a followup of #39146 that adds `pyspark.ml.torch` to `setup.py`.

### Why are the changes needed?

In order for PyPI users to be able to use `pyspark.ml.torch` package.

### Does this PR introduce _any_ user-facing change?

No, the main change has not been released yet.
It adds the package into PyPI-packaged PySpark.

### How was this patch tested?

CI in pip packaging test should check the change.

Closes #39490 from HyukjinKwon/SPARK-41589-followup.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
HyukjinKwon pushed a commit that referenced this pull request Jan 11, 2023
… GPU

### What changes were proposed in this pull request?

This is an addition to #39146 to add support for single node training using PyTorch files. The users would follow the second workflow in the [design document](https://docs.google.com/document/d/1QPO1Ly8WteL6aIPvVcR7Xne9qVtJiB3fdrRn7NwBcpA/edit#heading=h.8yvw9xq428fh) to run training. I added some new utility functions as well as built on top of current functions.

### Why are the changes needed?

Look at the [main ticket](https://issues.apache.org/jira/browse/SPARK-41589) for more details.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Some unit tests were added and integration tests will be added in a later PR (https://issues.apache.org/jira/browse/SPARK-41777).

Closes #39188 from rithwik-db/pytorch-file-local-training.

Authored-by: Rithwik Ediga Lakhamsani <rithwik.ediga@databricks.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants

Comments