Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[SPARK-44264][ML][PYTHON] Support Distributed Training of Functions U…
…sing Deepspeed ### What changes were proposed in this pull request? Made the DeepspeedTorchDistributor run() method use the _run() function as the backbone. ### Why are the changes needed? It allows the user to run distributed training of a function with deepspeed easily. ### Does this PR introduce _any_ user-facing change? This adds the ability for the user to pass in a function as the train_object when calling DeepspeedTorchDistributor.run(). The user must have all necessary imports within the function itself, and the function must be picklable. An example use case can be found in the python file linked in the JIRA ticket. ### How was this patch tested? Notebook/file linked in the JIRA ticket. Formal e2e tests will come in future PR. ### Next Steps/Timeline - [ ] Add more e2e tests for both running a regular pytorch file and running a function for training - [ ] Write more documentation Closes #42067 from mathewjacob1002/add_func_deepspeed. Authored-by: Mathew Jacob <mathew.jacob@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
- Loading branch information