Skip to content

[SPARK-43081][ML][FOLLOW-UP] Improve torch distributor data loader code#41382

Closed
WeichenXu123 wants to merge 3 commits intoapache:masterfrom
WeichenXu123:improve-torch-dataloader
Closed

[SPARK-43081][ML][FOLLOW-UP] Improve torch distributor data loader code#41382
WeichenXu123 wants to merge 3 commits intoapache:masterfrom
WeichenXu123:improve-torch-dataloader

Conversation

@WeichenXu123
Copy link
Contributor

What changes were proposed in this pull request?

Why are the changes needed?

Improve torch distributor data loader code:

  • Add a verification that num_processes must match input spark dataframe partitions. This makes user debug easier when they set mismatched input dataframe, otherwise torch package will raise intricate error information.
  • Improve column value conversion in torch dataloader. Avoid comparing type operation for every column values.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

UT.

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
czxm pushed a commit to czxm/spark that referenced this pull request Jun 12, 2023
### What changes were proposed in this pull request?

### Why are the changes needed?

Improve torch distributor data loader code:

* Add a verification that num_processes must match input spark dataframe partitions. This makes user debug easier when they set mismatched input dataframe, otherwise torch package will raise intricate error information.
* Improve column value conversion in torch dataloader. Avoid comparing type operation for every column values.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

UT.

Closes apache#41382 from WeichenXu123/improve-torch-dataloader.

Authored-by: Weichen Xu <weichen.xu@databricks.com>
Signed-off-by: Weichen Xu <weichen.xu@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants