Skip to content

Conversation

@viirya
Copy link
Member

@viirya viirya commented Jan 4, 2025

What changes were proposed in this pull request?

This patch removes task serialization when scheduling tasks for local executor backend. The tasks are directly passed to the local executor to run.

Why are the changes needed?

Spark serializes tasks and attaches serialized tasks in task description before submitting tasks to executors. For remote executor backends, this is necessary. But for local executor backend, it runs in same JVM with master. It seems that task serialization for local executor backend is not necessary.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Manual test local mode.

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the CORE label Jan 4, 2025
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you file a JIRA issue and use its ID, @viirya ?

@viirya viirya marked this pull request as draft January 5, 2025 01:30
@viirya
Copy link
Member Author

viirya commented Jan 5, 2025

Thanks @dongjoon-hyun. I changed this to draft as some test failures in the CI. I will add a JIRA issue once this is ready.

@viirya
Copy link
Member Author

viirya commented Jan 6, 2025

Looks like Spark in many places replies some tricks during serialization. Skipping serialization will cause errors.

@viirya viirya closed this Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants