Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYSTEMDS-3466] Asynchronous (Future-based) execution of Spark instructions #1733

Closed
wants to merge 1 commit into from

Conversation

phaniarnab
Copy link
Contributor

This patch introduces a future-based asynchronous execution of Spark actions. We wrap the matrix block with a future, create a matrix object handle, and maintain that in the symbol table. This extension allows triggering a chain of Spark instructions asynchronously and seeking the results only when needed.

TODO: Account the memory required for the future results, maintain the lineage of the broadcast variables to avoid premature removal.

…ctions

This patch introduces a future-based asynchronous execution of a Spark actions.
We wrap the matrix block with a future, create a matrix object handle, and
maintain that in the symbol table. This extension allows triggering a chain
of Spark instructions asynchronously and seek the results only when needed.

TODO: Account the memory required for the future results, maintain lineage
of the broadcast variables to avoid premature removal.
@BACtaki
Copy link
Contributor

BACtaki commented Nov 16, 2022

This extension allows triggering a chain of Spark instructions asynchronously and seeking the results only when needed.

This is really interesting @phaniarnab . Curious: is there a concrete use case for this feature?

@phaniarnab
Copy link
Contributor Author

This extension allows triggering a chain of Spark instructions asynchronously and seeking the results only when needed.

This is really interesting @phaniarnab . Curious: is there a concrete use case for this feature?

Yes. There are different use cases. They arise as we combine Spark's lazy execution with local eager execution. Currently, a chain of Spark operations is triggered only when we need the result locally---which leads to inefficient cluster utilization. @BACtaki

fathollahzadeh pushed a commit to fathollahzadeh/systemds that referenced this pull request Dec 7, 2022
…ctions

This patch introduces a future-based asynchronous execution of Spark actions.
We wrap the matrix block with a future, create a matrix object handle, and
maintain that in the symbol table. This extension allows triggering a chain
of Spark instructions asynchronously and seeking the results only when needed.

TODO: Account the memory required for the future results, maintain lineage
of the broadcast variables to avoid premature removal.

Closes apache#1733
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants