Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ballista Enhancement Overview #7

Open
2 of 15 tasks
yahoNanJing opened this issue Jan 29, 2022 · 4 comments
Open
2 of 15 tasks

Ballista Enhancement Overview #7

yahoNanJing opened this issue Jan 29, 2022 · 4 comments
Labels
enhancement New feature or request

Comments

@yahoNanJing
Copy link
Contributor

yahoNanJing commented Jan 29, 2022

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Current Ballista implementation is more like a POC product for verification of whether it's able to run the Datafusion operators in a distributed way. It helps set up the whole framework and works well for just verification. However, it's a long way to introduce it to the production environment for real cases. This issue mainly raises several aspects we need to consider and to enhance for a more robust distributed execution framework.

In big data era, there're many scenarios. Two common ones are query for interactive analysis and batch processing for ETL purpose. There's no silver bullet. Each scenario has its own characteristics and has its own needs. In the following, I'll describe some enhancement we can do for each scenario.

For both interactive query and batch processing:

For interactive query:

For batch processing:

  • [Necessary] Support task speculative scheduling
  • [Necessary] Support shuffle fetch failure handling and retry
  • [Necessary] Support to reattempt some stages
@yahoNanJing yahoNanJing added the enhancement New feature or request label Jan 29, 2022
@Ted-Jiang
Copy link
Member

This would be a milestone in Ballista! 👍

@EricJoy2048
Copy link
Member

Great, I hope I can contribute to these goals as much as I can.

@yahoNanJing
Copy link
Contributor Author

Great, I hope I can contribute to these goals as much as I can.

Hi @gaojun2048, which part are you interested in? Feel free to pick up some tasks.

@EricJoy2048
Copy link
Member

Is ballista targeting a data computing engine like spark or an ad-hoc query engine like Presto / CK / impala? I believe that our roadmap is different under different goals.

@andygrove andygrove transferred this issue from apache/datafusion May 19, 2022
yahoNanJing referenced this issue in yahoNanJing/arrow-ballista Aug 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants