Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TASK][EASY][SPARK] kyuubi.engine.submit.time is wrong for ApplicationMaster retries #5766

Closed
2 of 4 tasks
yaooqinn opened this issue Nov 24, 2023 · 9 comments
Closed
2 of 4 tasks
Assignees
Labels
kind:bug This is a clearly a bug priority:major
Milestone

Comments

@yaooqinn
Copy link
Member

Code of Conduct

Search before asking

  • I have searched in the issues and found no similar issues.

Describe the bug

ERROR ApplicationMaster: User class threw exception: org.apache.kyuubi.KyuubiException: The total engine initialization time (84513834 ms) exceeds kyuubi.session.engine.initialize.timeout (300000 ms), and submitted at 1700714879267.
org.apache.kyuubi.KyuubiException: The total engine initialization time (84513834 ms) exceeds kyuubi.session.engine.initialize.timeout (300000 ms), and submitted at 1700714879267.
	at org.apache.kyuubi.engine.spark.SparkSQLEngine$.main(SparkSQLEngine.scala:319)
	at org.apache.kyuubi.engine.spark.SparkSQLEngine.main(SparkSQLEngine.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)

When Kyuubi spark engine starting failover, the kyuubi.engine.submit.time is wrong to use

Affects Version(s)

1.7.0

Kyuubi Server Log Output

No response

Kyuubi Engine Log Output

No response

Kyuubi Server Configurations

No response

Kyuubi Engine Configurations

No response

Additional context

No response

Are you willing to submit PR?

  • Yes. I would be willing to submit a PR with guidance from the Kyuubi community to fix.
  • No. I cannot submit a PR at this time.
@yaooqinn yaooqinn added kind:bug This is a clearly a bug priority:major labels Nov 24, 2023
@yaooqinn
Copy link
Member Author

cc @wForget

@wForget
Copy link
Member

wForget commented Nov 24, 2023

Will the spark engine still work well after ApplicationMaster failover?

@yaooqinn
Copy link
Member Author

Not now maybe, but we shall make it work well

@pan3793
Copy link
Member

pan3793 commented Nov 28, 2023

I think the current behavior is by design, I'm not sure changing the behavior is good idea since Kyuubi can create a new engine when a new session comes in if the previous engine is broken. If we change the behavior, the attempted engine launched after the new engine will waste resources, such behavior violates the original expectation

@wForget
Copy link
Member

wForget commented Nov 30, 2023

I think the current behavior is by design, I'm not sure changing the behavior is good idea since Kyuubi can create a new engine when a new session comes in if the previous engine is broken. If we change the behavior, the attempted engine launched after the new engine will waste resources, such behavior violates the original expectation

Your concern is reasonable, but the attempted engine failure should not be triggered by timeout. So, I think this fix can be accepted, and we can recommend users to set spark.yarn.maxAppAttempts=1 before supporting attempted engines well. and cc @yaooqinn what do you think?

@yaooqinn
Copy link
Member Author

yaooqinn commented Dec 1, 2023

We can default spark.yarn.maxAppAttempts to 1

wForget added a commit to wForget/kyuubi that referenced this issue Dec 1, 2023
wForget added a commit to wForget/kyuubi that referenced this issue Dec 5, 2023
@wForget
Copy link
Member

wForget commented Dec 7, 2023

@yaooqinn @pan3793 #5798 has been approved, I will merge it later. So is #5776 acceptable? If not I will close it later.

@pan3793
Copy link
Member

pan3793 commented Dec 7, 2023

I think #5776 is not necessary

@wForget
Copy link
Member

wForget commented Dec 7, 2023

I think #5776 is not necessary

Thanks. after setting spark.yarn.maxAppAttempts to 1 by default, it is indeed dispensable. I will close it later.

@wForget wForget self-assigned this Dec 7, 2023
@wForget wForget added this to the v1.8.1 milestone Dec 7, 2023
@wForget wForget closed this as completed in 6a282fc Dec 7, 2023
wForget added a commit that referenced this issue Dec 7, 2023
# 🔍 Description
## Issue References 🔗

This pull request fixes #5766

## Describe Your Solution 🔧

As discussed in #5766 (comment), we should add `spark.yarn.maxAppAttempts=1` for spark engine when `spark.master` is `yarn`.

## Types of changes 🔖

- [X] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)

## Test Plan 🧪

#### Behavior Without This Pull Request ⚰️

#### Behavior With This Pull Request 🎉

#### Related Unit Tests

---

# Checklists
## 📝 Author Self Checklist

- [x] My code follows the [style guidelines](https://kyuubi.readthedocs.io/en/master/contributing/code/style.html) of this project
- [x] I have performed a self-review
- [x] I have commented my code, particularly in hard-to-understand areas
- [ ] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my feature works
- [ ] New and existing unit tests pass locally with my changes
- [x] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)

## 📝 Committer Pre-Merge Checklist

- [x] Pull request title is okay.
- [x] No license issues.
- [x] Milestone correctly set?
- [ ] Test coverage is ok
- [x] Assignees are selected.
- [x] Minimum number of approvals
- [x] No changes are requested

**Be nice. Be informative.**

Closes #5798 from wForget/KYUUBI-5766-2.

Closes #5766

6477dfd [wforget] fix
c50f656 [wforget] fix order
dbc1891 [wforget] comment
a493e29 [wforget] fix style
4fa0651 [wforget] fix test
b899646 [wforget] add test
954a30d [wforget] [KYUUBI #5766] Default `spark.yarn.maxAppAttempts` to 1 for spark engine

Authored-by: wforget <643348094@qq.com>
Signed-off-by: wforget <643348094@qq.com>
(cherry picked from commit 6a282fc)
Signed-off-by: wforget <643348094@qq.com>
@pan3793 pan3793 changed the title [Bug][SPARK] kyuubi.engine.submit.time is wrong for ApplicationMaster retries [TASK][EASY][SPARK] kyuubi.engine.submit.time is wrong for ApplicationMaster retries Dec 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:bug This is a clearly a bug priority:major
Projects
No open projects
3 participants