-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Process stack overflow panic elegantly #1437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8a71251 to
c5cd3f8
Compare
|
I think there are still some places that need to be wrapped by PTAL @alamb @houqp @Dandandan |
|
|
||
| /// Bytes available in the current stack | ||
| pub const STACKER_RED_ZONE: usize = { | ||
| 64 << 10 // 64KB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This constant is what I feel, I want to know your thoughts.
|
|
||
| /// Allocate a new stack of at least stack_size bytes. | ||
| pub const STACKER_SIZE: usize = { | ||
| 4 << 20 // 4MB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
datafusion/src/sql/planner.rs
Outdated
| SqlToRel { schema_provider } | ||
| SqlToRel { | ||
| schema_provider, | ||
| project_recursion: ProtectRecursion::new_with_limit(2048), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This constant is what I feel, I want to know your thoughts.
|
I wonder if it makes sense to just limit number of or statements instead? |
First, Second, in some cases, it's preferable to unconditionally grow the stack than returning err or panic. |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran out of time today to look at this issue carefully, but will do so tomorrow
houqp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an acceptable short term workaround, but I think it would be more efficient and more elegant if we rewrite these recursive procedures into iterative procedures.
Do you mean to use push-based model? |
6459001 to
f0593d1
Compare
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I previously fixed a similar issue like this in:
#1047
For the proposed approach (grow the stack if needed), I think this PR is quite elegant -- nicely done @xudong963
However, I think this PR doesn't actually fix the stack overflow, rather it:
- Converts a
panic!into anErrorin some (very important) - Allows the user to set a higher stack limit (but overflows are still possible)
The idea of datafusion protecting itself from pathological input cases is 👍 . The thing I worry about with the approach in this PR is that it relies on us annotating all places in the code (both that currently exist as well as may be added in the future) that do recursive walks of the tree with maybe_grow). It seems almost inevitable that we will end up missing some and I do think this code will likely add a non trivial overhead to datafusion)
I wonder if it makes sense to just limit number of or statements instead?
I think the idea from @jimexist was to limit the depth of any Expr tree by running a checking function like
pub fn check_depth(plan: LogicalPlan) {
// recursively check all Exprs in all LogicalPlans
// for depth greater than 50 (or something)
}I think @houqp's plan of "rewriting to not be recursive" is the best -- I will add a comment describing it shortly
@xudong963 , I think what @houqp is suggesting is to rewrite code that is recursive to not be recursive. The pattern for Datafusion probably looks like taking code like fn visit(expr: &expr) {
for child in expr.children() {
visit(child)
}
// do actual expr logic
}And changing it so the state is tracked with a structure on the heap rather than a stack. I think fn visit(expr: &expr) {
let mut worklist = VecDequeue::new();
worklist.push_back(expr);
while !worklist.is_empty() {
let parent = worklist.pop_front();
for child in parent.children() {
worklist.push_back(child)
}
// do actual expr logic on parent
}
}(aka avoid the call back to |
Very nice conclusion, You said what I was thinking inside👍
Yes, I agree. So I think we may check the depth of exprs using |
Open an issue #1444 to track of this. |
f0593d1 to
bcff782
Compare
bcff782 to
04190c4
Compare
|
Marking as stale pr -- will close it in a week or two unless we plan to keep working on it |
I'll directly close the ticket because I plan to fix it by #1444 |
Which issue does this PR close?
Closes #1434
Rationale for this change
Use
SafeRecursionto support limited recursion to avoid stack overflow.maybe_growsupports unconditionally grow the stack space, by spilling over to the heap if the stack has hit its limit.What changes are included in this PR?
Converts a panic! into an Error in some (very important)
Allows the user to set a higher stack limit (but overflows are still possible)
Are there any user-facing changes?
No