-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make wait()
method wait for query execution
#31
Conversation
With Impala the `\ThriftSQL\ImpalaQuery` object is ready as soon as the query is parsed / accepted to match up with `wait()` method with Hive fetch up to the first 2 rows during the `wait()` method so that the query gets executed.
|
||
// Wait for results | ||
// Wait for query to be ready |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/query/results/
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is part of the problem, sometimes the thrift query handle gets the query state marked as FINISHED
before the query is ready to return results. Perhaps it gets marked as finished when fragments finish executing but before the gateway aggregates the results?
This can be seen if we run something like SELECT sleep(5000)
which will return almost immediately even with wait()
called on it. Then the initial fetch()
call will take ~5s before it returns results.
I'm not sure if this change is being too clever with trying to wait for the initial fetch as well which might be just an edge case with the UDF.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this change is being too clever with trying to wait for the initial fetch as well which might be just an edge case with the UDF.
Yeah my bad, I should have read the whole thing and that there are actually 2 distinct waits ( which your comments explained quite well in fact, I should stop commenting as I go when reviewing and do a first full pass :D ).
I'd keep it as-is though: should we maybe try it with the cupcakes query that triggered this investigation?
src/ThriftSQL/ImpalaQuery.php
Outdated
@@ -51,35 +53,47 @@ public function wait() { | |||
|
|||
} while (true); | |||
|
|||
// Wait for results by fetching some rows -- triggers query to run | |||
$this->_fetchResponse( 2 ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how to test it tbh, but wouldn't this "consume" those 2 results as well as waiting? Such that when you fetch
afterwards, you won't get those 2 initial results used for waiting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that's why _fetchResponse()
stores the result in $this->_lastResponse
and when fetch()
gets called it consumes that response first.
@bperson so the only required change to make calling
To make sure it's safe to call The hacky |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talked with Xiao in Slack, the plan is to cut the extra complexity to add the "wait to fetch the first 2 rows" for now since we aren't sure it's actually needed for real queries that aren't just edge-cases like 'SELECT sleep(10000);"
With Impala the
\ThriftSQL\ImpalaQuery
object is ready as soon as the query is parsed / accepted to match up withwait()
method with Hive fetch up to the first 2 rows during thewait()
method so that the query gets executed.