Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make wait() method wait for query execution #31

Merged
merged 2 commits into from
May 7, 2019
Merged

Conversation

xyu
Copy link
Contributor

@xyu xyu commented May 6, 2019

With Impala the \ThriftSQL\ImpalaQuery object is ready as soon as the query is parsed / accepted to match up with wait() method with Hive fetch up to the first 2 rows during the wait() method so that the query gets executed.

With Impala the `\ThriftSQL\ImpalaQuery` object is ready as soon as the query is parsed / accepted to match up with `wait()` method with Hive fetch up to the first 2 rows during the `wait()` method so that the query gets executed.
@xyu xyu requested a review from bperson May 6, 2019 21:56

// Wait for results
// Wait for query to be ready
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/query/results/ ?

Copy link
Contributor Author

@xyu xyu May 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is part of the problem, sometimes the thrift query handle gets the query state marked as FINISHED before the query is ready to return results. Perhaps it gets marked as finished when fragments finish executing but before the gateway aggregates the results?

This can be seen if we run something like SELECT sleep(5000) which will return almost immediately even with wait() called on it. Then the initial fetch() call will take ~5s before it returns results.

I'm not sure if this change is being too clever with trying to wait for the initial fetch as well which might be just an edge case with the UDF.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this change is being too clever with trying to wait for the initial fetch as well which might be just an edge case with the UDF.

Yeah my bad, I should have read the whole thing and that there are actually 2 distinct waits ( which your comments explained quite well in fact, I should stop commenting as I go when reviewing and do a first full pass :D ).

I'd keep it as-is though: should we maybe try it with the cupcakes query that triggered this investigation?

@@ -51,35 +53,47 @@ public function wait() {

} while (true);

// Wait for results by fetching some rows -- triggers query to run
$this->_fetchResponse( 2 );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how to test it tbh, but wouldn't this "consume" those 2 results as well as waiting? Such that when you fetch afterwards, you won't get those 2 initial results used for waiting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that's why _fetchResponse() stores the result in $this->_lastResponse and when fetch() gets called it consumes that response first.

@xyu
Copy link
Contributor Author

xyu commented May 7, 2019

@bperson so the only required change to make calling wait() in the locked ThriftSQL class work is to add the two ready tests E.g.:

    if ( $this->_ready ) {
      return $this;
    }

To make sure it's safe to call wait() multiple times on the same query.

The hacky fetch() within wait() thing fixes some edge cases but maybe they are not important enough for us to really care. Most "normal" queries that spend most of their time reading HDFS and processing remote fragments do indeed spend most of their time during the existing wait() call. What do you think we should do? Get rid of the hacky thing to make the code a bit less surprising?

Copy link
Contributor

@bperson bperson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked with Xiao in Slack, the plan is to cut the extra complexity to add the "wait to fetch the first 2 rows" for now since we aren't sure it's actually needed for real queries that aren't just edge-cases like 'SELECT sleep(10000);"

@xyu xyu merged commit b773771 into master May 7, 2019
@xyu xyu deleted the add/wait-for-impala branch May 7, 2019 17:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants