-
Notifications
You must be signed in to change notification settings - Fork 248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Cursor.rowcount and close finished queries #528
Conversation
With current Impala server rowcount support needs DMLs to be closed with CloseImpalaOperation() as there is no simpler way to get the number of modifed rows. See https://issues.apache.org/jira/browse/IMPALA-12647 for alternatives. This change adds option close_finished_queries for cursors with default True. Setting it to False brings back the old behavior. If queries are closed after finishing queries, calling get_log RPC is no longer possible. If close_finished_queries is true then the logs are fetched and stored before closing to query to be able to return the saved results with get_log. Generally get_log shouldn't be a too expensive RPC. Another potential side-effect is that get_profile may fail as Impala can discard the runtime profile after the query is closed (see Impala flag query_log_size). Despite the above side effects closing the queries seems a better default behavior as it helps avoiding queries hanging in the "waiting to be closed" state and provides reliable rowcount. This is also consistent with the way impala-shell works. Testing: - rowcount already had good coverage in DBAPI2 compliance tests (e.g. test_mixedfetch) - new tests were added for some missing rowcount cases and for getting warning/error log for closed queries
c873132
to
c9ebb0a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this together. I think auto-closing is a reasonable behavior.
impala/hiveserver2.py
Outdated
queries with results set: all rows are returned with fetch | ||
DDL/DML: execution is finished | ||
If False, then the query will be only closed when the cursor is closed | ||
or when its destructor is called. Property rowcount will not be available |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor thought: When a cursor is re-used for multiple query executions, it looks like it calls _reset_state() when each new query is executed. _reset_state() calls CloseOperation on any existing operation. Should we change the "If False" case to say that executing a new query closes the old query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this case to the comment
I'm not sure what our process is, should we file an issue on github for this? |
There is already an issue about rowcount: #302 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me.
With current Impala server rowcount support needs DMLs to be closed with CloseImpalaOperation() as there is no simpler way to get the number of modifed rows.
See https://issues.apache.org/jira/browse/IMPALA-12647 for alternatives.
This change adds option close_finished_queries for cursors with default True. Setting it to False brings back the old behavior.
If queries are closed after finishing queries, calling get_log RPC is no longer possible. If close_finished_queries is true then the logs are fetched and stored before closing to query to be able to return the saved results with get_log. Generally get_log shouldn't be a too expensive RPC.
Another potential side-effect is that get_profile may fail as Impala can discard the runtime profile after the query is closed (see Impala flag query_log_size).
Despite the above side effects closing the queries seems a better default behavior as it helps avoiding queries hanging in the "waiting to be closed" state and provides reliable rowcount. This is also consistent with the way impala-shell works.
Testing: