Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Cursor.rowcount and close finished queries #528

Merged
merged 2 commits into from
Jan 30, 2024

Conversation

csringhofer
Copy link
Collaborator

With current Impala server rowcount support needs DMLs to be closed with CloseImpalaOperation() as there is no simpler way to get the number of modifed rows.
See https://issues.apache.org/jira/browse/IMPALA-12647 for alternatives.

This change adds option close_finished_queries for cursors with default True. Setting it to False brings back the old behavior.

If queries are closed after finishing queries, calling get_log RPC is no longer possible. If close_finished_queries is true then the logs are fetched and stored before closing to query to be able to return the saved results with get_log. Generally get_log shouldn't be a too expensive RPC.

Another potential side-effect is that get_profile may fail as Impala can discard the runtime profile after the query is closed (see Impala flag query_log_size).

Despite the above side effects closing the queries seems a better default behavior as it helps avoiding queries hanging in the "waiting to be closed" state and provides reliable rowcount. This is also consistent with the way impala-shell works.

Testing:

  • rowcount already had good coverage in DBAPI2 compliance tests (e.g. test_mixedfetch)
  • new tests were added for some missing rowcount cases and for getting warning/error log for closed queries

With current Impala server rowcount support needs DMLs to be
closed with CloseImpalaOperation() as there is no simpler way
to get the number of modifed rows.
See https://issues.apache.org/jira/browse/IMPALA-12647 for
alternatives.

This change adds option close_finished_queries for cursors
with default True. Setting it to False brings back the old
behavior.

If queries are closed after finishing queries, calling get_log
RPC is no longer possible. If close_finished_queries is true
then the logs are fetched and stored before closing to query
to be able to return the saved results with get_log. Generally
get_log shouldn't be a too expensive RPC.

Another potential side-effect is that get_profile may fail as
Impala can discard the runtime profile after the query is
closed (see Impala flag query_log_size).

Despite the above side effects closing the queries seems a better
default behavior as it helps avoiding queries hanging in the
"waiting to be closed" state and provides reliable rowcount. This
is also consistent with the way impala-shell works.

Testing:
- rowcount already had good coverage in DBAPI2 compliance tests
  (e.g. test_mixedfetch)
- new tests were added for some missing rowcount cases and for
  getting warning/error log for closed queries
Copy link
Collaborator

@joemcdonnell joemcdonnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together. I think auto-closing is a reasonable behavior.

queries with results set: all rows are returned with fetch
DDL/DML: execution is finished
If False, then the query will be only closed when the cursor is closed
or when its destructor is called. Property rowcount will not be available
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor thought: When a cursor is re-used for multiple query executions, it looks like it calls _reset_state() when each new query is executed. _reset_state() calls CloseOperation on any existing operation. Should we change the "If False" case to say that executing a new query closes the old query?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this case to the comment

impala/hiveserver2.py Outdated Show resolved Hide resolved
impala/tests/test_impala.py Outdated Show resolved Hide resolved
impala/tests/test_impala.py Outdated Show resolved Hide resolved
@joemcdonnell
Copy link
Collaborator

I'm not sure what our process is, should we file an issue on github for this?

@csringhofer
Copy link
Collaborator Author

I'm not sure what our process is, should we file an issue on github for this?

There is already an issue about rowcount: #302
I am also unsure about the process.

Copy link
Collaborator

@joemcdonnell joemcdonnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

@csringhofer csringhofer merged commit b941bfc into cloudera:master Jan 30, 2024
@csringhofer csringhofer mentioned this pull request Jan 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants