Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed PostgresToGCSOperator fail on empty resultset for use_server_side_cursor=True #21307

Merged
merged 2 commits into from
Feb 15, 2022

Conversation

kazanzhy
Copy link
Contributor

@kazanzhy kazanzhy commented Feb 3, 2022

Attribute description of simple psycopg2 cursor is available after .execute() method.
But for the named cursor (server-side cursor) any .fetch* method call is needed.

In case of an empty result set like .execute("SELECT 1 LIMIT 0"):

  • for the simple cursor, there are initialized .description and an empty iterator
  • for the server-side cursor, there are not initialized .descriptionand an empty original iterator.
    But also we have self.rows=[None], as a result of fetchone() on an empty iterator of the server-side cursor.
    This None makes _PostgresServerSideCursorDecorator non-empty iterator.

I decided to remove rows attribute because it could consist of a maximum of one row, and also psycopg2 server-side cursor have .scroll() method and it is safe to call .scroll() on empty resultset even without .fetchone()

closes #20007

@boring-cyborg boring-cyborg bot added area:providers provider:google Google (including GCP) related issues labels Feb 3, 2022
self.initialized = True
self.cursor.fetchone()
self.cursor.scroll(0, mode='absolute')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If .scroll(-1) is more clear then I change

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your current approach better.

Copy link
Contributor Author

@kazanzhy kazanzhy Feb 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the misleading.

First of all server-side cursor doesn't support relative mode by default. It should be initialized like conn.cursor(name='...', scrollable=True). But absolute mode worked without initialization.

But both scrolls cause a reset of .description attribute. So, unfortunately, they could not be used.

@kazanzhy

This comment was marked as outdated.

@kazanzhy kazanzhy force-pushed the pg_server_side_cursor_on_empty_20007 branch from 80c506a to e6db1c7 Compare February 3, 2022 22:53
@kazanzhy kazanzhy changed the title Fixed error in PostgresToGCSOperator on the empty resultset Fixed PostgresToGCSOperator(use_server_side_cursor=True) fail on empty resultset Feb 3, 2022
@kazanzhy kazanzhy changed the title Fixed PostgresToGCSOperator(use_server_side_cursor=True) fail on empty resultset Fixed PostgresToGCSOperator fail on empty resultset for use_server_side_cursor=True Feb 3, 2022
@uranusjr
Copy link
Member

uranusjr commented Feb 4, 2022

Actually, I think _PostgresServerSideCursorDecorator could be removed if

Are these existing tests to verify the behaviour of use_server_side_cursor? This suggested change makes sense to me but maybe I missed some cases expected by users.

@kazanzhy kazanzhy marked this pull request as ready for review February 4, 2022 17:57
@kazanzhy
Copy link
Contributor Author

kazanzhy commented Feb 4, 2022

Today I made few tests:

Current version (code from main branch):

  • Query returns rows and use_server_side_cursor=False: file with data
  • Query returns rows and use_server_side_cursor=True: file with data
  • Query returns nothing and use_server_side_cursor=False: empty file
  • Query returns nothing and use_server_side_cursor=True: ERROR

Updated version (this PR code):

  • Query returns rows and use_server_side_cursor=False: file with data
  • Query returns rows and use_server_side_cursor=True: file with data
  • Query returns nothing and use_server_side_cursor=False: empty file
  • Query returns nothing and use_server_side_cursor=True: empty file

The last one is not what is expected in the issue but I think creating the empty file is the right behavior

I'm expected, that operator on empty table not creating file and no upload it on Google Cloud.

@kazanzhy kazanzhy force-pushed the pg_server_side_cursor_on_empty_20007 branch 2 times, most recently from 512f28b to 1584d42 Compare February 5, 2022 23:28
@potiuk potiuk closed this Feb 6, 2022
@potiuk potiuk reopened this Feb 6, 2022
@kazanzhy kazanzhy force-pushed the pg_server_side_cursor_on_empty_20007 branch from 1584d42 to 6321d54 Compare February 9, 2022 20:35
@kazanzhy kazanzhy force-pushed the pg_server_side_cursor_on_empty_20007 branch from 6321d54 to ad1d827 Compare February 11, 2022 10:24
@potiuk potiuk merged commit 2eb1056 into apache:main Feb 15, 2022
@kazanzhy kazanzhy deleted the pg_server_side_cursor_on_empty_20007 branch February 16, 2022 09:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PostgresToGCSOperator fail on empty table and use_server_side_cursor=True
3 participants