Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Respect _MAX_RECORDS_LIMIT #24

Closed
qbatten opened this issue Jan 30, 2023 · 7 comments · Fixed by #393
Closed

Respect _MAX_RECORDS_LIMIT #24

qbatten opened this issue Jan 30, 2023 · 7 comments · Fixed by #393
Assignees

Comments

@qbatten
Copy link
Contributor

qbatten commented Jan 30, 2023

Since we're overriding get_records from SQLStream, let's add in these two lines so we maintain same behavior as SQLStream.

brief convo about this here

@tayloramurphy
Copy link

@qbatten any chance you want to make a PR for this?

@techtangents
Copy link

Ability to limit the number of rows would be very useful for my team. We need to ingest some large tables and can't read them all in one go due to our postgres replication timeout.

@tayloramurphy
Copy link

cc @visch @pnadolny13

@visch
Copy link
Member

visch commented Mar 12, 2024

@tayloramurphy It's a quick one, I can take this one.

SQLAlchemy and pre_pool should theoretically already handle this without a MAX_RECORDS_LIMIT and it should be automatic, but I can make a seperate issue for that and handle it after we get this one in place.

@visch
Copy link
Member

visch commented Mar 13, 2024

@techtangents This issue doesn't address the issue you're after. MAX_RECORDS_LIMIT was put together for running pytest not for running taps.

I ended up implementing what you were after here anyways. Here's the PR #393

I think the "right" way to approach this issue is to look at the connection pool #394 , it is harder to do this though as we'd still want to fail eventually if postgres times out, and it'd require more effort to get in place

@techtangents
Copy link

@visch apologies and thank you.

To be clear, we'd just want it to read, say, 500k records then stop. Then the next time we run the tap, it'd read another 500k records then stop. This is what the "limit" parameter on the pipelinewise-tap-postgres does.

@techtangents
Copy link

@visch I'm not a Python guy, but, that PR looks like exactly what we want. Really appreciate you jumping in and helping with this.

visch added a commit that referenced this issue Mar 22, 2024
Closes #24

I wanted to get rid of the `get_records` function all together, but with
this addition we have to keep the override.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants