Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for pagination for timestream.list_databases list_tables #1846

Merged
merged 4 commits into from
Dec 12, 2022

Conversation

cnfait
Copy link
Contributor

@cnfait cnfait commented Dec 6, 2022

Feature or Bugfix

  • Bugfix

Relates

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@cnfait cnfait added the bug Something isn't working label Dec 6, 2022
@cnfait cnfait added this to the 2.19.0 milestone Dec 6, 2022
@cnfait cnfait self-assigned this Dec 6, 2022
@cnfait cnfait added this to In Review in AWS SDK for pandas roadmap Dec 6, 2022
@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: f3731cd
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

Copy link
Contributor

@kukushking kukushking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Copy link
Contributor

@jaidisido jaidisido left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, let's just make sure the CB passes

Copy link
Contributor

@LeonLuttenberger LeonLuttenberger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it's worthwhile to create some tests for this? Creating too many timestream databases/tables might take too long, but maybe we could just mock the boto calls in order to make sure that we're actually testing the pagination. As in the first response returns nextToken, but the second one does not.

I've written and deployed code in the past where my while "nextToken" in response was never properly tested, and thus ended up not working when we needed to do pagination.

@cnfait
Copy link
Contributor Author

cnfait commented Dec 7, 2022

Do you think it's worthwhile to create some tests for this? Creating too many timestream databases/tables might take too long, but maybe we could just mock the boto calls in order to make sure that we're actually testing the pagination. As in the first response returns nextToken, but the second one does not.

I've written and deployed code in the past where my while "nextToken" in response was never properly tested, and thus ended up not working when we needed to do pagination.

I'm ok updating the tests to check that pagination works properly - but in that case it is better if I handle MaxResults (add a "limit" argument to list_databases/list_tables perhaps?) as well to avoid having to create tens of databases/tables.

Long term I'm actually wondering if it is worth having list_databases and list_tables at all in our API. People can use straight-boto3 and I'm not sure what value we're adding for these two calls. Any thought on this?

@kukushking
Copy link
Contributor

@cnfait My 2 cents: it's things like handling pagination is the value add here so we still might be a bit more convenient than pure boto3 calls. Also btw we could look into using generators and yielding the pages although it's not a very big win memory-wise but still nice to be able to fetch it dynamically.

@cnfait
Copy link
Contributor Author

cnfait commented Dec 7, 2022

@cnfait My 2 cents: it's things like handling pagination is the value add here so we still might be a bit more convenient than pure boto3 calls. Also btw we could look into using generators and yielding the pages although it's not a very big win memory-wise but still nice to be able to fetch it dynamically.

I thought about it but didn't want to change the type signature of the functions. Perhaps it's okay though, or I can do it for 3.0.0?

@kukushking
Copy link
Contributor

@cnfait I guess type signature could be a Union[pd.DataFrame, Iterator[pd.DataFrame]] in which case it stays backward-compatible and the iterator behaviour could be enabled by chunked parameter that is False by default like we do for some other calls - in that case we could even ship it with main but I do not have strong preference here it's fine to ship in with 3.0, too

@kukushking
Copy link
Contributor

Btw all of that is just a wishlist and can be addressed by the following PRs - this one looks great already!

@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: dc9b64e
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@malachi-constant malachi-constant merged commit f1f1090 into main Dec 12, 2022
@malachi-constant malachi-constant deleted the timestream-list-pagination branch December 12, 2022 16:39
@malachi-constant
Copy link
Contributor

AWS CodeBuild CI Report

  • CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
  • Commit ID: 739d264
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@kukushking kukushking moved this from In Review to Done in AWS SDK for pandas roadmap Dec 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Development

Successfully merging this pull request may close these issues.

None yet

5 participants