feat: add read_items to dynamodb module #1877

a-slice-of-py · 2022-12-16T08:31:45Z

Feature or Bugfix

Feature

Detail

Add to DynamoDB module a new read_items function to handle read operations towards a given DynamoDB Table, returning the results as pandas DataFrame

Relates

A candidate for wr.dynamodb.read_items #1867

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

malachi-constant · 2022-12-16T09:11:06Z

AWS CodeBuild CI Report

CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
Commit ID: 389754c
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

malachi-constant · 2022-12-16T09:39:21Z

AWS CodeBuild CI Report

CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
Commit ID: 3fa32c5
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

…mypy

* explicitly return None instead of return * switch from Mapping to Dict

malachi-constant · 2022-12-16T14:29:13Z

AWS CodeBuild CI Report

CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
Commit ID: fbc75bd
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

malachi-constant · 2022-12-16T14:40:02Z

AWS CodeBuild CI Report

CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
Commit ID: f819f8e
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

malachi-constant · 2022-12-16T14:52:48Z

AWS CodeBuild CI Report

CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
Commit ID: 3234d41
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

malachi-constant · 2022-12-16T15:00:08Z

AWS CodeBuild CI Report

CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
Commit ID: 3dcee5a
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

malachi-constant · 2022-12-16T15:10:57Z

AWS CodeBuild CI Report

CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
Commit ID: dd85bf6
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

malachi-constant · 2022-12-16T16:15:25Z

AWS CodeBuild CI Report

CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
Commit ID: 63f87c9
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

malachi-constant · 2022-12-16T17:43:59Z

AWS CodeBuild CI Report

CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
Commit ID: 0a22032
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

malachi-constant · 2022-12-16T18:07:59Z

AWS CodeBuild CI Report

CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
Commit ID: 24152a6
Result: FAILED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

a-slice-of-py · 2022-12-16T18:12:10Z

Ok, after some fighting with my local VSC setup I finally managed to review code and successfully complete all checks except CodeBuild: I wait for a feedback from your side because I don't know how to fix/proceed.

Regarding new feature testing: I should be able to add some mocked tests like the ones added for partiql, but I am not sure how much relevant they would be.

For more reliable tests against actual DynamoDB tables in a real AWS account, I probably need some guidance and/or let someone else do the job (to avoid author-bias in tests writing as well as risks of messing up with datalake/full test environments - I think I am not sure about what to do).

jaidisido · 2022-12-19T16:20:25Z

Thank you @a-slice-of-py, this is a great start. My team will review and we can either suggest or create some tests. I agree that they should be integration and not unit tests. We already have some but in a separate branch.

Don't worry about the failing CB, it's a flaky test on our side, nothing to do with your PR

malachi-constant · 2022-12-19T19:07:52Z

AWS CodeBuild CI Report

CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
Commit ID: 5b1e74b
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

malachi-constant · 2022-12-20T13:13:29Z

AWS CodeBuild CI Report

CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
Commit ID: e42475f
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

jaidisido

This is looking great, left some initial comments. We will crack on with some tests to give you an idea of how to create them

jaidisido · 2022-12-20T15:13:25Z

awswrangler/dynamodb/_read.py

+        # SEE: handle possible unprocessed keys. As suggested in Boto3 docs,
+        # this approach should involve exponential backoff, but this should be
+        # already managed by AWS SDK itself, as stated
+        # [here](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Programming.Errors.html)


I agree that it should be handled by the SDK. If not, we do have a method which is used for exponential backoff. It requires knowing the exception though

jaidisido · 2022-12-20T15:37:28Z

awswrangler/dynamodb/_read.py

+    max_items_evaluated: Optional[int] = None,
+    as_dataframe: bool = True,
+    boto3_session: Optional[boto3.Session] = None,
+) -> Union[pd.DataFrame, List[Dict[str, Any]]]:


We usually only return a pandas DataFrame given that this is supposed to be an AWS SDK for pandas. However I do see the appeal of also enabling list of dicts, especially that the as_dataframe argument is True by default

Yeah, I totally understand the focus of awswrangler, but I thought that provide a pandas opt-out choice would encourage the adoption of this new feature and related refactoring of pre-existing code.

Users can adopt it step-by-step: first by substituting their current snippets - which might return "raw" items as list of dicts - with wr.dynamodb.read_items(..., as_dataframe=False), and only then dealing with a change in data structures.

jaidisido · 2022-12-20T15:44:50Z

awswrangler/dynamodb/_read.py

+
+
+@apply_configs
+def read_items(


One additional argument we usually consider in our read methods is chunked. If set to True then an Iterator of data frames is yielded. It would make the API memory friendly as the user would be able to process items in chunks, particularly when doing a full scan on the table.

This might be a bit more complex so we could consider it later once we have some working tests.

jaidisido · 2022-12-20T15:49:33Z

awswrangler/dynamodb/_read.py

+        kwargs["Limit"] = max_items_evaluated
+
+    # If kwargs are sufficiently informative, proceed with actual read op
+    if partition_values or key_condition_expression or filter_expression or allow_full_scan or max_items_evaluated:


Using any would be simpler perhaps?
if any((partition_values, key_condition_expression,...)):

Definitely, thanks.

malachi-constant · 2022-12-20T16:17:03Z

AWS CodeBuild CI Report

CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
Commit ID: 8b39ce4
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

a-slice-of-py · 2022-12-21T07:33:38Z

awswrangler/dynamodb/_read.py

+                # SEE: recursive approach guarantees that each reserved keyword will be properly replaced,
+                # even if it will require as many calls as the reserved keywords involved (not so efficient...)
+                return wrapper(*args, **kwargs)


@jaidisido do you have any feedback on this?

I wasn't able to find any way to obtain the list of DynamoDB reserved keywords programmatically (having that beforehand would ease the implementation without requiring recursion).

I did look/ask around and there is no programmatic way to access the list of reserved keywords sadly.

That being said, looking at its GH history that list is static. It has not been changed since its inception.

This makes sense as adding a keyword to the list would be a serious breaking change in production.

All of this to say that I would be ok with creating a separate file hosting the list of keywords and using it for your check (example).

Thanks for the reference.

I tried to switch to a LBYL approach for reserved words management but I found it harder than expected - mainly due to the diverse format which ProjectionExpression, KeyConditionExpression and FilterExpression can assume - making difficult to find a general preventive action.

So, given that:

expressions can be also expressed via Key and Attr, which seem to sanitize reserved words too

the (current) EAFP approach is already available

I propose to keep it as it is, at least for the moment.

If we are worried about the recursive approach (which leads to as many validation client calls in case as number of reserved words in the given kwargs), we can put it behind an opt-in/opt-out kwarg, something like sanitize_dynamodb_reserved_words: bool.

Agreed, I think it's ok to keep it as is for now as it would be too complex to capture all edge cases from the involved expressions

malachi-constant · 2022-12-21T23:40:40Z

AWS CodeBuild CI Report

CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
Commit ID: 38e7686
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

jaidisido · 2022-12-22T17:17:11Z

This is looking great, left some initial comments. We will crack on with some tests to give you an idea of how to create them

I have added some initial tests to get us going as discussed @a-slice-of-py. The test coverage report displayed at the end of our CB logs is one measure of the quality of tests that we should implement.

malachi-constant · 2022-12-22T17:39:06Z

AWS CodeBuild CI Report

CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
Commit ID: 1842ee5
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

malachi-constant · 2022-12-23T14:35:40Z

AWS CodeBuild CI Report

CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
Commit ID: 989d788
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

a-slice-of-py · 2022-12-27T09:50:23Z

This is looking great, left some initial comments. We will crack on with some tests to give you an idea of how to create them

I have added some initial tests to get us going as discussed @a-slice-of-py. The test coverage report displayed at the end of our CB logs is one measure of the quality of tests that we should implement.

Thanks @jaidisido: I looked at the tests and they seem to already cover nearly all the possible usage of read_items (excepted the ones involving pagination and unprocessed keys), as confirmed by the coverage reported at the end of logs - awesome.

I just don't get why CB logs reported as missing lines 258->269, even if they should have been hit in all scan/query ops, for example at line 115 of test_dynamodb.py

jaidisido · 2022-12-28T15:39:50Z

Looking in detail into the coverage report I can see that use_query and use_scan cases were hit (green), it's just the surrounding pagination and unprocessed keys that is missed (red)

I am happy with where we are with this, just waiting on others to review before merging. Thanks again @a-slice-of-py

malachi-constant · 2022-12-29T17:27:55Z

AWS CodeBuild CI Report

CodeBuild project: GitHubCodeBuild8756EF16-4rfo0GHQ0u9a
Commit ID: 7e8b4d2
Result: SUCCEEDED
Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

malachi-constant

Great work, thanks

Silvio Lugaro added 2 commits December 16, 2022 09:24

feat: add read_items to dynamodb module (aws#1867)

389754c

fix: remove walrus operator to ensure python 3.7+ compatibility

3fa32c5

Silvio Lugaro added 5 commits December 16, 2022 14:24

fix: update type annotations to fix errors highlighted by mypy

ec28bdc

fix: another update to type annotations to fix errors highlighted by …

fbc75bd

…mypy

fix: still updating type annotations

c4ae93b

* explicitly return None instead of return * switch from Mapping to Dict

fix: add CustomCallable to fix mypy error in decorator type annotations

f819f8e

fix: bound wrapper within decorator to a fixed signature

3234d41

fix: rollback to wrapper generic signature, but now is explicitly casted

3dcee5a

Silvio Lugaro added 2 commits December 16, 2022 15:47

fix: abide to flake8 linting

dd85bf6

fix: abide to flake8 linting reqs also in dynamodb init

aa75605

fix: minor changes to avoid pylint errors

63f87c9

Silvio Lugaro added 2 commits December 16, 2022 18:18

fix: remove wildcard to avoid pylint R0912 (too-many-branches)

0a22032

fix: solve pylint R0912 with return refactoring

7eff24a

fix: sort imports in dynamodb init

24152a6

a-slice-of-py changed the title ~~feat: add read_items to dynamodb module (#1867)~~ feat: add read_items to dynamodb module Dec 19, 2022

malachi-constant requested review from malachi-constant, cnfait and jaidisido December 19, 2022 16:46

malachi-constant requested review from kukushking and LeonLuttenberger December 19, 2022 16:46

malachi-constant assigned a-slice-of-py Dec 19, 2022

malachi-constant added the feature label Dec 19, 2022

Merge branch 'main' into main

5b1e74b

Merge branch 'main' into main

e42475f

jaidisido reviewed Dec 20, 2022

View reviewed changes

Merge branch 'main' into main

8b39ce4

a-slice-of-py commented Dec 21, 2022

View reviewed changes

Merge branch 'main' into main

38e7686

(tests): Add initial DynamoDB integration tests

1842ee5

(tests): Add expression tests and increase coverage

989d788

jaidisido approved these changes Dec 28, 2022

View reviewed changes

Merge branch 'main' into main

7e8b4d2

Merge branch 'main' into main

100090f

malachi-constant approved these changes Dec 29, 2022

View reviewed changes

Merge branch 'main' into main

ab450f3

jaidisido merged commit bd51887 into aws:main Dec 29, 2022

a-slice-of-py mentioned this pull request Jun 30, 2023

wr.dynamodb.read_items ignores max_items_evaluated kwarg #2371

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add read_items to dynamodb module #1877

feat: add read_items to dynamodb module #1877

a-slice-of-py commented Dec 16, 2022

malachi-constant commented Dec 16, 2022

malachi-constant commented Dec 16, 2022

malachi-constant commented Dec 16, 2022

malachi-constant commented Dec 16, 2022

malachi-constant commented Dec 16, 2022

malachi-constant commented Dec 16, 2022

malachi-constant commented Dec 16, 2022

malachi-constant commented Dec 16, 2022

malachi-constant commented Dec 16, 2022

malachi-constant commented Dec 16, 2022

a-slice-of-py commented Dec 16, 2022

jaidisido commented Dec 19, 2022

malachi-constant commented Dec 19, 2022

malachi-constant commented Dec 20, 2022

jaidisido left a comment

jaidisido Dec 20, 2022

jaidisido Dec 20, 2022

a-slice-of-py Dec 21, 2022

jaidisido Dec 20, 2022

jaidisido Dec 20, 2022

a-slice-of-py Dec 21, 2022

malachi-constant commented Dec 20, 2022

a-slice-of-py Dec 21, 2022

jaidisido Dec 21, 2022

a-slice-of-py Dec 27, 2022 •

edited

jaidisido Dec 28, 2022

malachi-constant commented Dec 21, 2022

jaidisido commented Dec 22, 2022

malachi-constant commented Dec 22, 2022

malachi-constant commented Dec 23, 2022

a-slice-of-py commented Dec 27, 2022

jaidisido commented Dec 28, 2022 •

edited

malachi-constant commented Dec 29, 2022

malachi-constant left a comment



		@apply_configs
		def read_items(

feat: add read_items to dynamodb module #1877

feat: add read_items to dynamodb module #1877

Conversation

a-slice-of-py commented Dec 16, 2022

Feature or Bugfix

Detail

Relates

malachi-constant commented Dec 16, 2022

AWS CodeBuild CI Report

malachi-constant commented Dec 16, 2022

AWS CodeBuild CI Report

malachi-constant commented Dec 16, 2022

AWS CodeBuild CI Report

malachi-constant commented Dec 16, 2022

AWS CodeBuild CI Report

malachi-constant commented Dec 16, 2022

AWS CodeBuild CI Report

malachi-constant commented Dec 16, 2022

AWS CodeBuild CI Report

malachi-constant commented Dec 16, 2022

AWS CodeBuild CI Report

malachi-constant commented Dec 16, 2022

AWS CodeBuild CI Report

malachi-constant commented Dec 16, 2022

AWS CodeBuild CI Report

malachi-constant commented Dec 16, 2022

AWS CodeBuild CI Report

a-slice-of-py commented Dec 16, 2022

jaidisido commented Dec 19, 2022

malachi-constant commented Dec 19, 2022

AWS CodeBuild CI Report

malachi-constant commented Dec 20, 2022

AWS CodeBuild CI Report

jaidisido left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

malachi-constant commented Dec 20, 2022

AWS CodeBuild CI Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

a-slice-of-py Dec 27, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

malachi-constant commented Dec 21, 2022

AWS CodeBuild CI Report

jaidisido commented Dec 22, 2022

malachi-constant commented Dec 22, 2022

AWS CodeBuild CI Report

malachi-constant commented Dec 23, 2022

AWS CodeBuild CI Report

a-slice-of-py commented Dec 27, 2022

jaidisido commented Dec 28, 2022 • edited

malachi-constant commented Dec 29, 2022

AWS CodeBuild CI Report

malachi-constant left a comment

Choose a reason for hiding this comment

a-slice-of-py Dec 27, 2022 •

edited

jaidisido commented Dec 28, 2022 •

edited