Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking change in awswrangler.dynamodb.read_items with version 3.5.0 #2605

Closed
ArtOfStew opened this issue Jan 11, 2024 · 1 comment · Fixed by #2607
Closed

Breaking change in awswrangler.dynamodb.read_items with version 3.5.0 #2605

ArtOfStew opened this issue Jan 11, 2024 · 1 comment · Fixed by #2607
Labels
bug Something isn't working

Comments

@ArtOfStew
Copy link

ArtOfStew commented Jan 11, 2024

Describe the bug

The output we are getting from the DynamoDB read_items call is different between version 3.4.2 and 3.5.0. In version 3.5.0, the Dynamo DB datatypes are returned in the dataframe in each item.

This is demonstrated below with AWS Glue version 4.0 - we had a production process nailed this morning ;)

We can work around this change, but I wanted to check if it was an intended breaking change. There isn't any explicit mention of it in the release documentation (that I could see - read_items doesn't come up).

How to Reproduce

The following Glue script will demonstrate the issue.

You will require a DynamoDB table MyTable with a partition key Process and a sort key UniqueId.

You can set the wrangler version used by glue with the usual --additional-python-modules job parameter.
awswrangler==3.4.2 or awswrangler==3.5.0

import boto3
import sys
import awswrangler as wr
from boto3.dynamodb.conditions import Key
from datetime import datetime

print("Starting custom Glue utility.")

print(wr. __version__)

table_name='MyTable'
sync_key = datetime.now().strftime("%Y%m%d_%H%M%S_%f")
item = {
    "Process": "TEST", # <- Partition Key of the DynamoDB Table
    "UniqueId": "ABC123", # <- Sort Key of the DynamoDB Table
    "SyncKey": sync_key,
}
dynamodb = boto3.resource("dynamodb")
dyn_table = dynamodb.Table(table_name)
dyn_table.put_item(Item=item)

# try to get
items_df = wr.dynamodb.read_items(
    table_name = table_name,
    key_condition_expression=Key('Process').eq("TEST"),
    consistent = True,
)

print(items_df["UniqueId"].to_list())
print("Completed custom Glue utility.")

Wrangler 3 4 2

Wrangler 3 5 0

Expected behavior

The output of the read_items call should be consistent between V3.4.2 and V3.5.0 (unless change is intended).

Your project

No response

Screenshots

No response

OS

AWS Glue V4.0, Type = Spark

Python version

3.9

AWS SDK for pandas version

3.5.0

Additional context

(We are using a Spark job because our usual process requires Spark; this script doesn't need it, but I've included it to ensure reproducing it has the best chance.)

@ArtOfStew ArtOfStew added the bug Something isn't working label Jan 11, 2024
@LeonLuttenberger
Copy link
Contributor

Hey,

Thank you for raising this issue. It is indeed a bug introduced in 3.5.0. We are working on a fix and we'll release 3.5.1 as soon as the fix is ready.

Best regards,
Leon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants