You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The output we are getting from the DynamoDB read_items call is different between version 3.4.2 and 3.5.0. In version 3.5.0, the Dynamo DB datatypes are returned in the dataframe in each item.
This is demonstrated below with AWS Glue version 4.0 - we had a production process nailed this morning ;)
We can work around this change, but I wanted to check if it was an intended breaking change. There isn't any explicit mention of it in the release documentation (that I could see - read_items doesn't come up).
How to Reproduce
The following Glue script will demonstrate the issue.
You will require a DynamoDB table MyTable with a partition key Process and a sort key UniqueId.
You can set the wrangler version used by glue with the usual --additional-python-modules job parameter. awswrangler==3.4.2 or awswrangler==3.5.0
import boto3
import sys
import awswrangler as wr
from boto3.dynamodb.conditions import Key
from datetime import datetime
print("Starting custom Glue utility.")
print(wr. __version__)
table_name='MyTable'
sync_key = datetime.now().strftime("%Y%m%d_%H%M%S_%f")
item = {
"Process": "TEST", # <- Partition Key of the DynamoDB Table
"UniqueId": "ABC123", # <- Sort Key of the DynamoDB Table
"SyncKey": sync_key,
}
dynamodb = boto3.resource("dynamodb")
dyn_table = dynamodb.Table(table_name)
dyn_table.put_item(Item=item)
# try to get
items_df = wr.dynamodb.read_items(
table_name = table_name,
key_condition_expression=Key('Process').eq("TEST"),
consistent = True,
)
print(items_df["UniqueId"].to_list())
print("Completed custom Glue utility.")
Expected behavior
The output of the read_items call should be consistent between V3.4.2 and V3.5.0 (unless change is intended).
Your project
No response
Screenshots
No response
OS
AWS Glue V4.0, Type = Spark
Python version
3.9
AWS SDK for pandas version
3.5.0
Additional context
(We are using a Spark job because our usual process requires Spark; this script doesn't need it, but I've included it to ensure reproducing it has the best chance.)
The text was updated successfully, but these errors were encountered:
Describe the bug
The output we are getting from the DynamoDB
read_items
call is different between version 3.4.2 and 3.5.0. In version 3.5.0, the Dynamo DB datatypes are returned in the dataframe in each item.This is demonstrated below with AWS Glue version 4.0 - we had a production process nailed this morning ;)
We can work around this change, but I wanted to check if it was an intended breaking change. There isn't any explicit mention of it in the release documentation (that I could see -
read_items
doesn't come up).How to Reproduce
The following Glue script will demonstrate the issue.
You will require a DynamoDB table
MyTable
with a partition keyProcess
and a sort keyUniqueId
.You can set the wrangler version used by glue with the usual
--additional-python-modules
job parameter.awswrangler==3.4.2
orawswrangler==3.5.0
Expected behavior
The output of the
read_items
call should be consistent between V3.4.2 and V3.5.0 (unless change is intended).Your project
No response
Screenshots
No response
OS
AWS Glue V4.0, Type = Spark
Python version
3.9
AWS SDK for pandas version
3.5.0
Additional context
(We are using a Spark job because our usual process requires Spark; this script doesn't need it, but I've included it to ensure reproducing it has the best chance.)
The text was updated successfully, but these errors were encountered: