-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse DynamoDB Stream events? #1212
Comments
Something similar to https://github.com/awslabs/logstash-input-dynamodb/blob/master/lib/logstash/inputs/DynamoDBLogParser.rb#L121-L161 |
There is not a public interface for converting DynamoDB stream events to vanilla Ruby hashes and types. I think this would be a helpful interface though. We can track this as a feature request. |
I've moved this to our public backlog. |
Thanks @trevorrowe |
@phstc could you provide more information for the usage scenario? I'm going to work on this feature request, so it would be nice if you could provide more information. For example, How did you receive this JSON output? Perhaps you are integrating DynamoDB stream with Lambda? Or it comes from Cheers! |
Hey @cjyclaire
🍻 I have a stream connected to a lambda that enqueues DynamoDB records to an SQS queue. Then I process these messages using a Ruby worker, that needs this event parser. But in the Logstash snippet I used as an example above, they are receiving the JSON from the DynamoDB stream directly. |
@phstc , good to know, thanks! |
Currently Adding a utlity to parse the JSON document passed to the Lambda function would be pretty straight-forward. We have two options for this interface:
Thoughts? |
Agreed on preserving backwards compatibility. I'd prefer the 2nd option while treats "structs-to-vanilla-hash" as an opt-in feature. In this way, the consistency in response is maintained and vanilla hash enhancement is also available. |
I kind of prefer the 1st option, so I can freely use the hash in the same way I use hashes returned from the DynamoDB queries, scans etc. |
That's true, in terms of usage, the 1st option does make more sense. |
@phstc Can you clarify what you mean? I think maybe the options I proposed above were poorly phrased. Given the following, completely made-up, JSON document as sent to a lambda function: {
"awsRegion": "us-east-1",
"dynamodb": {
"ApproximateCreationDateTime": 123456789,
"Keys": {
"id" : {
"S": "abc",
}
},
"NewImage": {
"id" : {
"S": "abc",
},
"name" : {
"S": "new-name"
},
"size" : {
"N": 456
},
"enabled" : {
"BOOL": true
}
},
"OldImage": {
"id" : {
"S": "abc",
},
"name" : {
"S": "old-name"
},
"size" : {
"N": 123
},
"enabled" : {
"BOOL": false
}
},
"SequenceNumber": "seq-num",
"SizeBytes": 1245,
"StreamViewType": "view-type"
},
"eventID": "event-id",
"eventName": "event-name",
"eventSource": "event-source",
"eventVersion": "event-version"
} Option 1: Minimal conversion of attribute valuesJSON parses the document and convert only attribute values to their ruby equivalents. Other values, like the {
"awsRegion" => "us-east-1",
"dynamodb" => {
"ApproximateCreationDateTime" => 123456789,
"Keys" => {
"id" => "abc"
},
"NewImage" => {
"id" => "abc",
"name" => "new-name"
"size" => 456,
"enabled" => true
},
"OldImage" => {
"id" => "abc",
"name" => "old-name"
"size" => 123,
"enabled" => false
},
"SequenceNumber" => "seq-num",
"SizeBytes" => 1245,
"StreamViewType" => "view-type"
},
"eventID" => "event-id",
"eventName" => "event-name",
"eventSource" => "event-source",
"eventVersion" => "event-version"
} Option 2: Convert to SDK struct types + attribute valuesConvert the JSON into the types defined by the SDK. Attribute values would be represented with vanilla ruby hashes / values, not their struct type. Note that non-attribute value types would be converted (like the approximate_creation_date_time below). Also, structs use snake_case instead of the default casing. #<struct Aws::DynamoDBStreams::Types::Record
event_id="event-id",
event_name="event-name",
event_version="event-version",
event_source="event-source",
aws_region="us-east-1",
dynamodb=
#<struct Aws::DynamoDBStreams::Types::StreamRecord
approximate_creation_date_time=1973-11-29 13:33:09 -0800,
keys={"id"=>"abc"},
new_image={"id"=>"abc","name"=>"new-name", "size"=>456, "enabled"=>true},
old_image={"id"=>"abc","name"=>"old-name", "size"=>123, "enabled"=>false},
sequence_number="seq-num",
size_bytes=1245,
stream_view_type="view-type">> Thoughts? |
@trevorrowe sure, I was thinking about For my specific use case, what I honestly need only is something that converts |
you can use this quick and dirty method to convert
|
In case someone else needs it, based on @RedaBenh's example, I created a class for parsing the whole dynamoDB event: https://gist.github.com/alexperto/eb88db235d66bda85979fe08d047b18f |
Reopening - deprecating usage of Feature Requests backlog markdown file. |
Is this still on the pipeline? Really having problems converting DynamoDB JSON to plain ruby hashes to the lambda. Also, the examples given cannot handle deeply nested structures. |
This isn't actively being worked on. I'm happy to take any pull requests. I imagine that the implementation would be very similar to the dynamodb simple attributes plugin (https://github.com/aws/aws-sdk-ruby/blob/master/gems/aws-sdk-dynamodb/lib/aws-sdk-dynamodb/plugins/simple_attributes.rb). If I'm understanding the problem correctly, the output of dynamodb streams is similar to those of dynamodb and you'd want to use the unmarshaller to convert to simple types. |
Just dug through the sdk code and came up with this, needs some polish but got me unblocked. def extract_item(image)
shape_ref = Aws::DynamoDB::ClientApi::Shapes::ShapeRef.new(shape: Aws::DynamoDB::ClientApi::GetItemOutput)
parser = Aws::Json::Parser.new(shape_ref)
translator = Aws::DynamoDB::Plugins::SimpleAttributes::ValueTranslator.new(shape_ref, :unmarshal)
translator.apply(parser.parse(JSON.generate('Item' => image))).item
end DemoCreating a complex demo item: ddb.put_item(table_name: 'Demo', item: { pk: 'demo', sk: 'watwat3', some: 'data', nested: { array: [1,2,3,'four'], hash: { complex: true, set: Set.new([1,1,1,1]) } } }) Extracting the new image from each of context records: [136] pry(main)> context['Records'].map { |record| extract_item(record['dynamodb']['NewImage']) }
=> [{"some"=>"data",
"sk"=>"watwat3",
"pk"=>"demo",
"nested"=>
{"array"=>[0.1e1, 0.2e1, 0.3e1, "four"],
"hash"=>{"set"=>#<Set: {0.1e1}>, "complex"=>true}}}]
[137] pry(main)> DDB Trigger context: {
"Records": [
{
"eventID": "bc69ab5e64ef472f0a4b330fa3bfe29f",
"eventName": "MODIFY",
"eventVersion": "1.1",
"eventSource": "aws:dynamodb",
"awsRegion": "us-west-2",
"dynamodb": {
"ApproximateCreationDateTime": 1609037147,
"Keys": {
"sk": {
"S": "watwat3"
},
"pk": {
"S": "demo"
}
},
"NewImage": {
"some": {
"S": "data"
},
"sk": {
"S": "watwat3"
},
"pk": {
"S": "demo"
},
"nested": {
"M": {
"array": {
"L": [
{
"N": "1"
},
{
"N": "2"
},
{
"N": "3"
},
{
"S": "four"
}
]
},
"hash": {
"M": {
"set": {
"NS": [
"1"
]
},
"complex": {
"BOOL": true
}
}
}
}
}
},
"OldImage": {
"some": {
"S": "data"
},
"sk": {
"S": "watwat3"
},
"pk": {
"S": "demo"
}
},
"SequenceNumber": "78600000000028145232611",
"SizeBytes": 116,
"StreamViewType": "NEW_AND_OLD_IMAGES"
},
"eventSourceARN": "arn:aws:dynamodb:us-west-2:<not you>:table/Demo/stream/2020-12-27T01:31:08.439"
}
]
} |
Hello, I was facing the same problem, I needed to parse DynamoDB stream records, so I created these gems:
Test file: require 'aws-sdk-dynamodb-attribute-deserializer'
require 'aws-sdk-dynamodbstreams-event-parser'
test_item = {
'string' => {
'S' => 'test string'
},
'list' => {
'L' => [{ 'S' => 's1' }, { 'S' => 's2' }, { 'N' => '5' }]
},
'stringset' => {
'SS' => ['ss1']
},
'numberset' => {
'NS' => ['123']
},
'map' => {
'M' => {
'country_code' => { 'S' => 'FR' },
'sublist' => { 'L' => [{ 'N' => '123' }] },
'ns' => { 'NS' => ['123'] },
'submap' => {
'M' => {
'subkey' => { 'S' => 'substring' },
'sublist' => { 'L' => [{ 'S' => 'test' }] },
'submap2' => { 'M' => { 'submap2_key1' => { 'S' => 'test' } } },
'sublist2' => { 'L' => [{ 'M' => { 'submap2_key1' => { 'S' => 'test' } } }] }
}
}
}
},
'boolean_true' => {
'BOOL' => true
},
'boolean_false' => {
'BOOL' => false
},
'binary' => { 'B' => 'QftNXxB13kBXD2x5ZmYrDQ==' },
'binaryset' => { 'BS' => ['QftNXxB13kBXD2x5ZmYrDQ==', '9ubi62eYsx0H/MK6uQQgDA=='] }
}
raw_event = {
'eventID' => '64f3f4c0c43db06b5c00418a42a6fff2',
'eventName' => 'INSERT',
'eventVersion' => '1.1',
'eventSource' => 'aws:dynamodb',
'awsRegion' => 'eu-west-3',
'dynamodb' => {
'ApproximateCreationDateTime' => 1609926587,
'Keys' => {
'string' => {
'S' => 'test string'
},
'number' => {
'N' => '123456'
}
},
'NewImage' => test_item,
'SequenceNumber' => '82061200000000005864545644',
'SizeBytes' => 405,
'StreamViewType' => 'NEW_AND_OLD_IMAGES'
},
'eventSourceARN' => 'arn:aws:dynamodb:eu-west-3:xxxxxxxx:table/xxxxxxxxx/stream/1970-01-06T00:00:00.000'
}
raw_event_str = raw_event.to_json
puts 'Parse item from raw DynamoDB attributes'
pp Aws::DynamoDB::AttributeDeserializer.call(test_item)
puts '-----------------------------------------------'
puts 'Parse event from JSON object'
event_parser = Aws::DynamoDBStreams::EventParser.new
event = event_parser.from(raw_event)
pp event
puts '-----------------------------------------------'
puts 'Parse event from JSON-encoded string'
event = event_parser.parse(raw_event_str)
pp event Test file results:
Cheers! |
Thank you both for providing solutions. I've set aside some time to look at this and I'm currently investigating how this should fit in the SDK. This could either be a customization (utility class) for parsing, or it could be a plug-in that directly modifies the response records. If it were a plugin, it would have to be default turned off for backwards compatibility. |
Hello @mullermp Best solution would be an utility class for parsing I think :) Cheers! |
@saluzafa Yes, there will be a utility class, but I will also add a plugin. There are two different inputs here - those using lambda to get an event (ruby hash), and those using |
|
We've added a utility that you can use to parse lambda events into simple attributes:
If you're using the SDK with
|
Hey
Is there a way to parse DynamoDB Stream events using the SDK? I receive a JSON from a DynamoDB stream in a process, and I would like to parse the
NewImage
into a "normal" hash, without the typesS
,M
etc as root keys.The JSON I receive:
And I would like to parse the
NewImage
to:Is there any helper in the SDK that parses it?
The text was updated successfully, but these errors were encountered: