Serverless DynamoDB Scanner
This is a Serverless application that scans a given DynamoDB table and inserts every item into a Kinesis Stream. You can then process the Kinesis stream, allowing you to perform an operation on all existing items in a DynamoDB table.
It was inspired by a tweet from Eric Hammond:
We really want to run some code against every item in a DynamoDB table.— Eric Hammond (@esh) February 5, 2019
Surely there's a sample project somewhere that scans a DynamoDB table, feeds records into a Kinesis Data Stream, which triggers an AWS Lambda function?
We can scale DynamoDB and Kinesis manually. https://t.co/ZyAiLfLpWh
This project uses the Serverless Framework to deploy a Lambda function and associated AWS resources.
To use it, follow these steps:
Install the Framework and create your service:
# Make sure you have the Serverless Framework installed $ npm install -g serverless $ sls create --template-url https://github.com/alexdebrie/serverless-dynamodb-scanner --path serverless-dynamodb-scanner $ cd serverless-dynamodb-scanner
Update the configuration in
Add the ARN of the DynamoDB table you want to scan and the ARN of the Kinesis stream where you want the config added:
# serverless.yml custom: dynamodbTableArn: 'arn:aws:dynamodb:us-east-1:123456789012:table/my_table' kinesisStreamArn: 'arn:aws:kinesis:us-east-1:123456789012:stream/my-stream' ...
Deploy your service:
$ sls deploy
When you're ready, kick off your scan by invoking the function:
$ sls invoke -f scanner
How does it work?
The basic workflow is as follows:
A diagram that's missing a few steps ¯\_(ツ)_/¯
Inside the Lambda function, check AWS SSM for a
LastEvaluatedKeyparameter that would be sent with our Scan call.
If no parameter exist in SSM, we're just starting the scan.
Scancall to our DynamoDB table.
Insert the items returned from our Scan into our Kinesis Stream via a
Scancall did not return a
LastEvaluatedKey, our Scan is done! We can exit the function.
Scandid return a
LastEvaluatedKey, store the value in SSM.
Do a time check -- if our function has less than 15 seconds of execution time left, we'll invoke another instance of our function and exit the loop for this one. Our next function will pick up where our scan left off by using the