Skip to content
A Serverless project to help you operate on every existing item in a DynamoDB table
Branch: master
Clone or download
Latest commit 879ef00 Mar 6, 2019
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
example
.gitignore
README.md Update README.md Mar 5, 2019
handler.py
serverless.yml Fix service after testing Feb 7, 2019

README.md

Serverless DynamoDB Scanner

This is a Serverless application that scans a given DynamoDB table and inserts every item into a Kinesis Stream. You can then process the Kinesis stream, allowing you to perform an operation on all existing items in a DynamoDB table.

It was inspired by a tweet from Eric Hammond:

We really want to run some code against every item in a DynamoDB table.

Surely there's a sample project somewhere that scans a DynamoDB table, feeds records into a Kinesis Data Stream, which triggers an AWS Lambda function?

We can scale DynamoDB and Kinesis manually. https://t.co/ZyAiLfLpWh

— Eric Hammond (@esh) February 5, 2019

Usage:

This project uses the Serverless Framework to deploy a Lambda function and associated AWS resources.

To use it, follow these steps:

  1. Install the Framework and create your service:

    # Make sure you have the Serverless Framework installed
    $ npm install -g serverless
    
    $ sls create --template-url https://github.com/alexdebrie/serverless-dynamodb-scanner --path serverless-dynamodb-scanner
    
    $ cd serverless-dynamodb-scanner
  2. Update the configuration in serverless.yml.

    Add the ARN of the DynamoDB table you want to scan and the ARN of the Kinesis stream where you want the config added:

    # serverless.yml
    
    custom:
      dynamodbTableArn: 'arn:aws:dynamodb:us-east-1:123456789012:table/my_table'
      kinesisStreamArn: 'arn:aws:kinesis:us-east-1:123456789012:stream/my-stream'
    
    ...
  3. Deploy your service:

    $ sls deploy
  4. When you're ready, kick off your scan by invoking the function:

    $ sls invoke -f scanner

How does it work?

The basic workflow is as follows:

dynamodb scanner

A diagram that's missing a few steps ¯\_(ツ)_/¯

  1. Inside the Lambda function, check AWS SSM for a LastEvaluatedKey parameter that would be sent with our Scan call.

    If no parameter exist in SSM, we're just starting the scan.

  2. Make a Scan call to our DynamoDB table.

  3. Insert the items returned from our Scan into our Kinesis Stream via a PutRecords call.

  4. If the Scan call did not return a LastEvaluatedKey, our Scan is done! We can exit the function.

  5. If the Scan did return a LastEvaluatedKey, store the value in SSM.

  6. Do a time check -- if our function has less than 15 seconds of execution time left, we'll invoke another instance of our function and exit the loop for this one. Our next function will pick up where our scan left off by using the LastEvaluatedKey in SSM.

You can’t perform that action at this time.