Skip to content
A simple tool for replaying S3 file creation Lambda invocations
Python Shell Dockerfile
Branch: master
Clone or download

Latest commit

Fetching latest commit…
Cannot retrieve the latest commit at this time.


Type Name Latest commit message Commit time
Failed to load latest commit information.


A simple tool for replaying S3 file creation Lambda invocations. This is useful for backfill or replay on real-time ETL pipelines that run transformations in Lambdas triggered by S3 file creation events.


  1. Collect inputs from user
  2. Scan S3 for filenames that need to be replayed
  3. Batch S3 files into payloads for Lambda invocations
  4. Spawn workers to handle individual Lambda invocations/retries
  5. Process the work queue, keeping track of progress in a file in case of interrupts

Getting Started

  1. First step is to setup a python3 venv to hold our deps
  1. Run the command and follow the prompts


Run the help command for a full list of available command line options

python --help

Example Command Line Execution

Note that we must escape the $ This example is also included in the file

python3 \
-b gamesight-collection-pipeline-us-west-2-prod \
-p twitch/all/chatters/\$LATEST/objects/dt=2020-01-02-08-00/,twitch/all/chatters/\$LATEST/objects/dt=2020-01-02-09-00/ \
-l gstrans-prod-twitch-all-chatters

Using the command line also allows us to quickly include paths that aren't along / separation lines. For example, we can use the path twitch/all/chatters/\$LATEST/objects/dt=2020-01-02-1 to look at all records between 10:00 and 20:00 on 2020-01-02, or twitch/all/chatters/\$LATEST/objects/dt=2020-01-02 to just run all of the objects from that day.

You can’t perform that action at this time.