Skip to content

bufferapp/restream

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Restream

Replay Firehose Streams in Kinesis Streams!

Installation

You can use pip to install Restream.

pip install git+https://github.com/bufferapp/restream

If you prefer, you can clone it and run the setup.py file. Use the following commands to install Restream from Github:

git clone https://github.com/bufferapp/restream
cd restream
python setup.py install

Requirements

The script assumes you have data in S3 that follow the Firehose pattern (bucket/key/<year>/<month>/<day>/<hour>). Before running the script make sure you have a Kinesis Stream ready to consume a lot of data. Change the number of shards to fit the script throughput before restreaming large datasets!

Usage

The script requires the name of the bucket and the path (key) of the root Firehose folder as well as a delimited timeframe.

restream [bucket] [key] [start-date] [end-date]

For example, if the Firehose stream saves the records into s3://my-bucket/my-data/ and we want to restream the data from 2017-06-12 20:00:00 to 2017-06-12 22:00:00 we can run the following:

restream my-bucket my-data my-stream-name -sd '2017-06-12 20' -ed '2017 Jun 12 22:00' -y

Date input is flexible and has a resolution up to the hour.

Contributing

We appreciate all kind of contributions! Please open an issue to start a discussion around a feature idea or a bug. 😄

About

Replay Firehose Streams in Kinesis Streams!

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages