Skip to content

This repo is for generating data from existing dataset to a file or producing dataset rows as message to kafka in a streaming manner.

Notifications You must be signed in to change notification settings

dogukannulu/data-generator

 
 

Repository files navigation

Intro

It is easy to find data sources for batch processing, but it is hard to tell same for realtime processing. This repo aims to make easy realtime data processing developments by streaming static datasets to file, postgresql, Apache Kafka, AWS S3/MinIO. There are 4 python scripts:

  • Stream data to file (dataframe_to_log.py) as log files
  • PostgreSQL (dataframe_to_postgresql.py)
  • Kafka (dataframe_to_kafka.py)
  • AWS S3 (dataframe_to_s3.py)

You must use ** Python3 **.

Installation

erkan@ubuntu:~$ git clone https://github.com/erkansirin78/data-generator.git

erkan@ubuntu:~$ cd data-generator/

erkan@ubuntu:~/data-generator$ python3 -m virtualenv datagen

erkan@ubuntu:~/data-generator$ source datagen/bin/activate

(datagen) erkan@ubuntu:~/data-generator$ pip install -r requirements.txt

About

This repo is for generating data from existing dataset to a file or producing dataset rows as message to kafka in a streaming manner.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%