Synthetic data generators for simulating real-time data and work loads
Java Scala Shell
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.

Cloudwick Labs Synthetic Data Generators

These data generators mocks several real-life situations. Purpose built for research and development of several big data use cases.

Generator Description Schema Scope
log mocks the logs generated by apache httpd Log Schema 1. Real time data ingestion (using Flume & MapReduce)
2. Real time analytics (using Storm and kafka)
3. Click stream analytics
4. NoSql
odvs mocks the data generated by a real life on-demand video service provider like netflix, hulu & amazon prime ODVS Schema 1. NoSql
2. Analytics using MapReduce/Hive
osge mocks the data generated by a online social gaming entertainment provider like second life, imvu, onverse... OSGE Schema 1. NoSql
2. Analytics using MapReduce/Hive
retail mocks the data generated by a large retail store with multiple locations like Target, Wal-Mart ... [Retail Schema] ( 1. Analytics using MapReduce/Hive
2. Analytics using Spark, SparkSQL

What it does ?

Can simulate random events and write to various destination's like local filesystem, Kafka, Kinesis in various formats such as text, avro and other

Build from source

This project requires sbt, installation instructions found here

Once, sbt is installed use the assembly sbt task from the project path to build the jar with dependencies

git clone
cd generator
sbt assembly

Running the generator

Use the built in shell wrapper to fire up the generator

bin/generator --help

Using individual generators

All the data generators have a driver command line interface, following links show basic examples


For more generators, your specific use-cases or to leave feedback contact support



Apache 2.0. Please see LICENSE.txt. All contents copyright (c) 2013, Cloudwick Labs.