Mocks data generated by gaming website
DataSet Format:
column_header | description |
---|---|
cid | customer id |
gender | customer's gender |
age | age of the customer |
country | country to which customer belongs to |
register_date | date on which user reistered with us |
friend_count | number of friends a user has |
lifetime | number of days a user has been active |
citygame_played | number of times citygame has been played by user |
pictionarygame_played | number of times pictionary game has been played by user |
scramblegame_played | number of times scaramble game has been played by user |
snipergame_played | number of times sniper game has been played by user |
revenue | revenue generated by the user |
paid_subscriber | whether the customer is paid customer or not, represented by yes or no |
If --extra
option is enabled additional columns are populated as well
extra_columns | description |
---|---|
name | name of the user |
email address used to register the account | |
phone | contant number of the user |
address | user provided address during registration |
This generator requires ruby version ≥ 1.9, to install ruby 1.9.3 using rvm follow these instructions
This generator takes in various options for generating data:
Usage: generator.rb [options]
-l, --lines LINES number of lines to generate
-c LinesPerProcess, number of lines to generate per process (default: 50,000)
--lines-per-process
-m, --multiple-tables generates data in multi-table format
-p, --output-path PATH directory path where output should be written to
-e, --extra-data generates additional user information
-h, --help
###Generating data in single table mode: This mode mocks random user interaction data into single file which can be loaded into a single table.
The following command will generate 100,000 lines into file(s) named analytics_[process_id].data
at /tmp
specified by --ouput-path and will mock extra user information (such as name, email, phone, address)
ruby generator.rb --lines 100000 --output-path /tmp --extra-data
###Generating data in multi table mode: This mode mocks random user interaction data into multiple files:
analytics_customer[process_id]
will store the user information such as (cid, name, gender, age, register_date, country, total_days)analytics_facts[process_id]
will store user-game facts such as (cid, game_played, game_played_time)analyics_revene[process_id]
will store users who pay (cid, payed_date, revenue)
ruby generator.rb --lines 100000 --output-path /tmp --extra-data --multiple-tables
###Generating data in multi-process mode:
To generate data using multiple processes use --lines-per-process
option, which will specify lines per process. For example, the following example will generate 75,000 lines with 25,000 lines per process
ruby generator.rb --lines 75000 --output-path /tmp --extra-data --multiple-tables --lines-per-process 25000
Note: [process_id] represents id of the invoked process