The mock-data-generator.py python script produces mock data for Senzing.
The senzing/mock-data-generator
docker image produces mock data for Senzing for use in
docker formations (e.g. docker-compose, kubernetes).
mock-data-generator.py has a number of subcommands for performing different types of Senzing mock data creation.
To see all of the subcommands, run
$ ./mock-data-generator.py --help
usage: mock-data-generator.py [-h]
{version,random-to-stdout,random-to-kafka,url-to-stdout,url-to-kafka}
...
Generate mock data from a URL-addressable file or templated random data. For
more information, see https://github.com/Senzing/mock-data-generator
positional arguments:
{version,random-to-stdout,random-to-kafka,url-to-stdout,url-to-kafka}
Subcommands (SENZING_SUBCOMMAND):
version Print version of mock-data-generator.py.
random-to-stdout Send random data to STDOUT
random-to-kafka Send random data to Kafka
url-to-stdout Send HTTP or file data to STDOUT
url-to-kafka Send HTTP or file data to Kafka
optional arguments:
-h, --help show this help message and exit
To see the options for a subcommand, run commands like:
./mock-data-generator.py random-to-stdout --help
The following software programs need to be installed.
-
YUM-based installs - For Red Hat, CentOS, openSuse and others.
sudo yum -y install epel-release sudo yum -y install git
-
APT-based installs - For Ubuntu and others
sudo apt update sudo apt -y install git
-
These variables may be modified, but do not need to be modified. The variables are used throughout the installation procedure.
export GIT_ACCOUNT=senzing export GIT_REPOSITORY=mock-data-generator export DOCKER_IMAGE_TAG=senzing/mock-data-generator
-
Synthesize environment variables.
export GIT_ACCOUNT_DIR=~/${GIT_ACCOUNT}.git export GIT_REPOSITORY_DIR="${GIT_ACCOUNT_DIR}/${GIT_REPOSITORY}" export GIT_REPOSITORY_URL="https://github.com/${GIT_ACCOUNT}/${GIT_REPOSITORY}.git"
-
Set environment variables described in "Configuration".
-
Get repository.
mkdir --parents ${GIT_ACCOUNT_DIR} cd ${GIT_ACCOUNT_DIR} git clone ${GIT_REPOSITORY_URL}
-
YUM installs - For Red Hat, CentOS, openSuse and others.
sudo xargs yum -y install < ${GIT_REPOSITORY_DIR}/src/yum-packages.txt
-
APT installs - For Ubuntu and others
sudo xargs apt -y install < ${GIT_REPOSITORY_DIR}/src/apt-packages.txt
-
PIP installs
sudo pip install -r ${GIT_REPOSITORY_DIR}/requirements.txt
-
Show help. Example:
cd ${GIT_REPOSITORY_DIR} ./mock-data-generator.py --help ./mock-data-generator.py random-to-stdout --help
-
Show random file output. Example:
cd ${GIT_REPOSITORY_DIR} ./mock-data-generator.py random-to-stdout
-
Show random file output with 1 record per second. Example:
cd ${GIT_REPOSITORY_DIR} ./mock-data-generator.py random-to-stdout \ --records-per-second 1
-
Show repeatable "random" output using random seed. Example:
cd ${GIT_REPOSITORY_DIR} ./mock-data-generator.py random-to-stdout \ --random-seed 1
-
Show generating 10 (repeatable) random records at the rate of 2 per second. Example:
cd ${GIT_REPOSITORY_DIR} ./mock-data-generator.py random-to-stdout \ --random-seed 22 \ --record-min 1 \ --record-max 10 \ --records-per-second 2
-
Show sending output to a file of JSON-lines. Example:
cd ${GIT_REPOSITORY_DIR} ./mock-data-generator.py random-to-stdout \ --random-seed 22 \ --record-min 1 \ --record-max 10 \ --records-per-second 2 \ > output-file.jsonlines
-
Show reading 5 records from URL-based file at the rate of 3 per second. Example:
cd ${GIT_REPOSITORY_DIR} ./mock-data-generator.py url-to-stdout \ --input-url https://s3.amazonaws.com/public-read-access/TestDataSets/loadtest-dataset-1M.json \ --record-min 1 \ --record-max 5 \ --records-per-second 3
-
Build docker image.
sudo docker build --tag senzing/mock-data-generator https://github.com/senzing/mock-data-generator.git
- SENZING_DEBUG - Print debug statements to log.
- SENZING_DATA_SOURCE -
If a JSON line does not have the
DATA_SOURCE
key/value, this value is inserted. - SENZING_ENTITY_TYPE -
If a JSON line does not have the
ENTITY_TYPE
key/value, this value is inserted. - SENZING_INPUT_URL - URL of source file. Default: https://s3.amazonaws.com/public-read-access/TestDataSets/loadtest-dataset-1M.json
- SENZING_KAFKA_BOOTSTRAP_SERVER - Hostname and port of Kafka server. Default: "localhost"
- SENZING_KAFKA_TOPIC - Kafka topic. Default: "senzing-kafka-topic"
- SENZING_RANDOM_SEED - Identify seed for random number generator. Value of 0 uses system clock. Values greater than 0 give repeatable results. Default: "0"
- SENZING_RECORD_MAX - Identify highest record number to generate. Value of 0 means no maximum. Default: "0"
- SENZING_RECORD_MIN - Identify lowest record number to generate. Default: "1"
- SENZING_RECORD_MONITOR - Write a log record every N mock records. Default: "10000"
- SENZING_RECORDS_PER_SECOND - Throttle output to a specified records per second. Value of 0 means no throttling. Default: "0"
- SENZING_SUBCOMMAND -
Identify the subcommand to be run. See
mock-data-generator.py --help
for complete list.
-
To determine which configuration parameters are use for each
<subcommand>
, run:./mock-data-generator.py <subcommand> --help
-
Run the docker container. Example:
export SENZING_SUBCOMMAND=random-to-stdout export SENZING_RANDOM_SEED=0 export SENZING_RECORD_MIN=1 export SENZING_RECORD_MAX=10 export SENZING_RECORDS_PER_SECOND=0 sudo docker run -it \ --env SENZING_SUBCOMMAND="${SENZING_SUBCOMMAND}" \ --env SENZING_RANDOM_SEED="${SENZING_RANDOM_SEED}" \ --env SENZING_RECORD_MIN="${SENZING_RECORD_MIN}" \ --env SENZING_RECORD_MAX="${SENZING_RECORD_MAX}" \ --env SENZING_RECORDS_PER_SECOND="${SENZING_RECORDS_PER_SECOND}" \ senzing/mock-data-generator
-
Identify the Docker network. Example:
docker network ls # Choose value from NAME column of docker network ls export SENZING_NETWORK=nameofthe_network
-
Run the docker container. Example:
export SENZING_SUBCOMMAND=random-to-kafka export SENZING_KAFKA_BOOTSTRAP_SERVER=senzing-kafka:9092 export SENZING_KAFKA_TOPIC="senzing-kafka-topic" export SENZING_NETWORK=senzingdockercomposestreamloaderdemo_backend export SENZING_RANDOM_SEED=1 export SENZING_RECORD_MIN=210 export SENZING_RECORD_MAX=220 export SENZING_RECORDS_PER_SECOND=1 sudo docker run -it \ --net ${SENZING_NETWORK} \ --env SENZING_SUBCOMMAND="${SENZING_SUBCOMMAND}" \ --env SENZING_KAFKA_BOOTSTRAP_SERVER=${SENZING_KAFKA_BOOTSTRAP_SERVER} \ --env SENZING_KAFKA_TOPIC=${SENZING_KAFKA_TOPIC} \ --env SENZING_RANDOM_SEED="${SENZING_RANDOM_SEED}" \ --env SENZING_RECORD_MIN="${SENZING_RECORD_MIN}" \ --env SENZING_RECORD_MAX="${SENZING_RECORD_MAX}" \ --env SENZING_RECORDS_PER_SECOND="${SENZING_RECORDS_PER_SECOND}" \ senzing/mock-data-generator
-
Run the docker container. Example:
export SENZING_SUBCOMMAND=url-to-stdout export SENZING_INPUT_URL=https://s3.amazonaws.com/public-read-access/TestDataSets/loadtest-dataset-1M.json export SENZING_RECORD_MIN=240 export SENZING_RECORD_MAX=250 export SENZING_RECORDS_PER_SECOND=0 sudo docker run -it \ --env SENZING_SUBCOMMAND="${SENZING_SUBCOMMAND}" \ --env SENZING_INPUT_URL=${SENZING_INPUT_URL} \ --env SENZING_RECORD_MIN="${SENZING_RECORD_MIN}" \ --env SENZING_RECORD_MAX="${SENZING_RECORD_MAX}" \ --env SENZING_RECORDS_PER_SECOND="${SENZING_RECORDS_PER_SECOND}" \ senzing/mock-data-generator
-
Identify the Docker network. Example:
docker network ls # Choose value from NAME column of docker network ls export SENZING_NETWORK=nameofthe_network
-
Run the docker container. Example:
export SENZING_SUBCOMMAND=url-to-kafka export SENZING_INPUT_URL=https://s3.amazonaws.com/public-read-access/TestDataSets/loadtest-dataset-1M.json export SENZING_KAFKA_BOOTSTRAP_SERVER=senzing-kafka:9092 export SENZING_KAFKA_TOPIC="senzing-kafka-topic" export SENZING_NETWORK=senzingdockercomposestreamloaderdemo_backend export SENZING_RECORD_MIN=260 export SENZING_RECORD_MAX=300 export SENZING_RECORD_MONITOR=10 export SENZING_RECORDS_PER_SECOND=10 sudo docker run -it \ --net ${SENZING_NETWORK} \ --env SENZING_SUBCOMMAND="${SENZING_SUBCOMMAND}" \ --env SENZING_INPUT_URL=${SENZING_INPUT_URL} \ --env SENZING_KAFKA_BOOTSTRAP_SERVER=${SENZING_KAFKA_BOOTSTRAP_SERVER} \ --env SENZING_KAFKA_TOPIC=${SENZING_KAFKA_TOPIC} \ --env SENZING_RECORD_MIN="${SENZING_RECORD_MIN}" \ --env SENZING_RECORD_MAX="${SENZING_RECORD_MAX}" \ --env SENZING_RECORD_MONITOR="${SENZING_RECORD_MONITOR}" \ --env SENZING_RECORDS_PER_SECOND="${SENZING_RECORDS_PER_SECOND}" \ senzing/mock-data-generator
-
See if docker is already installed.
sudo docker --version
-
If needed, install Docker. See HOWTO - Install Docker
-
Option #1 - Using make command
cd ${GIT_REPOSITORY_DIR} sudo make docker-build
-
Option #2 - Using docker command
cd ${GIT_REPOSITORY_DIR} sudo docker build --tag ${DOCKER_IMAGE_TAG} .
- See doc/errors.md.