Firehose is a Kafka transfer agent which can do real-time transferring of a Topic from one set of Brokers to another. This is useful when you want the two clusters to remain independent and not use the built in replication process of Kafka. NOTE: This is mostly meant for seeding one cluster with another and not for any process needing data loss guarantees, for that check out uReplicator
- Docker (for local testing)
Clone this repository then:
$ git clone --recursive https://github.com/Kochava/firehose.git
$ cd firehose
$ go build ./cmd/firehose
$ git submodule update --recursive --remote
NAME:
Kochava Kafka Transfer Agent - An agent which consumes a topic from one set of brokers and publishes to another set of brokers
USAGE:
firehose [global options] command [command options] [arguments...]
VERSION:
0.3.4
GLOBAL OPTIONS:
--src-zookeepers value Comma delimited list of zookeeper nodes to connect to for the source brokers [$FIREHOSE_SRC_ZOOKEEPERS]
--dst-zookeepers value Comma delimited list of zookeeper nodes to connect to for the destination brokers [$FIREHOSE_DST_ZOOKEEPERS]
--topic value Topic to transfer (default: "firehose") [$FIREHOSE_TOPIC]
--cg-name-suffix value Suffix to use for the consumer group name in the format <topic>_<suffix> (default: "firehose") [$FIREHOSE_CG_NAME_SUFFIX]
--buffer-size value The number of messages to hold in memory at once (default: 10000) [$FIREHOSE_BUFFER_SIZE]
--max-errors value The maximum number of errors to allow the kafka to experience before quitting (default: 10) [$FIREHOSE_MAX_ERROR]
--max-retry value The maximum number of times to retry sending a message to the destination cluster (default: 5) [$FIREHOSE_MAX_RETRY]
--batch-size value The number of messages to batch together when sending to the destination cluster (default: 500) [$FIREHOSE_BATCH_SIZE]
--flush-interval value The interval (in ms) to flush messages that haven't been sent in a batch yet (default: 10000) [$FIREHOSE_FLUSH_INTERVAL]
--reset-offset Resets the offset in the consumer group to real-time before starting [$FIREHOSE_RESET_OFFSET]
--influx-address value Influx address (default: "http://localhost:8086") [$FIREHOSE_INFLUX_ADDRESS]
--influx-user value Influx user name (default: "firehose") [$FIREHOSE_INFLUX_USER]
--influx-pass value Influx password (default: "firehose") [$FIREHOSE_INFLUX_PASS]
--influx-db value Influx database name (default: "firehose") [$FIREHOSE_INFLUX_DB]
--consumer-concurrency value Number of consumer threads to run (default: 4) [$FIREHOSE_CONSUMER_CONCURRENCY]
--producer-concurrency value Number of producer threads to run (default: 4) [$FIREHOSE_PRODUCER_CONCURRENCY]
--log-file value Main log file to save to (default: "/var/log/firehose/firehose.log") [$FIREHOSE_LOG_FILE]
--stdout-logging Override logging settings to log to STDOUT [$FIREHOSE_STDOUT_LOGGING]
--help, -h show help
--version, -v print the version
Once you have the repo downloaded the following is all that's needed to get started testing locally. First grab your docker machine IP address, then in Docker/test-compose.yml
update the KAFKA_ADVERTISED_HOST_NAME
EnvVar to use this.
$ go build ./cmd/firehose
$ docker-compose -f Docker/test-compose.yml up -d
$ ./firehose --src-zookeepers="<docker-machine-ip>:2181" --dst-zookeepers="<docker-machine-ip>:2182" --stdout-logging
If you want to scale up the kafka clusters you can do so
$ docker-compose -f Docker/test-compose.yml scale kafka_src=3
Once up and running you can access Grafana at http://localhost:3000 with the default username and password. Next just add a new datasource, choosing InfluxDB
and method proxy
. For the location use http://172.18.0.2:8086
with the default Influx db of firehose
and user:pass being firehose:firehose
. Once that's setup you can import the simple dashboard preprepared under dist/dashboard.json
.
You can leverage Google Cloud Building in order to build the binary and generate a docker image.
$ PROJECT_ID=your-project-id BUILD_VERSION=v1 ./cloudbuild.sh
- Fork it
- Create your feature branch (
git checkout -b my-new-feature
) - Make your changes
- Add some tests
- Commit your changes (
git commit -am 'Add some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new Pull Request
With the new refactor a lot of work was put into performance, because of that the historical transfer aspect was essentially scrapped in this release. It will be added back in with the next release.
The Github version of this repo is a mirror of Master from our internal repo. This means that feature branches are not available here.
As of October 1, 2020, github.com uses the branch name ‘main’ when creating the initial default branch for all new repositories. In order to minimize any customizations in our github usage and to support consistent naming conventions, we have made the decision to rename the ‘master’ branch to be called ‘main’ in all Kochava’s github repos.
For local copies of the repo, the following steps will update to the new default branch:
git branch -m master main
git fetch origin
git branch -u origin/main main
git remote set-head origin -a