Replica Conductor

Creates a ZFS pool on a block device, orchestrates an hierarchy of zfs datasets, list of ports and systemd units.
Includes tool to control the conductor service (conductorctl.py).

Nomenclature:

Pool -> the ZFS pool
Filesystem -> The original filesystem that will be cloned to create casts
Cast -> Transitory clone of the filesystem, anonymization/hooks will be applied here
Replica -> Replica of the Cast

Notes:

The ZFS pool and the filesystem are initialized on the device if they do not exist.
State of the casts and replicas is kept on the volumes, hence there is no need for an external datastore. All casts and replicas will be loaded on start.

TODO: package to manage casts and pre/post hooks asynchronously from a goroutine

MariaDB 10.5 on ubuntu is showcased in vagrant. However, this is meant to be agnostic to your database (or whatever you want to run with this).
Conductor will manage for you:

a block device and the zfs datasets
a systemd unit (stop before clone, start after)
a systemd template unit (start after creating replica, stop before deleting)
configuration files generated from a provided template

Quickstart:

Start vagrant vagrant up and ssh into the box vagrant ssh.
An instance of MariaDB 10.5 will be started, with data in /var/lib/mysql. You can use conductorctl and journalctl to see conductor in action.

First, initialize mariadb so that you can actually use it.

cat configs/answers.txt | sudo mariadb-secure-install

Build and then run conductor with the provided configuration. Conductor logs access logs to stdout, and json formatted application activity to stderr. Let's redirect stdout to /dev/null to avoid the clutter.

go build -race cmd/conductor.go
sudo ./conductor -c configs/config.json 1>/dev/null

On a separate terminal, you can use conductorctl to create casts and replicas of the running database instance.

conductorctl list # list existing replicas
conductorctl create -c example # create a cast named example
conductorctl create -c example -r john # create a replica of the example cast named john

As you may notice, the mariadb service is stopped right before snapshotting the main dataset and started right back up. As the replica is created, a configuration file is generated at /etc/my.example_john.cnf and the mariadb@example_john.service is started.
This is possible because the mariadb@.service template unit is configured and loaded from /etc/systemd/system/mariadb@.service.

vagrant@ubuntu-focal:/vagrant$ conductorctl list
+----------------------+---------+---------+------+
|      Timestamp       |   Cast  | Replica | Port |
+----------------------+---------+---------+------+
| 2021-09-04T20:46:40Z | example |   john  | 3307 |
+----------------------+---------+---------+------+
vagrant@ubuntu-focal:/vagrant$ mysql -P 3307 -e 'status;'
--------------
mysql  Ver 15.1 Distrib 10.5.12-MariaDB, for debian-linux-gnu (x86_64) using readline 5.2

Connection id:          6
Current database:
Current user:           root@localhost
SSL:                    Not in use
Current pager:          stdout
Using outfile:          ''
Using delimiter:        ;
Server:                 MariaDB
Server version:         10.5.12-MariaDB-1:10.5.12+maria~focal mariadb.org binary distribution
Protocol version:       10
Connection:             127.0.0.1 via TCP/IP
Server characterset:    utf8mb4
Db     characterset:    utf8mb4
Client characterset:    utf8
Conn.  characterset:    utf8
TCP port:               3307
Uptime:                 1 min 12 sec

Threads: 1  Questions: 16  Slow queries: 0  Opens: 17  Open tables: 10  Queries per second avg: 0.222
--------------

vagrant@ubuntu-focal:/vagrant$

Configuration

For this to work, you must have a multi-service setup with systemd. More than enough is provided in this example to get you started with MariaDB. You need to appropriately edit your mariadb configuration and the replica configuration template. Ideally, the main database instance is replicating from a remote source in order to decouple this from your production systems. None of this is managed from conductor and never will be.

NOTE: to use this with real data and several replicas, you need to keep in mind the resource requirements. In practice, with a ~250gb dataset and 4-5 replicas in use for development and reporting usage, an instance with 64gb ram was used with an innodb_buffer_pool_size value of 8gb for the main mariadb instance and for each of the replicas.

Let's have a look at the configuration values available.

debug is used for the zap logger. it lowers the log level and disables json formatting
address is used for the router address string. default: 127.0.0.1
port is used for the router address string. default: 8080

pool_name you can set the name of the zfs pool. default: rootpool
pool_path you can set the path of the zfs pool. default: /rootpool
pool_dev is the device conductor will create a zfs pool onto. required
filesystem_name is the name of the filesystem. default: rootfs
filesystem_path is the path where the main dataset will be mounted. the service started from your main systemd unit must make use of this path. required
cast_path is the path where the casts will be mounted. default: /rootfs_cast
replica_path is the path where the replicas will be mounted. default: rootfs_replica

port_from is the first port in the allocated range. these are not reserved in any way, and you have to make sure this range will not be used by other processes. required
port_to is the last port in the allocated range. required
main_unit is the main service unit name that will be managed. typically this will be your main or replicating database. the dataset of this will be used for casts and replicas. required
config_template_path is the template file that will be rendered for your service. gotemplate syntax is used and available variables are {{ .Name }} {{ .Datadir }} and {{ .Port }}. See configs/myservice.cnd.tmpl for a complete example. required
config_path_template_string is the path where the configuration template will be rendered. gotemplate syntax is used and available variables are {{ .Name }} {{ .Datadir }} and {{ .Port }}. an example of this is /etc/my.{{ .Name }}.cnf. required
unit_template_string is the systemd template unit that will be managed by conductor. this unit must make use of the configuration files as configured with config_template_path and config_path_template_string. required

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.github/workflows		.github/workflows
api		api
cmd		cmd
configs		configs
init		init
internal		internal
logs		logs
scripts		scripts
tools		tools
.flake8		.flake8
.gitignore		.gitignore
.pylintrc		.pylintrc
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
Vagrantfile		Vagrantfile
diagram.png		diagram.png
go.mod		go.mod
go.sum		go.sum

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Replica Conductor

Nomenclature:

Notes:

TODO: package to manage casts and pre/post hooks asynchronously from a goroutine

Quickstart:

Configuration

About

Releases

Packages

Languages

License

dnsinogeorgos/conductor

Folders and files

Latest commit

History

Repository files navigation

Replica Conductor

Nomenclature:

Notes:

TODO: package to manage casts and pre/post hooks asynchronously from a goroutine

Quickstart:

Configuration

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages