Scale out a DolphinDB Cluster
- Scale out a DolphinDB Cluster
1. About DolphinDB Cluster
A DolphinDB cluster has 4 types of nodes: data node, compute node, agent and controller.
- A controller manages the metadata of a cluster and provides the web-based cluster manager.
A controller has the following 3 configuration files:
- controller.cfg: the configuration parameters for the controller, such as IP address, port number, the maximum number of connections, etc.
- cluster.nodes: a list of nodes in the cluster. The controller uses this file to get information of the cluster nodes.
- cluster.cfg: individual configuration of each node in the cluster, such as parameter volumes of a node.
- An agent is deployed on each physical machine and is responsible for starting and stopping local data nodes. The configuration file on the agent is:
- agent.cfg: specifies properties relevant to the agent node, such as IP and port, the cluster's controller, etc.
- A data node is used for data storage and computation. You can create distributed databases and tables on a data node.
- A compute node is only used for computation, including stream computing, distributed join and machine learning. Data is not stored on a compute node, and therefore we cannot create a database or table on it.
Follow the following steps to add a new physical machine to a cluster:
- Deploy an agent node on the new server and configure the file agent.cfg
- Register information about the new agent node and new data nodes with the controller
- Restart the controller and all data nodes
To expand the storage space of an existing data node, we can modify the node configuration file to add a path to parameter volumes.
2. Add Data Nodes
The following example will add a new physical server to a cluster and deploy a data node on it. Starting from 1.30.4, you can add a data node dynamically without restarting the cluster.
The original cluster has the following 4 servers on ubuntu 16.04 using DolphinDB 16.04:
172.18.0.10 : controller 172.18.0.11 : datanode1 172.18.0.12 : datanode2 172.18.0.13 : datanode3
The cluster has the following configuration files:
localSite,mode 172.18.0.11:8701:agent1,agent 172.18.0.12:8701:agent2,agent 172.18.0.13:8701:agent3,agent 172.18.0.11:8801:node1,datanode 172.18.0.12:8802:node2,datanode 172.18.0.13:8803:node3,datanode
The command line to start the controller:
nohup ./dolphindb -console 0 -mode controller -script dolphindb.dos -config config/controller.cfg -logFile log/controller.log -nodesFile config/cluster.nodes &
The command line to start the agent:
./dolphindb -mode agent -home data -script dolphindb.dos -config config/agent.cfg -logFile log/agent.log
Create a partitioned database in the cluster:
data = table(1..1000 as id,rand(`A`B`C,1000) as name) db = database("dfs://scaleout_test_db",RANGE,cutPoints(1..2000,10)) tb = db.createPartitionedTable(data,"scaleoutTB",`id) tb.append!(data)
We can browse the data in the DFS Explorer of the cluster web interface.
For more details about cluster configuration, please refer to DolphinDB Multi-Machine Cluster Deployment
The IP address of the new machine:
The IP address, port number and the unique alias name of the new data node in the cluster:
2.1 Add a Data Node Offline
2.1.1 Configure the Agent
Agent node is deployed in the directory
/home/<DolphinDBRoot>. Copy the files in the directory on any server to the directory
/home/<DolphinDBRoot> of the new machine, and modify "/home//config/agent.cfg" as follows:
//specify the IP address and port number of the new agent localSite=172.18.0.14:8701:agent4 //the controller information controllerSite=172.18.0.10:8990:ctl8990 Mode=agent
2.1.2 Configure Controller
Modify the file cluster.nodes to include the new agent node and data node:
localSite,mode 172.18.0.11:8701:agent1,agent 172.18.0.12:8701:agent2,agent 172.18.0.13:8701:agent3,agent 172.18.0.11:8801:node1,datanode 172.18.0.12:8802:node2,datanode 172.18.0.13:8803:node3,datanode //new agent node and data node 172.18.0.14:8704:agent4,agent 172.18.0.14:8804:node4,datanode
2.1.3 Restart the cluster
Restart the entire cluster, including the controller and all data nodes.
- Close all data nodes on the web interface
pkill dolphindbon the server (172.18.0.10) to shut down the controller.
After waiting for about half a minute (for the port to be released), restart the controller.
Back to the web interface, you can see that a new agent node has been added and is started, and all data nodes are started.
By now we have successfully added a new server and a new data node to the cluster.
2.2 Add a Data Node Online
The modification on configurations when adding a data node online is the same as described in Section 2.1. After configuring agent.cfg and cluster.nodes, please start the agent, connect to the controller via web interface or GUI, and execute the following code:
Back to the web interface, you can see that a new agent node has been added. Execute
sh startAgent.sh to start the agent and start all data nodes via the web interface. By now we have successfully added a new data node online.
Let's verify that the new data node is enabled in the cluster by writing some data to the cluster.
Tb = database("dfs://scaleout_test_db").loadTable("scaleoutTB") Tb.append!(table(1001..1500 as id,rand(`A`B`C,500) as name))
In the DFS Explorer, you should see that the data has been populated to the new data node.
3. Add Hard Disks to a Data Node
The storage capacity of existing data nodes might become insufficient. The following example explains how to add a hard disk to node3.
The storage paths can be specified with the configuration parameter volumes. The default storage path is
<HomeDir>/<Data Node Alias>/storage.
If you add a disk from the default path, you must explicitly set the original default path when specifying the parameter volumes. Otherwise, the metadata will be lost in the default path.
The default value for parameter volumes is as follows. You need to add it to the configuration file if it doesn't exist there.
To add the new disk path /dev/disk2/node3 to node3, just add the path to the line above after a comma.
After modifying the configuration file, execute
loadClusterNodesConfigs() on the controller to reload the configuration parameters. If the steps above are completed on the web interface, configuration parameters are automatically reloaded, and we don't need to execute
loadClusterNodesConfigs(). After the configuration is completed, there is no need to restart the controller. Just restart node3 on the web interface so that the new configuration can take effect.
If you would like the new storage to take effect immediately without restarting node3, you can execute
addVolumes("/dev/disk2/node3")on node3. However, this will not be effective any more after the cluster restarts.
After the configuration is completed, execute the following script to write data to the cluster and check if data is written to the new disk.
tb = database("dfs://scaleout_test_db").loadTable("scaleoutTB") tb.append!(table(1501..2000 as id,rand(`A`B`C,500) as name))
Data has been written to the disk: