Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New features of cluster scalability and multi-raft #3191

Merged
merged 69 commits into from Jun 27, 2021
Merged
Show file tree
Hide file tree
Changes from 54 commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
55fc053
init
OneSizeFitsQuorum Dec 11, 2020
d5c075d
add multi-raft except for add/remove node
fanhualta Dec 15, 2020
7ee34da
keep raft leader in one node && balance data to all raft group
OneSizeFitsQuorum Dec 19, 2020
1aea5a7
merge master
fanhualta Dec 21, 2020
2487976
merge master
fanhualta Dec 22, 2020
128e191
fix sync schema bug
fanhualta Dec 22, 2020
2a2e1cd
Merge branch 'fix_sync_schema_bug' into cluster_multi_raft
fanhualta Dec 22, 2020
5119aa7
fix a bug of cal MaxDeduplicatedPathNum
fanhualta Dec 22, 2020
d3cca11
Merge branch 'fix_MaxDeduplicatedPathNum_cal_bug' into cluster_multi_…
fanhualta Dec 22, 2020
72d697a
Merge branch 'master' into cluster_multi_raft
fanhualta Dec 23, 2020
c4a9dcd
fix a bug of multi-raft
fanhualta Dec 28, 2020
fdc8d32
merge master
fanhualta Dec 28, 2020
ae1b6f3
assign slots
fanhualta Dec 29, 2020
25ea701
fix a bug
fanhualta Dec 29, 2020
c0c5dfb
merge master
fanhualta Dec 29, 2020
b717e3f
fix a bug of forward plan
fanhualta Dec 30, 2020
109d432
fix some issues of multi-raft
fanhualta Jan 5, 2021
325e4f5
fix bugs of wrong previous groups, pull snapshot from self and wrong …
fanhualta Jan 11, 2021
452cba3
Reimplement the function of adding and removing nodes
fanhualta Jan 13, 2021
327eb7e
1. fix ut tests
fanhualta Jan 14, 2021
f342899
merge master
fanhualta Jan 14, 2021
f8a1432
fix a bug of TSStatus.OK set redirect
fanhualta Jan 14, 2021
e7f8e28
merge master
fanhualta Jan 19, 2021
dd64f0e
merge master
fanhualta Jan 19, 2021
e5e4411
merge master
fanhualta Jan 21, 2021
124190c
merge master
fanhualta Feb 9, 2021
4609bc0
This pr fix following bugs:
fanhualta Feb 23, 2021
d84f0ff
This commit fix following issues:
fanhualta Feb 24, 2021
8c7f80b
This commit fix following issues:
fanhualta Mar 1, 2021
4d5ca80
This commit fix a serious bug of abnormal cache of asyncServiceMap an…
fanhualta Mar 1, 2021
5c4ed2d
This commit fix following issues:
fanhualta Mar 2, 2021
5576d78
This commit fix following issues:
fanhualta Mar 4, 2021
5da92c5
This commit fix following issues:
fanhualta Mar 4, 2021
4505b95
This commit fix following issues:
fanhualta Mar 6, 2021
9446e77
enable auto create schema in cluster node and add sync meta log for n…
fanhualta Mar 6, 2021
feb1f64
This commit fix following issues:
fanhualta Mar 24, 2021
f3df198
This commit fix following issues:
fanhualta Mar 25, 2021
140cbb5
This commit fix following issues:
fanhualta Mar 29, 2021
82abad0
This commit fix following issues:
fanhualta Apr 1, 2021
eb45f30
This commit fix following issues:
fanhualta Apr 7, 2021
3a91a25
remove useless parameter
fanhualta Apr 19, 2021
b133170
merge master
fanhualta Apr 26, 2021
0b4014b
This commit fixes all issues of ut tests.
fanhualta Apr 29, 2021
b3815ea
fix bugs of checking params
fanhualta May 6, 2021
74cfc17
add a feature of merging schema result for those slots which are in s…
fanhualta May 7, 2021
2c7f2a1
merge master and fix all conflicts
fanhualta May 7, 2021
0792dea
fix a bug of update last cache and wait for setting sg when installin…
fanhualta May 14, 2021
9601a7e
merge master
fanhualta May 14, 2021
3463efe
fix ut bugs of OOM
fanhualta May 19, 2021
fbda1e9
fix sonar issues
fanhualta May 20, 2021
5d44f0f
merge master
fanhualta May 26, 2021
57e0875
fix some issues according to pr comments
fanhualta May 26, 2021
fb482dd
fix some issues according to pr comments
fanhualta May 27, 2021
57cdff5
fix a bug of replca num = 1 and update partition table before applyRe…
fanhualta Jun 1, 2021
3458a8e
This commit fix following issues:
fanhualta Jun 3, 2021
292cf23
fix some issues according pr comments:
fanhualta Jun 4, 2021
a69eab7
merge master
fanhualta Jun 6, 2021
ca0228d
add user guide for cluster scalability and multi-raft in cluster doc
fanhualta Jun 6, 2021
1729a2c
merge master and fix conflicts
fanhualta Jun 9, 2021
f1e98e4
fix issues according comments
fanhualta Jun 9, 2021
30478a6
merge master
fanhualta Jun 21, 2021
5059827
merge master
fanhualta Jun 21, 2021
7452091
fix bug of [IOTDB-1438], path not exists exception
fanhualta Jun 21, 2021
f815fcc
fix a bug of NPE
fanhualta Jun 22, 2021
b2b6ec7
fix a bug of wrong desc query result
fanhualta Jun 23, 2021
56f9dda
modify NodeTool user to root
fanhualta Jun 23, 2021
2248ba1
fix a serve bug of BatchData serialization and deserialization which …
fanhualta Jun 24, 2021
dc7b26d
merge master
fanhualta Jun 27, 2021
6a38a46
fix CI
fanhualta Jun 27, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
7 changes: 7 additions & 0 deletions cluster/src/assembly/resources/conf/iotdb-cluster.properties
Expand Up @@ -72,6 +72,9 @@ seed_nodes=127.0.0.1:9003
# number of replications for one partition
default_replica_num=1

# sub raft num for multi-raft
multi_raft_factor=1

# cluster name to identify different clusters
# all node's cluster_name in one cluster are the same
# cluster_name=default
Expand Down Expand Up @@ -169,6 +172,10 @@ default_replica_num=1
# This default value is 1000
# max_read_log_lag=1000

# When a follower tries to sync log with the leader, sync will fail if the log Lag exceeds max_sync_log_lag.
# This default value is 100000
# max_sync_log_lag=100000

# Max number of clients in a ClientPool of a member for one node.
# max_client_pernode_permember_number=1000

Expand Down
2 changes: 1 addition & 1 deletion cluster/src/assembly/resources/sbin/add-node.bat
Expand Up @@ -19,7 +19,7 @@

@echo off
echo ````````````````````````
echo Starting IoTDB
echo Starting IoTDB (Cluster Mode)
echo ````````````````````````

PATH %PATH%;%JAVA_HOME%\bin\
Expand Down
2 changes: 1 addition & 1 deletion cluster/src/assembly/resources/sbin/add-node.sh
Expand Up @@ -20,7 +20,7 @@


echo ---------------------
echo Starting IoTDB
echo "Starting IoTDB (Cluster Mode)"
echo ---------------------

if [ -z "${IOTDB_HOME}" ]; then
Expand Down
117 changes: 117 additions & 0 deletions cluster/src/assembly/resources/sbin/remove-node.bat
@@ -0,0 +1,117 @@
@REM
@REM Licensed to the Apache Software Foundation (ASF) under one
@REM or more contributor license agreements. See the NOTICE file
@REM distributed with this work for additional information
@REM regarding copyright ownership. The ASF licenses this file
@REM to you under the Apache License, Version 2.0 (the
@REM "License"); you may not use this file except in compliance
@REM with the License. You may obtain a copy of the License at
@REM
@REM http://www.apache.org/licenses/LICENSE-2.0
@REM
@REM Unless required by applicable law or agreed to in writing,
@REM software distributed under the License is distributed on an
@REM "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
@REM KIND, either express or implied. See the License for the
@REM specific language governing permissions and limitations
@REM under the License.
@REM

@echo off
echo ````````````````````````
echo Starting to remove a node (Cluster Mode)
echo ````````````````````````

PATH %PATH%;%JAVA_HOME%\bin\
set "FULL_VERSION="
set "MAJOR_VERSION="
set "MINOR_VERSION="


for /f tokens^=2-5^ delims^=.-_+^" %%j in ('java -fullversion 2^>^&1') do (
set "FULL_VERSION=%%j-%%k-%%l-%%m"
IF "%%j" == "1" (
set "MAJOR_VERSION=%%k"
set "MINOR_VERSION=%%l"
) else (
set "MAJOR_VERSION=%%j"
set "MINOR_VERSION=%%k"
)
)

set JAVA_VERSION=%MAJOR_VERSION%

IF NOT %JAVA_VERSION% == 8 (
IF NOT %JAVA_VERSION% == 11 (
echo IoTDB only supports jdk8 or jdk11, please check your java version.
goto finally
)
)

if "%OS%" == "Windows_NT" setlocal

pushd %~dp0..
if NOT DEFINED IOTDB_HOME set IOTDB_HOME=%cd%
popd

set IOTDB_CONF=%IOTDB_HOME%\conf
set IOTDB_LOGS=%IOTDB_HOME%\logs

@setlocal ENABLEDELAYEDEXPANSION ENABLEEXTENSIONS
set CONF_PARAMS=-r
set is_conf_path=false
for %%i in (%*) do (
IF "%%i" == "-c" (
set is_conf_path=true
) ELSE IF "!is_conf_path!" == "true" (
fanhualta marked this conversation as resolved.
Show resolved Hide resolved
set is_conf_path=false
set IOTDB_CONF=%%i
) ELSE (
set CONF_PARAMS=!CONF_PARAMS! %%i
)
)

if NOT DEFINED MAIN_CLASS set MAIN_CLASS=org.apache.iotdb.cluster.ClusterMain
if NOT DEFINED JAVA_HOME goto :err

@REM -----------------------------------------------------------------------------
@REM JVM Opts we'll use in legacy run or installation
set JAVA_OPTS=-ea^
-Dlogback.configurationFile="%IOTDB_CONF%\logback.xml"^
-DIOTDB_HOME="%IOTDB_HOME%"^
-DTSFILE_HOME="%IOTDB_HOME%"^
-DCLUSTER_CONF="%IOTDB_CONF%"^
-DIOTDB_CONF="%IOTDB_CONF%"

@REM ***** CLASSPATH library setting *****
@REM Ensure that any user defined CLASSPATH variables are not used on startup
set CLASSPATH="%IOTDB_HOME%\lib"

@REM For each jar in the IOTDB_HOME lib directory call append to build the CLASSPATH variable.
set CLASSPATH=%CLASSPATH%;"%IOTDB_HOME%\lib\*"
set CLASSPATH=%CLASSPATH%;iotdb.ClusterMain
goto okClasspath

:append
set CLASSPATH=%CLASSPATH%;%1
goto :eof

@REM -----------------------------------------------------------------------------
:okClasspath

rem echo CLASSPATH: %CLASSPATH%

"%JAVA_HOME%\bin\java" %JAVA_OPTS% %IOTDB_HEAP_OPTS% -cp %CLASSPATH% %IOTDB_JMX_OPTS% %MAIN_CLASS% %CONF_PARAMS%
goto finally

:err
echo JAVA_HOME environment variable must be set!
pause


@REM -----------------------------------------------------------------------------
:finally

pause

ENDLOCAL
88 changes: 88 additions & 0 deletions cluster/src/assembly/resources/sbin/remove-node.sh
@@ -0,0 +1,88 @@
#!/bin/bash
#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
#


echo ---------------------
echo "Starting to remove a node(Cluster Mode)"
echo ---------------------

if [ -z "${IOTDB_HOME}" ]; then
export IOTDB_HOME="`dirname "$0"`/.."
fi

IOTDB_CONF=${IOTDB_HOME}/conf

is_conf_path=false
for arg do
shift
if [ "$arg" == "-c" ]; then
is_conf_path=true
continue
fanhualta marked this conversation as resolved.
Show resolved Hide resolved
fi

if [ $is_conf_path == true ]; then
IOTDB_CONF=$arg
is_conf_path=false
continue
fi
set -- "$@" "$arg"
done

CONF_PARAMS="-r "$*

if [ -n "$JAVA_HOME" ]; then
for java in "$JAVA_HOME"/bin/amd64/java "$JAVA_HOME"/bin/java; do
if [ -x "$java" ]; then
JAVA="$java"
break
fi
done
else
JAVA=java
fi

if [ -z $JAVA ] ; then
echo Unable to find java executable. Check JAVA_HOME and PATH environment variables. > /dev/stderr
exit 1;
fi

CLASSPATH=""
for f in ${IOTDB_HOME}/lib/*.jar; do
CLASSPATH=${CLASSPATH}":"$f
done
classname=org.apache.iotdb.cluster.ClusterMain

launch_service()
{
class="$1"
iotdb_parms="-Dlogback.configurationFile=${IOTDB_CONF}/logback.xml"
iotdb_parms="$iotdb_parms -DIOTDB_HOME=${IOTDB_HOME}"
iotdb_parms="$iotdb_parms -DTSFILE_HOME=${IOTDB_HOME}"
iotdb_parms="$iotdb_parms -DIOTDB_CONF=${IOTDB_CONF}"
iotdb_parms="$iotdb_parms -DCLUSTER_CONF=${IOTDB_CONF}"
iotdb_parms="$iotdb_parms -Dname=iotdb\.IoTDB"
exec "$JAVA" $iotdb_parms $IOTDB_JMX_OPTS -cp "$CLASSPATH" "$class" $CONF_PARAMS
return $?
}

# Start up the service
launch_service "$classname"

exit $?
2 changes: 1 addition & 1 deletion cluster/src/assembly/resources/sbin/start-node.bat
Expand Up @@ -19,7 +19,7 @@

@echo off
echo ````````````````````````
echo Starting IoTDB
echo Starting IoTDB (Cluster Mode)
echo ````````````````````````

PATH %PATH%;%JAVA_HOME%\bin\
Expand Down
25 changes: 20 additions & 5 deletions cluster/src/main/java/org/apache/iotdb/cluster/ClusterMain.java
Expand Up @@ -51,6 +51,8 @@
import java.util.HashSet;
import java.util.Set;

import static org.apache.iotdb.cluster.utils.ClusterUtils.UNKNOWN_CLIENT_IP;

public class ClusterMain {

private static final Logger logger = LoggerFactory.getLogger(ClusterMain.class);
Expand Down Expand Up @@ -99,8 +101,8 @@ public static void main(String[] args) {
}

String mode = args[0];

logger.info("Running mode {}", mode);

if (MODE_START.equals(mode)) {
try {
metaServer = new MetaClusterServer();
Expand All @@ -121,13 +123,19 @@ public static void main(String[] args) {
}
} else if (MODE_ADD.equals(mode)) {
try {
long startTime = System.currentTimeMillis();
metaServer = new MetaClusterServer();
preStartCustomize();
metaServer.start();
metaServer.joinCluster();
// Currently, we do not register ClusterInfoService as a JMX Bean,
// so we use startService() rather than start()
ClusterInfoServer.getInstance().startService();

logger.info(
"Adding this node {} to cluster costs {} ms",
metaServer.getMember().getThisNode(),
(System.currentTimeMillis() - startTime));
} catch (TTransportException
| StartupException
| QueryProcessException
Expand Down Expand Up @@ -221,7 +229,7 @@ private static void doRemoveNode(String[] args) throws IOException {
TProtocolFactory factory =
config.isRpcThriftCompressionEnabled() ? new TCompactProtocol.Factory() : new Factory();
Node nodeToRemove = new Node();
nodeToRemove.setInternalIp(ip).setMetaPort(metaPort);
nodeToRemove.setInternalIp(ip).setMetaPort(metaPort).setClientIp(UNKNOWN_CLIENT_IP);
// try sending the request to each seed node
for (String url : config.getSeedNodeUrls()) {
Node node = ClusterUtils.parseNode(url);
Expand All @@ -230,6 +238,7 @@ private static void doRemoveNode(String[] args) throws IOException {
}
AsyncMetaClient client = new AsyncMetaClient(factory, new TAsyncClientManager(), node, null);
Long response = null;
long startTime = System.currentTimeMillis();
try {
logger.info("Start removing node {} with the help of node {}", nodeToRemove, node);
response = SyncClientAdaptor.removeNode(client, nodeToRemove);
Expand All @@ -240,19 +249,25 @@ private static void doRemoveNode(String[] args) throws IOException {
logger.warn("Cannot send remove node request through {}, try next node", node);
}
if (response != null) {
handleNodeRemovalResp(response, nodeToRemove);
handleNodeRemovalResp(response, nodeToRemove, startTime);
return;
}
}
}

private static void handleNodeRemovalResp(Long response, Node nodeToRemove) {
private static void handleNodeRemovalResp(Long response, Node nodeToRemove, long startTime) {
if (response == Response.RESPONSE_AGREE) {
logger.info("Node {} is successfully removed", nodeToRemove);
logger.info(
"Node {} is successfully removed, cost {}ms",
nodeToRemove,
(System.currentTimeMillis() - startTime));
} else if (response == Response.RESPONSE_CLUSTER_TOO_SMALL) {
logger.error("Cluster size is too small, cannot remove any node");
} else if (response == Response.RESPONSE_REJECT) {
logger.error("Node {} is not found in the cluster, please check", nodeToRemove);
} else if (response == Response.RESPONSE_DATA_MIGRATION_NOT_FINISH) {
logger.warn(
"The data migration of the previous membership change operation is not finished. Please try again later");
} else {
logger.error("Unexpected response {}", response);
}
Expand Down
Expand Up @@ -126,9 +126,7 @@ private AsyncClient waitForClient(Deque<AsyncClient> clientStack, ClusterNode cl
this.wait(waitClientTimeutMS);
if (clientStack.isEmpty() && System.currentTimeMillis() - waitStart >= waitClientTimeutMS) {
logger.warn(
"Cannot get an available client after {}ms, create a new one.",
waitClientTimeutMS,
asyncClientFactory);
"Cannot get an available client after {}ms, create a new one.", waitClientTimeutMS);
AsyncClient asyncClient = asyncClientFactory.getAsyncClient(clusterNode, this);
nodeClientNumMap.computeIfPresent(clusterNode, (n, oldValue) -> oldValue + 1);
return asyncClient;
Expand Down