From f1ebef9693efefaf68e093af16af435580925277 Mon Sep 17 00:00:00 2001 From: Jaehwa Jung Date: Mon, 6 Oct 2014 00:18:34 +0900 Subject: [PATCH 1/2] TAJO-1069: Add document to explain High Availability support --- .../sphinx/configuration/ha_configuration.rst | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/tajo-docs/src/main/sphinx/configuration/ha_configuration.rst b/tajo-docs/src/main/sphinx/configuration/ha_configuration.rst index 0eaa674dfe..962be8e359 100644 --- a/tajo-docs/src/main/sphinx/configuration/ha_configuration.rst +++ b/tajo-docs/src/main/sphinx/configuration/ha_configuration.rst @@ -60,7 +60,7 @@ And then, you need to setup tarball and set configuration files on backup master .. note:: - If you want to run active master and backup master on the same host, you may find tajo master port conflicts. To avoid this problem, you must convert backup master primary ports to another port in ``tajo-site.xml`` as follows: + If you want to run active master and backup master on the same host, you may find TajoMaster port conflicts. To avoid this problem, you must convert backup master primary ports to another port in ``tajo-site.xml`` as follows: .. code-block:: xml @@ -108,7 +108,7 @@ Then, execute ``start-tajo.sh`` :: .. note:: - You can't use HA mode in DerbyStore. Currently, just one tajo master invoke the derby. If another master try to invoke it, it never run itself. Also, if you set another catalog uri for backup master, it is a incorrect configuration. Because they are unequal in every way. + You can't use HA mode in DerbyStore. Currently, just one TajoMaster invoke the derby. If another master try to invoke it, it never run itself. Also, if you set another catalog uri for backup master, it is a incorrect configuration. Because they are unequal in every way. ================================================ Administration HA state @@ -132,4 +132,16 @@ If you want to initiate HA information, execute ``tajo haadmin -formatHA`` :: .. note:: - Before format HA, you must shutdown the tajo cluster. \ No newline at end of file + Before format HA, you must shutdown the Tajo cluster. + + +================================================ + Verify Automatic Failover +================================================ + +If you want to verify automatic failover, you must deploy your Tajo cluster with TajoMaster HA enable. And then, you +need to find which node is active by visiting the Tajo web interfaces. + +Once you have located your active TajoMaster, you can cause a failure on that node. For example, you can use kill -9 to simulate a JVM crash. Or you can shutdown the machine or disconnect network interface. And then, the backup TajoMaster should automatically become active within 5 seconds. The amount of time required to detect a failure and trigger a failover depends on the configuration of ``tajo.master.ha.monitor.interval``. If there is running queries, it will be finished successfully. Because your TajoClient will get the result data on TajoWorker. But you can't find already query history. Because TajoMaster stores query history on memory. So, the other master can't access already active master query history. And if there is no running query, the automatic failover run successfully. + +For reference, TajoMaster HA doesn't consider TajoWorker failure. It is related with TajoResourceManager and QueryMaster. \ No newline at end of file From 0634ac7f467549efeb8f2b0b03443c6a9b94cdb1 Mon Sep 17 00:00:00 2001 From: Jaehwa Jung Date: Wed, 8 Oct 2014 10:41:59 +0900 Subject: [PATCH 2/2] Update some comments. --- .../src/main/sphinx/configuration/ha_configuration.rst | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/tajo-docs/src/main/sphinx/configuration/ha_configuration.rst b/tajo-docs/src/main/sphinx/configuration/ha_configuration.rst index 962be8e359..8e4149c166 100644 --- a/tajo-docs/src/main/sphinx/configuration/ha_configuration.rst +++ b/tajo-docs/src/main/sphinx/configuration/ha_configuration.rst @@ -136,12 +136,11 @@ If you want to initiate HA information, execute ``tajo haadmin -formatHA`` :: ================================================ - Verify Automatic Failover + How to Test Automatic Failover ================================================ -If you want to verify automatic failover, you must deploy your Tajo cluster with TajoMaster HA enable. And then, you -need to find which node is active by visiting the Tajo web interfaces. +If you want to verify automatic failover of TajoMaster, you must deploy your Tajo cluster with TajoMaster HA enable. And then, you need to find which node is active from Tajo web UI. -Once you have located your active TajoMaster, you can cause a failure on that node. For example, you can use kill -9 to simulate a JVM crash. Or you can shutdown the machine or disconnect network interface. And then, the backup TajoMaster should automatically become active within 5 seconds. The amount of time required to detect a failure and trigger a failover depends on the configuration of ``tajo.master.ha.monitor.interval``. If there is running queries, it will be finished successfully. Because your TajoClient will get the result data on TajoWorker. But you can't find already query history. Because TajoMaster stores query history on memory. So, the other master can't access already active master query history. And if there is no running query, the automatic failover run successfully. +Once you find your active TajoMaster, you can cause a failure on that node. For example, you can use kill -9 to simulate a JVM crash. Or you can shutdown the machine or disconnect network interface. And then, the backup TajoMaster will be automatically active within 5 seconds. The amount of time required to detect a failure and trigger a failover depends on the config ``tajo.master.ha.monitor.interval``. If there is running queries, it will be finished successfully. Because your TajoClient will get the result data on TajoWorker. But you can't find already query history. Because TajoMaster stores query history on memory. So, the other master can't access already active master query history. And if there is no running query, the automatic failover run successfully. For reference, TajoMaster HA doesn't consider TajoWorker failure. It is related with TajoResourceManager and QueryMaster. \ No newline at end of file