[SCB-548] support gracefully shutdown by zhengyangyong · Pull Request #693 · apache/servicecomb-java-chassis

zhengyangyong · 2018-05-08T12:26:25Z

Signed-off-by: zhengyangyong yangyong.zheng@huawei.com

Follow this checklist to help us incorporate your contribution quickly and easily:

Make sure there is a JIRA issue filed for the change (usually before you start working on it). Trivial changes like typos do not require a JIRA issue. Your pull request should address just this issue, without pulling in other changes.
Each commit in the pull request should have a meaningful subject line and body.
Format the pull request title like [SCB-XXX] Fixes bug in ApproximateQuantiles, where you replace SCB-XXX with the appropriate JIRA issue.
Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
Run mvn clean install to make sure basic checks pass. A more thorough check will be performed on your pull request automatically.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

When user call system.exit(0),these three works will do:
1.Unregister microservice instance from Service Center:
this work will do by process ContextClosedEvent, we need unregister immediately for stopping provide service any more.
2.Waiting for all invocations to finish:
this work will do by ShutdownHandler, when all invocations finished or 'deadline time reached', cc and transport vertx threads will close.
3.All spring bean do close process
we had registerShutdownHook for spring ApplicationContext and beans can define 'destroy-method' do cleaning.

here is example:

Spring mvc Hello Java Chassis
Pojo Hello person ServiceComb/Java Chassis
Jaxrs Hello person ServiceComb/Java Chassis
Spring mvc Hello person ServiceComb/Java Chassis
2018-05-09 16:02:14,288 [WARN] handler chain is shutting down org.apache.servicecomb.core.handler.ShutdownHookHandler.run(ShutdownHookHandler.java:87)
2018-05-09 16:02:14,289 [INFO] Closing org.springframework.context.support.ClassPathXmlApplicationContext@7cf10a6f: startup date [Wed May 09 16:02:05 CST 2018]; root of context hierarchy org.springframework.context.support.AbstractApplicationContext.doClose(AbstractApplicationContext.java:984)
2018-05-09 16:02:14,290 [WARN] cse is closing now... org.apache.servicecomb.core.CseApplicationListener.onApplicationEvent(CseApplicationListener.java:148)
2018-05-09 16:02:14,291 [INFO] service center task is shutdown. org.apache.servicecomb.serviceregistry.registry.RemoteServiceRegistry.onShutdown(RemoteServiceRegistry.java:72)
2018-05-09 16:02:14,295 [WARN] handler chain is shut down org.apache.servicecomb.core.handler.ShutdownHookHandler.run(ShutdownHookHandler.java:103)
2018-05-09 16:02:14,296 [INFO] Unregister microservice instance success. microserviceId=90b76fd551c511e8b51db4b676a39f40 instanceId=481f630d535f11e8bc19b4b676a39f40 org.apache.servicecomb.serviceregistry.registry.AbstractServiceRegistry.unregisterInstance(AbstractServiceRegistry.java:232)
Process finished with exit code 0

zhengyangyong · 2018-05-09T01:07:08Z

Springmvc it test failed because we had fully shutdown and re-init seems do not init transport again,I will try fix this problem

WillemJiang · 2018-05-09T02:26:22Z

As the Server socket maybe reused, it's hard to reconnect the server which is restarted with the same port. I think we need to avoid restarting the server in the unit test or system test.

coveralls · 2018-05-09T08:34:30Z

Coverage increased (+0.2%) to 87.551% when pulling 4b5722c on zhengyangyong:SCB-548 into 5c29a7c on apache:master.

liubao68 · 2018-05-10T01:53:56Z

core/src/main/java/org/apache/servicecomb/core/handler/ShutdownHookHandler.java

  @Override
  public void handle(Invocation invocation, AsyncResponse asyncResp) throws Exception {
    if (shuttingDown) {
+      System.out.println("shutting down in progress");


Do not use System.out.println to print logs. And we have already throw an exception with this message, and I think this log is not necessary.

So sorry is for debug,thanks!

zhengyangyong · 2018-05-10T07:48:37Z

[�[1;31mERROR�[m] Failed to execute goal �[32mio.fabric8:docker-maven-plugin:0.20.0:start�[m �[1m(start)�[m on project �[36mdynamic-config-tests�[m: �[1;31mExecution start of goal io.fabric8:docker-maven-plugin:0.20.0:start failed: Start-Job failed with unexpected exception: [nobodyiam/apollo-quick-start] "apollo.servicecomb.apache.org": Timeout after 120033 ms while waiting on log out 'Portal started' and on tcp port '[/172.17.0.3:8080, /172.17.0.3:8070]'�[m -> �[1m[Help 1]�[m

Failed by apollo IT

wujimin · 2018-05-10T08:40:15Z

if ContextCloseEvent and handler/ShutdownHookHandler.java is random ordered, then it's not ok

zhengyangyong · 2018-05-11T06:58:49Z

I had used a Semaphore in order to sync all invocation had finished.now gracefully shutdown had fixed below order :

Unregister microservice instance from Service Center and close vertx
Wait all invocation finished (delete timeout mechanism in ShutdownHandler)
Notify all component do clean works via Event
Stop transport and cc vertx to prevent blocking exit

Signed-off-by: zhengyangyong <yangyong.zheng@huawei.com>

…gmvc-tests) make four independent module in order to eliminate disturbance(from gracefully shutdown) Signed-off-by: zhengyangyong <yangyong.zheng@huawei.com>

zhengyangyong · 2018-05-11T07:39:04Z

Rebase on latest master

Signed-off-by: zhengyangyong <yangyong.zheng@huawei.com>

zhengyangyong · 2018-05-11T08:44:12Z

improve gracefully shutdown process order

  private void gracefullyShutdown() {
    //Step 1: notify all component stop invoke via BEFORE_CLOSE Event
    triggerEvent(EventType.BEFORE_CLOSE);

    //Step 2: Unregister microservice instance from Service Center and close vertx
    RegistryUtils.destroy();
    VertxUtils.closeVertxByName("registry");

    //Step 3: wait all invocation finished
    try {
      ShutdownHookHandler.INSTANCE.ALL_INVOCATION_FINISHED.acquire();
      LOGGER.warn("all invocation finished");
    } catch (InterruptedException e) {
      LOGGER.error("invocation finished semaphore interrupted", e);
    }

    //Step 4: Stop vertx to prevent blocking exit
    VertxUtils.closeVertxByName("config-center");
    VertxUtils.closeVertxByName("transport");

    //Step 5: notify all component do clean works via AFTER_CLOSE Event
    triggerEvent(EventType.AFTER_CLOSE);
  }

wujimin · 2018-05-12T02:06:30Z

ShutdownHookHandler is not so good, sometimes will block shutdown process cause of wrong statistics, must depend on timeout to make shutdown process continue.

we should:
1.prevent create new consumer invocation
2.prevent other find me
3.make accurate statistics

wujimin · 2018-05-12T02:07:54Z

and CI failed, it's about trace, i can find what happened, it seems that did not print error test case name?

zhengyangyong · 2018-05-13T04:05:43Z

@wujimin failed by 'ExecutionException Error occurred in starting fork, check output in log', and I do not change zuul test case, I will reopen for restart ci check

Signed-off-by: zhengyangyong <yangyong.zheng@huawei.com>

liubao68 · 2018-05-14T02:26:56Z

core/src/main/java/org/apache/servicecomb/core/handler/ShutdownHookHandler.java

+
  private ShutdownHookHandler() {
+    try {
+      ALL_INVOCATION_FINISHED.acquire();


I think it's better to make semaphore private and add a method like waitInvocationFinish. And in initialization, acquire is not required and shutdownhook can be removed too.

liubao68 · 2018-05-14T02:31:21Z

core/src/main/java/org/apache/servicecomb/core/handler/ShutdownHookHandler.java

    }
  }

+  private synchronized void validAllInvocationFinished() {


This synchronized code blocking all invocations and not good idea

liubao68 · 2018-05-14T02:45:33Z

@wujimin

ShutdownHookHandler is not so good, sometimes will block shutdown process cause of wrong statistics, must depend on timeout to make shutdown process continue. ----I don't know how this comes. Because
ShutdownHookHandler will wait quit a long time much more than the request timeout. If the time reached there usually means timeout not properly set or some other bug.

we should:
1.prevent create new consumer invocation ----This is have already done
2.prevent other find me ---We can not prevent it from actually happen and maybe time delay.
3.make accurate statistics ---The process is shutdown and we force close some unexpected invocations and I think the precision is not very important and accurate statistics can not be achieved, due to metrics shutdown, for example.

We are experiencing a lot of scenarios that can not be shutdown but need force to exit the program. Program not exit can cause more bad consequences.

wujimin · 2018-05-14T02:58:09Z

in performance test, ShutdownHookHandler will always block shutdown process for many seconds (wait for timeout for wrong request/response count statistics)

will change statistics from invocation handler to invocation event
after change to invocation event, then reject in invocation handler is too late
so shutdown hook will rewrite

by new hook, even in performance test, we can shutdown normally in one or two second.

Signed-off-by: zhengyangyong <yangyong.zheng@huawei.com>

WillemJiang · 2018-05-16T03:50:06Z

core/src/main/java/org/apache/servicecomb/core/SCBEngine.java

+  }
+
+  public synchronized void init() {
+    if (validIsDown()) {


It's better to use the SCBStatus directly , I cannot get the method work by reading the name.

sounds good , fixed

Signed-off-by: zhengyangyong <yangyong.zheng@huawei.com>

wujimin · 2018-05-17T02:59:37Z

core/src/main/java/org/apache/servicecomb/core/SCBEngine.java

+    safeTriggerEvent(EventType.AFTER_CLOSE);
+
+    //Step 6: Clean flags for re-init
+    eventBus.unregister(this);


when succeed to init, then already unregister this, unregister a not register subscribe will cause exception.

yes,it had unregister in triggerAfterRegistryEvent after succeed to init,I will delete this line

wujimin · 2018-05-17T03:38:01Z

core/src/main/java/org/apache/servicecomb/core/SCBEngine.java

+    }
+  }
+
+  private void doUninit() throws Exception {


i think this should not throw exception
even one step failed, must finish remain steps

OK, i will fix

wujimin · 2018-05-17T03:42:28Z

core/src/main/java/org/apache/servicecomb/core/SCBStatus.java

+  //Chassis is Stopping (progressing)
+  STOPPING,
+  //Chassis Init Failed
+  FAILED


seems that uninit failed, will set status to FAILED.
so what's your design?

I don't know if uninit failed, can enable init again ,so I think we may better set to FAILED ?

currently, we will only init and uninit only one time
but i remember your UT will do this multi times?

Done,set to DOWN because no exception will be throw

Signed-off-by: zhengyangyong <yangyong.zheng@huawei.com>

zhengyangyong · 2018-05-17T07:25:32Z

Failed by apollo it , [�[1;31mERROR�[m] Failed to execute goal �[32mio.fabric8:docker-maven-plugin:0.20.0:start�[m �[1m(start)�[m on project �[36mdynamic-config-tests�[m: �[1;31mExecution start of goal io.fabric8:docker-maven-plugin:0.20.0:start failed: Start-Job failed with unexpected exception: [nobodyiam/apollo-quick-start] "apollo.servicecomb.apache.org": Timeout after 120162 ms while waiting on log out 'Portal started' and on tcp port '[/172.17.0.3:8080, /172.17.0.3:8070]'�[m -> �[1m[Help 1]�[m Reopen

wujimin · 2018-05-17T14:08:12Z

core/src/main/java/org/apache/servicecomb/core/SCBEngine.java

+    VertxUtils.blockCloseVertxByName("registry");
+
+    //Step 3: wait all invocation finished
+    // forbit create new consumer invocation


forbit comments should move to status = SCBStatus.STOPPING;

yes,it's my mistake,will fix

wujimin · 2018-05-17T14:11:16Z

core/src/main/java/org/apache/servicecomb/core/provider/consumer/InvokerUtils.java

-    SchemaMeta schemaMeta = referenceConfig.getMicroserviceMeta().ensureFindSchemaMeta(schemaId);
-    Invocation invocation = InvocationFactory.forConsumer(referenceConfig, schemaMeta, operationName, args);
-    return syncInvoke(invocation);
+    validCanInvoke();


please check validCanInvoke()
this invoked multi times for one invocation

Considering the check cost is low, I think it's no problem becasuse these method are all public and user may direct use them for general invoke.

ok,will changed in future PR

"validCanInvoke" is a bad name (It just check the status of Engine), how about using the "checkEngineStatus".

Signed-off-by: zhengyangyong <yangyong.zheng@huawei.com>

liubao68 reviewed May 10, 2018

View reviewed changes

liubao68 closed this May 10, 2018

liubao68 reopened this May 10, 2018

zhengyangyong closed this May 10, 2018

zhengyangyong reopened this May 10, 2018

liubao68 approved these changes May 10, 2018

View reviewed changes

zhengyangyong added 5 commits May 11, 2018 15:23

SCB-548 support gracefully shutdown

da01862

Signed-off-by: zhengyangyong <yangyong.zheng@huawei.com>

SCB-548 improvement and fix it failed

e3e04a5

Signed-off-by: zhengyangyong <yangyong.zheng@huawei.com>

SCB-548 fix pr comment

3f1ccf8

Signed-off-by: zhengyangyong <yangyong.zheng@huawei.com>

SCB-548 improve gracefully shutdown

6eb811f

Signed-off-by: zhengyangyong <yangyong.zheng@huawei.com>

SCB-548 reorganization springmvc it (spring-springmvc-tests and sprin…

d66eb28

…gmvc-tests) make four independent module in order to eliminate disturbance(from gracefully shutdown) Signed-off-by: zhengyangyong <yangyong.zheng@huawei.com>

zhengyangyong force-pushed the SCB-548 branch from 5fd157e to d66eb28 Compare May 11, 2018 07:37

SCB-548 improve gracefully shutdown process order

9eb2fde

Signed-off-by: zhengyangyong <yangyong.zheng@huawei.com>

zhengyangyong closed this May 13, 2018

zhengyangyong reopened this May 13, 2018

SCB-548 fix fork issue for ci

b7b60b9

Signed-off-by: zhengyangyong <yangyong.zheng@huawei.com>

liubao68 reviewed May 14, 2018

View reviewed changes

SCB-548 refactor for delete ShutdownHandler

ff7b87e

Signed-off-by: zhengyangyong <yangyong.zheng@huawei.com>

WillemJiang reviewed May 16, 2018

View reviewed changes

zhengyangyong added 2 commits May 16, 2018 17:17

SCB-548 refactor and fix pr comment

7c464f4

Signed-off-by: zhengyangyong <yangyong.zheng@huawei.com>

SCB-548 add SCBEngine UT

7e2cbd6

Signed-off-by: zhengyangyong <yangyong.zheng@huawei.com>

wujimin reviewed May 17, 2018

View reviewed changes

SCB-548 fix pr comment and update coverage pom

344fe80

Signed-off-by: zhengyangyong <yangyong.zheng@huawei.com>

zhengyangyong closed this May 17, 2018

zhengyangyong reopened this May 17, 2018

wujimin reviewed May 17, 2018

View reviewed changes

SCB-548 fix pr comment

8f1bc8f

Signed-off-by: zhengyangyong <yangyong.zheng@huawei.com>

wujimin approved these changes May 18, 2018

View reviewed changes

SCB-548 fix pr comment

4b5722c

Signed-off-by: zhengyangyong <yangyong.zheng@huawei.com>

wujimin merged commit 731e09e into apache:master May 18, 2018

zhengyangyong mentioned this pull request May 18, 2018

怎样平滑关闭Client？ #685

Closed

zhengyangyong mentioned this pull request Jul 20, 2018

负载均衡使用问题 #341

Closed

Conversation

zhengyangyong commented May 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhengyangyong commented May 9, 2018

Uh oh!

WillemJiang commented May 9, 2018

Uh oh!

coveralls commented May 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhengyangyong commented May 10, 2018

Uh oh!

wujimin commented May 10, 2018

Uh oh!

zhengyangyong commented May 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhengyangyong commented May 11, 2018

Uh oh!

zhengyangyong commented May 11, 2018

Uh oh!

wujimin commented May 12, 2018

Uh oh!

wujimin commented May 12, 2018

Uh oh!

zhengyangyong commented May 13, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liubao68 commented May 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wujimin commented May 14, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhengyangyong commented May 17, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

zhengyangyong commented May 8, 2018 •

edited

Loading

coveralls commented May 9, 2018 •

edited

Loading

zhengyangyong commented May 11, 2018 •

edited

Loading

liubao68 commented May 14, 2018 •

edited

Loading