[SCB-548] support gracefully shutdown#693
Conversation
|
Springmvc it test failed because we had fully shutdown and re-init seems do not init transport again,I will try fix this problem |
|
As the Server socket maybe reused, it's hard to reconnect the server which is restarted with the same port. I think we need to avoid restarting the server in the unit test or system test. |
| @Override | ||
| public void handle(Invocation invocation, AsyncResponse asyncResp) throws Exception { | ||
| if (shuttingDown) { | ||
| System.out.println("shutting down in progress"); |
There was a problem hiding this comment.
Do not use System.out.println to print logs. And we have already throw an exception with this message, and I think this log is not necessary.
There was a problem hiding this comment.
So sorry is for debug,thanks!
|
[�[1;31mERROR�[m] Failed to execute goal �[32mio.fabric8:docker-maven-plugin:0.20.0:start�[m �[1m(start)�[m on project �[36mdynamic-config-tests�[m: �[1;31mExecution start of goal io.fabric8:docker-maven-plugin:0.20.0:start failed: Start-Job failed with unexpected exception: [nobodyiam/apollo-quick-start] "apollo.servicecomb.apache.org": Timeout after 120033 ms while waiting on log out 'Portal started' and on tcp port '[/172.17.0.3:8080, /172.17.0.3:8070]'�[m -> �[1m[Help 1]�[m Failed by apollo IT |
|
if ContextCloseEvent and handler/ShutdownHookHandler.java is random ordered, then it's not ok |
|
I had used a Semaphore in order to sync all invocation had finished.now gracefully shutdown had fixed below order :
|
Signed-off-by: zhengyangyong <yangyong.zheng@huawei.com>
Signed-off-by: zhengyangyong <yangyong.zheng@huawei.com>
Signed-off-by: zhengyangyong <yangyong.zheng@huawei.com>
Signed-off-by: zhengyangyong <yangyong.zheng@huawei.com>
…gmvc-tests) make four independent module in order to eliminate disturbance(from gracefully shutdown) Signed-off-by: zhengyangyong <yangyong.zheng@huawei.com>
|
Rebase on latest master |
Signed-off-by: zhengyangyong <yangyong.zheng@huawei.com>
|
improve gracefully shutdown process order private void gracefullyShutdown() {
//Step 1: notify all component stop invoke via BEFORE_CLOSE Event
triggerEvent(EventType.BEFORE_CLOSE);
//Step 2: Unregister microservice instance from Service Center and close vertx
RegistryUtils.destroy();
VertxUtils.closeVertxByName("registry");
//Step 3: wait all invocation finished
try {
ShutdownHookHandler.INSTANCE.ALL_INVOCATION_FINISHED.acquire();
LOGGER.warn("all invocation finished");
} catch (InterruptedException e) {
LOGGER.error("invocation finished semaphore interrupted", e);
}
//Step 4: Stop vertx to prevent blocking exit
VertxUtils.closeVertxByName("config-center");
VertxUtils.closeVertxByName("transport");
//Step 5: notify all component do clean works via AFTER_CLOSE Event
triggerEvent(EventType.AFTER_CLOSE);
} |
|
ShutdownHookHandler is not so good, sometimes will block shutdown process cause of wrong statistics, must depend on timeout to make shutdown process continue. we should: |
|
and CI failed, it's about trace, i can find what happened, it seems that did not print error test case name? |
|
@wujimin failed by 'ExecutionException Error occurred in starting fork, check output in log', and I do not change zuul test case, I will reopen for restart ci check |
Signed-off-by: zhengyangyong <yangyong.zheng@huawei.com>
|
|
||
| private ShutdownHookHandler() { | ||
| try { | ||
| ALL_INVOCATION_FINISHED.acquire(); |
There was a problem hiding this comment.
I think it's better to make semaphore private and add a method like waitInvocationFinish. And in initialization, acquire is not required and shutdownhook can be removed too.
| } | ||
| } | ||
|
|
||
| private synchronized void validAllInvocationFinished() { |
There was a problem hiding this comment.
This synchronized code blocking all invocations and not good idea
|
ShutdownHookHandler is not so good, sometimes will block shutdown process cause of wrong statistics, must depend on timeout to make shutdown process continue. ----I don't know how this comes. Because we should: We are experiencing a lot of scenarios that can not be shutdown but need force to exit the program. Program not exit can cause more bad consequences. |
|
in performance test, ShutdownHookHandler will always block shutdown process for many seconds (wait for timeout for wrong request/response count statistics) will change statistics from invocation handler to invocation event by new hook, even in performance test, we can shutdown normally in one or two second. |
Signed-off-by: zhengyangyong <yangyong.zheng@huawei.com>
| } | ||
|
|
||
| public synchronized void init() { | ||
| if (validIsDown()) { |
There was a problem hiding this comment.
It's better to use the SCBStatus directly , I cannot get the method work by reading the name.
Signed-off-by: zhengyangyong <yangyong.zheng@huawei.com>
Signed-off-by: zhengyangyong <yangyong.zheng@huawei.com>
| safeTriggerEvent(EventType.AFTER_CLOSE); | ||
|
|
||
| //Step 6: Clean flags for re-init | ||
| eventBus.unregister(this); |
There was a problem hiding this comment.
when succeed to init, then already unregister this, unregister a not register subscribe will cause exception.
There was a problem hiding this comment.
yes,it had unregister in triggerAfterRegistryEvent after succeed to init,I will delete this line
| } | ||
| } | ||
|
|
||
| private void doUninit() throws Exception { |
There was a problem hiding this comment.
i think this should not throw exception
even one step failed, must finish remain steps
| //Chassis is Stopping (progressing) | ||
| STOPPING, | ||
| //Chassis Init Failed | ||
| FAILED |
There was a problem hiding this comment.
seems that uninit failed, will set status to FAILED.
so what's your design?
There was a problem hiding this comment.
I don't know if uninit failed, can enable init again ,so I think we may better set to FAILED ?
There was a problem hiding this comment.
currently, we will only init and uninit only one time
but i remember your UT will do this multi times?
There was a problem hiding this comment.
Done,set to DOWN because no exception will be throw
Signed-off-by: zhengyangyong <yangyong.zheng@huawei.com>
|
Failed by apollo it , [�[1;31mERROR�[m] Failed to execute goal �[32mio.fabric8:docker-maven-plugin:0.20.0:start�[m �[1m(start)�[m on project �[36mdynamic-config-tests�[m: �[1;31mExecution start of goal io.fabric8:docker-maven-plugin:0.20.0:start failed: Start-Job failed with unexpected exception: [nobodyiam/apollo-quick-start] "apollo.servicecomb.apache.org": Timeout after 120162 ms while waiting on log out 'Portal started' and on tcp port '[/172.17.0.3:8080, /172.17.0.3:8070]'�[m -> �[1m[Help 1]�[m Reopen |
| VertxUtils.blockCloseVertxByName("registry"); | ||
|
|
||
| //Step 3: wait all invocation finished | ||
| // forbit create new consumer invocation |
There was a problem hiding this comment.
forbit comments should move to status = SCBStatus.STOPPING;
There was a problem hiding this comment.
yes,it's my mistake,will fix
| SchemaMeta schemaMeta = referenceConfig.getMicroserviceMeta().ensureFindSchemaMeta(schemaId); | ||
| Invocation invocation = InvocationFactory.forConsumer(referenceConfig, schemaMeta, operationName, args); | ||
| return syncInvoke(invocation); | ||
| validCanInvoke(); |
There was a problem hiding this comment.
please check validCanInvoke()
this invoked multi times for one invocation
There was a problem hiding this comment.
Considering the check cost is low, I think it's no problem becasuse these method are all public and user may direct use them for general invoke.
There was a problem hiding this comment.
ok,will changed in future PR
There was a problem hiding this comment.
"validCanInvoke" is a bad name (It just check the status of Engine), how about using the "checkEngineStatus".
Signed-off-by: zhengyangyong <yangyong.zheng@huawei.com>
Signed-off-by: zhengyangyong <yangyong.zheng@huawei.com>
Signed-off-by: zhengyangyong yangyong.zheng@huawei.com
Follow this checklist to help us incorporate your contribution quickly and easily:
[SCB-XXX] Fixes bug in ApproximateQuantiles, where you replaceSCB-XXXwith the appropriate JIRA issue.mvn clean installto make sure basic checks pass. A more thorough check will be performed on your pull request automatically.When user call system.exit(0),these three works will do:
1.Unregister microservice instance from Service Center:
this work will do by process ContextClosedEvent, we need unregister immediately for stopping provide service any more.
2.Waiting for all invocations to finish:
this work will do by ShutdownHandler, when all invocations finished or 'deadline time reached', cc and transport vertx threads will close.
3.All spring bean do close process
we had registerShutdownHook for spring ApplicationContext and beans can define 'destroy-method' do cleaning.
here is example:
Spring mvc Hello Java Chassis
Pojo Hello person ServiceComb/Java Chassis
Jaxrs Hello person ServiceComb/Java Chassis
Spring mvc Hello person ServiceComb/Java Chassis
2018-05-09 16:02:14,288 [WARN] handler chain is shutting down org.apache.servicecomb.core.handler.ShutdownHookHandler.run(ShutdownHookHandler.java:87)
2018-05-09 16:02:14,289 [INFO] Closing org.springframework.context.support.ClassPathXmlApplicationContext@7cf10a6f: startup date [Wed May 09 16:02:05 CST 2018]; root of context hierarchy org.springframework.context.support.AbstractApplicationContext.doClose(AbstractApplicationContext.java:984)
2018-05-09 16:02:14,290 [WARN] cse is closing now... org.apache.servicecomb.core.CseApplicationListener.onApplicationEvent(CseApplicationListener.java:148)
2018-05-09 16:02:14,291 [INFO] service center task is shutdown. org.apache.servicecomb.serviceregistry.registry.RemoteServiceRegistry.onShutdown(RemoteServiceRegistry.java:72)
2018-05-09 16:02:14,295 [WARN] handler chain is shut down org.apache.servicecomb.core.handler.ShutdownHookHandler.run(ShutdownHookHandler.java:103)
2018-05-09 16:02:14,296 [INFO] Unregister microservice instance success. microserviceId=90b76fd551c511e8b51db4b676a39f40 instanceId=481f630d535f11e8bc19b4b676a39f40 org.apache.servicecomb.serviceregistry.registry.AbstractServiceRegistry.unregisterInstance(AbstractServiceRegistry.java:232)
Process finished with exit code 0