Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasional failure TccLoadBalanceSenderTest.participateFailedThenRetry #430

Closed
coolbeevip opened this issue Mar 15, 2019 · 1 comment
Closed

Comments

@coolbeevip
Copy link
Member

coolbeevip commented Mar 15, 2019

alpha server 0.4.0 master

Use mvn clean install -Pdemo,spring-boot-2

error log

-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running org.apache.servicecomb.pack.omega.connector.grpc.tcc.GrpcTccClientMessageSenderTest
Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.594 sec - in org.apache.servicecomb.pack.omega.connector.grpc.tcc.GrpcTccClientMessageSenderTest
Running org.apache.servicecomb.pack.omega.connector.grpc.tcc.TccLoadBalanceSenderTest
Tests run: 9, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 10.391 sec <<< FAILURE! - in org.apache.servicecomb.pack.omega.connector.grpc.tcc.TccLoadBalanceSenderTest
participateFailedThenRetry(org.apache.servicecomb.pack.omega.connector.grpc.tcc.TccLoadBalanceSenderTest)  Time elapsed: 2.388 sec  <<< FAILURE!
java.lang.AssertionError: 

Expected: is <3>
     but: was <4>
        at org.apache.servicecomb.pack.omega.connector.grpc.tcc.TccLoadBalanceSenderTest.participateFailedThenRetry(TccLoadBalanceSenderTest.java:213)

Running org.apache.servicecomb.pack.omega.connector.grpc.LoadBalanceContextBuilderTest
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.115 sec - in org.apache.servicecomb.pack.omega.connector.grpc.LoadBalanceContextBuilderTest
Running org.apache.servicecomb.pack.omega.connector.grpc.PushBackReconnectRunnableTest
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.025 sec - in org.apache.servicecomb.pack.omega.connector.grpc.PushBackReconnectRunnableTest
Running org.apache.servicecomb.pack.omega.connector.grpc.saga.RetryableMessageSenderTest
Tests run: 3, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.022 sec - in org.apache.servicecomb.pack.omega.connector.grpc.saga.RetryableMessageSenderTest
Running org.apache.servicecomb.pack.omega.connector.grpc.saga.SagaLoadBalanceSenderWithTLSTest
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1.579 sec - in org.apache.servicecomb.pack.omega.connector.grpc.saga.SagaLoadBalanceSenderWithTLSTest
Running org.apache.servicecomb.pack.omega.connector.grpc.saga.SagaLoadBalancedSenderTest
Tests run: 11, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 14.497 sec - in org.apache.servicecomb.pack.omega.connector.grpc.saga.SagaLoadBalancedSenderTest

Results :

Failed tests: 
  TccLoadBalanceSenderTest.participateFailedThenRetry:213 
Expected: is <3>
     but: was <4>

I found a possible problem in TccLoadBalanceSender.participationStart

  @Override
  public AlphaResponse participationStart(ParticipationStartedEvent participationStartedEvent) {
    do {
      final TccMessageSender messageSender = pickMessageSender();
      Optional<AlphaResponse> response = doGrpcSend(messageSender, participationStartedEvent, new SenderExecutor<ParticipationStartedEvent>() {
        @Override
        public AlphaResponse apply(ParticipationStartedEvent event) {
          return messageSender.participationStart(event);
        }
      });
      if (response.isPresent()) return response.get();
    } while (!Thread.currentThread().isInterrupted());

    throw new OmegaException("Failed to send event " + participationStartedEvent + " due to interruption");
  }

if response is empty it will be executed cyclically. May be this reason?

@WillemJiang
Copy link
Member

WillemJiang commented Mar 16, 2019

do... while is intended. The sender keeps sending the message until there is no exception.
As we just wait for at most 2 seconds for the restarting of the service. It could cause some trouble if server doesn't start in 2 seconds.
I just updated the wait time to 3 seconds to make sure the server is back for business on a slower box.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants