Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NPE is thrown when LoadBalancer returns null server #427

Closed
mikexliu opened this issue Aug 3, 2016 · 5 comments
Closed

NPE is thrown when LoadBalancer returns null server #427

mikexliu opened this issue Aug 3, 2016 · 5 comments
Labels
documentation Issues that require updates to our documentation

Comments

@mikexliu
Copy link
Contributor

mikexliu commented Aug 3, 2016

Cause:
LoadBalancingTarget.java

  @Override
  public Request apply(RequestTemplate input) {
    Server currentServer = lb.chooseServer(null);
    String url = format("%s://%s", scheme, currentServer.getHostPort());
    input.insert(0, url);
    try {
      return input.request();
    } finally {
      lb.getLoadBalancerStats().incrementNumRequests(currentServer);
    }
  }

BaseLoadBalancer

public Server chooseServer(Object key) {
        if(this.counter == null) {
            this.counter = this.createCounter();
        }

        this.counter.increment();
        if(this.rule == null) {
            return null;
        } else {
            try {
                return this.rule.choose(key);
            } catch (Throwable var3) {
                return null;
            }
        }
    }

SynchronousMethodHandler.java

  @Override
  public Object invoke(Object[] argv) throws Throwable {
    RequestTemplate template = buildTemplateFromArgs.create(argv);
    Retryer retryer = this.retryer.clone();
    while (true) {
      try {
        return executeAndDecode(template);
      } catch (RetryableException e) {
        retryer.continueOrPropagate(e);
        if (logLevel != Logger.Level.NONE) {
          logger.logRetry(metadata.configKey(), logLevel);
        }
        continue;
      }
    }
  }

If lb.chooseServer(null); returns null, then an NPE is thrown and it cannot be caught by the retryer, since it can only catch RetryableException.

Stacktrace:

11:45:32.688 [hystrix-distributed-service-1] WARN com.netflix.loadbalancer.RoundRobinRule - No up servers available from load balancer: DynamicServerListLoadBalancer:{NFLoadBalancer:name=distributed-service,current list of Servers=[],Load balancer stats=Zone stats: {},Server stats: []}ServerList:com.netflix.loadbalancer.ConfigurationBasedServerList@2892d68
11:45:32.691 [hystrix-distributed-service-1] DEBUG com.netflix.hystrix.AbstractCommand - Error executing HystrixCommand.run(). Proceeding to fallback logic ...
java.lang.NullPointerException: null
    at feign.ribbon.LoadBalancingTarget.apply(LoadBalancingTarget.java:95)
    at feign.SynchronousMethodHandler.targetRequest(SynchronousMethodHandler.java:156)
    at feign.SynchronousMethodHandler.executeAndDecode(SynchronousMethodHandler.java:88)
    at feign.SynchronousMethodHandler.invoke(SynchronousMethodHandler.java:76)
    at feign.hystrix.HystrixInvocationHandler$1.run(HystrixInvocationHandler.java:98)
    at com.netflix.hystrix.HystrixCommand$1.call(HystrixCommand.java:293)
    at com.netflix.hystrix.HystrixCommand$1.call(HystrixCommand.java:289)
    at rx.internal.operators.OnSubscribeDefer.call(OnSubscribeDefer.java:46)
    at rx.internal.operators.OnSubscribeDefer.call(OnSubscribeDefer.java:35)
    at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:50)
    at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30)
    at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:50)
    at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30)
    at rx.Observable.unsafeSubscribe(Observable.java:8460)
    at rx.internal.operators.OnSubscribeDefer.call(OnSubscribeDefer.java:51)
    at rx.internal.operators.OnSubscribeDefer.call(OnSubscribeDefer.java:35)
    at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:50)
    at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30)
    at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:50)
    at rx.internal.operators.OnSubscribeLift.call(OnSubscribeLift.java:30)
    at rx.Observable.unsafeSubscribe(Observable.java:8460)
    at rx.internal.operators.OperatorSubscribeOn$1.call(OperatorSubscribeOn.java:94)
    at com.netflix.hystrix.strategy.concurrency.HystrixContexSchedulerAction$1.call(HystrixContexSchedulerAction.java:56)
    at com.netflix.hystrix.strategy.concurrency.HystrixContexSchedulerAction$1.call(HystrixContexSchedulerAction.java:47)
    at com.netflix.hystrix.strategy.concurrency.HystrixContexSchedulerAction.call(HystrixContexSchedulerAction.java:69)
    at rx.internal.schedulers.ScheduledAction.run(ScheduledAction.java:55)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
@codefromthecrypt
Copy link
Contributor

codefromthecrypt commented Aug 3, 2016 via email

@mikexliu
Copy link
Contributor Author

mikexliu commented Aug 3, 2016

@codefromthecrypt
Copy link
Contributor

yep open a bug in ribbon:

ILoadBalancer.chooseServer() needs to document how we should interpret a null result. Otherwise, we could retry forever on a bug that isn't transient.

@kdavisk6
Copy link
Member

Looking over the latest version of Ribbon, while it does not say that chooseServer() can return null, it is possible for that to occur, especially if the Round Robin is unable to find a server to use. It does also state in RoundRobinRule that a null should be transient.

https://github.com/Netflix/ribbon/blob/1a4fbcc99da3e376c3e17469bc25caa6cf91fa4f/ribbon-loadbalancer/src/main/java/com/netflix/loadbalancer/RoundRobinRule.java#L51

From my perspective, I think it is acceptable for us to catch this scenario and throw a RetryableException. We may run into an issue if the specified Retryer tries forever, but I think as long as we document that, we should be ok. Thoughts?

@kdavisk6 kdavisk6 added documentation Issues that require updates to our documentation and removed needs info Information is either missing or incomplete. labels May 28, 2019
@kdavisk6
Copy link
Member

Closing due to inactivity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Issues that require updates to our documentation
Projects
None yet
Development

No branches or pull requests

3 participants