Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Threads stuck when jersey client with apache connector is used concurrently #3772

Closed
jerseyrobot opened this issue Feb 15, 2018 · 11 comments

Comments

@jerseyrobot
Copy link
Collaborator

commented Feb 15, 2018

Description

Using jersey client with apache connector from multiple threads concurrently results in threads that wait forever

How to reproduce

Run the following code

import static javax.ws.rs.HttpMethod.GET;
import static javax.ws.rs.core.MediaType.APPLICATION_JSON_TYPE;

import java.net.URI;
import java.util.List;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutionException;

import javax.ws.rs.client.Client;
import javax.ws.rs.client.ClientBuilder;
import javax.ws.rs.client.Invocation;
import javax.ws.rs.client.WebTarget;
import javax.ws.rs.core.GenericType;

import org.apache.http.client.config.RequestConfig;
import org.apache.http.config.Registry;
import org.apache.http.config.RegistryBuilder;
import org.apache.http.conn.socket.ConnectionSocketFactory;
import org.apache.http.conn.socket.PlainConnectionSocketFactory;
import org.apache.http.impl.conn.PoolingHttpClientConnectionManager;
import org.glassfish.jersey.apache.connector.ApacheClientProperties;
import org.glassfish.jersey.apache.connector.ApacheConnectorProvider;
import org.glassfish.jersey.client.ClientConfig;
import org.glassfish.jersey.jackson.JacksonFeature;

import com.spotify.docker.client.LogsResponseReader;
import com.spotify.docker.client.ObjectMapperProvider;
import com.spotify.docker.client.ProgressResponseReader;
import com.spotify.docker.client.UnixConnectionSocketFactory;
import com.spotify.docker.client.messages.Container;
import com.spotify.docker.client.messages.ContainerInfo;

public class Jersey {

  public static void main(String[] args) throws InterruptedException {

    final URI originalUri = URI.create("unix:///var/run/docker.sock");
    final URI uri = UnixConnectionSocketFactory.sanitizeUri(originalUri);

    final Registry<ConnectionSocketFactory> registry = RegistryBuilder
        .<ConnectionSocketFactory>create()
        .register("http", PlainConnectionSocketFactory.getSocketFactory())
        .register("unix", new UnixConnectionSocketFactory(originalUri))
        .build();

    final PoolingHttpClientConnectionManager cm = new PoolingHttpClientConnectionManager(registry);
    cm.setMaxTotal(100);
    cm.setDefaultMaxPerRoute(100);

    final RequestConfig requestConfig = RequestConfig.custom()
        .setConnectionRequestTimeout(30000)
        .setConnectTimeout(30000)
        .setSocketTimeout(5000)
        .build();

    final ClientConfig config = new ClientConfig(
        ObjectMapperProvider.class,
        JacksonFeature.class,
        LogsResponseReader.class,
        ProgressResponseReader.class)
        .connectorProvider(new ApacheConnectorProvider())
        .property(ApacheClientProperties.CONNECTION_MANAGER, cm)
        .property(ApacheClientProperties.REQUEST_CONFIG, requestConfig);

    final Client client = ClientBuilder.newBuilder()
        .withConfig(config)
        .build();

    final int threadCount = 100;

    final CountDownLatch startLatch = new CountDownLatch(threadCount);
    final CountDownLatch endLatch = new CountDownLatch(threadCount);
    for (int i = 0; i < threadCount; ++i) {
      new Thread(new Worker(i, client, uri, startLatch, endLatch)).start();
    }
    endLatch.await();
    System.out.println("All done");

    client.close();
  }

  static class Worker implements Runnable {
    private final Client client;
    private final CountDownLatch myEndLatch;
    private final URI uri;
    private final CountDownLatch myStartLatch;
    private final int myI;
    private static final GenericType<List<Container>> CONTAINER_LIST =
        new GenericType<List<Container>>() {
        };

    Worker(final int i,
           final Client client,
           final URI uri,
           final CountDownLatch startLatch,
           final CountDownLatch endLatch) {
      myI = i;
      this.client = client;
      this.uri = uri;
      myStartLatch = startLatch;
      myEndLatch = endLatch;
    }

    public void run() {

      try {
        myStartLatch.countDown();
        myStartLatch.await();
        for (int i = 0; i < 10; i++) {
          List<Container> containers = listContainers();
          for (Container container : containers) {
            // These two methods do the same thing - the first one uses ContainerInfo as result
            // type, while the second one uses String as result type. When using the first one the threads
            // will get stuck, while with the second one program works as expected
            System.out.println(inspectContainer(container.id()));
//            inspectContainerReturnString(container.id());
          }
          ping();
        }
      } catch (final Throwable e) {
        e.printStackTrace();
      } finally {
        System.out.println("done " + myI);
        myEndLatch.countDown();
      }
    }

    List<Container> listContainers() throws InterruptedException {
      final WebTarget resource = client.target(uri).path("v1.27").path("containers").path("json").queryParam("all", 1);

      return request(GET, CONTAINER_LIST, resource.request(APPLICATION_JSON_TYPE));
    }

    ContainerInfo inspectContainer(String containerId) throws InterruptedException {
      final WebTarget resource = client.target(uri).path("v1.27").path("containers").path(containerId).path("json");
      return request(GET, ContainerInfo.class, resource.request(APPLICATION_JSON_TYPE));
    }

    String inspectContainerReturnString(String containerId) throws InterruptedException {
      final WebTarget resource = client.target(uri).path("v1.27").path("containers").path(containerId).path("json");
      return request(GET, String.class, resource.request(APPLICATION_JSON_TYPE));
    }

    String ping() throws InterruptedException {
      final WebTarget resource = client.target(uri).path("v1.27").path("_ping");
      return request(GET, String.class, resource.request(APPLICATION_JSON_TYPE));
    }

    private <T> T request(final String method, final Class<T> clazz, final Invocation.Builder request)
        throws InterruptedException {
      try {
        return request.async().method(method, clazz).get();
      } catch (ExecutionException e) {
        throw new RuntimeException(e);
      }
    }

    private <T> T request(final String method, final GenericType<T> type, final Invocation.Builder request)
        throws InterruptedException {
      try {
        return request.async().method(method, type).get();
      } catch (ExecutionException e) {
        throw new RuntimeException(e);
      }
    }
  }
}

With this maven pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>docker-client-debug</groupId>
  <artifactId>docker-client-debug</artifactId>
  <version>1.0-SNAPSHOT</version>

  <properties>
    <jersey.version>2.26</jersey.version>
  </properties>
  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <configuration>
          <source>1.8</source>
          <target>1.8</target>
        </configuration>
      </plugin>
    </plugins>
  </build>

  <dependencies>
    <dependency>
      <groupId>com.spotify</groupId>
      <artifactId>docker-client</artifactId>
      <version>8.11.1</version>
    </dependency>
    <dependency>
      <groupId>org.glassfish.jersey.core</groupId>
      <artifactId>jersey-client</artifactId>
      <version>${jersey.version}</version>
    </dependency>
    <dependency>
      <groupId>org.glassfish.jersey.containers</groupId>
      <artifactId>jersey-container-servlet</artifactId>
      <version>${jersey.version}</version>
    </dependency>
    <dependency>
      <groupId>org.glassfish.jersey.connectors</groupId>
      <artifactId>jersey-apache-connector</artifactId>
      <version>${jersey.version}</version>
    </dependency>
    <dependency>
      <groupId>org.glassfish.jersey.inject</groupId>
      <artifactId>jersey-hk2</artifactId>
      <version>${jersey.version}</version>
    </dependency>
  </dependencies>
</project>

Expected result

"All done" is written in the console and (optionally, see notes below) the program finishes

Actual result

One of the threads gets stuck with the following thread dump output

"Thread-1" #11 prio=5 os_prio=31 tid=0x00007f992a19b000 nid=0x3f03 waiting on condition [0x00007000023fc000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000786f7cda0> (a java.util.concurrent.CompletableFuture$Signaller)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
	at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
	at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
	at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
	at Jersey$Worker.request(Jersey.java:152)
	at Jersey$Worker.inspectContainer(Jersey.java:136)
	at Jersey$Worker.run(Jersey.java:115)
	at java.lang.Thread.run(Thread.java:748)

Prerequisites

  • docker (any version)
  • A few containers shoud be present (can be in stopped state)
docker run busybox /bin/true
docker run busybox /bin/true
docker run busybox /bin/true

Notes

  • There should be at least a few docker containers present on the machine
  • Replacing inspectContainer call with inspectContainerReturnString will make the program run as expected, without threads getting stuck
  • I managed to reproduce this with ApacheConnector, communicating over UNIX socket with docker daemon (this is how communication is performed with docker daemon in general)
  • I couldn't reproduce this with regular http calls, i.e. by using final Client client = ClientBuilder.newBuilder().build(); and communicating with a dummy http server that returns over http the same responses that daemon would return over UNIX socket
  • This can be reproduced with jersey versions >= 2.22.2
  • Jersey 2.22.1 does not cause this issue, although it suffers from JERSEY-2967 which prevents the program from finishing, but it outputs "All done" in the test case and there is no deadlock mentioned above
  • Changes in jersey/jersey@b5b9fb6 could be relevant, but it's just a guess, maybe it's not related to apache connector at all
  • Maybe this is the same as #3558
  • The above thread dump looks slightly different when using other jersey versions, e.g. with 2.22.2 it looks like
"Thread-93" #103 prio=5 os_prio=31 tid=0x00007f9387a36800 nid=0xd203 waiting on condition [0x000070000aba2000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x0000000788278e88> (a jersey.repackaged.com.google.common.util.concurrent.AbstractFuture$Sync)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
	at jersey.repackaged.com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:285)
	at jersey.repackaged.com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
	at Jersey$Worker.request(Jersey.java:152)
	at Jersey$Worker.inspectContainer(Jersey.java:136)
	at Jersey$Worker.run(Jersey.java:115)
	at java.lang.Thread.run(Thread.java:748)
@jerseyrobot

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 20, 2018

@itaibh

This comment has been minimized.

Copy link

commented Aug 29, 2018

Happens here also, and we don't use docker, just TestNG.

We use TestNG to test our SDK which have multiple places it calls Jersey 2.x to perform various server access tasks. Once we added a new feature (which communicates with the server using POST commands), the tests passed when I only ran one at a time, and hanged when I ran the entire suite.

We narrowed it down to one of the new POST commands we added. We saw that removing it completely made the parallel tests pass, and duplicating it made a single test hang. So now we had a way of reproducing it.
However, it doesn't hang there - it hangs in the next POST down the line. We added a lot of logging and also used Fiddler to determine what's going on - we get the results of the two consecutive POST commands and parse them, then we have some more logic. Using 'step-into' we see that the Jersey threads are still active and we can see the place where invoke in called, and then it hangs.

This is the stacktrace:


"main" #1 prio=5 os_prio=0 tid=0x0000000002fa9000 nid=0x55bc waiting on condition [0x0000000002f9c000]
   java.lang.Thread.State: WAITING (parking)
	at sun.misc.Unsafe.park(Native Method)
	- parking to wait for  <0x000000076f6672a0> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
	at org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking(AbstractConnPool.java:380)
	at org.apache.http.pool.AbstractConnPool.access$200(AbstractConnPool.java:69)
	at org.apache.http.pool.AbstractConnPool$2.get(AbstractConnPool.java:246)
	- locked <0x000000076cae6c00> (a org.apache.http.pool.AbstractConnPool$2)
	at org.apache.http.pool.AbstractConnPool$2.get(AbstractConnPool.java:193)
	at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:303)
	at org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(PoolingHttpClientConnectionManager.java:279)
	at org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:191)
	at org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
	at org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
	at org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
	at org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
	at org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:72)
	at org.glassfish.jersey.apache.connector.ApacheConnector.apply(ApacheConnector.java:435)
	at org.glassfish.jersey.client.ClientRuntime.invoke(ClientRuntime.java:252)
	at org.glassfish.jersey.client.JerseyInvocation$1.call(JerseyInvocation.java:684)
	at org.glassfish.jersey.client.JerseyInvocation$1.call(JerseyInvocation.java:681)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:228)
	at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:444)
	at org.glassfish.jersey.client.JerseyInvocation.invoke(JerseyInvocation.java:681)        <<==== This is as far as we stepped into.
	at org.glassfish.jersey.client.JerseyInvocation$Builder.method(JerseyInvocation.java:437)
	at org.glassfish.jersey.client.JerseyInvocation$Builder.post(JerseyInvocation.java:343)
	at com.applitools.eyes.ServerConnector.matchWindow(ServerConnector.java:317)
	at com.applitools.eyes.MatchWindowTask.performMatch(MatchWindowTask.java:117)
	at com.applitools.eyes.MatchWindowTask.tryTakeScreenshot(MatchWindowTask.java:300)
	at com.applitools.eyes.MatchWindowTask.retryTakingScreenshot(MatchWindowTask.java:276)
	at com.applitools.eyes.MatchWindowTask.takeScreenshot(MatchWindowTask.java:250)
	at com.applitools.eyes.MatchWindowTask.matchWindow(MatchWindowTask.java:147)
	at com.applitools.eyes.EyesBase.matchWindow(EyesBase.java:1177)
	at com.applitools.eyes.EyesBase.checkWindowBase(EyesBase.java:1117)
	at com.applitools.eyes.selenium.Eyes.checkRegion(Eyes.java:1193)
	at com.applitools.eyes.selenium.Eyes.check(Eyes.java:913)
	at com.applitools.eyes.selenium.Eyes.check(Eyes.java:853)
	at com.applitools.eyes.selenium.TestFluentApi.TestCheckElementWithIgnoreRegionByElementOutsideTheViewport_Fluent(TestFluentApi.java:162)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:85)
	at org.testng.internal.Invoker.invokeMethod(Invoker.java:639)
	at org.testng.internal.Invoker.invokeTestMethod(Invoker.java:816)
	at org.testng.internal.Invoker.invokeTestMethods(Invoker.java:1124)
	at org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:125)
	at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:108)
	at org.testng.TestRunner.privateRun(TestRunner.java:774)
	at org.testng.TestRunner.run(TestRunner.java:624)
	at org.testng.SuiteRunner.runTest(SuiteRunner.java:359)
	at org.testng.SuiteRunner.runSequentially(SuiteRunner.java:354)
	at org.testng.SuiteRunner.privateRun(SuiteRunner.java:312)
	at org.testng.SuiteRunner.run(SuiteRunner.java:261)
	at org.testng.SuiteRunnerWorker.runSuite(SuiteRunnerWorker.java:52)
	at org.testng.SuiteRunnerWorker.run(SuiteRunnerWorker.java:86)
	at org.testng.TestNG.runSuitesSequentially(TestNG.java:1191)
	at org.testng.TestNG.runSuitesLocally(TestNG.java:1116)
	at org.testng.TestNG.run(TestNG.java:1024)
	at org.testng.IDEARemoteTestNG.run(IDEARemoteTestNG.java:72)
	at org.testng.RemoteTestNGStarter.main(RemoteTestNGStarter.java:123)
@itaibh

This comment has been minimized.

Copy link

commented Aug 29, 2018

P.S. Trying to upgrade to the latest version and even one before (2.26, 2.27) of Jersey failed with

java.lang.IllegalStateException: InjectionManagerFactory not found.
@jansupol

This comment has been minimized.

Copy link
Collaborator

commented Aug 29, 2018

since jersey 2.26 it is needed to add org.glassfish.jersey.inject:jersey-hk2 to the depenecies

@itaibh

This comment has been minimized.

Copy link

commented Aug 30, 2018

I tried that, it indeed solved the InjectionManagerFactory not found issue, but the original problem still exists...

@senivam

This comment has been minimized.

Copy link
Collaborator

commented Aug 30, 2018

As I'm looking at this issue, it seems the problem lies beneath Jersey framework. Trying to reproduce the issue I tried described steps and it really hangs. However it hangs on waiting results of the completable feature. As per specification the get() method of the completable feature waits until the feature is completed. And it shall be completed as soon as the response is received. However response never comes. What says thread dump is that thread is waiting on SessionInputBufferImpl.streamRead (and some steps below). Which is correct. But response never comes. so feature never gets completed thus it looks like everything is hanged. Jersey layer is just a mediator here so it does not cause hanging it just waits for the feature completion as well.

@daniloradenovic

This comment has been minimized.

Copy link

commented Aug 30, 2018

That is a valid point with waiting for completion of a call to an external service (in this case Docker daemon). However, it's strange that the example from reproduction steps works fine with versions of Jersey prior to 2.22.2.

@senivam

This comment has been minimized.

Copy link
Collaborator

commented Aug 31, 2018

well yes, I can confirm that. The issue happens since Jersey 2.22.2. Further more I've localized the reason why it happens and will try to fix that. The issue is caused by wrapping of the response input stream together with response instance itself in attempt to close both stream and response (which does not seam to be a good idea). This causes hanging.

senivam added a commit to senivam/jersey that referenced this issue Sep 5, 2018

fix for issue eclipse-ee4j#3772
Signed-off-by: Maxim Nesen <maxim.nesen@oracle.com>
@jansupol

This comment has been minimized.

Copy link
Collaborator

commented Sep 5, 2018

This seems to be fixed by #3861

@daniloradenovic

This comment has been minimized.

Copy link

commented Oct 14, 2018

I've just tried the changes from #3861 and it is working correctly.

@jansupol

This comment has been minimized.

Copy link
Collaborator

commented Oct 30, 2018

#3861 is merged to 2.29.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.