Skip to content

Thread leak if there's no internet connection #1144

@mattias-berglund

Description

@mattias-berglund

If you run ably-java:1.2.54 on a system with a flaky internet connection, ably-java will leak WebSocket-related threads under certain circumstances. Here's a sample program to demonstrate the behavior: the program sets up an AblyRealtime instance and logs the number of WebSocket-related threads an each connection-state change.

package se.tremil.ablyleak;

import io.ably.lib.realtime.AblyRealtime;
import io.ably.lib.realtime.Channel;
import io.ably.lib.types.ClientOptions;
import io.ably.lib.types.ErrorInfo;
import io.ably.lib.util.Log;
import java.util.Optional;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class Ablyleak {
    public static void main(String[] args) throws Exception {
        final Logger logger = LoggerFactory.getLogger(Ablyleak.class);

        final ClientOptions clientOptions = new ClientOptions();
        clientOptions.clientId = "********";
        clientOptions.key = "********";
        clientOptions.echoMessages = false;
        clientOptions.logLevel = Log.VERBOSE;
        
        final AblyRealtime ably = new AblyRealtime(clientOptions);
        
        ably.connection.on(csc -> {
            final var webSocketCount = Thread.getAllStackTraces().keySet().stream().map(Thread::getName).filter(s -> s.startsWith("WebSocket")).count();
            final Optional<ErrorInfo> optionalReason = Optional.ofNullable(csc.reason);
            final String errorCode = optionalReason.map(e -> e.code).map(Object::toString).orElse("<n/a>");
            final String errorStatus = optionalReason.map(e -> e.statusCode).map(Object::toString).orElse("<n/a>");
            
            logger.info(String.format("connection state changed from %s to %s, web socket count: %d (code=%s, statusCode=%s)", csc.previous, csc.current, webSocketCount, errorCode, errorStatus));
        });

        final Channel channel = ably.channels.get("remote-punches");
        channel.attach();
    }
}

Running this on a system connected to a WiFi-network without internet connection will produce something like this:

19:46:00.280 [Thread-0] INFO  (Ablyleak.java:30) - connection state changed from connecting to disconnected, web socket count: 0 (code=80000, statusCode=503)
19:46:13.646 [Thread-0] INFO  (Ablyleak.java:30) - connection state changed from disconnected to connecting, web socket count: 0 (code=<n/a>, statusCode=<n/a>)
... 
19:48:09.223 [Thread-0] INFO  (Ablyleak.java:30) - connection state changed from connecting to suspended, web socket count: 0 (code=80000, statusCode=503)
19:48:39.229 [Thread-0] INFO  (Ablyleak.java:30) - connection state changed from suspended to connecting, web socket count: 0 (code=<n/a>, statusCode=<n/a>)
...
20:00:07.422 [Thread-0] INFO  (Ablyleak.java:30) - connection state changed from connecting to suspended, web socket count: 1 (code=80000, statusCode=503)
20:17:38.329 [Thread-0] INFO  (Ablyleak.java:30) - connection state changed from suspended to connecting, web socket count: 0 (code=<n/a>, statusCode=<n/a>)
20:17:53.342 [Thread-0] INFO  (Ablyleak.java:30) - connection state changed from connecting to suspended, web socket count: 2 (code=80002, statusCode=503)
20:18:23.353 [Thread-0] INFO  (Ablyleak.java:30) - connection state changed from suspended to connecting, web socket count: 2 (code=<n/a>, statusCode=<n/a>)
20:18:38.364 [Thread-0] INFO  (Ablyleak.java:30) - connection state changed from connecting to suspended, web socket count: 4 (code=80002, statusCode=503)
20:19:08.375 [Thread-0] INFO  (Ablyleak.java:30) - connection state changed from suspended to connecting, web socket count: 4 (code=<n/a>, statusCode=<n/a>)
20:19:23.388 [Thread-0] INFO  (Ablyleak.java:30) - connection state changed from connecting to suspended, web socket count: 6 (code=80002, statusCode=503)
20:19:53.394 [Thread-0] INFO  (Ablyleak.java:30) - connection state changed from suspended to connecting, web socket count: 6 (code=<n/a>, statusCode=<n/a>)

After a while, when the ErrorInfo.code changes from 80000 to 80002, two WebSocket threads are leaked for every attempt to restore the connection. This eventually leads to an OutOfMemory exception, and the app fails.

I'm attaching a detailed log file.

I ran the test program on macOS 15.6/Java 24, and observed the same behavior on Raspberry Pi OS 12.

detailed-leak-log.txt

┆Issue is synchronized with this Jira Task by Unito

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions