Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

too many (100k) concurrent queries produce errors #1731

Open
isopov opened this issue Oct 25, 2023 · 3 comments
Open

too many (100k) concurrent queries produce errors #1731

isopov opened this issue Oct 25, 2023 · 3 comments

Comments

@isopov
Copy link

isopov commented Oct 25, 2023

Please answer these questions before submitting your issue. Thanks!

What version of Cassandra are you using?

4.1.3 or 5.0

What version of Gocql are you using?

1.6.0

What version of Go are you using?

1.21.1

What did you do?

What did you expect to see?

I expect the program to finish successfully.

What did you see instead?

I see errors

gocql: no hosts available in the pool
gocql: no streams available on connection
gocql: no hosts available in the pool
gocql: no streams available on connection
gocql: no hosts available in the pool
gocql: no hosts available in the pool

or

gocql: no hosts available in the pool
gocql: no hosts available in the pool

Note: changing workers const to 50_000 allows the program to finish gracefully (increasing queries const is possible also - program will work minutes instead of seconds but still will finish gracefully with 50_000 workers)

@isopov
Copy link
Author

isopov commented Oct 25, 2023

It seems that this issue is not a bug, since java driver behaves pretty much the same:

import com.datastax.oss.driver.api.core.CqlSession;

import java.net.InetSocketAddress;
import java.util.concurrent.Executors;

public class Main {

    public static final int WORKERS = 2_000;
    public static final int QUERIES = 100;

    public static void main(String[] args) {
        try (var session = CqlSession.builder()
                .addContactPoint(new InetSocketAddress(9042))
                .withLocalDatacenter("datacenter1")
                .build()) {

            session.execute("drop keyspace if exists pargettest");
            session.execute("create keyspace pargettest with replication = {'class' : 'SimpleStrategy', 'replication_factor' : 1}");
            session.execute("use pargettest");
            session.execute("drop table if exists test");
            session.execute("create table test (a text, b int, primary key(a))");
            session.execute("insert into test (a, b) values ( 'a', 1)");

            try (var executor = Executors.newVirtualThreadPerTaskExecutor()) {
                for (int i = 0; i < WORKERS; i++) {
                    executor.submit(() -> {
                        try {
                            for (int j = 0; j < QUERIES; j++) {
                                var res = session.execute("select * from test where a=?", "a");
                                if (res.all().size() != 1) {
                                    System.out.println("WAT!");
                                }
                            }
                        } catch (Exception e) {
                            e.printStackTrace();
                            throw e;
                        }
                    });
                }
            }
        }
        System.out.println("All is done!");
    }
}

1_000 workers are handled gracefully, while 2_000 give errors like

com.datastax.oss.driver.api.core.AllNodesFailedException: All 1 node(s) tried for the query failed (showing first 1 nodes, use getAllErrors() for more): Node(endPoint=0.0.0.0/0.0.0.0:9042, hostId=81c1988d-3075-430a-9177-93c37ebd2b0b, hashCode=697b9b84): [com.datastax.oss.driver.api.core.NodeUnavailableException: No connection was available to Node(endPoint=0.0.0.0/0.0.0.0:9042, hostId=81c1988d-3075-430a-9177-93c37ebd2b0b, hashCode=697b9b84)]
	at com.datastax.oss.driver.api.core.AllNodesFailedException.copy(AllNodesFailedException.java:141)
	at com.datastax.oss.driver.internal.core.util.concurrent.CompletableFutures.getUninterruptibly(CompletableFutures.java:149)
	at com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:53)
	at com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:30)
	at com.datastax.oss.driver.internal.core.session.DefaultSession.execute(DefaultSession.java:230)
	at com.datastax.oss.driver.api.core.cql.SyncCqlSession.execute(SyncCqlSession.java:54)
	at com.datastax.oss.driver.api.core.cql.SyncCqlSession.execute(SyncCqlSession.java:104)
	at io.github.isopov.cassandra.Main.lambda$main$0(Main.java:31)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
	at java.base/java.lang.VirtualThread.run(VirtualThread.java:309)
	Suppressed: com.datastax.oss.driver.api.core.NodeUnavailableException: No connection was available to Node(endPoint=0.0.0.0/0.0.0.0:9042, hostId=81c1988d-3075-430a-9177-93c37ebd2b0b, hashCode=697b9b84)
		at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler.sendRequest(CqlRequestHandler.java:256)
		at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler.onThrottleReady(CqlRequestHandler.java:195)
		at com.datastax.oss.driver.internal.core.session.throttling.PassThroughRequestThrottler.register(PassThroughRequestThrottler.java:52)
		at com.datastax.oss.driver.internal.core.cql.CqlRequestHandler.<init>(CqlRequestHandler.java:172)
		at com.datastax.oss.driver.internal.core.cql.CqlRequestAsyncProcessor.process(CqlRequestAsyncProcessor.java:44)
		at com.datastax.oss.driver.internal.core.cql.CqlRequestSyncProcessor.process(CqlRequestSyncProcessor.java:54)

@isopov
Copy link
Author

isopov commented Oct 25, 2023

However with java driver adding the following config to CqlSession

        DriverConfigLoader loader =
                DriverConfigLoader.programmaticBuilder()
                        .withClass(DefaultDriverOption.REQUEST_THROTTLER_CLASS, ConcurrencyLimitingRequestThrottler.class)
                        .withInt(DefaultDriverOption.REQUEST_THROTTLER_MAX_CONCURRENT_REQUESTS, 500)
                        .withInt(DefaultDriverOption.REQUEST_THROTTLER_MAX_QUEUE_SIZE, 100_000)
                        .build();

allows to sustain heavy load from client code. So this issue may be not a bug, but a feature request for similar throttler in gocql.

@RostislavPorohnya
Copy link

I would like to handle this issue, working on it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants