Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRILL-7982: Timed out waiting for container port to open #2287

Merged
merged 1 commit into from
Aug 8, 2021

Conversation

luocooong
Copy link
Member

DRILL-7982: Timed out waiting for container port to open

Description

If the default 60s timeout is not sufficient, it can be altered with the withStartupTimeout() method.

See also Waiting for containers to start or be ready

Then, the testcontainers has merged new feature about the For faster startup.

See also Tune Cassandra options for faster startup

Documentation

N/A

Testing

Need the unit tests can run 3 times without the Timed out waiting for XXX.

@luocooong luocooong self-assigned this Aug 5, 2021
Copy link
Member

@vdiravka vdiravka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @luocooong I have already added fix for that n scope of DRILL-7973.
in the following commit.

But since it is not only the one issue related to CI I didn't publish PR yet. I am going to fix CI failures in scope of that ticket.

.withEnv("CASSANDRA_SNITCH", "GossipingPropertyFileSnitch") // Tune Cassandra options for faster startup
.withEnv("JVM_OPTS", "-Dcassandra.skip_wait_for_gossip_to_settle=0 -Dcassandra.initial_token=0")
.withEnv("HEAP_NEWSIZE", "128M")
.withEnv("MAX_HEAP_SIZE", "1024M");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have not much memory on Github CI. Do we need this additional params for our test container? What about Splunk test container then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vdiravka
The Cassandra tests seem to keep timing out and breaking the CI. My recollection of the Cassandra plugin was that Cassandra really does take a long time to start up, possibly more than 1 minute. This was why we implemented it to preserve the connection as long as possible rather than creating tons of new connections.

To my knowledge, the Splunk ones are working fine in the CI.

@luocooong Thanks for this. It makes me wonder, should we add the GossipingPropertyFileSnitch parameter to the Cassandra plugin?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Cassandra tests seem to keep timing out and breaking the CI. My recollection of the Cassandra plugin was that Cassandra really does take a long time to start up, possibly more than 1 minute. This was why we implemented it to preserve the connection as long as possible rather than creating tons of new connections.

I am aware about timing out test of container for Cassandra plugin. I also fixed that several days ago in my branch. But I didn't merge it, because this is not only the one issue on CI and don't fix CI build. For instance the CI build is restarted several times for this PR and still not green.


To my knowledge, the Splunk ones are working fine in the CI.

My question was about memory options only in the above comment and luocooong answered that he is going to eliminate them.


It makes me wonder, should we add the GossipingPropertyFileSnitch parameter to the Cassandra plugin?

GossipingPropertyFileSnitch looks Cassandra specific code only.

@luocooong
Copy link
Member Author

@vdiravka Thanks for the comments. I saw that the goal of DRILL-7973 is avoid the OOM (use the direct mem). This pull request is ready to speed up the C* startup. no conflict. I am going to remove the heap options if you already have new ways. (Actually, I think the new version of testcontainers will included these setting for desfult).

@@ -65,7 +66,12 @@ public static void tearDownCluster() {

private static void startCassandra() {
cassandra = new CassandraContainer<>("cassandra")
.withInitScript("queries.cql");
.withInitScript("queries.cql")
.withStartupTimeout(Duration.ofMinutes(3))
Copy link
Member

@vdiravka vdiravka Aug 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It look like 2m is enough. There is no one failure for Cassandra Test Container with this value
vdiravka@b5b3a17

@luocooong luocooong marked this pull request as ready for review August 6, 2021 05:24
Copy link
Member

@vdiravka vdiravka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks fine. Please consider possible additional improvement in my comments

@@ -65,7 +66,10 @@ public static void tearDownCluster() {

private static void startCassandra() {
cassandra = new CassandraContainer<>("cassandra")
.withInitScript("queries.cql");
.withInitScript("queries.cql")
.withStartupTimeout(Duration.ofMinutes(2))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep same indent for the methods call chain. For example:

    HiveParquetTableMetadataProvider metadataProvider = builder
        .withEntries(entries)
        .withHivePartitionHolder(hivePartitionHolder)
        .withHiveStoragePlugin(hiveStoragePlugin)
        .withReaderConfig(readerConfig)
        .withSchema(schema)
        .build();

.withInitScript("queries.cql")
.withStartupTimeout(Duration.ofMinutes(2))
.withEnv("CASSANDRA_SNITCH", "GossipingPropertyFileSnitch") // Tune Cassandra options for faster startup
.withEnv("JVM_OPTS", "-Dcassandra.skip_wait_for_gossip_to_settle=0 -Dcassandra.initial_token=0");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly we can do event better here:

-Dcassandra.skip_wait_for_gossip_to_settle=0 -Dcassandra.load_ring_state=false -Dcassandra.initial_token=1 -Dcassandra.num_tokens=nil -Dcassandra.allocate_tokens_for_local_replication_factor=nil

https://github.com/sky-uk/cassandra-operator/issues/103#issuecomment-499040927

@cgivre
Copy link
Contributor

cgivre commented Aug 6, 2021

@luocooong @vdiravka
Should we merge this? @vdiravka Are you ok with this or does this still conflict with DRILL-7973?

@luocooong
Copy link
Member Author

@cgivre Can be merged. Let me try more options (by Vitalii) first. I will use the fastest parameter.

@luocooong
Copy link
Member Author

Done. the current options is faster (included the C* startup and tests). Saved 10 seconds.

current options (JDK 8), spend 48 seconds

2021-08-07T03:55:45.7691691Z [INFO] -------------------------------------------------------
2021-08-07T03:55:45.7692311Z [INFO]  T E S T S
2021-08-07T03:55:45.7693143Z [INFO] -------------------------------------------------------

2021-08-07T03:56:33.1223519Z [INFO] Tests run: 30, Failures: 0, Errors: 0, Skipped: 0

current options (JDK 11), spend 40 seconds

2021-08-07T04:00:34.0149031Z [INFO] -------------------------------------------------------
2021-08-07T04:00:34.0149860Z [INFO]  T E S T S
2021-08-07T04:00:34.0150857Z [INFO] -------------------------------------------------------

2021-08-07T04:01:14.6705624Z [INFO] Tests run: 30, Failures: 0, Errors: 0, Skipped: 0

more options (JDK 8), spend 58 seconds

2021-08-07T02:08:46.6741617Z [INFO] -------------------------------------------------------
2021-08-07T02:08:46.6744463Z [INFO]  T E S T S
2021-08-07T02:08:46.6746502Z [INFO] -------------------------------------------------------

2021-08-07T02:09:44.7628630Z [INFO] Tests run: 30, Failures: 0, Errors: 0, Skipped: 0

more options (JDK 11), spend 38 seconds

2021-08-07T02:00:06.0005436Z [INFO] -------------------------------------------------------
2021-08-07T02:00:06.0006095Z [INFO]  T E S T S
2021-08-07T02:00:06.0006901Z [INFO] -------------------------------------------------------

2021-08-07T02:00:44.6733692Z [INFO] Tests run: 30, Failures: 0, Errors: 0, Skipped: 0

Copy link
Member

@vdiravka vdiravka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is too fast :) Even .withStartupTimeout(Duration.ofMinutes(2)) is not needed. But let's keep it to avoid CI fails in future.
+1 LGTM

@cgivre cgivre merged commit 3c092b3 into apache:master Aug 8, 2021
@luocooong luocooong deleted the DRILL-7982 branch August 20, 2021 02:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants