Increase timeout per upgrade path to 15mins #308

matriv · 2024-03-13T12:24:04Z

From recently added logs it seems that the longest upgrade path, (starting with 4.0.x), frequently consumes a bit more than 10 mins.

example failure: https://jenkins.crate.io/blue/organizations/jenkins/CrateDB%2Fqa%2Fcrate_qa/detail/crate_qa/793/pipeline

romseygeek · 2024-03-13T12:29:48Z

 for version_def in versions[1:]:
            timestamp = datetime.utcnow().isoformat(timespec='seconds')
            print(f"{timestamp} Upgrade to:: {version_def.version}")
            self.assert_data_persistence(version_def, nodes, digest, paths)
        # restart with latest version
        version_def = versions[-1]
        self.assert_data_persistence(version_def, nodes, digest, paths)

I've just realised, this is growing as O(n^2) isn't it - every time we add a new version, it tests an upgrade path to it from every previous supported version. Is it possible to move the timeout onto assert_data_persistance instead? Otherwise we're going to keep hitting this...

matriv · 2024-03-13T12:37:34Z

 for version_def in versions[1:]:
            timestamp = datetime.utcnow().isoformat(timespec='seconds')
            print(f"{timestamp} Upgrade to:: {version_def.version}")
            self.assert_data_persistence(version_def, nodes, digest, paths)
        # restart with latest version
        version_def = versions[-1]
        self.assert_data_persistence(version_def, nodes, digest, paths)
I've just realised, this is growing as O(n^2) isn't it - every time we add a new version, it tests an upgrade path to it from every previous supported version. Is it possible to move the timeout onto assert_data_persistance instead? Otherwise we're going to keep hitting this...

The timeout is on each upgrade path, e.g. from 4.0.x to latest, not for the outer loop for all upgrade paths. If we move it to assert_data_persistance then we exclude all the time spent during restarting the cluster, so we won't easily catch timeouts during these operations, which may hide issues. What do you think?

romseygeek

If we move it to assert_data_persistance then we exclude all the time spend into restarting the cluster, so we won't easily catch timeouts on during these operations, which may hide issues

Fair point. OK, let's do it this way then :)

matriv · 2024-04-24T16:15:59Z

retest this please

From recently added logs it seems that the longest upgrade path, (starting with `4.0.x`), frequently consumes a bit more than 10 mins.

matriv requested a review from romseygeek March 13, 2024 12:24

romseygeek approved these changes Mar 13, 2024

View reviewed changes

Increase timeout per upgrade path to 15mins

df74743

From recently added logs it seems that the longest upgrade path, (starting with `4.0.x`), frequently consumes a bit more than 10 mins.

matriv force-pushed the mt/increase-timeout branch from 3eab19e to df74743 Compare April 25, 2024 06:05

matriv merged commit 7c61af8 into master Apr 25, 2024

matriv deleted the mt/increase-timeout branch April 25, 2024 11:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Increase timeout per upgrade path to 15mins #308

Increase timeout per upgrade path to 15mins #308

Uh oh!

matriv commented Mar 13, 2024

Uh oh!

romseygeek commented Mar 13, 2024

Uh oh!

matriv commented Mar 13, 2024 •

edited

Loading

Uh oh!

romseygeek left a comment

Uh oh!

matriv commented Apr 24, 2024

Uh oh!

Uh oh!

Increase timeout per upgrade path to 15mins #308

Increase timeout per upgrade path to 15mins #308

Uh oh!

Conversation

matriv commented Mar 13, 2024

Uh oh!

romseygeek commented Mar 13, 2024

Uh oh!

matriv commented Mar 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

romseygeek left a comment

Choose a reason for hiding this comment

Uh oh!

matriv commented Apr 24, 2024

Uh oh!

Uh oh!

matriv commented Mar 13, 2024 •

edited

Loading