Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase timeout per upgrade path to 15mins #308

Merged
merged 1 commit into from Apr 25, 2024
Merged

Conversation

matriv
Copy link
Contributor

@matriv matriv commented Mar 13, 2024

From recently added logs it seems that the longest upgrade path, (starting with 4.0.x), frequently consumes a bit more than 10 mins.

example failure: https://jenkins.crate.io/blue/organizations/jenkins/CrateDB%2Fqa%2Fcrate_qa/detail/crate_qa/793/pipeline

@matriv matriv requested a review from romseygeek March 13, 2024 12:24
@romseygeek
Copy link
Contributor

 for version_def in versions[1:]:
            timestamp = datetime.utcnow().isoformat(timespec='seconds')
            print(f"{timestamp} Upgrade to:: {version_def.version}")
            self.assert_data_persistence(version_def, nodes, digest, paths)
        # restart with latest version
        version_def = versions[-1]
        self.assert_data_persistence(version_def, nodes, digest, paths)

I've just realised, this is growing as O(n^2) isn't it - every time we add a new version, it tests an upgrade path to it from every previous supported version. Is it possible to move the timeout onto assert_data_persistance instead? Otherwise we're going to keep hitting this...

@matriv
Copy link
Contributor Author

matriv commented Mar 13, 2024

 for version_def in versions[1:]:
            timestamp = datetime.utcnow().isoformat(timespec='seconds')
            print(f"{timestamp} Upgrade to:: {version_def.version}")
            self.assert_data_persistence(version_def, nodes, digest, paths)
        # restart with latest version
        version_def = versions[-1]
        self.assert_data_persistence(version_def, nodes, digest, paths)

I've just realised, this is growing as O(n^2) isn't it - every time we add a new version, it tests an upgrade path to it from every previous supported version. Is it possible to move the timeout onto assert_data_persistance instead? Otherwise we're going to keep hitting this...

The timeout is on each upgrade path, e.g. from 4.0.x to latest, not for the outer loop for all upgrade paths. If we move it to assert_data_persistance then we exclude all the time spent during restarting the cluster, so we won't easily catch timeouts during these operations, which may hide issues. What do you think?

Copy link
Contributor

@romseygeek romseygeek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we move it to assert_data_persistance then we exclude all the time spend into restarting the cluster, so we won't easily catch timeouts on during these operations, which may hide issues

Fair point. OK, let's do it this way then :)

@matriv
Copy link
Contributor Author

matriv commented Apr 24, 2024

retest this please

From recently added logs it seems that the longest upgrade path,
(starting with `4.0.x`), frequently consumes a bit more than 10 mins.
@matriv matriv merged commit 7c61af8 into master Apr 25, 2024
1 check passed
@matriv matriv deleted the mt/increase-timeout branch April 25, 2024 11:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants