Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry the coffea_rados_parquet job 3 times to overcome the rare OSD segfault issue #482

Merged
merged 3 commits into from
Apr 2, 2021

Conversation

JayjeetAtGithub
Copy link
Contributor

No description provided.

@JayjeetAtGithub
Copy link
Contributor Author

JayjeetAtGithub commented Apr 2, 2021

@nsmith- Looks like the error while bringing a Ceph cluster up is very transient and it's a little difficult to predict the actual cause with almost no good answers to that on the internet.

So, I added a retry script to try the docker run 3 times or fail silently if the error happens all three times in a row, which I think will not be the case, the error being very rare. But this patch will do the job for now. Thanks .

@JayjeetAtGithub JayjeetAtGithub changed the title Retry the docker run command few times Retry the coffea rados parquet 3 times to overcome the rare OSD segfault issue Apr 2, 2021
@JayjeetAtGithub JayjeetAtGithub changed the title Retry the coffea rados parquet 3 times to overcome the rare OSD segfault issue Retry the coffea_rados_parquet job 3 times to overcome the rare OSD segfault issue Apr 2, 2021
@lgray
Copy link
Collaborator

lgray commented Apr 2, 2021

I'd say if it fails three times in a row let it throw and error so we at least look at it. If there's a real problem and we can't see it, then why test?

@JayjeetAtGithub
Copy link
Contributor Author

Yeah, you are right. Let me fix

@JayjeetAtGithub
Copy link
Contributor Author

JayjeetAtGithub commented Apr 2, 2021

@nsmith- @lgray fixed

@lgray lgray merged commit 2b24f81 into CoffeaTeam:master Apr 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants