New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes #9978 - Sets an env content id correctly in cp #5155
Conversation
| # "requestUuid":"13c15fcd-0bc2-4686-a988-e0ba70377b86"} (POST /candlepin/environments/1/content) | ||
| # so as pure conjecture we are using this and just guessing if its a dup id. | ||
| if e.message.include?("org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse") | ||
| add_ids = input[:content_ids] - existing_ids |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just catch any 500 error and retry, vs trying to parse specific exception strings which seems fragile
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any could be even something basic like Candlepin shut down in the middle. I think looking for the org.postgres narrows it closer. Correct answer would be candlepin complaining..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, so in that case we would be assuming the ID was added even in the case where it was shutdown.
why don't we instead try a GET on the content to see if it exists if we receive an error trying to add it? If add_content = error, and get returns 200 then we can add the ID as existing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code has allowance for that too . so see the line that says
if e.message.include?("org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse")
add_ids = input[:content_ids] - existing_ids
else
that existing_ids does a fresh get. Hmm sounds like your idea might work. Actually I am only catching 500s. 404 wlll raise a different exception anyway.
|
Also, do you have a Candlepin bug to return a better error when this problem occurs? And what about a Katello issue to remove this hack once that's completed? |
|
have also filed https://bugzilla.redhat.com/show_bug.cgi?id=1207822 . Hopefully that will reduce the "hackiness" part |
|
Also, your commit message formatting is awful. Here you go: |
Thanks.. Updated |
When setting the content id for an environment in candlepin a race condition can happen. This because the SetContent action says: 1) Fetch me the list of existing content ids for this environment 2) From the list of content ids to provided to me to set, subtract them from the existing content ids 3) Add the new content ids , and remove the removed content ids Problem is between step 1 and step 3 there can be a race condition if 2 repos belonging to the same environment are clicked quickly, because they run in parallel. A temporary hack is to catch such an error in the response and resend the diffs. This commit intends to do that, with retry limit set to 2.
|
Went with @daviddavis 's suggested changes from https://gist.github.com/daviddavis/3ffeff72e8bf67abf3c0 ty |
|
Created a place holder issue to fix the candlepin exception checking |
|
LGTM. @mccun934 did you want to test this out? |
|
I'll test yes |
|
Ran unit tests locally for this PR and they passed. ACK for merging out of band of the CI |
Fixes #9978 - Sets an env content id correctly in cp
|
I expect the deletion should be treated the similar way as well, as indicates this comment https://bugzilla.redhat.com/show_bug.cgi?id=1205921#c6: so similar case might occur when disabling the environments |
| output[:add_response] = ::Katello::Resources::Candlepin::Environment. | ||
| add_content(input[:cp_environment_id], add_ids) | ||
| retries = 2 | ||
| (retries + 1).times do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might use the retry ruby keyword
retries = 2
begin
add_ids = input[:content_ids] - existing_ids
unless add_ids.empty?
output[:add_response] = ::Katello::Resources::Candlepin::Environment.add_content(input[:cp_environment_id], add_ids)
end
rescue RestClient::InternalServerError
retries -= 1
retry if retries >= 0
endAnd then extract the retry into a method:
def with_retries(retries = 2)
yield
rescue RestClient::InternalServerError => e
retries -= 1
if retries >= 0
retry
else
raise e
end
endso that the code simplifies to this:
with_retries do
add_ids = input[:content_ids] - existing_ids
unless add_ids.empty?
output[:add_response] = ::Katello::Resources::Candlepin::Environment.add_content(input[:cp_environment_id], add_ids)
end
endThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, to improve the correctness, the add_response should be replaces with add_responses as an array, where all the responses from all the tries are captured
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the suggested code handles better the reraising the error when exceeding the number of retries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice 👍 with_retries could also accept parameter which exception classes should be catched
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not insisting on the parametrization of the class, until we need it, just to keep it simpler.

When setting the content id for an environment in candlepin a race
condition can happen. This because the set_content rb says
Problem is between step 1 and step 3 there can be a race condition if 2 repos belonging to the same environment
are clicked quickly, because they run in parallel. A temporary hack is to catch such an error in the response
and resend the diffs. This commit intends to do that, with retry limit set to 2