Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes #9978 - Sets an env content id correctly in cp #5155

Merged
merged 1 commit into from Mar 31, 2015

Conversation

parthaa
Copy link
Contributor

@parthaa parthaa commented Mar 31, 2015

When setting the content id for an environment in candlepin a race
condition can happen. This because the set_content rb says

  1. Fetch me the list of existing content ids for this environment
  2. From the list of content ids to provided to me to set, subtract them from the existing content ids
  3. Add the new content ids , and remove the removed content ids
    Problem is between step 1 and step 3 there can be a race condition if 2 repos belonging to the same environment
    are clicked quickly, because they run in parallel. A temporary hack is to catch such an error in the response
    and resend the diffs. This commit intends to do that, with retry limit set to 2

# "requestUuid":"13c15fcd-0bc2-4686-a988-e0ba70377b86"} (POST /candlepin/environments/1/content)
# so as pure conjecture we are using this and just guessing if its a dup id.
if e.message.include?("org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse")
add_ids = input[:content_ids] - existing_ids
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just catch any 500 error and retry, vs trying to parse specific exception strings which seems fragile

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any could be even something basic like Candlepin shut down in the middle. I think looking for the org.postgres narrows it closer. Correct answer would be candlepin complaining..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, so in that case we would be assuming the ID was added even in the case where it was shutdown.

why don't we instead try a GET on the content to see if it exists if we receive an error trying to add it? If add_content = error, and get returns 200 then we can add the ID as existing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code has allowance for that too . so see the line that says

                if e.message.include?("org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse")
                  add_ids = input[:content_ids] - existing_ids
                else

that existing_ids does a fresh get. Hmm sounds like your idea might work. Actually I am only catching 500s. 404 wlll raise a different exception anyway.

@daviddavis
Copy link
Contributor

Also, do you have a Candlepin bug to return a better error when this problem occurs? And what about a Katello issue to remove this hack once that's completed?

@parthaa
Copy link
Contributor Author

parthaa commented Mar 31, 2015

have also filed https://bugzilla.redhat.com/show_bug.cgi?id=1207822 . Hopefully that will reduce the "hackiness" part

@daviddavis
Copy link
Contributor

Also, your commit message formatting is awful. Here you go:

Fixes #9978 - Sets an env content id correctly in cp

When setting the content id for an environment in candlepin a race
condition can happen. This because the SetContent action says:

1) Fetch me the list of existing content ids for this environment
2) From the list of content ids to provided to me to set, subtract them 
   from the existing content ids
3) Add the new content ids , and remove the removed content ids

Problem is between step 1 and step 3 there can be a race condition if 2
repos belonging to the same environment are clicked quickly, because 
they run in parallel. A temporary hack is to catch such an error in the
response and resend the diffs. This commit intends to do that, with 
retry limit set to 2.

@parthaa
Copy link
Contributor Author

parthaa commented Mar 31, 2015

Fixes #9978 - Sets an env content id correctly in cp

When setting the content id for an environment in candlepin a race
condition can happen. This because the SetContent action says:

Thanks.. Updated

When setting the content id for an environment in candlepin a race
condition can happen. This because the SetContent action says:

1) Fetch me the list of existing content ids for this environment
2) From the list of content ids to provided to me to set, subtract them
   from the existing content ids
3) Add the new content ids , and remove the removed content ids

Problem is between step 1 and step 3 there can be a race condition if 2
repos belonging to the same environment are clicked quickly, because
they run in parallel. A temporary hack is to catch such an error in the
response and resend the diffs. This commit intends to do that, with
retry limit set to 2.
@parthaa
Copy link
Contributor Author

parthaa commented Mar 31, 2015

Went with @daviddavis 's suggested changes from https://gist.github.com/daviddavis/3ffeff72e8bf67abf3c0

ty

@parthaa
Copy link
Contributor Author

parthaa commented Mar 31, 2015

Created a place holder issue to fix the candlepin exception checking
http://projects.theforeman.org/issues/9980

@daviddavis
Copy link
Contributor

LGTM. @mccun934 did you want to test this out?

@mccun934
Copy link
Member

I'll test yes

@mccun934
Copy link
Member

Worked perfectly, enabled ~20 repos in rapid succession on a host that experienced the same error without the patch
mass-save-no-errror

@ehelms
Copy link
Member

ehelms commented Mar 31, 2015

Ran unit tests locally for this PR and they passed. ACK for merging out of band of the CI

parthaa added a commit that referenced this pull request Mar 31, 2015
Fixes #9978 - Sets an env content id correctly in cp
@parthaa parthaa merged commit da50a84 into Katello:master Mar 31, 2015
@parthaa parthaa deleted the race branch March 31, 2015 22:35
@iNecas
Copy link
Member

iNecas commented Apr 1, 2015

I expect the deletion should be treated the similar way as well, as indicates this comment https://bugzilla.redhat.com/show_bug.cgi?id=1205921#c6: so similar case might occur when disabling the environments

output[:add_response] = ::Katello::Resources::Candlepin::Environment.
add_content(input[:cp_environment_id], add_ids)
retries = 2
(retries + 1).times do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might use the retry ruby keyword

retries = 2
begin
  add_ids = input[:content_ids] - existing_ids
  unless add_ids.empty?
    output[:add_response] = ::Katello::Resources::Candlepin::Environment.add_content(input[:cp_environment_id], add_ids)
  end
rescue RestClient::InternalServerError
  retries -= 1
  retry if retries >= 0
end

And then extract the retry into a method:

def with_retries(retries = 2)
  yield
rescue RestClient::InternalServerError => e
  retries -= 1
  if retries >= 0
    retry
  else
    raise e
  end
end

so that the code simplifies to this:

with_retries do
  add_ids = input[:content_ids] - existing_ids
  unless add_ids.empty?
    output[:add_response] = ::Katello::Resources::Candlepin::Environment.add_content(input[:cp_environment_id], add_ids)
  end
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, to improve the correctness, the add_response should be replaces with add_responses as an array, where all the responses from all the tries are captured

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the suggested code handles better the reraising the error when exceeding the number of retries

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice 👍 with_retries could also accept parameter which exception classes should be catched

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not insisting on the parametrization of the class, until we need it, just to keep it simpler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants