Fixes #18812 - Recreate event queue on upgrade #487
Conversation
|
Talked to bcourt on Candlepin team and he suggested the new bindings and says it looks good after the pr for filtering: Proof it works, tested on katello 3.3 and downstream on 6.2 el6/7: [root@rhel6 jrnl]# inotifywait -m /var/lib/qpidd/qls/jrnl/ if it cant write the installer will fail out with this message, to simultate the error I did a chattr +i on the directory so it got a perm denied when trying to delete/create: Upgrade Step: recreate_event_queue... Before with all messages coming in: https://paste.fedoraproject.org/paste/Mshnk0UBAvV1G5Omh6padl5M1UNdIGYhyRLivL9gydE= After with filtering: https://paste.fedoraproject.org/paste/ejSNbTsHow5L~IL428ZbjV5M1UNdIGYhyRLivL9gydE= Verfied virt-who works after filter and I can attach vdc subs and get content from vm's I didnt use Kafo Helpers because the command is already way too long imho |
hooks/pre/30-upgrade.rb
Outdated
| def recreate_event_queue | ||
| # piping output to /dev/null because on EL6 any qpid-stat/config commands gives a non fatal python traceback. | ||
| cert = '/etc/pki/katello/qpid_client_striped.crt' | ||
| `qpid-config --ssl-certificate #{cert} -b "amqps://localhost:5671" del exchange event --durable > /dev/null 2>&1` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not very familiar with the installer code but am wondering why you use backticks over Kafo::Helpers.execute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@daviddavis I could not use this with Kafo::Helpers.execute:
Also its starting to wrap around which is not good because of how long.
hooks/pre/30-upgrade.rb
Outdated
| `qpid-config --ssl-certificate #{cert} -b "amqps://localhost:5671" bind event katello_event_queue pool.deleted 2>&1` | ||
| `qpid-config --ssl-certificate #{cert} -b "amqps://localhost:5671" bind event katello_event_queue compliance.created 2>&1` | ||
| `qpid-config --ssl-certificate #{cert} -b "amqps://localhost:5671" queues katello_event_queue > /dev/null 2>&1` | ||
| if $?.success? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only checks the success of the last command, is that intended? If you look at the execute helper, you can see how it collects results from multiple commands (and I'd reiterate @daviddavis's comment about why we're not using the helper)
|
A few questions from me: Why are the queues being deleted? What's being created? Why is a hook appropriate instead of puppet? Qpid queues etc are all handled in puppet code: https://github.com/Katello/puppet-katello/blob/master/manifests/qpid.pp Is some of this new configuration that belongs there? |
|
@stbenjam Answering questions inline: Why are the queues being deleted? There was an issue found going from 6.1 to 6.2 downstream where candlepin is not able to send messages correctly to katello, after talking with Tom Mckay and Justin it looks like the event queue getting recreated corrects this issue. After talking with bcourt, Katello only cares about 5 messages coming from candlepin on the event queue, this currently is affecting a customer downstream at the moment and while they are fixing this on the candlepin side of things he and Justin suggested to create a filter so there is less events coming into Katello to help with the load in high virt-who/client envs. As far as a hook, I was originally going to say only for existing installs because the original pr in my head had just the '.' in it to accept everything, but after talking with bcourt its best to recreate everything only once and have puppet enforce these new bindings after the hook has made them. What is being created: We are recreating the event queue while tomcat/candlepin are offline so that upon all services coming up events will flow through from Candlepin -> Katello correctly. I have created a puppet pr for this to correct new installs: @daviddavis @stbenjam I updated the PR with Kafo::Helpers and removed the if(status) since Kafo will check each line. |
hooks/pre/30-upgrade.rb
Outdated
| def recreate_event_queue | ||
| # piping output to /dev/null because on EL6 any qpid-stat/config commands gives a non fatal python traceback. | ||
| cert = '/etc/pki/katello/qpid_client_striped.crt' | ||
| Kafo::Helpers.execute("qpid-config --ssl-certificate #{cert} -b 'amqps://localhost:5671' del exchange event --durable > /dev/null 2>&1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if the following is easier to read:
commands = [
'del exchange event --durable',
'add exchange topic event --durable',
# And the rest
].each do |command|
Kafo::Helpers.execute("qpid-config --ssl-certificate #{cert} -b 'amqps://localhost:5671' #{command} > /dev/null 2>&1")
end|
After talking with @jlsherrill and @stbenjam Going to just have the hook delete the queue/exchange and let puppet do the rest with recreating and doing the binding since we are trying to keep hooks to a minimum. This has to be held until the following PRS are merged: |
hooks/pre/30-upgrade.rb
Outdated
| def delete_event_queue | ||
| # piping output to /dev/null because on EL6 any qpid-stat/config commands gives a non fatal python traceback. | ||
| cert = '/etc/pki/katello/qpid_client_striped.crt' | ||
| Kafo::Helpers.execute("qpid-config --ssl-certificate #{cert} -b 'amqps://localhost:5671' del exchange event --durable > /dev/null 2>&1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The qpid-config help seems to imply --durable is only valid of creating a queue
Fixes #18812 - Recreate event queue on upgrade
|
The code here LGTM. |
|
We delete the queues, and then start tomcat - is that going to cause any problems? |
|
@stbenjam tomcat will reconnect when it starts. Any messages that tomcat could not send to the queue from before shutdown will be sent upon startup. |
| @@ -164,6 +171,7 @@ def fail_and_exit(message) | |||
| upgrade_step :mark_qpid_cert_for_update | |||
| upgrade_step :migrate_candlepin, :run_always => true | |||
| upgrade_step :remove_gutterball | |||
| upgrade_step :delete_event_queue | |||
| upgrade_step :start_tomcat, :run_always => true | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@barnabycourt This is what I'm talking about, we're starting candlepin right after deleting the katello_event_queue. Isn't this going to cause a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The risk in this case is that Candlepin will send events on the topic which has nobody listening. Any events sent during this time period are effectively being piped to /dev/null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stbenjam after talking to @barnabycourt he suggests we create the queue all in this step in the hook. We also have the new PR's to make sure things are correct with just a normal installer run. That way we know the queues/events are maintained by Puppet. Your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have to delete the queue in the hook here? So Puppet creates it again with only specific bindings we want? Or some other reason?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stbenjam We want puppet with the other pr's to create the event queue and exchange with the new filters that @barnabycourt and @jlsherrill found while debugging a downstream case. @barnabycourt is also making a change on the Candlepin side of things to help Katello out but this is the action that was deemed needed on the Katello side of things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't answer why we're deleting the queue in a hook
Do you need to delete the queue for it to get setup correctly again?
If that's the case, it should be the job of Puppet to detect that things are misconfigured and recreate the queue with the right settings.
That could mean exec over here https://github.com/Katello/puppet-katello/blob/master/manifests/qpid.pp#L14
|
Closing out this PR in favor of having puppet do the deletion if it sees if the event queue is bound to '.' Will open a PR in https://github.com/Katello/puppet-katello |
Add validations for product and organization options: - product options required if repo name is specified - org options required if product name is specified
No description provided.