-
Notifications
You must be signed in to change notification settings - Fork 897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DRB error : connection timeout #13958
Comments
Forgot to say : it happened on |
Thanks for reporting this @fvillain! @blomquisg @Ladas Can you take a look? I'm not quite sure where to start understanding why the |
@jrafanie In cloud, other managers delegates to CloudManager for authentication. So if the .parent_manager association is missing, .authentications will return nil. |
Good find @Ladas. I thought we had |
Related to ManageIQ#13958 In the above issue, if ManageIQ::Providers::StorageManager::CinderManager::EventCatcher.sync_workers raises an exception, the server process exits fatally and all workers exit `Error heartbeating to MiqServer because DRb::DRbConnError: Connection reset by peer Worker exiting.` We now rescue any exceptions here, log it and move on to other worker classes.
@jrafanie the ensure_managers had a side effect, that could be actually causing this. When deleting the managers, the running refresh would re-add managers without the parent manager. So now, the ensure_managers runs only on create. |
This issue also seems to be causing this BZ https://bugzilla.redhat.com/show_bug.cgi?id=1417171 In this case we were calling After @jrafanie's change I see the following in the logs:
|
the ensure_managers that was creating managers without parent manager was fixed here: The general fix for the delegation issues is here: but a bug in Rails prevents that from finishing although after the #12878, we should not be seeing managers without a parent manager, so any idea why this is still happening? |
@durandom seems like Ansible still does before_validation https://github.com/Ladas/manageiq/blob/2835c365b3f180cd36911a5bd4346c8ef7d11ff3/app/models/manageiq/providers/ansible_tower/provider_mixin.rb#L7 @carbonin @jrafanie can you check which providers are causing this failure? |
@Ladas it was reported on Swift and Cinder providers here. I believe @fvillain also had Swift/Cinder providers in the descroption. Note, @fvillain if you use the master branch, we no longer have a fatal error in the server process, instead the failing worker class will log a message like this in evm.log: See: #13976 |
Good find, will fix this. |
Looks like this was fixed in #12878 |
Hi,
We got an error with DRBd that doesn't start with the appliance, i got the following logs :
@jrafanie looked it up, and It looks like the server process was failing when trying to sync_workers for one of the worker classes, possibly for the cinder/swift providers. For some reason, calling authentications on the provider are nil instead of being an empty array since it's Rails relation. It looks like a bug.
You can see the full discussion / details here : http://talk.manageiq.org/t/drb-error-connection-timeout/2025
Thank you !
The text was updated successfully, but these errors were encountered: