Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proxy: access provider directly not through be_ctx #646

Closed
wants to merge 5 commits into from

Conversation

pbrezina
Copy link
Member

@pbrezina pbrezina commented Sep 5, 2018

See commit message for details.

Resolves:
https://pagure.io/SSSD/sssd/issue/3812

@jhrozek jhrozek self-assigned this Sep 6, 2018
@jhrozek
Copy link
Contributor

jhrozek commented Sep 6, 2018

This improves the situation in the sense that the proxy provider now works, but I'm seeing this crash on shutdown:

Sep 06 14:29:55 client.ipa.test systemd-coredump[11292]: Process 11281 (sssd_be) of user 0 dumped core.

                                                         Stack trace of thread 11281:
                                                         #0  0x00007f477424acf0 _dbus_list_unlink (libdbus-1.so.3)
                                                         #1  0x00007f477424ad69 _dbus_list_remove_link (libdbus-1.so.3)
                                                         #2  0x00007f4774239c71 _dbus_message_remove_counter (libdbus-1.so.3)
                                                         #3  0x00007f477422d49d free_outgoing_message (libdbus-1.so.3)
                                                         #4  0x00007f477422ef81 _dbus_connection_last_unref (libdbus-1.so.3)
                                                         #5  0x00007f4774aa8191 sbus_connection_release (libsss_sbus.so)
                                                         #6  0x00007f4774aa81ef sbus_connection_destructor (libsss_sbus.so)
                                                         #7  0x00007f477446f011 _tc_free_internal (libtalloc.so.2)
                                                         #8  0x00007f4774aa931a sbus_connection_free_handler (libsss_sbus.so)
                                                         #9  0x00007f4774689541 tevent_common_loop_timer_delay (libtevent.so.0)
                                                         #10 0x00007f477468a557 epoll_event_loop_once (libtevent.so.0)
                                                         #11 0x00007f4774688ba7 std_event_loop_once (libtevent.so.0)
                                                         #12 0x00007f4774684fed _tevent_loop_once (libtevent.so.0)
                                                         #13 0x00007f477468520b tevent_common_loop_wait (libtevent.so.0)
                                                         #14 0x00007f4774688b47 std_event_loop_wait (libtevent.so.0)
                                                         #15 0x00007f477856cb15 server_loop (libsss_util.so)
                                                         #16 0x000000000040a4a9 main (sssd_be)
                                                         #17 0x00007f4773b70f2a __libc_start_main (libc.so.6)
                                                         #18 0x0000000000407daa _start (sssd_be)

@jhrozek
Copy link
Contributor

jhrozek commented Sep 6, 2018

btw I'm seeing the crash with PR #647 applied..but only with the proxy domain, not e.g. IPA domain

Modules are initialized as part of dp_init_send() but be_ctx->provider is set
only after this request is finished therefore it is not available here.

Resolves:
https://pagure.io/SSSD/sssd/issue/3812
Backend context is overused inside sssd code even during its initialization.
Some parts of initialization code requires access to be_ctx->provider so we
must make it available as soon as possible.

Better solution would be to always use 'provider' directly in initialization
but this makes it safer for any future changes as one does not have to keep
in mind when it is safe to use be_ctx->provider and when not. Now it is
always safe.

Resolves:
https://pagure.io/SSSD/sssd/issue/3812
@pbrezina
Copy link
Member Author

pbrezina commented Sep 7, 2018

I can not reproduce this issue. Do you have any reproducer or machine where I can debug it?

dbus_message_set_sender may reallocate internal fields which will yield pointer
obtained by dbus_message_get_* invalid.
This may cause some troubles if the dbus connection was dropped
as dbus will try to actually send the messages. Also when the
connectin is being freed, tevent integration is already disabled
so there is no point in doing this.
We never reproduced this with gdb but valgrind shows invalid read in sbus_watch_handler
after the watch_fd was freed. This should not be needed since watch_fd is memory parent
of fdevent but it seems to help.
@pbrezina
Copy link
Member Author

I pushed new patches after debugging session with Jakub.

@jhrozek
Copy link
Contributor

jhrozek commented Sep 27, 2018

I'm very sorry for the slow review. Please ping me next time I forget about some patchset like this.

Anyway, I'm happy to confirm that the patches fix the issue. To be sure, I ran the RH QE tests twice without the patches and got two crash reports (jobs 2829402 and 2829398 in case anyone cares), then twice with the patches (jobs 2831660 and 2831658) with no crash.

-> ACK

@jhrozek
Copy link
Contributor

jhrozek commented Sep 27, 2018

btw I know Sumit wants to get some patches through CI, so I won't push the patches right away in order to not add even more load to CI at the moment..

@jhrozek
Copy link
Contributor

jhrozek commented Sep 28, 2018

@jhrozek jhrozek closed this Sep 28, 2018
@jhrozek jhrozek added the Pushed label Sep 28, 2018
@pbrezina pbrezina deleted the sbus-proxy branch November 14, 2019 10:34
@alexey-tikhonov
Copy link
Member

This improves the situation in the sense that the proxy provider now works, but I'm seeing this crash on shutdown:

Sep 06 14:29:55 client.ipa.test systemd-coredump[11292]: Process 11281 (sssd_be) of user 0 dumped core.

                                                         Stack trace of thread 11281:
                                                         #0  0x00007f477424acf0 _dbus_list_unlink (libdbus-1.so.3)
                                                         #1  0x00007f477424ad69 _dbus_list_remove_link (libdbus-1.so.3)
                                                         #2  0x00007f4774239c71 _dbus_message_remove_counter (libdbus-1.so.3)
                                                         #3  0x00007f477422d49d free_outgoing_message (libdbus-1.so.3)
                                                         #4  0x00007f477422ef81 _dbus_connection_last_unref (libdbus-1.so.3)
                                                         #5  0x00007f4774aa8191 sbus_connection_release (libsss_sbus.so)
                                                         #6  0x00007f4774aa81ef sbus_connection_destructor (libsss_sbus.so)
                                                         #7  0x00007f477446f011 _tc_free_internal (libtalloc.so.2)
                                                         #8  0x00007f4774aa931a sbus_connection_free_handler (libsss_sbus.so)
                                                         #9  0x00007f4774689541 tevent_common_loop_timer_delay (libtevent.so.0)
                                                         #10 0x00007f477468a557 epoll_event_loop_once (libtevent.so.0)
                                                         #11 0x00007f4774688ba7 std_event_loop_once (libtevent.so.0)
                                                         #12 0x00007f4774684fed _tevent_loop_once (libtevent.so.0)
                                                         #13 0x00007f477468520b tevent_common_loop_wait (libtevent.so.0)
                                                         #14 0x00007f4774688b47 std_event_loop_wait (libtevent.so.0)
                                                         #15 0x00007f477856cb15 server_loop (libsss_util.so)
                                                         #16 0x000000000040a4a9 main (sssd_be)
                                                         #17 0x00007f4773b70f2a __libc_start_main (libc.so.6)
                                                         #18 0x0000000000407daa _start (sssd_be)

For the record: the same crash was seen in https://bugzilla.redhat.com/show_bug.cgi?id=1783169 so I think it was not fixed by this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants