New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix group renaming issue when "id_provider = ldap" is set #128
Conversation
CI: http://sssd-ci.duckdns.org/logs/job/60/62/summary.html Debian testing failure is unrelated. |
I haven't read the patches, I just realized we might want to have a ticket and not just a downstream bugzilla. |
940caf7
to
bec6586
Compare
Updating the patch set with a test for this fix ... |
990b813
to
984cf33
Compare
The code looks good and manual tests went fine so far. Also the CI run passed: http://sssd-ci.duckdns.org/logs/job/62/55/summary.html |
The tests didn't find any regressions, ACK |
On (14/02/17 01:25), Jakub Hrozek wrote:
The tests didn't find any regressions, ACK
I have a tiny question:
What would happen if there ate two groups in directlry server
with the same GID?
I know it is a little bit corner case but we hit it few times.
LS
|
@lslebodn:
I understand why you're worried and I see we can hit this situation. But we can hit this situation even without my fix. So I'd like to propose to fix this situation when someone has time to work on this and in a better way than just "don't deal with group renaming". Does this make sense for you? |
On (14/02/17 01:57), fidencio wrote:
@lslebodn:
Firstly, my answer may be incomplete due to the lack of knowledge, but let's try ...
1) As far as I understand SSSD does not deal properly with multiple groups having the same GID and I'm saying that based on both AD's and LDAP's code, where the search is done by the GID and we expect only one result;
Yes, we expect but reality is different and we got
bug reports about incomplete groups.
And result of bug investigation was colliding GIDs.
Current version detects that there is a collision of GIDs
and will not return any result for problematic groups.
2) We already have at least one bug opened for this situation (https://fedorahosted.org/sssd/ticket/2982) and in case we decide to deal properly with this my feeling is that it will have to be done in all different parts of the code.
I understand why you're worried and I see we can hit this situation. But we can hit this situation even without my fix. So I'd like to propose to fix this situation when someone has time to work on this and in a better way than just "don't deal with group renaming".
Yes we can hit this situation without your fix but I am curious
what will be a difference between current behaviour and with this PR.
LS
|
On Tue, Feb 14, 2017 at 2:07 PM, lslebodn ***@***.***> wrote:
On (14/02/17 01:57), fidencio wrote:
***@***.***:
>Firstly, my answer may be incomplete due to the lack of knowledge, but
let's try ...
>1) As far as I understand SSSD does not deal properly with multiple
groups having the same GID and I'm saying that based on both AD's and
LDAP's code, where the search is done by the GID and we expect only one
result;
Yes, we expect but reality is different and we got
bug reports about incomplete groups.
And result of bug investigation was colliding GIDs.
Current version detects that there is a collision of GIDs
and will not return any result for problematic groups.
>2) We already have at least one bug opened for this situation (
https://fedorahosted.org/sssd/ticket/2982) and in case we decide to deal
properly with this my feeling is that it will have to be done in all
different parts of the code.
>
>I understand why you're worried and I see we can hit this situation. But
we can hit this situation even without my fix. So I'd like to propose to
fix this situation when someone has time to work on this and in a better
way than just "don't deal with group renaming".
>
Yes we can hit this situation without your fix but I am curious
what will be a difference between current behaviour and with this PR.
With this patch we will end up removing one of first group cached with the
gid and update with the new one.
Yes, as you mentioned, it's a corner case. And yes, as you said, it may
bite us really hard in the future.
So, I'd like to ask for suggestions (@sbose ?, @jhrozek ?) on how to deal
with this.
In case we get bitten by one of those two bugs, which one would hurt less?
Also, would be nice to see some bug reports about this (in case you have
those handy, @lslebodn).
Last but not least, @lslebodn suggested (in face to face conversation in
the office) that maybe we could add an option which would be used for
fixing the group renaming for whoever reported this bug (and this option
wouldn't be enabled by default). Opinions on Lukáš' idea?
Best Regards,
|
On Tue, Feb 14, 2017 at 05:41:34AM -0800, fidencio wrote:
On Tue, Feb 14, 2017 at 2:07 PM, lslebodn ***@***.***> wrote:
> On (14/02/17 01:57), fidencio wrote:
> ***@***.***:
> >Firstly, my answer may be incomplete due to the lack of knowledge, but
> let's try ...
> >1) As far as I understand SSSD does not deal properly with multiple
> groups having the same GID and I'm saying that based on both AD's and
> LDAP's code, where the search is done by the GID and we expect only one
> result;
>
> Yes, we expect but reality is different and we got
> bug reports about incomplete groups.
> And result of bug investigation was colliding GIDs.
>
> Current version detects that there is a collision of GIDs
> and will not return any result for problematic groups.
>
> >2) We already have at least one bug opened for this situation (
> https://fedorahosted.org/sssd/ticket/2982) and in case we decide to deal
> properly with this my feeling is that it will have to be done in all
> different parts of the code.
> >
> >I understand why you're worried and I see we can hit this situation. But
> we can hit this situation even without my fix. So I'd like to propose to
> fix this situation when someone has time to work on this and in a better
> way than just "don't deal with group renaming".
> >
>
> Yes we can hit this situation without your fix but I am curious
> what will be a difference between current behaviour and with this PR.
With this patch we will end up removing one of first group cached with the
gid and update with the new one.
Yes, as you mentioned, it's a corner case. And yes, as you said, it may
bite us really hard in the future.
So, I'd like to ask for suggestions ***@***.*** ?, @jhrozek ?) on how to deal
with this.
I wonder if a low-tech solution would help here. In case we hit this
codepath, issue a really loud debug message informing that a group was
renamed from X to Y and if the group was renamed on the server, it's
expected, otherwise it's an error.
btw we should (unless we already do) check that requests by ID return
only one result.
In case we get bitten by one of those two bugs, which one would hurt less?
Also, would be nice to see some bug reports about this (in case you have
those handy, @lslebodn).
I don't remember those off-hand, but I know there were some and that's
the reason we added debug messages to the NSS responder informing about
ID duplicates.
Last but not least, @lslebodn suggested (in face to face conversation in
the office) that maybe we could add an option which would be used for
fixing the group renaming for whoever reported this bug (and this option
wouldn't be enabled by default). Opinions on Lukáš' idea?
I'm not sure..it does steer towards the safe side, but on the other
hand, renaming a group is a legally fine operation and I'm not sure I
like an option that the admin must enable in order to proceed with an OK
operation..
|
On (14/02/17 08:17), Jakub Hrozek wrote:
> Last but not least, @lslebodn suggested (in face to face conversation in
> the office) that maybe we could add an option which would be used for
> fixing the group renaming for whoever reported this bug (and this option
> wouldn't be enabled by default). Opinions on Lukáš' idea?
I'm not sure..it does steer towards the safe side, but on the other
hand, renaming a group is a legally fine operation and I'm not sure I
like an option that the admin must enable in order to proceed with an OK
operation..
Renaming is fine.
But coliding UIDs/GIDs is not rare situation.
especialy if they use old clients (nss-ldap) which
do not cache entries and do not care about colliding IDs.
ATM we are quite safe in case of colliding IDs
The main problem is what whether this change might results
in more bug reports related to issues with colliding IDs
(renamed group very often). It might be difficult to
identify it.
BTW IIRC this use case (colliding IDs is quite common in /etc/passwd,group)
LS
|
So, as far as I remember, the conclusion about this patch is that we should also have a really loud debug message saying that the group has been renamed. Is everyone here in agreement about this? @lslebodn, @jhrozek, @sumit-bose |
984cf33
to
6b927e3
Compare
Just updated the patches adding the loud debug message saying the group has been renamed. |
6b927e3
to
eef7fba
Compare
So I was thinking about this PR a bit more and I'm no longer sure we can rename groups at will. I think we can only support that if we support multiple objects with the same ID (which we currently don't and which would be a big task) or if we implement some heuristics to see that it's indeed the same, just renamed group and not a conflict. So, I suggest that we, before renaming the group check that at least one of these conditions applies:
if these don't apply, then we can't know if the object is the same or different and we thrown an error. What do you think? |
@jhrozek, your suggestion is good. I've updated the patchset and I'm waiting for CI's results. |
eef7fba
to
24e5381
Compare
src/db/sysdb_ops.c
Outdated
same_original_dn = true; | ||
} else { | ||
same_original_dn = false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please avoid copy and paste here.
Have you considered for loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't. I'll re-work the code and re-submit the PR.
@jhrozek, @pbrezina, @sumit-bose, @mzidek-rh ... Please, could someone take a look on #128 (comment) ? |
So I don't like passing the provider from everywhere. But because this PR was going on for so long, I actually prepared alternative version where the data provider is part of sdap_options and passed this way. Can you check my branch called "group_rename" ? This was we could get rid of all the "SDAP: Pass struct data_provider to XYZ" patches. |
Your version seems okay, please, force-push it to my branch. Just a note: Although I do appreciate you took some time to come up with a better solution, for the next time, if it's possible, I'd strongly prefer if you could give me your suggestion and then I'd rework the patch ... that would help get a better understanding of the parts that I'm still lacking knowledge in the project. |
OK, I'll prettify the patches and force push. About just nacking the patch, I was both not sure if'd you'd appreciate 10th time someone tells you to rework the PR and at the same time, I wasn't sure if the approach is viable.. |
...viable until I actually sat down and coded the approach myself. |
Hmm, I'm getting permission rejected from pushing to your repository, any chance you can just fetch my branch and push into yours? |
f5432ea
to
7cd45e6
Compare
I've squashed your patches into mines and updated the PR. My ask for help still stand: #128 (comment) and on #128 (comment) |
Have you considered changing the ldb expiration timestamps with pyldb or shelling out to ldbmodify? |
I don't understand how it would help, Jakub. How doing changing the ldb expiration timestamps with pyldb would trigger sdap_handle_id_collision_for_incomplete_groups()? |
Maybe I don't understand the problem. What is it you're trying to test, the memcache invalidation? |
I really failed to understand how to write a test without calling When I reworked the patches I didn't find an easy wai to have sdap_handle_id_collision_for_incomplete_groups() triggered (sorry, I can't give you more details from the top of my head) and the test was not passing when removing the So, seems that there are a lot of things to be understood in this part ... but that I'd really need someone's help to do so. If you have the time and the willing to do this, please, let me know. We can do this over IRC ... just gimme an "one day head's up" so I could try to remember the last state of the patch set and then we can talk. |
I would suggest sometimes this week because currently I do have the patch in my head :) |
Here's the CI results for the current version of this patch set: http://vm-031.${abc}/logs/job/87/73/summary.html |
There are some situations where, from the backend, the NSS responder will have to be notified to invalidate a group. In order to achieve this in a clean way, let's add the InvalidateGroupById handler and make use of it later in this very same series. Related: https://pagure.io/SSSD/sssd/issue/2653 Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
This function will be called from the data provider to the NSS responder, which will invalidate a group in the memcache. Related: https://pagure.io/SSSD/sssd/issue/2653 Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
This new error will be returned from sysdb_add_incomplete_group() when renaming a group which will case gid collision. Related: https://pagure.io/SSSD/sssd/issue/2653 Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
In order to be able to use the Data Provider methods from the SDAP code to e.g. invalidate memcache when needed, add a new field to the sdap_options structure with the data_provider structure pointer. Fill the pointer value for all LDAP-based providers.
This newly added function is a helper to properly hadle group id-collisions when renaming incomplete groups and it does: - Deletes the group from sysdb - Adds the new incomplete group - Notifies the NSS responder that the entry also has to be deleted from the memory cache This function will be called from sdap_ad_save_group_membership_with_idmapping() and from sdap_add_incomplete_groups(). Related: https://pagure.io/SSSD/sssd/issue/2653 Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
Resolves: https://pagure.io/SSSD/sssd/issue/2653 Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
This situation can be hit when renaming a group. For now, let's just error this out so the caller can handle it properly on its own layer. Related: https://pagure.io/SSSD/sssd/issue/2653 Signed-off-by: Fabiano Fidêncio <fidencio@redhat.com>
…initgroups As we implemented the group renaming heuristics to rename only if we can use another "hint" like the original DN or the SID to know the group is the same, this patch adds two tests (positive and negative) to make sure a group with a totally different RDN and hence different originalDN cannot be renamed but a group whose name changed but the RDN stays the same can be renamed. Related: https://pagure.io/SSSD/sssd/issue/3282 Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
I pushed the patch that detects the group rename vs. GID duplicate to https://github.com/jhrozek/sssd/tree/group_renaming The rdn_same test now fails when I revert your patches. Thank you for spotting the test was not hitting the codepath at all. I haven't ran any downstream tests yet, just the integration tests and unit tests. But I would appreciate code review. |
… GID Reviewed-by: Fabiano Fidêncio <fidencio@redhat.com>
7cd45e6
to
d58f84b
Compare
@jhrozek, I've updated the branch. Your tests are solid now. Now we're just waiting for the results of downstream tests to be sure that the latest patch in this series is not going to break something else. |
I'm sorry about the delay. I had to spend some time to triage the downstream tests, because they were failing and at the same time, our test engine internally at red hat had some scheduling issues. Nonetheless, the tests seem to be passing -- the most relevant tests are LDAP provider (job ID 2432961) and the AD provider test (job ID 2441955), in both tests there were either some known failures or intermittent failures that were gone during a manual re-run. ACK |
Those two patches fix https://bugzilla.redhat.com/show_bug.cgi?id=1401241
The sssd.conf used in order to reproduce this issue looks like:
The reproducer can be found in the bug report.