hbac_eval_user_element returns incorrect group count, HBAC fails #4373

sssd-bot · 2020-05-02T13:37:10Z

Cloned from Pagure issue: https://pagure.io/SSSD/sssd/issue/3342

Created at 2017-03-17 03:56:57 by datakid
Closed at 2017-05-11 16:08:51 as duplicate
Assigned to nobody

env: CentOS 7.3, FreeIPA 4.4, sssd 1.15.1 from COPR

On the IPA server:

"ipa hbactest ..." returns TRUE, so everything seems set up correctly.

When I try to login to the test client, I get denied.

On the test client:

hbac_eval_user_element is returning a wrong value. This is seen in sssd_domain.log, it's returning 25. My test user is in 37 groups. This is seen on the IPA server via id username. On the test client id username returns 36 groups, the one missing is an IPA (not AD) group that was made for HBAC rules. I have sanitized logs available.
taking ldbsearch -H /var/lib/sss/db/cache_domain.com.ldb '(objectclass=user)' and finding the record in question shows the same 36 groups available. The missing group shouldn't affect ability to login via HBAC
getent group (groupname) works as expected. Also worth noting that the group missing from id username shows that user in getent.

For reference, on the client the sssd service was stopped, the cache deleted, and the service started again the night before after which the server wasn't accessed by anyone. I find that this is necessary for the cache to populate.

Attached are all the log files from around that time, with relevant domain removed.

Here are the results of id from both machines:

id from client

[root@vmts-linuxclient1 sss_repeat_logs]# id simpsonlachlan@domain.org.au
uid=1506(lsimpson@domain.org.au) gid=1506(lsimpson) groups=1506(lsimpson),1750692689(covene cohesion standard  users - end users@domain.org.au),1750673921(pmc-cxserver01 - data shares (m)@domain.org.au),1750673801(external - exchange 2010 users@domain.org.au),1750642254(wireless lan - securewireless1@domain.org.au),1750642900(secure file transfer users@domain.org.au),1750640132(internet access - general@domain.org.au),1750645701(next gen seq users group@domain.org.au),1750663628(sp-vccc-referencegroups@domain.org.au),1750692154(_sm_temp_researchers@domain.org.au),1750699277(pmc-res-cluster-user@domain.org.au),1750668949(vpn access - general@domain.org.au),1750603605(.all hospital staff@domain.org.au),1750689356(cnxs_users_pmc_prod@domain.org.au),1750636500(res_cancer genetics@domain.org.au),1750663624(sp-vccc-backofhouse@domain.org.au),1750625322(all domain staff@domain.org.au),1750688798(cnxs_users_pmc_uat@domain.org.au),1750639195(res_bioinformatics@domain.org.au),1750699276(pmc-res-rfc-admin@domain.org.au),1750642781(sp_dashboard_read@domain.org.au),1750699280(pmc-res-ipausers@domain.org.au),1750663625(sp-vccc-research@domain.org.au),1750688388(bioinf-cluster@domain.org.au),1750705769(bioinf_rstudio@domain.org.au),1750688773(res_v7000_vcfg@domain.org.au),1750661167(bioinf - team@domain.org.au),1750687331(bioinf_admins@domain.org.au),1750612248(crystal users@domain.org.au),1750636018(res_all staff@domain.org.au),1750687326(bioinf-staff@domain.org.au),1750600513(domain users@domain.org.au),1750652288(vc_res_users@domain.org.au),1750634622(allresearch@domain.org.au),1750688196(mol-path@domain.org.au),10007(ipa_bioinf_cluster),10005(ipa_bioinf_admins),10004(ipa_bioinf_staff)

id from server

[root@vmdv-linuxidm1 log]# id simpsonlachlan@domain.org.au
uid=1506(lsimpson@domain.org.au) gid=1506(lsimpson) groups=1506(lsimpson),1750673921(pmc-cxserver01 - data shares (m)@domain.org.au),1750642254(wireless lan - securewireless1@domain.org.au),1750642900(secure file transfer users@domain.org.au),1750645701(next gen seq users group@domain.org.au),1750668949(vpn access - general@domain.org.au),1750603605(.all hospital staff@domain.org.au),1750689356(cnxs_users_pmc_prod@domain.org.au),1750636500(res_cancer genetics@domain.org.au),1750663624(sp-vccc-backofhouse@domain.org.au),1750688798(cnxs_users_pmc_uat@domain.org.au),1750639195(res_bioinformatics@domain.org.au),1750663625(sp-vccc-research@domain.org.au),1750688388(bioinf-cluster@domain.org.au),1750705769(bioinf_rstudio@domain.org.au),1750688773(res_v7000_vcfg@domain.org.au),1750661167(bioinf - team@domain.org.au),1750636018(res_all staff@domain.org.au),1750687331(bioinf_admins@domain.org.au),1750600513(domain users@domain.org.au),1750652288(vc_res_users@domain.org.au),1750634622(allresearch@domain.org.au),1750688196(mol-path@domain.org.au),1750673801(external - exchange 2010 users@domain.org.au),1750625322(all domain staff@domain.org.au),1750642781(sp_dashboard_read@domain.org.au),1750663628(sp-vccc-referencegroups@domain.org.au),1750699277(pmc-res-cluster-user@domain.org.au),1750699280(pmc-res-ipausers@domain.org.au),10007(ipa_bioinf_cluster),1750699276(pmc-res-rfc-admin@domain.org.au),1750692689(covene cohesion standard  users - end users@domain.org.au),1750640132(internet access - general@domain.org.au),1750612248(crystal users@domain.org.au),1750692154(_sm_temp_researchers@domain.org.au),10005(ipa_bioinf_admins),1750687326(bioinf-staff@domain.org.au)

Comments

Comment from datakid at 2017-03-17 04:15:20

Here's the conf file

Comment from datakid at 2017-03-17 04:54:21

Looks like adding more than one file at a time doesn't work. Here's the sssd_nss.log from the relevant time stamps.

Comment from datakid at 2017-03-17 05:03:18

add the /var/log/dirsrv/slapd_domain/access, from the relevant timestamps

Comment from lslebodn at 2017-03-17 12:13:31

Could you provide a little bit more info?

How does problematic ipa HBAC rule looks like
which group is problematic

It will simplify analysis of log files

Is it problem just with sssd-1.15.1 or the same problem is with default version of sssd on CentOS7.

Comment from lslebodn at 2017-03-17 12:20:15

BTW I can see few error in logs

1002:(Fri Mar 17 09:03:12 2017) [sssd[be[unixdev.domain.org.au]]] [ipa_sudo_fetch_rules_done] (0x0040): Received 1 sudo rules
1193:(Fri Mar 17 09:03:29 2017) [sssd[be[unixdev.domain.org.au]]] [sysdb_mod_group_member] (0x0080): ldb_modify failed: [No such attribute](16)[attribute 'member': no matching attribute value while deleting attribute on 'name=ipa_bioinf_staff@unixdev.domain.org.au,cn=groups,cn=unixdev.domain.org.au,cn=sysdb']
1194:(Fri Mar 17 09:03:29 2017) [sssd[be[unixdev.domain.org.au]]] [sysdb_error_to_errno] (0x0020): LDB returned unexpected error: [No such attribute]
1196:(Fri Mar 17 09:03:29 2017) [sssd[be[unixdev.domain.org.au]]] [sysdb_update_members_ex] (0x0020): Could not remove member [SimpsonLachlan@domain.org.au] from group [name=ipa_bioinf_staff@unixdev.domain.org.au,cn=groups,cn=unixdev.domain.org.au,cn=sysdb]. Skipping

Comment from lslebodn at 2017-03-17 12:22:55

BTW here few hbac related:

sh$ grep get_ipa_groupname sssd_unixdev.domain.org.au.log
(Fri Mar 17 09:03:37 2017) [sssd[be[unixdev.domain.org.au]]] [get_ipa_groupname] (0x1000): Parsing CN=Bioinf_RStudio,OU=Security Groups,DC=domain,DC=org,DC=au
(Fri Mar 17 09:03:37 2017) [sssd[be[unixdev.domain.org.au]]] [get_ipa_groupname] (0x0020): Expected cn in second component, got OU
(Fri Mar 17 09:03:37 2017) [sssd[be[unixdev.domain.org.au]]] [get_ipa_groupname] (0x1000): Parsing CN=.Research Bioinf Cluster,OU=Distribution Groups,OU=Research,OU=User Accounts,OU=User Accounts,DC=domain,DC=org,DC=au
(Fri Mar 17 09:03:37 2017) [sssd[be[unixdev.domain.org.au]]] [get_ipa_groupname] (0x0020): Expected cn in second component, got OU
(Fri Mar 17 09:03:37 2017) [sssd[be[unixdev.domain.org.au]]] [get_ipa_groupname] (0x1000): Parsing CN=res-compute-systems,CN=Users,DC=domain,DC=org,DC=au
(Fri Mar 17 09:03:37 2017) [sssd[be[unixdev.domain.org.au]]] [get_ipa_groupname] (0x0020): Expected groups second component, got Users

many more

Comment from datakid at 2017-03-19 22:13:00

Could you provide a little bit more info?

How does problematic ipa HBAC rule looks like
which group is problematic

It will simplify analysis of log files
Is it problem just with sssd-1.15.1 or the same problem is with default version of sssd on CentOS7.

Absolutely. This problem has existed for us since SSSD 1.13 iirc. That's why we've been using COPR. We are set up in a one way trust with the AD system.

This issue doesn't happen with "allow all" and we can successfully authenticate and work with allow all. We have been using it for almost a year.

The issue happens with all HBAC rules we've tried over the year. The rule I've chosen is arbitrary, but here are that particular rule's details:

[root@vmdv-linuxidm1 ~]# ipa hbacrule-show
Rule name: access to unrestricted
Rule name: access to unrestricted
Description: Gives users login access to unrestricted servers
Enabled: TRUE
User Groups: ipa_bioinf_admins, ipa_bioinf_cluster, ipa_bioinf_staff
Host Groups: unrestricted
Services: login, sshd

Those three user groups are mapped from AD groups are set up as per documentation. The user in question is a member of the AD group Bioinf Admins that is an external member of external_bioinf_admins which is a member of the POSIX ipa_bioinf_admins

But, as I noted, this is not exclusive to that group, it just happens to be the group I've used to test this time around. Am happy to show this error with other groups.

Comment from datakid at 2017-03-19 22:49:07

I note that this pdf re legacy clients and IPA 3.3 makes a mention of using ipa-adtrust-install --enable-compat and creating a HBAC srv system-auth. We haven't done either (not in the recent docs to best of my knowledge) - should we have?

Comment from jhrozek at 2017-03-20 09:38:53

Unless you use a legacy client, then no, you don't have to worry about system-auth in HBAC, it's really only used by the slapi-nis plugin.

Comment from jhrozek at 2017-04-25 22:51:20

Hi,
I'm sorry it took us so long to get back to you. Could you please try these test builds? They are based atop the 7.3 baseline:
https://copr.fedorainfracloud.org/coprs/jhrozek/sssd-hbac-7.3/

Please let me know if you'd like me to build the packages for a different distribution.

Comment from datakid at 2017-05-01 03:31:42

It's ok, I've only just got back from annual leave. I'll have to build a new server - we are currently tracking sssd 1.15 from COPR

Comment from jhrozek at 2017-05-01 21:59:00

Please don't, if you are already running 1.15, then I'll build you a different test repo instead. The patch applies seamlessly for both.

Comment from datakid at 2017-05-02 03:57:43

ok, great - thanks.

Comment from jhrozek at 2017-05-02 14:51:04

Please try these builds, I hope I got the release number right so it would be easy to upgrade from these:
https://jhrozek.fedorapeople.org/sssd-test-builds/sssd-1-15-hbac-no-orig-mbof/

Comment from datakid at 2017-05-05 03:07:13

That worked really well. TBH, I've never seen sssd work so quickly and have HBAC work out of the box (as a result of the changes, I presume). Do you want any info?

All I did was install the rpms you supplied onto the test client, stopped sssd, cleared the /var/lib/sss/db, rebooted. It just worked.

I need to do more testing. I've never seen it work so quickly.

Comment from jhrozek at 2017-05-05 09:44:55

I'm actually surprised, because the new builds shouldn't be noticeably faster. Are you sure sssd is not "just" in offline mode, answering everything from cache (debug logs would tell).

Anyway, I'm glad HBAC works for you now.

Comment from lslebodn at 2017-05-05 10:09:39

All I did was install the rpms you supplied onto the test client, stopped sssd, cleared the /var/lib/sss/db, rebooted. It just worked.

It could not work in offline mode :-)

But It would be good to know more details about difference. What do you mean by "it works so quickly". Within a second? How long did it take previously?

Comment from datakid at 2017-05-08 02:01:32

@jhrozek @lslebodn Yep, cache's have been deleted so it's not "offline".

Previous experiences:

with COPR 1.14.x: HBAC was intermittent, logins denied meant we couldn't use at all in production

with COPR 1.15.2: With more investigation, time and understanding of how FreeIPA/SSSD works, HBAC seemed to work, but we needed to wait.

What we found was that users couldn't log in at all straight away, and sometimes not at all. IF we "prompted" the SSSD installation (ie, tried to login or ran "id user@domain" on the machine), then we could usually log in 5-10 minutes later. Not always though. Sometimes it would take a day or a couple of days for the login to work properly - I always thought this was while the local cache properly mirrored the AD.

When I say login here I mean "complete the login process successfully". What we were seeing was SSSD returning the "password" prompt quite quickly and then failing relatively quickly.

The impression I got was that - because of the results we were seeing with the prompting - SSSD wasn't correctly going back to IPA to ask every time. When I looked into it, I noticed that hbac_eval_user was not returning the right number of groups. This was the only obvious error/issue - apart from the resulting symptom "users can't login".

But deleting the cache didn't improve the situation, it made it worse, because the local cache didn't seem to cache the whole AD LDAP, nor interrogate the AD long enough to get a full record for any particular user.

What I'm seeing now is that a user who has never logged into a machine (a machine with an cleared cache), can login successfully after a 10-20 second pause on the command line before the password prompt, and when I look into the logs, hbac_eval_user is returning the correct groups.

So, the TLDR:

the login process is longer (ie, time between "ssh user@server.domain.com" request and "password" promt) - from a few seconds to maybe 10-20 or 20-30 seconds. This login is successful.
a successful login is quicker - because while the login process is slower, there is no wait between "prompting" and a successful login (which we were seeing take between 5 minutes and 3 days for first login).

Happy to clarify if this doesn't make sense.

Comment from jhrozek at 2017-05-11 16:06:53

The performance is in my opinion unrelated. In general, logins should reflect the group membership pretty much always.

Unless you use some tuning and especially if your AD groups are large it might take some time to cache the groups but especially their members during login.

But since it seems HBAC works for you reliably now, I would prefer to close this tikcet now as a duplicate of ticket #3382, mostly because that one is also linked to a downstream RHEL bugzilla.

Comment from jhrozek at 2017-05-11 16:08:58

Metadata Update from @jhrozek:

Issue close_status updated to: duplicate
Issue status updated to: Closed (was: Open)

The text was updated successfully, but these errors were encountered:

sssd-bot added the Closed: Duplicate label May 2, 2020

sssd-bot closed this as completed May 2, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hbac_eval_user_element returns incorrect group count, HBAC fails #4373

hbac_eval_user_element returns incorrect group count, HBAC fails #4373

sssd-bot commented May 2, 2020