LDAP authentication with active directory [$10] #1491

Closed
tpetrosy opened this Issue Nov 25, 2015 · 97 comments

Projects

None yet

10 participants

@tpetrosy

Hello,
We try to integrate rocketchat with AD using LDAP.
Login works, but we have problem with active sessions.
Seems main.js creates new session with LDAP server for each user login and keeps connection up.
After 15 minutes LDAP server sends RST packet to application and drop established connection.
As soon as LDAP server drop session with application, all connected clients lose connection with rocketchat server.
There is what I get from logs when it happens

Error: read ECONNRESET
at errnoException (net.js:905:11)
at TCP.onread (net.js:559:19)

/var/www/rocket.chat/bundle/programs/server/packages/meteor.js:974
throw new Error("Meteor code must always run within a Fiber. " +
^
Error: Meteor code must always run within a Fiber. Try wrapping callbacks that you pass to non-Meteor libraries with Meteor.bindEnvironment.
at Object.Meteor.nodeCodeMustBeInFiber (packages/meteor/dynamics_nodejs.js:9:1)
at [object Object].
.extend.get (packages/meteor/dynamics_nodejs.js:21:1)
at Object.Meteor.isRestricted (packages/dispatch_run-as-user/packages/dispatch_run-as-user.js:137:1)
at [object Object].Mongo.Collection.(anonymous function) as update
at Object.UserPresence.removeConnectionsByInstanceId (packages/konecty_user-presence/packages/konecty_user-presence.js:88:1)
at process. (packages/konecty_user-presence/packages/konecty_user-presence.js:223:1)
at process.emit (events.js:117:20)
at process.exit (node.js:740:17)
at process.catchException (/usr/lib/node_modules/pm2/node_modules/pmx/lib/notify.js:52:15)
at process.g (events.js:180:16)

There is a $10 open bounty on this issue. Add to the bounty at Bountysource.

@srevereault

Hello,
I confirm having the same problem :
Error: read ECONNRESET
at errnoException (net.js:905:11)
at TCP.onread (net.js:559:19)

RocketChat worked fine until I connected it to an AD server. I'm running the lastest Docker image with docker-compose.

@tpetrosy

Hi,
I tried with latest version from GitHub and got same problem, daemon drop all user connections.

@Megatronic79

I’ve just updated to the latest build and LDAP authentication is still working properly and no crashes.

Just now upgraded the production version and same result, LDAP continues to work and RC is stable, this is 2003 and its a non Docker install.

So not being able to reproduce, was the LDAP working for you at all? is the time on the chat server and the AD server in sync? - Can you test your LDAP query in apache studio to confirm its correct?
Also seen some similar issues before with crashes when AD is used with local location for Avatars, has your users changed avatars?
Did you set the LDAP sync option set?

@tpetrosy

Hello,
Thank you for your replay. There are answers on your questions.

  1. I am using AD 2008 r2.
  2. I tried with Docker install, and non Docker install.
  3. Time sync is ok on both of the servers.
  4. Ldap query is correct (I am able to login, even if query is not correct, why rocket chat must crash because of that? )
  5. nobody tried to change Avatars.
  6. I tried with Ldap sync and without, the result is the same.

There are more info wich can be useful.

  1. Rocket chat opens connection and doesn't close it with ldap server for every attempt to login (even if I use bad user and password). I was able to open more than 200 connections only by clicking on login button, which is vulnerable issue.
  2. What I get on non Docker installation is that application crashes anyway, but it recover itself after 2-3 seconds.
    There are some system loop outputs for every 2 second when it happens (login time is 21:48:55, crash time is 22:03:38)

Connection status:
Thu Nov 26 22:03:35 UTC 2015
tcp 0 0 10.136.2.161:39138 10.136.0.101:389 ESTABLISHED 12766/main.js
Thu Nov 26 22:03:37 UTC 2015
tcp 0 0 10.136.2.161:39138 10.136.0.101:389 ESTABLISHED 12766/main.js
Thu Nov 26 22:03:39 UTC 2015
Thu Nov 26 22:03:41 UTC 2015


Open port status:
Thu Nov 26 22:03:35 UTC 2015
tcp 0 0 0.0.0.0:3000 0.0.0.0:* LISTEN 12766/main.js
Thu Nov 26 22:03:37 UTC 2015
tcp 0 0 0.0.0.0:3000 0.0.0.0:* LISTEN 12766/main.js
Thu Nov 26 22:03:39 UTC 2015
Thu Nov 26 22:03:41 UTC 2015
tcp 0 0 0.0.0.0:3000 0.0.0.0:* LISTEN 22220/main.js
Thu Nov 26 22:03:43 UTC 2015
tcp 0 0 0.0.0.0:3000 0.0.0.0:* LISTEN 22220/main.js


Process list status:
Thu Nov 26 22:03:35 UTC 2015
12766 Ssl 0:15 node /var/www/rocket.chat/bundle/main.js
Thu Nov 26 22:03:37 UTC 2015
12766 Ssl 0:15 node /var/www/rocket.chat/bundle/main.js
Thu Nov 26 22:03:39 UTC 2015
22220 Rsl 0:01 node /var/www/rocket.chat/bundle/main.js
Thu Nov 26 22:03:41 UTC 2015
22220 Ssl 0:03 node /var/www/rocket.chat/bundle/main.js


Error output from connected client browser:
GET http://rocketchat.myorg.org:3000/sockjs/info?cb=z9110hil5c 503 (Service Unavailable)y._start @ 8924e40c19fca54679d60bca73797c01ec281b56.js?meteor_js_resource=true:45(anonymous function) @ 8924e40c19fca54679d60bca73797c01ec281b56.js?meteor_js_resource=true:45


You can reproduce case fast, if you use tcpkill and try to login to system, it will close new created session between application and ldap server. (Rocket Chat will crash on every attempt to login)

@Megatronic79

If using LDAP with wrong credentials is letting you in to RC then looks like there is an error in the filter and its not actually doing any LDAP. (Seen this when no bind is used)

Can you post your Bind Search field (change the username and password of course)

and checking your RC server can actually resolve your AD server ok?

@tpetrosy

Ok there is Bind Search.

But if filter is not correct, why login page works correctly ?
when I enter correct username and password I enter to system successfully if no, it gives me bad username or password output.

{"filter": "(&(objectCategory=person)(objectclass=user)(memberOf=CN=GRP,OU=Groups,OU=LCC,DC=myorg,DC=org)(mail=#{username}))", "scope": "sub", "userDN": "myuser", "password": "mypassword"}

@Megatronic79

I think there is a default action, when ldap initial bind fails, to accept logon, prob left over from the testing. @rodrigok best to answer that one. - we should prob add some form of testing to confirm bind is successful before accepting the changes.

So looking at your bind you are using email to login with right?

Here is one from my dev server: (This one can either authenticate with Email or Username and must be in the group called RC_Users)

In this case:

Bind Account (proxy user) = rocket.service@domain.com
Proxy user password = password
Domain = domain.com
Group = RC_Users,OU=Services,DC=domain,DC=com

{"filter": "(&(objectCategory=person)(objectclass=user)(memberOf=CN=RC_Users,OU=Services,DC=domain,DC=com)(|(mail=#{username})(sAMAccountName=#{username})))", "scope": "sub", "userDN": "rocket.service@domain.com", "password": "password"}

@tpetrosy

RC doesn't let me to enter to system with wrong credentials.
It just create TCP session with LDAP server and keeps this session established, it creates new TCP session for every login attempts.

@tpetrosy

Yes you are right I use email as user login parameter and authentication works fine

@Megatronic79

ok so the issue right now for you is that after 15 minutes the session is dropped for all connected users?

@Megatronic79

and there appears to be no handling of the closure an ldap query?

@tpetrosy

I see here 2 issues.

  1. RC creates new TCP sessions with ldap server and keep session open for every login attempt.
  2. As soon as LDAP server close one of established connection with RC, the RC crashes and do restart.
@Megatronic79

Are you using LDAPS? or LDAP?

@tpetrosy

I am using LDAP

@tpetrosy

One difference is I made manual installation inside self created Docker container. I am not sure if this can be the reason.

@Megatronic79

Strange, I can only see an initial 389 connection to the DC, if I kill it the one session will reconnect but all other sessions are still up.

@Megatronic79

I'll try and few things see if I can reproduce it

@tpetrosy

Thanks a lot!

@Megatronic79

Ok I can confirm the non closure of ldap connections even on failed logon.

netstat -antup | grep 389

increases over time for the same process if I enter incorrect pw over and over. - so we need to check the clean-up on connections there.

I'm still not able to reproduce the session closure for all users thou, I can of course manually kill -9 the connection and this will force all clients to terminate and reconnect within a few seconds but I'm not clear why the server side is resetting this connection? - do you have any throttling or limits set on the server side? how many connections are active before it resets?

@tpetrosy

Ok what I was able to get from AD.
In ldap connection policy we have parameter
MaxConnIdleTime 900
I think this cause session closure.
Anyway I thing RC must create and clean TCP connection after authentication, because after authentication it will not use it any more, or create some connections and use this connections only for new authentication requests.

Now how to reproduce connection closure. Killing process will not give you what you are looking for.

  1. Login into RC and keep this open
  2. Use command on your RC linux host
    tcpkill -i eth0 host "ldapserverIP"
  3. Than open new browser and try to login.

tcpkill will detect TCP activity from RC to AD and will send RST to both of hosts. This will force hosts to close TCP session immediately.

This will crash RC and you will see that both of clients loose connection with server.
After few seconds server will recover itself (you can see that process PID was changed) and after clients will recover connection with server.

@tpetrosy

About amount of active sessions, it doesn't mater I happens even with 1 session.

@Megatronic79

Ok then we need to be looking at why your server is terminating the connection? Sure we can handle this better with ldap but I cannot see the behaviour you are seeing with any of the deployments with AD.

Anyone else able to reproduce this?

@Megatronic79

I'll create a fresh install this weekend to see if I get the same results.

@tpetrosy

As I said
On AD the ldap connection policy is
MaxConnIdleTime 900
this makes connection to be closed.
The problem is not, why server close the connection, the problem is why RC crashes because of that.
The connection closure reason can be (network problem, restart of AD, etc. )
Why all RC restarts because of that :)

@Megatronic79

I also have the same default timeouts but no crashes.. that's why I said I will create a fresh one and see if I can reproduce it.

I just jumped onto the dev domain, logged on as user X on RC - then completely disconnected the Virtual DC from the domain just after logon (to simulate network problem, restart of AD etc..) and the user did not disconnect nor did RC crash - of course no more users can logon at that stage (as expected) but I do not see any crashes when a DC is removed from RC.

@Megatronic79

if I use tcpkill -i eth0 host "ldapserverIP" and let it listen and kill the connection as it comes in then yep I can confirm RC crashes in that instance, so that defo needs further investigation and should be marked as a BUG. @rodrigok is the best to comment on the ldap code.

@Megatronic79

Related to auto-connect on socket error (seems undocumented but possible to handle it gracefully?)

mcavage/node-ldapjs#318

@engelgabriel engelgabriel changed the title from LDAP authentication with active directory to LDAP authentication with active directory [$10] Dec 3, 2015
@engelgabriel engelgabriel added the bounty label Dec 3, 2015
@switchdk
switchdk commented Dec 9, 2015

I can confirm the exact same problem using the latest RC build using Docker and non-Docker deployment connecting to AD with LDAP and LDAPS. After 15 minutes, the connection is reset and RC crashes with error in OP.

@switchdk
switchdk commented Dec 9, 2015

Additionally with reference to these settings mentioned above:

  In this case:

  Bind Account (proxy user) = rocket.service@domain.com
  Proxy user password = password
  Domain = domain.com
  Group = RC_Users,OU=Services,DC=domain,DC=com
  {"filter": "(&(objectCategory=person)(objectclass=user)(memberOf=CN=RC_Users,OU=Services,DC=domain,DC=com)(|(mail=#{username})(sAMAccountName=#{username})))", "scope": "sub", "userDN": "rocket.service@domain.com", "password": "password"}

I can't find those settings in RC. I have attached a screenshot of the default LDAP config screen.

screenshot

@Megatronic79

examples are here:

https://github.com/RocketChat/Rocket.Chat/wiki/LDAP-Authentication

But from the examples I sent before:

The bind search would then be:

{"filter": "(&(objectCategory=person)(objectclass=user)(memberOf=CN=RC_Users,OU=Services,DC=domain,DC=com)(|(mail=#{username})(sAMAccountName=#{username})))", "scope": "sub", "userDN": "rocket.service@domain.com", "password": "password"}

Distinguished name:

dc=domain,dc=com

If you replace those parts with yours it would work.

So are you able to log on with LDAP?

We have identified a bug with the LDAP that when a Reset is sent to the RC server in some instances it crashes RC - ive not observed it myself in production but many have. (we can simulate it thou if we force a result with tcpkill as shown above)

What version of AD are you running?

@Megatronic79

I've just tested this with a 2008 R2 Server and can reproduce the ECONNRESET after the timeout period. This doesn't happen on 2003\2003R2 DC servers so there must be a difference in how idle connections are dealt with from the server side (i.e. sending a RST).

Obviously RC shouldn't crash when the LDAP server terminates the connection and I think this issue is down to how its dealt with at the client side - possible as mentioned here:

mcavage/node-ldapjs#318

@switchdk
switchdk commented Dec 9, 2015

Thanks for your reply. I have followed the wiki LDAP authentication but the settings on the Wiki and the settings you suggest, do not all exist in the current build of RC. I have however adapted the suggested settings to the currently available options in RC:

Bind Search: {"filter": "(&(objectCategory=person)(objectclass=user)(memberOf=CN=Domain Users,CN=Users,DC=mycorp,DC=com)(sAMAccountName=#{username}))", "scope": "sub", "userDN": "serviceaccount@mycorp.com", "password": "password"}
Distringuished Name (DN): dc=mycorp,dc=com
Enable LDAP: True
LDAP Port: 389
Sync Data: False
User Data Field Map: {"cn":"name", "mail":"email"}
LDAP URL: ldap://mydc.mycorp.com

I believe we are running Forest Level 2008R2. I hope this can help with troubleshooting.
Thanks

@Megatronic79

Oh I see, the wiki is probably missing the new stuff like Sync Data and User data field map.

The two entries I gave you was the ones that was missing from your screenshot as the rest you had filled in or was set as default.

Unfortunately I suspect using 2008R2 server will have the same RST issue and so expect the connection to drop at the timeout period of 15 minutes..

I will test 2012 DC today, currently openldap and 2003 DC servers seem to be ok.

@switchdk
switchdk commented Dec 9, 2015

Thank you for investigating.

@Megatronic79

Just confirmed this also applies to 2012R2 server :(

@miscs
miscs commented Dec 9, 2015

We also have server crashes using LDAP with sync activated. We do use an LDAP Server, no AD.
We are using latest docker images.

@Megatronic79

Using the sync in particular or LDAP in general?

Which LDAP server are you using?

@Megatronic79

and assuming you are able to logon with your LDAP credentials? - we have found if any of the bind search is wrong it will also cause RC to fall over and restart.

@miscs
miscs commented Dec 9, 2015

I don´t have tested with LDAP only. Currently we have sync activated. Our backend is an openldap server. In general we are able to login using ldap, but in the logs I found the following - maybe it helps:

Bind before search rocketchat_service@corp.xxx.com xxx
LDAP search dn DC=corp,DC=xxx,DC=com
LDAP search options { filter: '(&(objectCategory=person)(objectclass=user)(memberOf=CN=Employees,OU=Groups,DC=corp,DC=xxx,DC=com)(|(mail=xxx)(sAMAccountName=xxx)))',
scope: 'sub' }
Attempt to bind DC=corp,DC=xxx,DC=com

events.js:72
throw er; // Unhandled 'error' event
^
OperationsError: 00002020: Operation unavailable without authentication
at messageCallback (/app/bundle/programs/server/npm/rocketchat_ldap/node_modules/ldapjs/lib/client/client.js:1419:45)
at Parser.onMessage (/app/bundle/programs/server/npm/rocketchat_ldap/node_modules/ldapjs/lib/client/client.js:1089:14)
at Parser.emit (events.js:95:17)
at Parser.write (/app/bundle/programs/server/npm/rocketchat_ldap/node_modules/ldapjs/lib/messages/parser.js:117:8)
at Socket.onData (/app/bundle/programs/server/npm/rocketchat_ldap/node_modules/ldapjs/lib/client/client.js:1076:22)
at Socket.emit (events.js:95:17)
at Socket. (stream_readable.js:765:14)
at Socket.emit (events.js:92:17)
at emitReadable
(_stream_readable.js:427:10)
at emitReadable (_stream_readable.js:423:5)

@Megatronic79

a quick glance, are you sure sAMAccountName is correct for openldap and not uid?

Also memberOf is not natively part of OpenLDAP?

@miscs
miscs commented Dec 9, 2015

I am sorry, we use samba4 for authentication. I just checked with our admin.
the authentication it´s generally working - we all use LDAP to login to rocket chat and many other systems with the same settings.
I just tested a few things:
I can use my "uid" to login successful ("miscs".)
But when I try to login with miscs@xxx.com the server will crash.

Hope that helps.

@Megatronic79

what is the samba server saying about the incoming authentication requests?

@miscs
miscs commented Dec 10, 2015

you where right!
when I authenticate with uid ("miscs") samba is able to map my user
[2015/12/10 09:30:45.634391, 3] ../source4/auth/ntlm/auth.c:270(auth_check_password_send)
auth_check_password_send: Checking password for unmapped user [XXX][miscs]@[(null)]
auth_check_password_send: mapped user is: [XXX][miscs]@[(null)]

when I authenticate with miscs@xxx.com samba is not able to map my user -> rocket crashes.
[2015/12/10 09:27:27.409133, 3] ../source4/auth/ntlm/auth.c:270(auth_check_password_send)
auth_check_password_send: Checking password for unmapped user [XXX][]@[(null)]
auth_check_password_send: mapped user is: [XXX][]@[(null)]

@Megatronic79

Thanks for the update miscs.

@Sing-Li @rodrigok - I'm guessing based on the issues here that in our implementation of the LDAP we are not catching any errors and RC then crashes when one is thrown? (thrown when config is incorrect, thrown when a RST is sent back from the ldap server?)

@rodrigok
Member

@Megatronic79 maybe, how can we test this? We don't have any experience with LDAP and we don't have any LDAP server/service.

@Megatronic79

Hey @rodrigok - I can put a 2008R2 Server online to test against like we did before if that will help?

@rodrigok
Member

@Megatronic79 great, sure will help

@Megatronic79

@rodrigok - No probs, give me a little bit and I'll spin one up and PM you the details.

@rodrigok
Member

Hei guys, I solved a lot of errors in LDAP that was breaking the application, can you test again?

Thanks @Megatronic79 by the LDAP server to test.

@Megatronic79

Thanks @rodrigok - just tested on 2008 R2 and this has now been resolved!

Aside form the 15 mins timeout we also used the tcpkill method and RC stays up when the connection is forced.

Looks good to me, lets wait to hear back form the others @miscs @switchdk @tpetrosy @srevereault

@miscs
miscs commented Dec 17, 2015

I just pulled docker (rocketchat/rocket.chat:latest)... but still have the server crashing when logging in with email. I think docker may not include the latest LDAP fixes yet... ?

@Megatronic79

Hey @miscs - I'm no sure on that one sorry but I suspect you assumption is correct, I tested the development branch earlier which included the fixes, maybe @Sing-Li can answer when the docker will reflect the latest dev.

@miscs
miscs commented Dec 18, 2015

just pulled again - no changes, so the server is still crashing.
@rodrigok Do you have a fixed IP? If so I could open our LDAP Server and give you a test-user to check, if you want.

@Megatronic79

What did you just pull? the develop branch?

The fix was tested against a 2008R2 Server and we used the tcpkill connection to simulate the termination of LDAP connections - all passed and the dev machine is still running without crash.

are you 100% sure you have used the develop branch to test the fix?

@Megatronic79

Looking at the branches this fix is probably only in the develop

https://github.com/RocketChat/Rocket.Chat/branches

@miscs
miscs commented Dec 18, 2015

This is my pull output, I think ist should be the develop branch (latest):

Pulling rocketchat (rocketchat/rocket.chat:latest)...
latest: Pulling from rocketchat/rocket.chat

9ee13ca3b908: Already exists
23cb15b0fcec: Already exists
5e5f21412e19: Already exists
df82ac64861d: Already exists
a06b823c6a02: Already exists
29539139bf9d: Already exists
ae5d45c31c7e: Already exists
37d7af71d35e: Already exists
f0ae772097e8: Already exists
38c02af29fa3: Already exists
6280b1f9cfc9: Already exists
26d1cedc055d: Already exists
1a25d8de65ae: Already exists
5a1f4d7b15af: Already exists
3c5b375bdb27: Already exists
439f87fa7c5f: Already exists
6ccc5ce664fd: Already exists
61c91ffda88c: Already exists
319307b83e41: Already exists
5f048411a2fd: Already exists
ee5e683c4cd6: Already exists
b658beb8b973: Already exists
Digest: sha256:fc4b5962998acd16e3079f9c1804416c26c9999b06477e6d2b8df0420eabd9c2
Status: Image is up to date for rocketchat/rocket.chat:latest

I also restarted the server with a complete reboot...

@engelgabriel
Member

rocketchat/rocket.chat:latest = master

@miscs
miscs commented Dec 18, 2015

ouch. I am sorry for the trouble :(

@engelgabriel
Member

Hahaha... don't need to be sorry! :) I've just replied in a hurry...

@miscs
miscs commented Dec 18, 2015

I just changed the image in my docker-compose file to rocketchat/rocket.chat:develop and pulled again. I hope that´s all whats needed (no docker knowledge). But after restart I still have the server crashing in the above scenario.
A second pull shows I have the latest rocketchat/rocket.chat:develop):

Pulling rocketchat (rocketchat/rocket.chat:develop)...
develop: Pulling from rocketchat/rocket.chat
9ee13ca3b908: Already exists
23cb15b0fcec: Already exists
5e5f21412e19: Already exists
df82ac64861d: Already exists
a06b823c6a02: Already exists
29539139bf9d: Already exists
ae5d45c31c7e: Already exists
37d7af71d35e: Already exists
f0ae772097e8: Already exists
38c02af29fa3: Already exists
9ad394fa0dfc: Already exists
ad415e4f530d: Already exists
f8c7d6e82a92: Already exists
5408020e33cd: Already exists
dc23623ac4a3: Already exists
81146fc8c65d: Already exists
7f658a6353aa: Already exists
fa7832acc85c: Already exists
075da9e1a334: Already exists
a16feb8d7c63: Already exists
0c2720e76a5d: Already exists
7d05b47b4616: Already exists
Digest: sha256:1d4bb19b39e3e060acefe7b1daa22ce8a82fbd2a5664037bd22a1710c2fef1fb
Status: Image is up to date for rocketchat/rocket.chat:develop

Do I need to change something else? Thanks!

@sampaiodiego
Member

in order to use the new image you have to delete your current container and start a new one.

@tpetrosy

Hello guys, thanks for fast solution. I just built develop branch, I confirm that crash problem was solved.
When AD drops connection I receive Client Error: { '0': { [Error: read ECONNRESET] code: 'ECONNRESET', errno: 'ECONNRESET', syscall: 'read' } } message in logs, but daemon stays alive.
Also I see that we still have connection cleaning issue there, which itself I think is not less important vulnerability for system.
Why after successful or unsuccessful login attempts the system doesn't close connection with LDAP server ?
Every LDAP server has maximum number of connection limit. So it can be possible to create lot of connections with ldap server and block all functionality of LDAP service.

@Sing-Li
Member
Sing-Li commented Dec 19, 2015

Which commit is this fixed on? I'm fascinated with what the original 'problem' might be. Thx.

@tpetrosy

I think this is the commit for that fix.
513034b

@Megatronic79

Checking the LDAP clean-up, the idle sessions are being closed after about 15 mins (which in the case of RC is all the authentication requests) - I guess if enough auth attempts was made (failure or not) it could cause a build up within the "clean up" threshold to cause any further ldap requests to fail until the idles are closed.

@rodrigok can we look at closing the connection straight after the ldap attempt? I know you tried it in the original fix but was getting an 535X stating the connection was already closed, maybe something to look at further there?

We may need to check with the coder who did the sync component, when this is run and how its handling the connections (or if its part of the same auth request)

@tpetrosy

Pay attention that 15 min idle session closings are made by AD side, not by RC. anyway I don't know if there is a reason to not close sessions after getting authentication results.

@Megatronic79

I'm paying attention thanks, I'm aware that AD closes the session and not RC (this is what it has been doing the whole time) - I didn't say RC did this function, the end result is that the session is closed on both server and client after the idle time - if you read the reply I asked @rodrigok to look at the possibility to close the ldap straight after auth for as long as it doesn't affect the data sync for the other attributes.

@tpetrosy

Thanks Megatronic79, I just try to give detailed description for case.

@miscs
miscs commented Dec 21, 2015

I have now updated to "version": "0.10.0", which I confirmed using /api/info.

But unfortunately the server is still crashing when I try to log in with miscs@xxx.com and samba cannot bind a user. login with only miscs still works....

@Megatronic79

Hi @miscs is this a brand new deployment replacing the older one?

What error are you getting in the logs for RC when it crashes? and at the same time what does your samba\ldap server report in its logs?

When you say logging in with just miscs works is this a local account or a domain account? if domain are you saying you can authenticate ok for as long as the @xxxx.com is not present? if it is then it crashes RC?

Can you please show us your LDAP search options? (assuming you tested this with Apache directory studio or equiv and it comes back ok?)

@miscs
miscs commented Dec 21, 2015

Hi,

I use only LDAP-Users for testing. My miscs LDAP entry has miscs@xxx.com as mail and miscs as sAMAccountName. So I use this LDAP Query to enable login with miscs OR miscs@xxx.com for my user. As I said using miscs works but miscs@xxx.com crahses RC.

{"filter": "(&(objectCategory=person)(objectclass=user)(memberOf=CN=Employees,OU=Groups,DC=corp,DC=xxx,DC=com)(|(mail=#{username})(sAMAccountName=#{username})))", "scope": "sub", "userDN": "rocketchat_service@corp.xxx.com", "password": "mypass"}

Rocket Log output for login with miscs@xxx.com is the same as in my comment above:

Bind before search rocketchat_service@corp.xxx.com mypass
LDAP search dn DC=corp,DC=xxx,DC=com
LDAP search options { filter: '(&(objectCategory=person)(objectclass=user)(memberOf=CN=Employees,OU=Groups,DC=corp,DC=xxx,DC=com)(|(mail=miscs@xxx.com)(sAMAccountName=miscs@xxx.com)))',
  scope: 'sub' }
Attempt to bind DC=corp,DC=xxx,DC=com

events.js:72
        throw er; // Unhandled 'error' event
              ^
OperationsError: 00002020: Operation unavailable without authentication
    at messageCallback (/app/bundle/programs/server/npm/rocketchat_ldap/node_modules/ldapjs/lib/client/client.js:1419:45)
    at Parser.onMessage (/app/bundle/programs/server/npm/rocketchat_ldap/node_modules/ldapjs/lib/client/client.js:1089:14)
    at Parser.emit (events.js:95:17)
    at Parser.write (/app/bundle/programs/server/npm/rocketchat_ldap/node_modules/ldapjs/lib/messages/parser.js:117:8)
    at Socket.onData (/app/bundle/programs/server/npm/rocketchat_ldap/node_modules/ldapjs/lib/client/client.js:1076:22)
    at Socket.emit (events.js:95:17)
    at Socket.<anonymous> (_stream_readable.js:765:14)
    at Socket.emit (events.js:92:17)
    at emitReadable_ (_stream_readable.js:427:10)
    at emitReadable (_stream_readable.js:423:5)

But even if I simplify my LDAP search to

{"filter": "(&(objectCategory=person)(objectclass=user)(memberOf=CN=Employees,OU=Groups,DC=corp,DC=xxx,DC=com)(sAMAccountName=#{username}))", "scope": "sub", "userDN": "rocketchat_service@corp.xxx.com", "password": "mypass"}

RC crahses with the same error when I try to login with miscs@xxx.com. But in that case it should return "User not found".
Samba Log is always the same as in my comment above:

[2015/12/21 21:19:22.651375,  3] ../source4/auth/ntlm/auth.c:270(auth_check_password_send)
  auth_check_password_send: Checking password for unmapped user [xxx]\[rocketchat_service]@[(null)]
  auth_check_password_send: mapped user is: [xxx]\[rocketchat_service]@[(null)]
[2015/12/21 21:19:22.683116,  3] ../source4/auth/ntlm/auth.c:270(auth_check_password_send)
  auth_check_password_send: Checking password for unmapped user [xxx]\[]@[(null)]
  auth_check_password_send: mapped user is: [xxx]\[]@[(null)]

Using miscs works fine (using LDAP and it´s even the same LDAP user). So I think RC has a problem if LDAP can´t map any user?

@miscs
miscs commented Dec 21, 2015

That seems to be the problem. If i use "ANonExistingUsername" RC always crashes.
So any Login with a Username non existing in LDAP brings RC down :(

@Megatronic79

hi @miscs - i think your LDAP queries may not be quite right, from the looks of this you are using openldap but you are also using the attribute sAMAccountName=#{username}

sAMAccountName is Active Directory specific.

Shouldn't you be using the following?

uid=#{username}

making your simplified ldap query this:

{"filter": "(&(objectCategory=person)(objectclass=user)(memberOf=CN=Employees,OU=Groups,DC=corp,DC=xxx,DC=com)(uid=#{username}))", "scope": "sub", "userDN": "rocketchat_service@corp.xxx.com", "password": "mypass"}

also, just a note, openldap doesn't natively use the overlay "memberOf" - so did you add this yeah?

easiest confirmation is to use apache directory studio to confirm the queries.

@miscs
miscs commented Dec 21, 2015

The LDAP queries are working. We use Samba4 as backend which supports sAMAccountName=#{username} - if the query would not work I would not be able to login using only "miscs".

I use these queries in a lot of other applications and they are working everywhere just fine :)

Even in RC they are working as long as I use an existing LDAP user. The crash only occurs if I use a username which is not in LDAP and therefore cannot be mapped...

@Megatronic79

hmm, ok, what is Samba4 sending back to RC as a response? can you inspect the traffic and post back?

Ive just tested the latest 0.10 and caused a failed lookup and an incorrect config just results in the correct LDAP response of user not found prompting the following:

failed

Guess we need to know a bit more about how samba is responding to find out why RC crashes.

@engelgabriel
Member

You now have a new setting on the Admin -> General panel

image

@miscs
miscs commented Dec 21, 2015

with debug-level all I get

[methods] UserPresence:online -> userId: null , arguments:  {}
Bind before search rocketchat_service@corp.xxx.com mypass
LDAP search dn DC=corp,DC=xxx,DC=com
LDAP search options { filter: '(&(objectCategory=person)(objectclass=user)(memberOf=CN=Employees,OU=Groups,DC=corp,DC=xxx,DC=com)(sAMAccountName=checked))',
  scope: 'sub' }
Attempt to bind DC=corp,DC=xxx,DC=com

events.js:72
        throw er; // Unhandled 'error' event
              ^
OperationsError: 00002020: Operation unavailable without authentication
    at messageCallback (/app/bundle/programs/server/npm/rocketchat_ldap/node_modules/ldapjs/lib/client/client.js:1419:45)
    at Parser.onMessage (/app/bundle/programs/server/npm/rocketchat_ldap/node_modules/ldapjs/lib/client/client.js:1089:14)
    at Parser.emit (events.js:95:17)
    at Parser.write (/app/bundle/programs/server/npm/rocketchat_ldap/node_modules/ldapjs/lib/messages/parser.js:117:8)
    at Socket.onData (/app/bundle/programs/server/npm/rocketchat_ldap/node_modules/ldapjs/lib/client/client.js:1076:22)
    at Socket.emit (events.js:95:17)
    at Socket.<anonymous> (_stream_readable.js:765:14)
    at Socket.emit (events.js:92:17)
    at emitReadable_ (_stream_readable.js:427:10)
    at emitReadable (_stream_readable.js:423:5)

So if I am reading the log the right way I have no user (null) which is correct since I used a non existing login. But then [methods] UserPresence:online -> userId: null , arguments: {} is called which won´t work with userId: null ...

@miscs
miscs commented Dec 21, 2015

to intercept the samba4/ldap response I have to wait for our admin :( sorry. but maybe we can log the response using RC?

@Megatronic79

i'll spin up a samba4 ldap tomo when I get to the office tomo and see if I can help ya out.

@miscs
miscs commented Dec 22, 2015

thank you for all your help to figure this out!
but if it is too much hassle, I´ll do it with my admin next year (he is already in holidays)...

@miscs
miscs commented Jan 4, 2016

Happy new year to all of you!

We just testet a little bit more and checked responses with wireshark. What we see is the following:

Samba4 Backend, Using LDAP

LogIn with existing User (Success)

  • Bind with service user : Success
  • Search for User : Success -> 1 Result
  • Bind with User-DN from Search-Result : Success
  • Fetch Attributes for User

LogIn with non-existing User (Server-Crash)

  • Bind with service user : Success
  • Search for User : Success -> 0 Result
  • Bind with Base-DN from Search-Result
  • Fetch Attributes for User -> "OperationsError: 00002020: Operation unavailable without authentication"

What RC should do is test is if the result set for User-Search is exactly 1 - if not throw error User not found and abort login.

Hope this helps :)

@engelgabriel
Member

@miscs thanks for the extra info. Hopefully that will help @rodrigok :)

@Megatronic79

great @miscs - strange that it behaves like this for samba\ldap but not for openldap and active directory

anyways - @rodrigok let me know if you need samba\ldap backend to test against.

@rodrigok
Member
rodrigok commented Jan 4, 2016

@Megatronic79 yes, I need :)

I'm trying to simulate this with some online "demo" server but without success.

@Megatronic79

sure thing :) , will get this online when I get to the office in the morning and will pm the details to ya.

@miscs
miscs commented Jan 4, 2016

@rodrigok if you have an static IP I can also open our Samba4/LDAP server for that IP. Maybe thats the easiest?

@Megatronic79

cool.

btw @miscs what kind of setup is your samba4\ldap?

@rodrigok
Member
rodrigok commented Jan 4, 2016

@Megatronic79 thanks

@miscs I do not have an static IP, I can inform you my dynamic IP that can work for some time

@hameno
hameno commented Jan 5, 2016

If I read the code at https://github.com/RocketChat/Rocket.Chat/blob/develop/packages/rocketchat-ldap/ldap_server.js#L158 correctly, I cannot see any check, if the returned search result is not empty. Could that be it?

@miscs
miscs commented Jan 5, 2016

I just modified the from @hameno mentioned code a bit (inside my docker image) and the server won't crash any more :)

client.search(options.ldapOptions.dn, opts, function(err, res) {
                                        if (err) {
                                                console.log('LDAP: Search Error', err);
                                                return ldapAsyncFut.return({
                                                        error: err
                                                });
                                        }
                                        var res_count = 0;
                                        var dn = self.options.dn;
                                        res.on('searchEntry', function(entry) {
                                                res_count = res_count + 1;
                                                dn = entry.object.dn;
                                        });
                                        res.on('error', function(err) {
                                                console.log('LDAP: Search on Error', err);
                                                ldapAsyncFut.return({
                                                        error: err
                                                });
                                        });
                                        res.on('end', function(result) {
                                                if (res_count == 1) {
                                                        bind(dn);
                                                } 
                                                   //else {
                                                    //    var err = new Error('User not Found');
                                                    //    ldapAsyncFut.return({
                                                    //            error: err
                                                    //    });
                                                }
                                        });

But my changes are very dirty and additionally I am too stupid to return a correct LDAP error in case res_count != 1 (this is why the else flow is commented out).
Without the else-flow the server stays UP but of course hangs during login and the site needs to be refreshed :(

@Megatronic79

Great!

Maybe @rodrigok can clean that code up for you and pass the correct LDAP value back but looks like you found the issue. :)

@rodrigok rodrigok added a commit that closed this issue Jan 5, 2016
@rodrigok rodrigok Fix #1491 via code from @miscs 2541885
@rodrigok rodrigok closed this in 2541885 Jan 5, 2016
@rodrigok
Member
rodrigok commented Jan 5, 2016

@miscs I implemented your code.

As far as I know the error returned is not relevant, will just cancel de login.

@miscs
miscs commented Jan 6, 2016

cool. thanks a lot!!!
any chance to get this pushed to master soon? ;)

@engelgabriel
Member

We push changes to master on mondays, is that ok?

@miscs
miscs commented Jan 6, 2016

of course, thanks for the feedback!

@miscs
miscs commented Jan 8, 2016

I just pulled the latest develop docker image and everything works fine!!!
Many thanks to all, especially @Megatronic79 @rodrigok and @engelgabriel !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment