Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to properly optimize the amount of different LDAP queries required #18

Closed
datakurre opened this issue Oct 24, 2016 · 6 comments
Closed

Comments

@datakurre
Copy link
Contributor

The most complex and controversial refactoring I had to make to get node.ext.ldap working with a large local (or just with any remote ldap) was to optimize the amount LDAP queries it does. My approach was to normalize searches so that there was only one different search for each object.

https://github.com/datakurre/node.ext.ldap/commits/master

Any ideas, how that should be done more properly? Mainly

  • I needed to store configured attribute mapping somewhere (search_attrlist) and enforce each query to use that (instead of only what the actual need was or instead of current "*" for attribute sheet)
  • Instead of looking up nodes with root base DN and queryFilter at first and then with direct DN, I required to store queryFilter (search_criteria) and enforce use it instead of direct DN (slower LDAP query as direct DN as such, but because search was always made anyway, the search was already available in cache)
@rnixx
Copy link
Member

rnixx commented Oct 24, 2016

The approach for normalizing the query looks good to me.

Currently there are 3 queries for LDAPNode objects. In LDAPNode.__iter__, LDAPNode.__getitem__ and LDAPNodeAttributes.load

Maybe it would make sense to use the normalized query in __getitem__, store the search result on the node and refactor LDAPNodeAttributes to avoid querying the directory again (the node is accessible via self.parent). This would reduce the searches significantly in the first place.

I'm not really convinced yet whether I like the use of the normalized query on __iter__. How often is __iter__ used in the context of pas.plugins.ldap?

@datakurre
Copy link
Contributor Author

Thanks.

I should be able to return on this next week. Probably iter could be refactored out.

On 24. lokakuuta 2016 klo 9.50 +0300, Robert Niederreiter notifications@github.com, wrote:

The approach for normalizing the query looks good to me.

Currently there are 3 queries for LDAPNode objects. In LDAPNode.iter, LDAPNode.getitem and LDAPNodeAttributes.load

Maybe it would make sense to use the normalized query in getitem, store the search result on the node and refactor LDAPNodeAttributes to avoid querying the directory again (the node is accessible via self.parent). This would reduce the searches significantly in the first place.

I'm not really convinced yet whether I like the use of the normalized query on iter. How often is iter used in the context of pas.plugins.ldap?


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub (#18 (comment)), or mute the thread (https://github.com/notifications/unsubscribe-auth/AAJyv6OeeZt3EGp0fbTgPvdpVWIGMLVJks5q3FSZgaJpZM4Keab9).

@rnixx
Copy link
Member

rnixx commented Oct 28, 2016

After some rethinking I'm of the mind we need to be able to configure whether the complete entries should already be fetched at __iter__ time or not, so we can experiment with the different behaviours a little bit referring speed, memory footprint and resulting subsequent queries.

The normalized query in __iter__ as implemented in your branch is only effective in conjunction with caching, because subsequent queries happen anyway when accessing the nodes. So we have this options:

  • Add a dedicated config flag whether to normalize queries on __iter__.
  • Normalize the query on __iter__ only if caching is enabled (Currently my preferred variant).
  • Remember the search result on __iter__ to avoid subsequent queries already on __getitem__. But this actually is just another form of caching, and may be a downside in memory consumption.

Nevertheless the separate query on accessing node attributes can be avoided entirely by fetching all attributes of interest in __getitem__, remembering the result on the node object and use it in LDAPNodeAttributes.

@jensens
Copy link
Member

jensens commented Oct 28, 2016

The base idea to keep __iter__ queries simple and it answers small is because if one has 10000 users and iterates over it to get to a batch from 9000 - 9999 we would need to fetch all 9099 entries complete which is expensive.

Also the tree in memory gets larger than needed.

To optimize this we would have to implement slicing and then iterate over the slice, so we can write users[9000:9099] and then we can iterate low-cost to 8999 (because LDAP/pyhon-LDAP does not support slicing/batch-request) and then fetch full entries to avoid query each twice.

The most difficult part is to change PAS to use it in an efficient way.

@datakurre
Copy link
Contributor Author

Agreed. I should be able to return to this in the middle of next week.

On 28. lokakuuta 2016 klo 10.10 +0300, Jens W. Klein notifications@github.com, wrote:

The base idea to keep iter queries simple and it answers small is because if one has 10000 users and iterates over it to get to a batch from 9000 - 9999 (tel:9000%20-%209999) we would need to fetch all 9099 entries complete which is expensive.

Also the tree in memory gets larger than needed.

To optimize this we would have to implement slicing and then iterate over the slice, so we can write users[9000:9099] and then we can iterate low-cost to 8999 (because LDAP/pyhon-LDAP does not support slicing/batch-request) and then fetch full entries to avoid query each twice.

The most difficult part is to change PAS to use it in an efficient way.


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub (#18 (comment)), or mute the thread (https://github.com/notifications/unsubscribe-auth/AAJyv246GKG7W3KeEkHnM5ieWfvZsPvfks5q4Z_XgaJpZM4Keab9).

@rnixx
Copy link
Member

rnixx commented Apr 16, 2019

closing this in favor of #37

@rnixx rnixx closed this as completed Apr 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants