How to properly optimize the amount of different LDAP queries required #18

datakurre · 2016-10-24T05:39:45Z

The most complex and controversial refactoring I had to make to get node.ext.ldap working with a large local (or just with any remote ldap) was to optimize the amount LDAP queries it does. My approach was to normalize searches so that there was only one different search for each object.

https://github.com/datakurre/node.ext.ldap/commits/master

Any ideas, how that should be done more properly? Mainly

I needed to store configured attribute mapping somewhere (search_attrlist) and enforce each query to use that (instead of only what the actual need was or instead of current "*" for attribute sheet)
Instead of looking up nodes with root base DN and queryFilter at first and then with direct DN, I required to store queryFilter (search_criteria) and enforce use it instead of direct DN (slower LDAP query as direct DN as such, but because search was always made anyway, the search was already available in cache)

rnixx · 2016-10-24T06:47:53Z

The approach for normalizing the query looks good to me.

Currently there are 3 queries for LDAPNode objects. In LDAPNode.__iter__, LDAPNode.__getitem__ and LDAPNodeAttributes.load

Maybe it would make sense to use the normalized query in __getitem__, store the search result on the node and refactor LDAPNodeAttributes to avoid querying the directory again (the node is accessible via self.parent). This would reduce the searches significantly in the first place.

I'm not really convinced yet whether I like the use of the normalized query on __iter__. How often is __iter__ used in the context of pas.plugins.ldap?

datakurre · 2016-10-25T18:26:01Z

Thanks.

I should be able to return on this next week. Probably iter could be refactored out.

On 24. lokakuuta 2016 klo 9.50 +0300, Robert Niederreiter notifications@github.com, wrote:

The approach for normalizing the query looks good to me.

Currently there are 3 queries for LDAPNode objects. In LDAPNode.iter, LDAPNode.getitem and LDAPNodeAttributes.load

Maybe it would make sense to use the normalized query in getitem, store the search result on the node and refactor LDAPNodeAttributes to avoid querying the directory again (the node is accessible via self.parent). This would reduce the searches significantly in the first place.

I'm not really convinced yet whether I like the use of the normalized query on iter. How often is iter used in the context of pas.plugins.ldap?

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub (#18 (comment)), or mute the thread (https://github.com/notifications/unsubscribe-auth/AAJyv6OeeZt3EGp0fbTgPvdpVWIGMLVJks5q3FSZgaJpZM4Keab9).

rnixx · 2016-10-28T06:56:50Z

After some rethinking I'm of the mind we need to be able to configure whether the complete entries should already be fetched at __iter__ time or not, so we can experiment with the different behaviours a little bit referring speed, memory footprint and resulting subsequent queries.

The normalized query in __iter__ as implemented in your branch is only effective in conjunction with caching, because subsequent queries happen anyway when accessing the nodes. So we have this options:

Add a dedicated config flag whether to normalize queries on __iter__.
Normalize the query on __iter__ only if caching is enabled (Currently my preferred variant).
Remember the search result on __iter__ to avoid subsequent queries already on __getitem__. But this actually is just another form of caching, and may be a downside in memory consumption.

Nevertheless the separate query on accessing node attributes can be avoided entirely by fetching all attributes of interest in __getitem__, remembering the result on the node object and use it in LDAPNodeAttributes.

jensens · 2016-10-28T07:10:15Z

The base idea to keep __iter__ queries simple and it answers small is because if one has 10000 users and iterates over it to get to a batch from 9000 - 9999 we would need to fetch all 9099 entries complete which is expensive.

Also the tree in memory gets larger than needed.

To optimize this we would have to implement slicing and then iterate over the slice, so we can write users[9000:9099] and then we can iterate low-cost to 8999 (because LDAP/pyhon-LDAP does not support slicing/batch-request) and then fetch full entries to avoid query each twice.

The most difficult part is to change PAS to use it in an efficient way.

datakurre · 2016-10-28T07:13:49Z

Agreed. I should be able to return to this in the middle of next week.

On 28. lokakuuta 2016 klo 10.10 +0300, Jens W. Klein notifications@github.com, wrote:

The base idea to keep iter queries simple and it answers small is because if one has 10000 users and iterates over it to get to a batch from 9000 - 9999 (tel:9000%20-%209999) we would need to fetch all 9099 entries complete which is expensive.

Also the tree in memory gets larger than needed.

To optimize this we would have to implement slicing and then iterate over the slice, so we can write users[9000:9099] and then we can iterate low-cost to 8999 (because LDAP/pyhon-LDAP does not support slicing/batch-request) and then fetch full entries to avoid query each twice.

The most difficult part is to change PAS to use it in an efficient way.

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub (#18 (comment)), or mute the thread (https://github.com/notifications/unsubscribe-auth/AAJyv246GKG7W3KeEkHnM5ieWfvZsPvfks5q4Z_XgaJpZM4Keab9).

rnixx · 2019-04-16T09:42:20Z

closing this in favor of #37

rnixx added the enhancement label Oct 25, 2016

rnixx closed this as completed Apr 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to properly optimize the amount of different LDAP queries required #18

How to properly optimize the amount of different LDAP queries required #18

datakurre commented Oct 24, 2016

rnixx commented Oct 24, 2016

datakurre commented Oct 25, 2016

rnixx commented Oct 28, 2016

jensens commented Oct 28, 2016 •

edited

datakurre commented Oct 28, 2016

rnixx commented Apr 16, 2019

How to properly optimize the amount of different LDAP queries required #18

How to properly optimize the amount of different LDAP queries required #18

Comments

datakurre commented Oct 24, 2016

rnixx commented Oct 24, 2016

datakurre commented Oct 25, 2016

rnixx commented Oct 28, 2016

jensens commented Oct 28, 2016 • edited

datakurre commented Oct 28, 2016

rnixx commented Apr 16, 2019

jensens commented Oct 28, 2016 •

edited