Define option for toggling implicit decoding / encoding #25

datakurre · 2016-11-04T20:20:44Z

Needs

review
some tests with unicode flag disabled
benchark / profiling example with on and off

Fixes #16

coveralls · 2016-11-04T20:22:32Z

Coverage decreased (-2.4%) to 95.664% when pulling 4faf13f on datakurre:datakurre-use-unicode-option into 570e5ec on bluedynamics:master.

coveralls · 2016-11-04T21:51:52Z

Coverage decreased (-0.003%) to 98.059% when pulling b277183 on datakurre:datakurre-use-unicode-option into 570e5ec on bluedynamics:master.

coveralls · 2016-11-04T21:59:01Z

Coverage decreased (-0.003%) to 98.059% when pulling 344147d on datakurre:datakurre-use-unicode-option into 570e5ec on bluedynamics:master.

coveralls · 2016-11-04T22:04:14Z

Coverage increased (+0.05%) to 98.115% when pulling 0a2c548 on datakurre:datakurre-use-unicode-option into 570e5ec on bluedynamics:master.

datakurre · 2016-11-05T08:40:02Z

Before

After with use_unicode = False

Not as dramatic change as it was in my first round after I had normalized the search queries (which I haven't done yet in these upstream pulls), but still significant.

The difference from my original issue #16 is probably that because of normalization of attrlist in queries, even simple queries need to decode the same large result blob from cached as the bigger queries. So, once I manage to also add normalization as an option, it will probably slow down node.ext.ldap when enabled without disabling implicit unicode deoding.

jensens

Overall @rnixx and I both think that repeating of complete code blocks via copy paste is not an option.
I would see this as a first draft, but not as a solution we want to merge.

Overall we gain 1.75% performance according to your profiling. This might be more in a scenario with many attributes and so is a good feature we would like to merge if the repeated/ copy-pasted code style is refactored.

jensens · 2016-11-07T08:11:31Z

src/node/ext/ldap/_node.py

@@ -207,8 +212,7 @@ def __init__(self, name=None, props=None):
    @finalize
    def __getitem__(self, key):
        # nodes are created for keys, if they do not already exist in memory
-        if isinstance(key, str):
-            key = decode(key)
+        key = decodes(key) if self.root._use_unicode else key


This kind of code is repeating and blows up several lines.

In order to follow a DRY paradigm, I propose to pass the current node to decodes and encodes and handle the if in there.

jensens · 2016-11-07T08:13:58Z

src/node/ext/ldap/_node.py

+                    key = explode_dn(dn)[0]
+                    # do not yield if node is supposed to be deleted
+                    if key not in self._deleted_children:
+                        yield key


here copy pasting the whole logic section is not ok. When passing the current node to decode this is not needed any more. This may need some millisecs more, but I doubt it has a real impact.

jensens · 2016-11-07T08:14:17Z

src/node/ext/ldap/_node.py

+        if self.root._use_unicode:
+            dn = self.DN.encode('ascii', 'replace') or '(dn not set)'
+        else:
+            dn = decodes(self.DN).encode('ascii', 'replace') or '(dn not set)'  # noqa


jensens · 2016-11-07T08:18:44Z

src/node/ext/ldap/_node.py

+                    if get_nodes:
+                        res.append(self.node_by_dn(dn, strict=True))
+                    else:
+                        res.append(dn)


this block it is a good example why the repeating of code is bad.

jensens · 2016-11-07T08:29:03Z

Further ideas to improve:

on nodes __setitem__ copy the _use_unicode from root/parent to the node. A bool should not waste much RAM, but it saves a lot traversing up the tree. Dependent on tree depth this should reduce costs to get the value.

pass node over and handle condition centrally

def decodes(value, node):
    if not node._use_unicode:
        return value
    .....

datakurre · 2016-11-07T08:47:41Z

Thanks for the review. I postpone this until I manage to make another pull with query normalizations (may take a couple of weeks).

For our current version, decode removal was the last optimization I did, so I was surprised, how much my normalizations added decode calls (compared to only 1,75% here). So, I need to do the other optimizations properly at first so that I can compare the version without any decode the version with _use_unicode-checking decode.

The version without decoding at will remains the prettiest one :)

rnixx · 2016-12-17T12:06:55Z

The version without decoding at will remains the prettiest one

Agreed. I also think we should remove auto encoding / decoding in the end, but since this is an API break this should be postponed to 1.1

datakurre · 2016-12-17T16:01:24Z

👍 So, I should definitely wait for that the. :) (And I would not be able to do this too soon anyways ;)

…

On 17. joulukuuta 2016 klo 14.07 +0200, Robert Niederreiter ***@***.***>, wrote: > > The version without decoding at will remains the prettiest one > Agreed. I also think we should remove auto encoding / decoding in the end, but since this is an API break this should be postponed to 1.1 — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub (#25 (comment)), or mute the thread (https://github.com/notifications/unsubscribe-auth/AAJyv3kMWsPyuDfhspVZSr9wXkZMtvhBks5rI9BfgaJpZM4Kp9AN).

rnixx · 2019-04-16T09:21:41Z

Hi. Since python-ldap in py 3 handles everything except attribute values as unicode out of the box, this PR is obsolete. For py 2 there's a compatibility mode (vontrolled by bytes_mode setting). Merge of #44 takes care of py3 and bytes_mode.

datakurre force-pushed the datakurre-use-unicode-option branch 2 times, most recently from 57259c2 to b277183 Compare November 4, 2016 21:49

datakurre force-pushed the datakurre-use-unicode-option branch from b277183 to 344147d Compare November 4, 2016 21:57

Define option for toggling implicit decoding / encoding

0a2c548

datakurre force-pushed the datakurre-use-unicode-option branch from 344147d to 0a2c548 Compare November 4, 2016 22:02

jensens added the enhancement label Nov 7, 2016

jensens requested changes Nov 7, 2016

View reviewed changes

rnixx closed this Apr 16, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define option for toggling implicit decoding / encoding #25

Define option for toggling implicit decoding / encoding #25

datakurre commented Nov 4, 2016 •

edited

Loading

coveralls commented Nov 4, 2016

coveralls commented Nov 4, 2016

coveralls commented Nov 4, 2016

coveralls commented Nov 4, 2016

datakurre commented Nov 5, 2016 •

edited

Loading

jensens left a comment

jensens Nov 7, 2016

jensens Nov 7, 2016

jensens Nov 7, 2016

jensens Nov 7, 2016

jensens commented Nov 7, 2016

datakurre commented Nov 7, 2016 •

edited

Loading

rnixx commented Dec 17, 2016

datakurre commented Dec 17, 2016 via email

rnixx commented Apr 16, 2019

Define option for toggling implicit decoding / encoding #25

Define option for toggling implicit decoding / encoding #25

Conversation

datakurre commented Nov 4, 2016 • edited Loading

coveralls commented Nov 4, 2016

coveralls commented Nov 4, 2016

coveralls commented Nov 4, 2016

coveralls commented Nov 4, 2016

datakurre commented Nov 5, 2016 • edited Loading

jensens left a comment

Choose a reason for hiding this comment

jensens Nov 7, 2016

Choose a reason for hiding this comment

jensens Nov 7, 2016

Choose a reason for hiding this comment

jensens Nov 7, 2016

Choose a reason for hiding this comment

jensens Nov 7, 2016

Choose a reason for hiding this comment

jensens commented Nov 7, 2016

datakurre commented Nov 7, 2016 • edited Loading

rnixx commented Dec 17, 2016

datakurre commented Dec 17, 2016 via email

rnixx commented Apr 16, 2019

datakurre commented Nov 4, 2016 •

edited

Loading

datakurre commented Nov 5, 2016 •

edited

Loading

datakurre commented Nov 7, 2016 •

edited

Loading