Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define option for toggling implicit decoding / encoding #25

Closed

Conversation

datakurre
Copy link
Contributor

@datakurre datakurre commented Nov 4, 2016

Needs

  • review
  • some tests with unicode flag disabled
  • benchark / profiling example with on and off

Fixes #16

@coveralls
Copy link

Coverage Status

Coverage decreased (-2.4%) to 95.664% when pulling 4faf13f on datakurre:datakurre-use-unicode-option into 570e5ec on bluedynamics:master.

@datakurre datakurre force-pushed the datakurre-use-unicode-option branch 2 times, most recently from 57259c2 to b277183 Compare November 4, 2016 21:49
@coveralls
Copy link

Coverage Status

Coverage decreased (-0.003%) to 98.059% when pulling b277183 on datakurre:datakurre-use-unicode-option into 570e5ec on bluedynamics:master.

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.003%) to 98.059% when pulling 344147d on datakurre:datakurre-use-unicode-option into 570e5ec on bluedynamics:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.05%) to 98.115% when pulling 0a2c548 on datakurre:datakurre-use-unicode-option into 570e5ec on bluedynamics:master.

@datakurre
Copy link
Contributor Author

datakurre commented Nov 5, 2016

Before

control panel 1

After with use_unicode = False

control panel

Not as dramatic change as it was in my first round after I had normalized the search queries (which I haven't done yet in these upstream pulls), but still significant.

The difference from my original issue #16 is probably that because of normalization of attrlist in queries, even simple queries need to decode the same large result blob from cached as the bigger queries. So, once I manage to also add normalization as an option, it will probably slow down node.ext.ldap when enabled without disabling implicit unicode deoding.

Copy link
Member

@jensens jensens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall @rnixx and I both think that repeating of complete code blocks via copy paste is not an option.
I would see this as a first draft, but not as a solution we want to merge.

Overall we gain 1.75% performance according to your profiling. This might be more in a scenario with many attributes and so is a good feature we would like to merge if the repeated/ copy-pasted code style is refactored.

@@ -207,8 +212,7 @@ def __init__(self, name=None, props=None):
@finalize
def __getitem__(self, key):
# nodes are created for keys, if they do not already exist in memory
if isinstance(key, str):
key = decode(key)
key = decodes(key) if self.root._use_unicode else key
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of code is repeating and blows up several lines.

In order to follow a DRY paradigm, I propose to pass the current node to decodes and encodes and handle the if in there.

key = explode_dn(dn)[0]
# do not yield if node is supposed to be deleted
if key not in self._deleted_children:
yield key
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here copy pasting the whole logic section is not ok. When passing the current node to decode this is not needed any more. This may need some millisecs more, but I doubt it has a real impact.

if self.root._use_unicode:
dn = self.DN.encode('ascii', 'replace') or '(dn not set)'
else:
dn = decodes(self.DN).encode('ascii', 'replace') or '(dn not set)' # noqa
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

if get_nodes:
res.append(self.node_by_dn(dn, strict=True))
else:
res.append(dn)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this block it is a good example why the repeating of code is bad.

@jensens
Copy link
Member

jensens commented Nov 7, 2016

Further ideas to improve:

  • on nodes __setitem__ copy the _use_unicode from root/parent to the node. A bool should not waste much RAM, but it saves a lot traversing up the tree. Dependent on tree depth this should reduce costs to get the value.

  • pass node over and handle condition centrally

    def decodes(value, node):
        if not node._use_unicode:
            return value
        .....
    

@datakurre
Copy link
Contributor Author

datakurre commented Nov 7, 2016

Thanks for the review. I postpone this until I manage to make another pull with query normalizations (may take a couple of weeks).

For our current version, decode removal was the last optimization I did, so I was surprised, how much my normalizations added decode calls (compared to only 1,75% here). So, I need to do the other optimizations properly at first so that I can compare the version without any decode the version with _use_unicode-checking decode.

The version without decoding at will remains the prettiest one :)

@rnixx
Copy link
Member

rnixx commented Dec 17, 2016

The version without decoding at will remains the prettiest one

Agreed. I also think we should remove auto encoding / decoding in the end, but since this is an API break this should be postponed to 1.1

@datakurre
Copy link
Contributor Author

datakurre commented Dec 17, 2016 via email

@rnixx
Copy link
Member

rnixx commented Apr 16, 2019

Hi. Since python-ldap in py 3 handles everything except attribute values as unicode out of the box, this PR is obsolete. For py 2 there's a compatibility mode (vontrolled by bytes_mode setting). Merge of #44 takes care of py3 and bytes_mode.

@rnixx rnixx closed this Apr 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Why everything is decoded to unicode instead of handling as bytestrings?
4 participants