Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of Container class #869

Closed
michallowasrzechonek-silvair opened this issue Jun 26, 2020 · 15 comments
Closed

Performance of Container class #869

michallowasrzechonek-silvair opened this issue Jun 26, 2020 · 15 comments

Comments

@michallowasrzechonek-silvair
Copy link

michallowasrzechonek-silvair commented Jun 26, 2020

Hi,

Disclaimer: I'm using python 3.7.

I've noticed that Container class (esp. __set*__ methods) introduce a performance penalty. Initial profiling suggests it's about 2x:

578010 function calls (558956 primitive calls) in 0.272 seconds
vs
231299 function calls (212245 primitive calls) in 0.105 seconds

To get these results, I simplified the Container class like this (and updated usages in the code):

diff --git a/construct/lib/containers.py b/construct/lib/containers.py
index c3293dd..9a0470c 100644
--- a/construct/lib/containers.py
+++ b/construct/lib/containers.py
@@ -82,20 +82,12 @@ class Container(dict):
             text = u'utf8 decoded string...' (total 22)
             value = 123
     """
-    __slots__ = ["__keys_order__", "__recursion_lock__"]
+    __slots__ = ["__recursion_lock__"]
 
     def __getattr__(self, name):
         try:
             if name in self.__slots__:
-                try:
-                    return object.__getattribute__(self, name)
-                except AttributeError as e:
-                    if name == "__keys_order__":
-                        r = []
-                        object.__setattr__(self, "__keys_order__", r)
-                        return r
-                    else:
-                        raise e
+                return object.__getattribute__(self, name)
             else:
                 return self[name]
         except KeyError:
@@ -119,63 +111,12 @@ class Container(dict):
         except KeyError:
             raise AttributeError(name)
 
-    def __setitem__(self, key, value):
-        if key not in self:
-            self.__keys_order__.append(key)
-        dict.__setitem__(self, key, value)
-
-    def __delitem__(self, key):
-        """Removes an item from the Container in linear time O(n)."""
-        if key in self:
-            self.__keys_order__.remove(key)
-            dict.__delitem__(self, key)
-
-    def __init__(self, *args, **entrieskw):
-        self.__keys_order__ = []
-        for arg in args:
-            if isinstance(arg, dict):
-                for k,v in arg.items():
-                    self[k] = v
-            else:
-                for k,v in arg:
-                    self[k] = v
-        for k,v in entrieskw.items():
-            self[k] = v
-
     def __call__(self, **entrieskw):
         """Chains adding new entries to the same container. See ctor."""
         for k,v in entrieskw.items():
             self[k] = v
         return self
 
-    def keys(self):
-        return iter(self.__keys_order__)
-
-    def values(self):
-        return (self[k] for k in self.__keys_order__)
-
-    def items(self):
-        return ((k, self[k]) for k in self.__keys_order__)
-
-    __iter__ = keys
-
-    def clear(self):
-        """Removes all items."""
-        dict.clear(self)
-        self.__keys_order__ = []
-
-    def pop(self, key):
-        """Removes and returns the value for a given key, raises KeyError if not found."""
-        val = dict.pop(self, key)
-        self.__keys_order__.remove(key)
-        return val
-
-    def popitem(self):
-        """Removes and returns the last key and value from order."""
-        k = self.__keys_order__.pop()
-        v = dict.pop(self, k)
-        return k, v
-
     def update(self, seqordict):
         """Appends items from another dict/Container or list-of-tuples."""
         if isinstance(seqordict, dict):
@@ -183,12 +124,6 @@ class Container(dict):
         for k,v in seqordict:
             self[k] = v
 
-    def __getstate__(self):
-        return self.__keys_order__
-
-    def __setstate__(self, state):
-        self.__keys_order__ = state
-
     def copy(self):
         return Container(self)

Would you be interested in such a patch? It broke one unit test, but rather trivially:

diff --git a/tests/lib/test_containers_dict.py b/tests/lib/test_containers_dict.py
index 1598022..2424288 100644
--- a/tests/lib/test_containers_dict.py
+++ b/tests/lib/test_containers_dict.py
@@ -101,7 +101,7 @@ def test_popitem():
     assert c.popitem() == ("c",3)
     assert c.popitem() == ("b",2)
     assert c.popitem() == ("a",1)
-    assert raises(c.popitem) == IndexError
+    assert raises(c.popitem) == KeyError
@arekbulski
Copy link
Member

I spent a lot of time tweaking this class so I am no so keen on changing it. I also tried deriving it from OrderedDict rather than implementing it myself and it didnt work as expected. So no, I am not accepting this. But thanks for trying.

@michallowasrzechonek-silvair
Copy link
Author

Mhm. You might note that regular dicts are ordered now (from 3.6 if I'm not mistaken).

@arekbulski
Copy link
Member

I think it is a good idea to finally leverage the ordered dict implementation in Py3. Thank you for bringing this up. I will redo your work later (at some point).

@arekbulski
Copy link
Member

arekbulski commented Jun 26, 2020

About dicts: you got it wrong. Since 3.6 its "an implementation detail" (OrderedDict is needed for a guarantee across pythons). Since 3.7 its a guarantee for ordinary dicts. Construct to day supports 3.6.
https://stackoverflow.com/questions/39980323/are-dictionaries-ordered-in-python-3-6

@arekbulski arekbulski reopened this Jun 26, 2020
@arekbulski
Copy link
Member

I am leaving this open as a remainder of what to do. Please do not close.

@arekbulski
Copy link
Member

I am reopening (again) this ticket and giving it real thought. Would you like to PR it? Otherwise I will do it myself.

@arekbulski
Copy link
Member

BTW the difference in performance is even higher than you indicated.

Zrzut ekranu z 2021-01-26 11-03-26

@michallowasrzechonek-silvair
Copy link
Author

Well, I'm still using the patch from this PR ordinary issue, on top of 2.9.45 (I need nesting, sorry), and it still seems to work fine.

@arekbulski
Copy link
Member

What do you mean by "this PR"? This is NOT a pull request, its an ordinary issue. You pasted a diff.

@arekbulski
Copy link
Member

Ah nevermind. After little help from uncle Google I figured how to apply your patch. I will take it from here. Thanks again.
https://stackoverflow.com/questions/12320863/how-do-you-take-a-git-diff-file-and-apply-it-to-a-local-branch-that-is-a-copy-o

@arekbulski
Copy link
Member

arekbulski commented Jan 27, 2021

Your diff was not appliable to my working directory. I had to manually rework your patch. Next time, please do a proper PR.

Merged and tested.

@michallowasrzechonek-silvair
Copy link
Author

The patch was against 2.9.45, I mentioned that. Anyway, I don't think there will be a next time ;)

@arekbulski
Copy link
Member

Oh you mean the library is so spit and polish that there is no need for more patches? :P

@michallowasrzechonek-silvair
Copy link
Author

Exactly! Especially the "polish" part.

@arekbulski
Copy link
Member

Yeah, that joke is as old as me. Funny.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants