Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manipulating objects is very slow, even without going to database #1624

Closed
davidwneary opened this issue Aug 16, 2017 · 3 comments
Closed

Manipulating objects is very slow, even without going to database #1624

davidwneary opened this issue Aug 16, 2017 · 3 comments

Comments

@davidwneary
Copy link

davidwneary commented Aug 16, 2017

For my current project, I often need to extract data from various mongoengine documents into various dictionaries. I'm finding that there is a very large overhead when doing this, compared with normal objects. I'll illustrate with an example:

I have the following 3 documents:

class Product(Document):
    product_number = IntField()


class ProductFamily(Document):
    name = StringField()
    products = ListField(ReferenceField(Product))


class Order(Document):
    product = ReferenceField(Product)

and I run the following script:

import datetime

connect('test-families')

# Create 300 products
products = []
for i in range(300):
    product = Product(product_number=i)
    products.append(product)

# Create 295 families, each with 5 products
families = []
for i in range(295):
    families.append(ProductFamily(name=str(i), products=products[i:i+5]))

# Create 300 orders
orders = []
for i in range(300):
    orders.append(Order(product=products[i]))

# Create a dictionary of `family name` => List of orders whose product is in that family
start = datetime.datetime.now()
result = {}
for family in families:
    result[family.name] = []
    for order in orders:
        if order.product in family.products:
            result[family.name].append(order)
end = datetime.datetime.now()

print('took {} seconds'.format((end - start).total_seconds()))

Output:

took 4.74825 seconds

If I redefine the documents as normal objects:

class Product(object):
    def __init__(self, product_number):
        self.product_number = product_number


class ProductFamily(object):
    def __init__(self, name, products):
        self.name = name
        self.products = products


class Order(object):
    def __init__(self, product):
        self.product = product

and run the same script, the output is:

took 0.022871 seconds

That's more than 200x slower and we're not even making calls to the database beyond the initial connection.

Obviously I expect some overhead but not this amount. Is there something I'm missing or a way I can quickly speed this up? It's worth noting that I'm only trying to extract data here, not modify it, so if there's a way to cast the document to a normal object or disable validation or something that may be useful.

I ran a quick profile of the script and the top 50 methods by total time are:

>>> p.strip_dirs().sort_stats('tottime').print_stats(50)
Wed Aug 16 10:22:22 2017    stats.txt

         18518514 function calls (18424110 primitive calls) in 9.566 seconds

   Ordered by: internal time
   List reduced from 1320 to 50 due to restriction <50>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   442500    0.804    0.000    1.429    0.000 datastructures.py:113(__getitem__)
  1759390    0.798    0.000    1.704    0.000 datastructures.py:396(get)
    88500    0.736    0.000    5.556    0.000 fields.py:253(__get__)
  2025775    0.729    0.000    1.042    0.000 datastructures.py:382(__getitem__)
  1493890    0.724    0.000    2.170    0.000 fields.py:114(__get__)
   438075    0.702    0.000    2.921    0.000 document.py:264(__eq__)
  3988099    0.654    0.000    0.654    0.000 {built-in method builtins.isinstance}
   531000    0.440    0.000    1.869    0.000 datastructures.py:129(__iter__)
    88795    0.440    0.000    3.932    0.000 dereference.py:14(__call__)
   975293    0.413    0.000    1.057    0.000 {built-in method builtins.hasattr}
        1    0.343    0.343    9.323    9.323 test_reference_field.py:36(run)
2028413/2027518    0.313    0.000    0.319    0.000 {built-in method builtins.getattr}
   800097    0.305    0.000    0.409    0.000 common.py:5(_import_class)
    88795    0.225    0.000    0.531    0.000 datastructures.py:415(<listcomp>)
    88500    0.200    0.000    0.431    0.000 fields.py:1023(__get__)
    88795    0.188    0.000    2.058    0.000 dereference.py:49(<listcomp>)
    88795    0.180    0.000    0.226    0.000 datastructures.py:481(__iter__)
    88795    0.133    0.000    1.021    0.000 datastructures.py:426(__len__)
   179380    0.126    0.000    0.126    0.000 datastructures.py:463(__setattr__)
   895705    0.116    0.000    0.116    0.000 {method 'get' of 'dict' objects}
   355180    0.113    0.000    0.170    0.000 datastructures.py:424(<genexpr>)
181840/179755    0.109    0.000    0.247    0.000 {built-in method builtins.setattr}
   179380    0.078    0.000    0.312    0.000 datastructures.py:389(__setitem__)
    88795    0.072    0.000    0.875    0.000 datastructures.py:414(items)
181519/92552    0.072    0.000    1.081    0.000 {built-in method builtins.len}
    88795    0.050    0.000    0.072    0.000 fields.py:1014(document_type)
      114    0.048    0.000    0.104    0.001 sre_compile.py:248(_optimize_charset)
    88795    0.047    0.000    0.272    0.000 {built-in method builtins.iter}
    88795    0.046    0.000    0.046    0.000 datastructures.py:423(__iter__)
   442500    0.044    0.000    0.044    0.000 {function BaseList.__getitem__ at 0x105e80ae8}
    88795    0.037    0.000    1.118    0.000 document.py:244(__len__)
   132409    0.034    0.000    0.055    0.000 sre_compile.py:101(fixup)
   132447    0.021    0.000    0.021    0.000 {built-in method _sre.getlower}
    88798    0.020    0.000    0.020    0.000 {built-in method builtins.all}
      895    0.018    0.000    0.068    0.000 document.py:37(__init__)
       18    0.016    0.001    0.018    0.001 {built-in method _imp.create_dynamic}
      142    0.015    0.000    0.015    0.000 {built-in method marshal.loads}
88583/88575    0.013    0.000    0.013    0.000 {built-in method builtins.issubclass}
     8350    0.010    0.000    0.021    0.000 document.py:145(__setattr__)
  458/455    0.009    0.000    0.123    0.000 {built-in method builtins.__build_class__}
    159/1    0.008    0.000    9.566    9.566 {built-in method builtins.exec}
      611    0.005    0.000    0.020    0.000 <frozen importlib._bootstrap_external>:1215(find_spec)
      999    0.005    0.000    0.005    0.000 {built-in method posix.stat}
     2085    0.004    0.000    0.011    0.000 fields.py:124(__set__)
        1    0.004    0.004    0.004    0.004 {method 'read' of '_io.TextIOWrapper' objects}
      142    0.003    0.000    0.004    0.000 <frozen importlib._bootstrap_external>:816(get_data)
   174/35    0.003    0.000    0.008    0.000 sre_parse.py:491(_parse)
     3139    0.002    0.000    0.007    0.000 <frozen importlib._bootstrap_external>:50(_path_join)
     3139    0.002    0.000    0.003    0.000 <frozen importlib._bootstrap_external>:52(<listcomp>)
      895    0.002    0.000    0.003    0.000 datastructures.py:435(create)

I'm using mongoengine version 0.13.0

Let me know if you need any more details!

@davidwneary
Copy link
Author

Can anyone shed any light on this?

@touilleMan
Copy link
Member

see #1230, I've submitted PR #1630 to mitigate this

@amcgregor
Copy link
Contributor

I've updated the benchmark a bit and here are my comparative results pre- and post-optimization; a clear improvement! (Apologies for how ugly the code in there is…)

Master branch @ dc8a64f

  • Population: 0.04223203659057617 seconds
  • Insertion (instance.save): 0.37940502166748047 seconds
  • Interrogation: 5.322256803512573 seconds
  • Deserialization: 0.16296911239624023 seconds

Master branch @ 5bdd354

  • Population: 0.037187814712524414 seconds
  • Insertion (instance.save): 0.3511490821838379 seconds
  • Interrogation: 3.0043630599975586 seconds
  • Deserialization: 0.12032485008239746 seconds

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants