Skip to content

Conversation

@sagivmalihi
Copy link
Contributor

This implements issue #624:

  1. replacing _data with a strict dictionary (thus conserving memory spent on storing the keys on instance)
  2. adding slots to base document classes to conserver memory spent on dicts.

This drastically reduces the memory footprint when loading many instances of the same class.

@sagivmalihi
Copy link
Contributor Author

This improved memory utilization in our use case by a factor of x20.
The main memory gain comes from storing the keys only once per class, and store only values on instances.
A secondary gain comes from losing all the unnecessary dicts per instance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strongly backward incompatible.

@lig
Copy link
Contributor

lig commented Apr 11, 2014

Thank you for such a contribution. It looks like braking a lot of stuff that is used widely.

I think that here is the place for an alternative BaseDocument class. Say StrictDocument. That would allow consume less memory in trade of lack of functionality.

@lig lig added this to the Features we're unsure of milestone Apr 11, 2014
@sagivmalihi
Copy link
Contributor Author

sagivmalihi commented Apr 11, 2014

There is already DynamicDocument which is non strict and also even stores
to DB...

Is the functionality of storing extra unsaved data really used widely?

It's possible to relax this a little and simply ignore any extra args.
Another solution could be to store a dedicated slot to such data that will
only be created if needed.

what do you think?

@sagivmalihi
Copy link
Contributor Author

@lig - thank you for your comments! I changed the implementation such that it now allows for previously undeclared fields (thus breaking nothing!), but still greatly optimizing for declared fields.

A hash table is only created if absolutely necessary.

I also added a class member 'STRICT' to allow using the 'strict' implementation (IMO it's generally safer...)

@ispmarin
Copy link

ispmarin commented Jun 6, 2014

What functionality would be lost if this is used?

@sagivmalihi
Copy link
Contributor Author

@ispmarin - absolutely no functionality would be lost (as far as I understand).
All tests pass without modification.

@lig
Copy link
Contributor

lig commented Jun 9, 2014

There is no 100% coverage in this project.

Ability to save data that is not declared in class attributes will be
completely loss.

Also I think using __slots__ is great for performance and memory
consumption. But this particular realization looks like it will be hard to
maintain it in the future.

I dont like strict_classes and semi_strict_classes globals and
strict_dict function. I think this code has a lack of incapsulation.

Maybe we should put strict_dict somewhere in StrictDict. It may be
__init__ or something. And strict_classes and semi_strict_classes
should be class attributes on my opinion.

@sagivmalihi
Copy link
Contributor Author

@lig - thanks for following up.

  1. The ability to save non-declared class attributes will not be lost. There is no change in that aspect (the default is to use 'semi-strict' classes that allow storing any extra-allocated keys, the space just won't be pre-allocated - they will be stored in a real dict).
  2. using slots is just a minor enhancement really, and is not related to replacing the implementation of _data. However it does have a rather strong effect. Why do you think it will hurt maintainability?
  3. I understand what you're saying about the strict_dict function (it looks a bit non-OO, but that's not really a bad thing per-se...) - but it's necessary since it creates new types on-the-fly (so can't happen in init). It could have been handled by a metaclass but I think a simple function is cleaner and simpler. I also understand what you're saying about globals, but since they are only referenced inside this dedicated module, you can might as well think of the entire module as a class (or namespace) - there is no abstraction leak here - the functionality is completely encapsulated in the strict_dict function (that simply generates and returns the correct type).

@lig
Copy link
Contributor

lig commented Jun 9, 2014

Sagiv, I completely understand you point. It looks like it is good enough
solution. But thinking of MongoEngine hackers out there I prefer Metaclass
over fabric because it could be coded with extensibility in mind unlike
standalone function. I think here is much place for engineering at this
moment.

@sagivmalihi
Copy link
Contributor Author

OK, so I looked a bit deeper into it.
Since strict_dict is actually a class factory (generating classes on-the-fly), it cannot be a class (well, it could be a class with call and an init, but that's just a function in disguise).

It's (as far as I can see) impossible to achieve this by metaclasses, since you need to be able to both
a) pass arguments to class creation (the allowed keys)
and
b) pass arguments to instance creation (the key:value pairs for initial data)

@lig
Copy link
Contributor

lig commented Jun 10, 2014

Sagiv, any function could be a member of the class. Say classmethod.

changed the _data field to static key-value mapping instead of hash table
This implements MongoEngine#624
@sagivmalihi
Copy link
Contributor Author

@lig, I'm fully convinced. Please see updated commit.

@rozza rozza merged commit 9835b38 into MongoEngine:master Jun 26, 2014
rozza added a commit that referenced this pull request Jun 26, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants