-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade to UUID based indexing #1
Conversation
Upgrade to ``UUID`` basd indexing instead of using the ``id``. The id is not unique and causes problems when multiple lineage subsites with the same id are registered. Furthermore, the uuid can be used to retrieve the lineage childsite object without traversing up the content tree. A upgrade step is included.
+1 |
Very nice! |
thanks for your efforts @thet. i think the upgrade step needs to update the index and metadata for all content objects not just the childsites. to save memory we could use an generator as its done in this migration step (never tried out if that really has a smaller memory footprint though) |
@frisi regarding the second statement - 3rd statement, regarding reduction of memory consumption - the upgrade step looks like it introduces the memory leak problem. when loading all objects before iterating over them, python's garbage collector cannot free any obj references. |
@frisi please review again |
waking up one object Inside the loop each by each is perfectly fine.
|
happy to see that others do not know about python inline generators - see section "the details" - @thet and @jensens 😃 - as i did before reviewing @hvelarde's PR for imagecropping. |
@thet - your updated upgrade step looks fine to me now - thanks again! |
+1 for merging. maybe @hvelarde can tell us why
is better than
|
generators are memory efficient; I used that expression because reindexing a bunch of objects on a large database could let you server running out of memory; I don't know if they are slower, but it's weird this is taking so much time on you case: |
i double checked pep289 and so from what i can tell both::
and::
should be completely the same from time and memory consumption. The first, In the second the wakeup happens in a generator, also one by one. the loop is executed with reindexing, then the next obj gets a new reference in the cycle and the old obj can be garbage collected. Since we do not expand the result to a list and need to loop anyway over the generator, its better memory efficiency does not get grip here. So do what you like: both implementations are the same. |
well, I can confirm you that is not the case: I implemented this because I had issues with the p.a.imagecropping upgrade step mentioned above; without the generator I was running out of memory; after using the generator I was able to reindex all my objects with no problems. the first case will wake all object in memory in the loop; the second, only one by one. |
look carefully - in both cases its a wake up object by object - not all at once. you created a local variable Using an anonymous call helps here. So it does not matter much if you use a generator or directly if i remember right, loop variables are garbage collected in an optimized way. |
@jensens just to let you know that no, the memory consumption s not the same. I made some tests and accessing a batch of 1000 objects using the list of brains consumes around 200MB more memory that doing it by using a generator. the gain was not as big as I expected in the first place, but there is a gain and it can be the difference between an upgrade step ran or a process killed by an hypervisor because excessive memory consumption. I will share later the code. |
Upgrade to
UUID
basd indexing instead of using theid
. The id is notunique and causes problems when multiple lineage subsites with the same id
are registered. Furthermore, the uuid can be used to retrieve the lineage
childsite object without traversing up the content tree. A upgrade step
is included.
@frisi @jensens this is the better fix than c091f96
i think, it's even safe without modifications for templates using this index, since the vocabularies title didn't change.