Upgrade to UUID based indexing #1

thet · 2015-04-20T08:21:01Z

Upgrade to UUID basd indexing instead of using the id. The id is not
unique and causes problems when multiple lineage subsites with the same id
are registered. Furthermore, the uuid can be used to retrieve the lineage
childsite object without traversing up the content tree. A upgrade step
is included.

@frisi @jensens this is the better fix than c091f96
i think, it's even safe without modifications for templates using this index, since the vocabularies title didn't change.

Upgrade to ``UUID`` basd indexing instead of using the ``id``. The id is not unique and causes problems when multiple lineage subsites with the same id are registered. Furthermore, the uuid can be used to retrieve the lineage childsite object without traversing up the content tree. A upgrade step is included.

jensens · 2015-04-20T08:24:59Z

+1

claytron · 2015-04-20T14:00:33Z

Very nice!

frisi · 2015-04-20T19:48:57Z

thanks for your efforts @thet.

i think the upgrade step needs to update the index and metadata for all content objects not just the childsites.
lineage.index is also used to show which childsite an item belongs to (eg. in folder listings). therefore it's important that brain.childsite matches the keys in lineage.childsites vocabulary

to save memory we could use an generator as its done in this migration step (never tried out if that really has a smaller memory footprint though)

thet · 2015-04-20T20:42:33Z

@frisi
hm... i guess, you're right, regarding updating all content objects.

regarding the second statement - brain.childsite is a uuid and the keys in the lineage.childsites vocab are uuids too.

3rd statement, regarding reduction of memory consumption - the upgrade step looks like it introduces the memory leak problem. when loading all objects before iterating over them, python's garbage collector cannot free any obj references.

thet · 2015-04-20T23:20:38Z

@frisi please review again

jensens · 2015-04-21T03:58:46Z

waking up one object Inside the loop each by each is perfectly fine.
creating a tuple before and iterating over it is most worse practice.
Am 21.04.2015 01:20 schrieb "Johannes Raggam" notifications@github.com:

@frisi https://github.com/frisi please review again

—
Reply to this email directly or view it on GitHub
#1 (comment)
.

frisi · 2015-04-21T08:33:13Z

happy to see that others do not know about python inline generators - see section "the details" - @thet and @jensens 😃 - as i did before reviewing @hvelarde's PR for imagecropping.
so it's definitely not waking up all objects before iterating over the tuple. however i can't tell - did not test yet - if thats really more efficient than iterating over the catalog result (which should be a generator too) and waking up the object inside the for-loop. maybe @hvelarde can shed some light on this.

frisi · 2015-04-21T08:38:01Z

@thet - your updated upgrade step looks fine to me now - thanks again!
the line "lineage.index is also used to show" was meant to explain why the index and metadata needs to be updated (so it matches the new vocabulary) - not that it's something your PR was wrong about.

thet · 2015-04-21T10:02:37Z

@frisi ha, python inline generators are new to me. nice!

the refactored upgrade step takes on my projectsite with > 20000 content items half an hour or so instead very few seconds like before. but we definitely need to index all content items.

@frisi can we merge it? :)

frisi · 2015-04-21T10:09:35Z

+1 for merging. maybe @hvelarde can tell us why

objects = (brain.getObject for catalog())
for obj in objects:
   obj.reindexObject();

is better than

for brain in catalog():
    brain.getObject().reindexObject()

… given context

in https://github.com/collective/plone.app.imagecropping/pull/63/files

Upgrade to UUID based indexing

hvelarde · 2015-04-27T09:56:33Z

generators are memory efficient; I used that expression because reindexing a bunch of objects on a large database could let you server running out of memory; I don't know if they are slower, but it's weird this is taking so much time on you case:

https://www.python.org/dev/peps/pep-0289/

jensens · 2015-04-27T10:14:56Z

i double checked pep289 and so from what i can tell both::

for brain in catalog():
    brain.getObject().reindexObject()

and::

objects = (brain.getObject for brain in  catalog())
for obj in objects:
    obj.reindexObject();

should be completely the same from time and memory consumption.

The first, catalog() itself returns LazyBrains. This happens in two too.
Then in first case a brain is fetched and at the same time the object is woken up (anonymous) and immediately indexed and since there is no reference held it will be garbage collected asap.

In the second the wakeup happens in a generator, also one by one. the loop is executed with reindexing, then the next obj gets a new reference in the cycle and the old obj can be garbage collected.

Since we do not expand the result to a list and need to loop anyway over the generator, its better memory efficiency does not get grip here.

So do what you like: both implementations are the same.

hvelarde · 2015-04-27T12:03:04Z

well, I can confirm you that is not the case: I implemented this because I had issues with the p.a.imagecropping upgrade step mentioned above; without the generator I was running out of memory; after using the generator I was able to reindex all my objects with no problems.

the first case will wake all object in memory in the loop; the second, only one by one.

jensens · 2015-04-27T12:46:15Z

look carefully - in both cases its a wake up object by object - not all at once.

you created a local variable obj = brain.getObject() - maybe that is not garbage collected immediately because some circular reference was left?

Using an anonymous call helps here. So it does not matter much if you use a generator or directly brain.getObject().reindexObject()

if i remember right, loop variables are garbage collected in an optimized way.

hvelarde · 2015-06-08T16:37:59Z

@jensens just to let you know that no, the memory consumption s not the same.

I made some tests and accessing a batch of 1000 objects using the list of brains consumes around 200MB more memory that doing it by using a generator.

the gain was not as big as I expected in the first place, but there is a gain and it can be the difference between an upgrade step ran or a process killed by an hypervisor because excessive memory consumption.

I will share later the code.

fix upgrade step to apply childsite index on all catalog items

98ef673

thet added 5 commits April 21, 2015 17:53

Add chilsiteForContext method, which returns the childsite UUID for a…

494d7dc

… given context

quick bugfix

3c65fb1

handle plone.event.occurrences

8a793bd

Depend on plone.api and use it to get the portal object.

98febfc

refactor upgrade step to use generator expression like done by @hvelarde

b49890c

in https://github.com/collective/plone.app.imagecropping/pull/63/files

thet added a commit that referenced this pull request Apr 26, 2015

Merge pull request #1 from collective/thet-uuidindexing

b49423e

Upgrade to UUID based indexing

thet merged commit b49423e into master Apr 26, 2015

thet deleted the thet-uuidindexing branch April 26, 2015 22:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade to UUID based indexing #1

Upgrade to UUID based indexing #1

thet commented Apr 20, 2015

jensens commented Apr 20, 2015

claytron commented Apr 20, 2015

frisi commented Apr 20, 2015

thet commented Apr 20, 2015

thet commented Apr 20, 2015

jensens commented Apr 21, 2015

frisi commented Apr 21, 2015

frisi commented Apr 21, 2015

thet commented Apr 21, 2015

frisi commented Apr 21, 2015

hvelarde commented Apr 27, 2015

jensens commented Apr 27, 2015

hvelarde commented Apr 27, 2015

jensens commented Apr 27, 2015

hvelarde commented Jun 8, 2015

Upgrade to UUID based indexing #1

Upgrade to UUID based indexing #1

Conversation

thet commented Apr 20, 2015

jensens commented Apr 20, 2015

claytron commented Apr 20, 2015

frisi commented Apr 20, 2015

thet commented Apr 20, 2015

thet commented Apr 20, 2015

jensens commented Apr 21, 2015

frisi commented Apr 21, 2015

frisi commented Apr 21, 2015

thet commented Apr 21, 2015

frisi commented Apr 21, 2015

hvelarde commented Apr 27, 2015

jensens commented Apr 27, 2015

hvelarde commented Apr 27, 2015

jensens commented Apr 27, 2015

hvelarde commented Jun 8, 2015