Change the PersistenceContext default resetLimit for iteration queries to 100 (from 1000) #2407

rbygrave · 2021-10-07T23:58:30Z

Changes the resetLimit from 1000 to 100 for iteration queries (findEach, findIterate, findStream).

Then with TestPersistenceContext.findWithGcTest() we see

Total instances: 500, instances left in memory: 99
Total instances: 1000, instances left in memory: 99
Total instances: 1500, instances left in memory: 99
Total instances: 2000, instances left in memory: 99
Total instances: 2500, instances left in memory: 99
Total instances: 3000, instances left in memory: 99
Total instances: 3500, instances left in memory: 99
Total instances: 4000, instances left in memory: 99
Total instances: 4500, instances left in memory: 99
Total instances: 5000, instances left in memory: 99
Total instances: 5500, instances left in memory: 99
Total instances: 6000, instances left in memory: 99
Total instances: 6500, instances left in memory: 99
Total instances: 7000, instances left in memory: 99
Total instances: 7500, instances left in memory: 99
Total instances: 8000, instances left in memory: 99
Total instances: 8500, instances left in memory: 99
Total instances: 9000, instances left in memory: 99
Total instances: 9500, instances left in memory: 99
Total instances: 10000, instances left in memory: 99

…s to 100 (from 1000)

rPraml · 2021-10-08T08:16:42Z

Hello Rob, thanks for the fix. The memory leak fix seems to be easier than I thought.
Changing the value to 100 suits us.

But why is this done at all? As far as I understand, you put a lot of effort in findIterate (#2021 and so on) so that it periodically starts a new persistence context.

What do you generally think if we have only one persistence context, that uses weak references, so that it has not the ability to hold beans in memory.

IMHO that would make sense, that the PC holds only the beans in memory, that are strong referenced somewhere else. It would also free up memory if you clear lists (e.g. customer.getOrders().clear() will free all orders in the PC)

I made a proof of concept in this branch https://github.com/FOCONIS/ebean/commits/experimental/use-weak-ref

I do not know every detail (e.g. if PC is used for prefetch/preload and so on) - I also know, that using references may add additional costs (runtime/memory) - but on the other hand, the change will remove complexity and the PC will cache as many beans as the garbage collection allows.

If you think, WeakReferences are the wrong way here, you may take a look at that change FOCONIS@060c385 it will remove code used only in one test.

Cheers
Roland

rbygrave · 2021-10-10T21:29:03Z

But why is this done at all?

Largely because it means people can then use findEach(), findStream() etc even for small results - choose it for the programming model/style rather than "large resultSet" reasons. In doing so they can get the same behaviour wrt persistence context as if they used findList.

That is, people can use findEach(), findStream() without thinking or understanding much and the previous persistence context per top level bean isn't really appropriate/good for that case. I felt we need the default behaviour for small results via findEach/findStream to be just as good as findList.

rbygrave · 2021-10-10T21:30:53Z

persistence context, that uses weak references

Or a specialised persistence context for the iterating queries case that uses weak references. We only get the extra costs of weak references for the iterating queries.

rPraml · 2021-10-11T10:12:47Z

Or a specialised persistence context for the iterating queries case that uses weak references. We only get the extra costs of weak references for the iterating queries.

First, I thought, that can be a good idea, to use the special WeakRef Persistence Context in findIterate/findStream. But there may be the following use cases:

try (Transaction txn = DB.beginTransaction) {
   list = DB.find(AnyClass.class).query....findList(); // here PC1 is active
   resultMap = new HashMap();
   DB.find(AnyClass.class).query....findEach( bean -> {
       // Here, PC2 (which uses weakReferences) is active, but we must know PC1 as parent
       // because 'bean' could be already in 'list'
       if (addToResult(bean)) {
          resultMap.put(bean.getId(), bean); // Here we pass a bean from the findEach loop to the outer context
       }
   });
   list = DB.find(AnyClass.class).query....findList();  // PC1 is active again, but bean could be in "resultMap" (or should PC2 be active now)
}

If we start a new WeakRefPersistenceContext on findEach()/findStream() etc. and there is already a context present, we probably have to copy all elements to the new context. (as it is currently done in DefaultPersistenceContext)
Passing beans outside the findEach loop may lead to problems, as all following queries may use the wrong persistence context.
Starting with a WeakRefPersistenceContext will solve all problems, but will add additional overhead.

I made some quick & dirty performance measurements here:

FOCONIS@3acdd3c

and I think the WeakRefPersistenceContext is about 3 times slower. The measurements are for 100K beans. So DefaultPersistenceContext will ad ~ 130ns per bean get/put and WeakRefPersistenceContext ~ 400ns per bean.

CreateTime: 272 ms, Default: 283 ms (Overhead: 10 ms), Weak: 309 ms (Overhead: 37 ms)
CreateTime: 266 ms, Default: 284 ms (Overhead: 17 ms), Weak: 305 ms (Overhead: 38 ms)
CreateTime: 271 ms, Default: 283 ms (Overhead: 12 ms), Weak: 318 ms (Overhead: 47 ms)
CreateTime: 282 ms, Default: 280 ms (Overhead: -1 ms), Weak: 309 ms (Overhead: 27 ms)
CreateTime: 269 ms, Default: 279 ms (Overhead: 10 ms), Weak: 314 ms (Overhead: 45 ms)
CreateTime: 269 ms, Default: 281 ms (Overhead: 11 ms), Weak: 307 ms (Overhead: 37 ms)
CreateTime: 276 ms, Default: 287 ms (Overhead: 10 ms), Weak: 326 ms (Overhead: 50 ms)
CreateTime: 271 ms, Default: 315 ms (Overhead: 43 ms), Weak: 336 ms (Overhead: 65 ms)
CreateTime: 269 ms, Default: 290 ms (Overhead: 20 ms), Weak: 309 ms (Overhead: 40 ms)

To remove complexity, we could implement a PersistenceContextDelegate:

class PersistenceContextDelegate implements PersistenceContext {
    private PersistenceContext delegate = new DefaultPersistenceContext();
// delegate methods..

    void switchToWeakRef() {
        delegate = new WeakRefPersistenceContext(delegate); // copy all content
    } 
}

This would make it much easier to switch from DefaultPersistenceContext to WeakRefPersistenceContext, if a 'forEach' loop will start. (Especially in ReadJson etc)

Cheers
Roland

rbygrave · 2021-10-12T01:40:45Z

Discard this now.

Change the PersistenceContext default resetLimit for iteration querie…

f2adbc3

…s to 100 (from 1000)

rbygrave self-assigned this Oct 7, 2021

rbygrave mentioned this pull request Oct 8, 2021

Memory leak in Query.findEach #2406

Closed

rPraml mentioned this pull request Oct 11, 2021

Weak reference based persistence context for streaming queries #2411

Merged

rbygrave closed this Oct 12, 2021

rbygrave deleted the feature/findEach-resetLimit-100 branch November 30, 2021 23:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change the PersistenceContext default resetLimit for iteration queries to 100 (from 1000) #2407

Change the PersistenceContext default resetLimit for iteration queries to 100 (from 1000) #2407

rbygrave commented Oct 7, 2021 •

edited

Loading

rPraml commented Oct 8, 2021

rbygrave commented Oct 10, 2021

rbygrave commented Oct 10, 2021

rPraml commented Oct 11, 2021

rbygrave commented Oct 12, 2021

Change the PersistenceContext default resetLimit for iteration queries to 100 (from 1000) #2407

Change the PersistenceContext default resetLimit for iteration queries to 100 (from 1000) #2407

Conversation

rbygrave commented Oct 7, 2021 • edited Loading

rPraml commented Oct 8, 2021

rbygrave commented Oct 10, 2021

rbygrave commented Oct 10, 2021

rPraml commented Oct 11, 2021

rbygrave commented Oct 12, 2021

rbygrave commented Oct 7, 2021 •

edited

Loading