findStream() / findEach() memory leak #2918

serg1236 · 2022-12-15T13:50:44Z

Expected behavior

Query::findStream() and Query::findEach() are supposed to not cache any items in the persistence context.

Actual behavior

Items are being cached in the persistence context causing OutOfMemoryError after some time.
I've done simple tests:

//...
myQuery.findStream().forEach(System.out::println);

//...
myQuery.findEach(System.out::println);

Here is a memory snapshot from the profiler after about 20 sec of execution:

P.S. I use Postgresql DB.

The text was updated successfully, but these errors were encountered:

rPraml · 2022-12-15T14:11:24Z

Hello seg1236
which ebean version are you using? Can yo check, if the issue #2411 (and commit #2413) are in your version. I would expect, that a findeach should use weak references, which should be cleaned up, if a bean is no longer referenced.
Do you store the beans in a list or something similar?

Roland

serg1236 · 2022-12-15T14:23:38Z

@rPraml I use 13.11.0. I don't collect any data. I also tried to run GC explicitly and it didn't help:

try (var iterator = vectorsQuery.findIterate()) {
            while (iterator.hasNext()) {
                if (counter.incrementAndGet() % 2000 == 0) {
                    System.out.println("COLLECTOR");
                    System.gc();
                }
                System.out.println(counter.get() + " " + iterator.next());
            }
        }

rPraml · 2022-12-16T12:02:47Z

@serg1236 I tried to reproduce that, so I write a simple test (See below)

The test uses a simple entity and creates 10M of them i a DB file. (H2 in memory will not work)
It also uses its own serverConfig. Otherwise, the classpathScan will find the TDChangelogListener and this ChangelogListener will put all models in an internal list. (Which makes it difficult/impossible to create 10M entries)
Note to @rbygrave: this should be fixed in the test framework. The default server should not find classes like TDChangelogListener and use them.

But back th the test.
After the "test-set" is created (resulting DB file is ~2,7GB) I started the second test, which does a findEach and then a findList.
The output was:

Doing findEach
Read 10000000 entries
Doing findStream
Read 10000000 entries
Doing findIterate
Read 10000000 entries
Doing FindList

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "h2-ro.heartBeat"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "logback-1"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "h2.heartBeat"
....

So I don't think there is a memory leak in streaming queries itself, but somewhere else.
I'm sure there is some code in your app, in an optional ebean feature (maybe cache) that causes this side effect.
Do you have installed some listeners, (e.g. ReadAuditLogger), that may keep references of these beans?

Can you check the test case, adapt it in your application and maybe provide an example, when a "findEach" query runs in an OOM

Roland

  @Entity
  public static class TestModel extends BasicDomain {
    @Size(max = 255)
    private String someData;

    public String getSomeData() {
      return someData;
    }

    public void setSomeData(String someData) {
      this.someData = someData;
    }
  }

  @Test
  void initDb() {
    DatabaseConfig config = new DatabaseConfig();
    config.setName("h2-batch");
    config.loadFromProperties();
    config.setDdlExtra(false);
    config.getDataSourceConfig().setUsername("sa");
    config.getDataSourceConfig().setPassword("sa");
    config.getDataSourceConfig().setUrl("jdbc:h2:file:./testsFile;DB_CLOSE_ON_EXIT=FALSE;NON_KEYWORDS=KEY,VALUE");
    config.addClass(TestModel.class);
    DatabaseFactory.create(config);

    String base = "x".repeat(240);
    // 10 mio TestModel - each needs about 1/4 kbytes -> 2,5 GB in total
    List<TestModel> batch = new ArrayList<>();
    for (int i = 0; i < 10_000_000; i++) {
      TestModel m = new TestModel();
      m.setSomeData(base + i); // ensure we have not duplicates
      batch.add(m);
      if (i % 1000 == 0) {
        DB.saveAll(batch);
        batch.clear();
      }
      if (i % 100000 == 0) {
        System.out.println(i);
      }
    }
    DB.saveAll(batch);
  }

  @Test
  void testOom() {

    DatabaseConfig config = new DatabaseConfig();
    config.setName("h2-batch");
    config.loadFromProperties();
    config.setDdlRun(false);
    config.getDataSourceConfig().setUsername("sa");
    config.getDataSourceConfig().setPassword("sa");
    config.getDataSourceConfig().setUrl("jdbc:h2:file:./testsFile;DB_CLOSE_ON_EXIT=FALSE;NON_KEYWORDS=KEY,VALUE");
    config.addClass(TestModel.class);
    DatabaseFactory.create(config);

    AtomicInteger i = new AtomicInteger();
    System.out.println("Doing findEach");
    DB.find(TestModel.class).select("*").findEach(c -> i.incrementAndGet());
    System.out.println("Read " + i + " entries");

    i.set(0);
    System.out.println("Doing findStream");
    DB.find(TestModel.class).select("*").findStream().forEach(c -> i.incrementAndGet());
    System.out.println("Read " + i + " entries");

    i.set(0);
    System.out.println("Doing findIterate");
    QueryIterator<TestModel> iter = DB.find(TestModel.class).select("*").findIterate();
    while (iter.hasNext()) {
      iter.next();
      i.incrementAndGet();
    }
    System.out.println("Read " + i + " entries");

    System.out.println("Doing FindList");
    List<TestModel> lst = DB.find(TestModel.class).select("*").findList();
    System.out.println("Read " + lst.size() + " entries");
  }

serg1236 · 2023-01-03T11:07:21Z

@rPraml sorry for the late response. I figured it out. The leak happens only if items have embedded ids.
Here is what happening:

io.ebeaninternal.server.transaction.DefaultPersistenceContext.ClassContext contains a map of objects like Map<Id, WeakRef>
If id is @Embeddable eBean is creating embeddedOwner field that contains strong reference to Value object. That's why it cannot be cleaned up by GC.

Even without that storing id as a strong reference may cause OutOfMemory, especially when it's @Embeddable with a bunch of fields.

…-orm#2918

rPraml · 2023-01-04T10:59:45Z

@serg1236 thanks for your investigation. I've created the PR #2922 that shoud fix THIS issue.

@rbygrave There may be more places, in DefaultServerCache.put we use softreferences as value.
The EmbeddedIds are used very often as a part of the cache key (HashQuery->bindValues) - so I see a risk here to have the next memory leak.
I did not want to do the "copy-trick" on all that places, so I made two other PRs #2923 and #2924

In our application, we also often store only the IDs in a List of search results, because the ID is small in comparison to the whole bean. But if the EmbeddedId keeps a reference to it's owner, this is no longer true.
Fortunately, we do not use embedded ids very often.

My favourite is now #2924 - but this may cause some breaking changes, where I cannot asses the risk.

To test I use the above test model and created a model that uses two UUIDs as key.

Should I commit the test also? It will only make sense to run the test manually.
/edit: Test is here: FOCONIS@5af6980

rPraml added a commit to FOCONIS/ebean that referenced this issue Jan 4, 2023

FIX memory leak in streaming queries when embedded-ids are used ebean…

672e0f0

…-orm#2918

This was referenced Jan 4, 2023

FIX memory leak in streaming queries when embedded-ids are used #2918 #2922

Closed

Other approach: Change owner to weak reference #2923

Closed

Other approach: Use ReadOnlyIntercept for embedded ids when read from DB #2924

Closed

rPraml mentioned this issue Jan 4, 2023

Ignore private/package private classes from CP-scan / FIX: Order of runServerConfigStartup #2925

Merged

rbygrave linked a pull request Jan 5, 2023 that will close this issue

Fix memory leak with streaming queries with Id Embedded #2928

Merged

rbygrave closed this as completed in #2928 Jan 8, 2023

rbygrave added this to the 13.11.1 milestone Jan 8, 2023

rbygrave self-assigned this Jan 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

findStream() / findEach() memory leak #2918

findStream() / findEach() memory leak #2918

serg1236 commented Dec 15, 2022

rPraml commented Dec 15, 2022

serg1236 commented Dec 15, 2022

rPraml commented Dec 16, 2022

serg1236 commented Jan 3, 2023 •

edited

Loading

rPraml commented Jan 4, 2023 •

edited

Loading

findStream() / findEach() memory leak #2918

findStream() / findEach() memory leak #2918

Comments

serg1236 commented Dec 15, 2022

Expected behavior

Actual behavior

rPraml commented Dec 15, 2022

serg1236 commented Dec 15, 2022

rPraml commented Dec 16, 2022

serg1236 commented Jan 3, 2023 • edited Loading

rPraml commented Jan 4, 2023 • edited Loading

serg1236 commented Jan 3, 2023 •

edited

Loading

rPraml commented Jan 4, 2023 •

edited

Loading