Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

findStream() / findEach() memory leak #2918

Closed
serg1236 opened this issue Dec 15, 2022 · 5 comments · Fixed by #2928
Closed

findStream() / findEach() memory leak #2918

serg1236 opened this issue Dec 15, 2022 · 5 comments · Fixed by #2928
Assignees
Milestone

Comments

@serg1236
Copy link
Contributor

Expected behavior

Query::findStream() and Query::findEach() are supposed to not cache any items in the persistence context.

Actual behavior

Items are being cached in the persistence context causing OutOfMemoryError after some time.
I've done simple tests:

//...
myQuery.findStream().forEach(System.out::println);
//...
myQuery.findEach(System.out::println);

Here is a memory snapshot from the profiler after about 20 sec of execution:
image

P.S. I use Postgresql DB.

@rPraml
Copy link
Contributor

rPraml commented Dec 15, 2022

Hello seg1236
which ebean version are you using? Can yo check, if the issue #2411 (and commit #2413) are in your version. I would expect, that a findeach should use weak references, which should be cleaned up, if a bean is no longer referenced.
Do you store the beans in a list or something similar?

Roland

@serg1236
Copy link
Contributor Author

@rPraml I use 13.11.0. I don't collect any data. I also tried to run GC explicitly and it didn't help:

try (var iterator = vectorsQuery.findIterate()) {
            while (iterator.hasNext()) {
                if (counter.incrementAndGet() % 2000 == 0) {
                    System.out.println("COLLECTOR");
                    System.gc();
                }
                System.out.println(counter.get() + " " + iterator.next());
            }
        }

@rPraml
Copy link
Contributor

rPraml commented Dec 16, 2022

@serg1236 I tried to reproduce that, so I write a simple test (See below)

The test uses a simple entity and creates 10M of them i a DB file. (H2 in memory will not work)
It also uses its own serverConfig. Otherwise, the classpathScan will find the TDChangelogListener and this ChangelogListener will put all models in an internal list. (Which makes it difficult/impossible to create 10M entries)
Note to @rbygrave: this should be fixed in the test framework. The default server should not find classes like TDChangelogListener and use them.

But back th the test.
After the "test-set" is created (resulting DB file is ~2,7GB) I started the second test, which does a findEach and then a findList.
The output was:

Doing findEach
Read 10000000 entries
Doing findStream
Read 10000000 entries
Doing findIterate
Read 10000000 entries
Doing FindList

Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "h2-ro.heartBeat"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "logback-1"
Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "h2.heartBeat"
....

So I don't think there is a memory leak in streaming queries itself, but somewhere else.
I'm sure there is some code in your app, in an optional ebean feature (maybe cache) that causes this side effect.
Do you have installed some listeners, (e.g. ReadAuditLogger), that may keep references of these beans?

Can you check the test case, adapt it in your application and maybe provide an example, when a "findEach" query runs in an OOM

Roland

  @Entity
  public static class TestModel extends BasicDomain {
    @Size(max = 255)
    private String someData;

    public String getSomeData() {
      return someData;
    }

    public void setSomeData(String someData) {
      this.someData = someData;
    }
  }

  @Test
  void initDb() {
    DatabaseConfig config = new DatabaseConfig();
    config.setName("h2-batch");
    config.loadFromProperties();
    config.setDdlExtra(false);
    config.getDataSourceConfig().setUsername("sa");
    config.getDataSourceConfig().setPassword("sa");
    config.getDataSourceConfig().setUrl("jdbc:h2:file:./testsFile;DB_CLOSE_ON_EXIT=FALSE;NON_KEYWORDS=KEY,VALUE");
    config.addClass(TestModel.class);
    DatabaseFactory.create(config);

    String base = "x".repeat(240);
    // 10 mio TestModel - each needs about 1/4 kbytes -> 2,5 GB in total
    List<TestModel> batch = new ArrayList<>();
    for (int i = 0; i < 10_000_000; i++) {
      TestModel m = new TestModel();
      m.setSomeData(base + i); // ensure we have not duplicates
      batch.add(m);
      if (i % 1000 == 0) {
        DB.saveAll(batch);
        batch.clear();
      }
      if (i % 100000 == 0) {
        System.out.println(i);
      }
    }
    DB.saveAll(batch);
  }

  @Test
  void testOom() {

    DatabaseConfig config = new DatabaseConfig();
    config.setName("h2-batch");
    config.loadFromProperties();
    config.setDdlRun(false);
    config.getDataSourceConfig().setUsername("sa");
    config.getDataSourceConfig().setPassword("sa");
    config.getDataSourceConfig().setUrl("jdbc:h2:file:./testsFile;DB_CLOSE_ON_EXIT=FALSE;NON_KEYWORDS=KEY,VALUE");
    config.addClass(TestModel.class);
    DatabaseFactory.create(config);

    AtomicInteger i = new AtomicInteger();
    System.out.println("Doing findEach");
    DB.find(TestModel.class).select("*").findEach(c -> i.incrementAndGet());
    System.out.println("Read " + i + " entries");

    i.set(0);
    System.out.println("Doing findStream");
    DB.find(TestModel.class).select("*").findStream().forEach(c -> i.incrementAndGet());
    System.out.println("Read " + i + " entries");

    i.set(0);
    System.out.println("Doing findIterate");
    QueryIterator<TestModel> iter = DB.find(TestModel.class).select("*").findIterate();
    while (iter.hasNext()) {
      iter.next();
      i.incrementAndGet();
    }
    System.out.println("Read " + i + " entries");

    System.out.println("Doing FindList");
    List<TestModel> lst = DB.find(TestModel.class).select("*").findList();
    System.out.println("Read " + lst.size() + " entries");
  }

@serg1236
Copy link
Contributor Author

serg1236 commented Jan 3, 2023

@rPraml sorry for the late response. I figured it out. The leak happens only if items have embedded ids.
Here is what happening:

  • io.ebeaninternal.server.transaction.DefaultPersistenceContext.ClassContext contains a map of objects like Map<Id, WeakRef>
  • If id is @Embeddable eBean is creating embeddedOwner field that contains strong reference to Value object. That's why it cannot be cleaned up by GC.

Even without that storing id as a strong reference may cause OutOfMemory, especially when it's @Embeddable with a bunch of fields.

@rPraml
Copy link
Contributor

rPraml commented Jan 4, 2023

@serg1236 thanks for your investigation. I've created the PR #2922 that shoud fix THIS issue.

@rbygrave There may be more places, in DefaultServerCache.put we use softreferences as value.
The EmbeddedIds are used very often as a part of the cache key (HashQuery->bindValues) - so I see a risk here to have the next memory leak.
I did not want to do the "copy-trick" on all that places, so I made two other PRs #2923 and #2924

In our application, we also often store only the IDs in a List of search results, because the ID is small in comparison to the whole bean. But if the EmbeddedId keeps a reference to it's owner, this is no longer true.
Fortunately, we do not use embedded ids very often.

My favourite is now #2924 - but this may cause some breaking changes, where I cannot asses the risk.

To test I use the above test model and created a model that uses two UUIDs as key.

Should I commit the test also? It will only make sense to run the test manually.
/edit: Test is here: FOCONIS@5af6980

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants