Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GEODE-8029: Allow OplogEntryIdSet to Overflow #5329

Merged
merged 2 commits into from Jul 2, 2020

Conversation

jujoramos
Copy link
Contributor

Do not delete drf files during member startup as that should be only
done by the compactor thread. Instead, allow the OplogEntryIdSet to
grow over the default capacity and log a warning message instructing
the user to manually compact the disk-stores.

  • Added unit tests.
  • Replaced usages of 'junit.Assert' by 'assertj'.
  • Modified DiskStoreImpl.deadRecordCount to return long instead of int.
  • Added internal overflow implementation to the OplogEntryIdSet so it can
    grow above the default limit.

Thank you for submitting a contribution to Apache Geode.

In order to streamline the review of the contribution we ask you
to ensure the following steps have been taken:

For all changes:

  • Is there a JIRA ticket associated with this PR? Is it referenced in the commit message?

  • Has your PR been rebased against the latest commit within the target branch (typically develop)?

  • Is your initial contribution a single, squashed commit?

  • Does gradlew build run cleanly?

  • Have you written or updated unit tests to verify your changes?

  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?

Note:

Please ensure that once the PR is submitted, check Concourse for build issues and
submit an update to your PR as soon as possible. If you need help, please send an
email to dev@geode.apache.org.

Do not delete drf files during member startup as that should be only
done by the compactor thread. Instead, allow the OplogEntryIdSet to
grow over the default capacity and log a warning message instructing
the user to manually compact the disk-stores.

- Added unit tests.
- Replaced usages of 'junit.Assert' by 'assertj'.
- Modified DiskStoreImpl.deadRecordCount to return long instead of int.
- Added internal overflow implementation to the OplogEntryIdSet so it can
  grow above the default limit.
@jujoramos jujoramos changed the title [WIP] - GEODE-8029: Allow OplogEntryIdSet to Overflow GEODE-8029: Allow OplogEntryIdSet to Overflow Jun 30, 2020
@jujoramos jujoramos marked this pull request as ready for review June 30, 2020 12:41
Copy link
Contributor

@dschneider-pivotal dschneider-pivotal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good quick fix for users hitting this issue. I'm concerned though that if they are having this issue that a large amount of memory will be used to track deleted ids. An int[] of size 805,306,401 consumes 3.2G of memory.
We should consider a more long term fix. We used to have code that would remove drfs that were no longer needed. If we could do that more periodically it seems like it would prevent a large number of drfs from building up.
Also it seems like the OplogEntryIdSet itself could be reimplemented to use less memory. Since the ids for an diskstore always start at 1 and then simply increase, and many use cases will end up with large sequences of ids that have been deleted, we could consider using algorithms like we have on our concurrency check code for tracking what version ids we have seen. Basically you can just have a sequence (start of sequence and length) which can represent a long sequence with just 2 32-bit values (or 2 64-bit values if longs). Anything in the sequence range that is still alive you would note in another data structure as an exception. If exceptions are rare then this can save lots of memory.

Copy link
Contributor

@pivotal-eshu pivotal-eshu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we no longer need the OplogEntryIdSet after the recovery, shall we clear these sets once it is done?

With these many entryIds in drf files, it will also take longer time to recover. I think Darrel's idea for removing unnecessary drf files will help in this case as well.

@jujoramos
Copy link
Contributor Author

@pivotal-eshu

If we no longer need the OplogEntryIdSet after the recovery, shall we clear these sets once it is done?

There are no references to the OplogEntryIdSet instance once we finish the recovery, so the class itself (along with all internal structures) will eventually be removed by the garbage collector.

With these many entryIds in drf files, it will also take longer time to recover. I think Darrel's idea for removing unnecessary drf files will help in this case as well.

Agreed, I'll create another ticket for that.

@jujoramos jujoramos merged commit fdc4401 into apache:develop Jul 2, 2020
@jujoramos jujoramos deleted the feature/GEODE-8029 branch July 2, 2020 12:37
jujoramos added a commit to jujoramos/geode that referenced this pull request Jul 2, 2020
Do not delete drf files during member startup as that should be only
done by the compactor thread. Instead, allow the OplogEntryIdSet to
grow over the default capacity and log a warning message instructing
the user to manually compact the disk-stores.

- Added unit tests.
- Replaced usages of 'junit.Assert' by 'assertj'.
- Modified DiskStoreImpl.deadRecordCount to return long instead of int.
- Added internal overflow implementation to the OplogEntryIdSet so it can
  grow above the default limit.

(cherry picked from commit fdc4401)
jujoramos added a commit to jujoramos/geode that referenced this pull request Jul 2, 2020
Do not delete drf files during member startup as that should be only
done by the compactor thread. Instead, allow the OplogEntryIdSet to
grow over the default capacity and log a warning message instructing
the user to manually compact the disk-stores.

- Added unit tests.
- Replaced usages of 'junit.Assert' by 'assertj'.
- Modified DiskStoreImpl.deadRecordCount to return long instead of int.
- Added internal overflow implementation to the OplogEntryIdSet so it can
  grow above the default limit.

(cherry picked from commit fdc4401)
jujoramos added a commit that referenced this pull request Jul 3, 2020
Do not delete drf files during member startup as that should be only
done by the compactor thread. Instead, allow the OplogEntryIdSet to
grow over the default capacity and log a warning message instructing
the user to manually compact the disk-stores.

- Added unit tests.
- Replaced usages of 'junit.Assert' by 'assertj'.
- Modified DiskStoreImpl.deadRecordCount to return long instead of int.
- Added internal overflow implementation to the OplogEntryIdSet so it can
  grow above the default limit.

(cherry picked from commit fdc4401)
jujoramos added a commit that referenced this pull request Jul 3, 2020
Do not delete drf files during member startup as that should be only
done by the compactor thread. Instead, allow the OplogEntryIdSet to
grow over the default capacity and log a warning message instructing
the user to manually compact the disk-stores.

- Added unit tests.
- Replaced usages of 'junit.Assert' by 'assertj'.
- Modified DiskStoreImpl.deadRecordCount to return long instead of int.
- Added internal overflow implementation to the OplogEntryIdSet so it can
  grow above the default limit.

(cherry picked from commit fdc4401)
bschuchardt pushed a commit to bschuchardt/geode that referenced this pull request Sep 23, 2020
Do not delete drf files during member startup as that should be only
done by the compactor thread. Instead, allow the OplogEntryIdSet to
grow over the default capacity and log a warning message instructing
the user to manually compact the disk-stores.

- Added unit tests.
- Replaced usages of 'junit.Assert' by 'assertj'.
- Modified DiskStoreImpl.deadRecordCount to return long instead of int.
- Added internal overflow implementation to the OplogEntryIdSet so it can
  grow above the default limit.

(cherry picked from commit fdc4401)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants