Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Journal without flush #3979

Merged
merged 1 commit into from
Jun 6, 2023
Merged

Conversation

gaozhangmin
Copy link
Contributor

Motivation

In the given code snippet, a ForceWriteRequest object is created with the flushed flag always set to true when calling the createForceWriteRequest method, reused from a recycle pool. Consequently, when the flushFileToDisk method is invoked, it checks the flushed flag and only performs the file flush if it is false. However, since the flushed flag is always true due to the recycling process, the file flush is skipped.

private ForceWriteRequest createForceWriteRequest(JournalChannel logFile,
                          long logId,
                          long lastFlushedPosition,
                          RecyclableArrayList<QueueEntry> forceWriteWaiters,
                          boolean shouldClose) {
        ForceWriteRequest req = forceWriteRequestsRecycler.get();
        req.forceWriteWaiters = forceWriteWaiters;
        req.logFile = logFile;
        req.logId = logId;
        req.lastFlushedPosition = lastFlushedPosition;
        req.shouldClose = shouldClose;
        journalStats.getForceWriteQueueSize().inc();
        return req;
    }
private void flushFileToDisk() throws IOException {
            if (!flushed) {
                logFile.forceWrite(false);
                flushed = true;
            }
        }

Changes:

To address this issue, the recycle() method should be modified to set the flushed flag to false when recycling a ForceWriteRequest object. This ensures that the flushFileToDisk method will correctly perform the file flush when needed. The updated recycle() method could look like this:

public void recycle() {
    // Reset other fields to their initial state
    forceWriteWaiters = null;
    logFile = null;
    logId = 0;
    lastFlushedPosition = 0;
    shouldClose = false;

    // Reset the flushed flag to false
    flushed = false;

    // Recycle the object back to the pool
    forceWriteRequestsRecycler.recycle(this);
}

By setting flushed to false during recycling, subsequent reuse of the ForceWriteRequest object will correctly trigger the file flush operation in the flushFileToDisk method when necessary.

Copy link
Member

@horizonzy horizonzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@lhotari
Copy link
Member

lhotari commented Jun 6, 2023

Good catch! Does this bug mean that Bookkeeper hasn't been fsyncing the journal to disk in certain cases?

@horizonzy
Copy link
Member

Good catch! Does this bug mean that Bookkeeper hasn't been fsyncing the journal to disk in certain cases?

Yes.

Copy link
Member

@wenbingshen wenbingshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good Catch!

Copy link
Contributor

@hangc0276 hangc0276 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a critical bug, and we need to trigger a new release.

@horizonzy
Copy link
Member

This is a critical bug, and we need to trigger a new release.

+1

@hangc0276 hangc0276 added this to the 4.17.0 milestone Jun 6, 2023
@@ -421,6 +421,7 @@ private ForceWriteRequest(Handle<ForceWriteRequest> recyclerHandle) {

private void recycle() {
logFile = null;
flushed = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of bug is completely avoided by removing object recycling. I understand that object recycling was added a while ago, but I wonder if it is still necessary on JVM 17. It'd be valuable to run benchmarks to verify the underlying assumption that we get better performance by recycling objects.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great point

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be easy to test because you can disable the recycler using Netty Java properties

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I wonder if we could follow up and add some tests.

We have tests about force() being called (I remember I added them while working on DEFERED_SYNC)

@eolivelli eolivelli merged commit 50bf016 into apache:master Jun 6, 2023
16 checks passed
@horizonzy
Copy link
Member

We can add a test to cover the recycle() case in the important place
By comparing new objects with objects retrieved from the pool.

@nicoloboschi
Copy link
Contributor

@gaozhangmin do we know if it's a regression of a particular BookKeeper version?

gaozhangmin pushed a commit to gaozhangmin/bookkeeper that referenced this pull request Jun 6, 2023
gaozhangmin pushed a commit to gaozhangmin/bookkeeper that referenced this pull request Jun 6, 2023
@tisonkun
Copy link
Member

We can add a test to cover the recycle() case in the important place By comparing new objects with objects retrieved from the pool.

Instead of testing the implementation, perhaps testing the performance a good idea - if missing recycling doesn't cause performance regression, that should be fine.

Does BookKeeper have any kind of performance regression coverage now?

@horizonzy
Copy link
Member

Does BookKeeper have any kind of performance regression coverage now?

It has, I will test the performance without object recycling. But, I am not sure about the performance impact on low-performance CPUs.

zymap pushed a commit that referenced this pull request Jun 19, 2023
Co-authored-by: gavingaozhangmin <gavingaozhangmin@didiglobal.com>
(cherry picked from commit 50bf016)
@hangc0276 hangc0276 mentioned this pull request Jun 27, 2023
Ghatage pushed a commit to sijie/bookkeeper that referenced this pull request Jul 12, 2024
Co-authored-by: gavingaozhangmin <gavingaozhangmin@didiglobal.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet