Sync file system on temp file before moving it. by geomacy · Pull Request #791 · apache/brooklyn-server

geomacy · 2017-08-04T11:15:19Z

See https://issues.apache.org/jira/browse/BROOKLYN-526.

This is a suggestion for a fix. Not sure how to go about testing this!

geomacy · 2017-08-04T13:07:34Z

Interesting;

Test Result (2 failures / -15)
org.apache.brooklyn.policy.loadbalancing.LoadBalancingPolicySoakTest.testLoadBalancingManyItemsQuickTest
org.apache.brooklyn.policy.loadbalancing.LoadBalancingPolicySoakTest.testLoadBalancingQuickTest

These run ok on my laptop. This may be turning up because of the comparatively restricted capability of the build machine. Retest this please.

aledsage · 2017-08-04T21:47:25Z

@geomacy Very cool - I didn't know about that!

Looking at the javadoc (https://docs.oracle.com/javase/8/docs/api/java/io/FileDescriptor.html#sync--):

Throws SyncFailedException - Thrown when the buffers cannot be flushed, or because the system cannot guarantee that all the buffers have been synchronized with physical media.

That suggests there might be two causes:

"cannot guarantee" because of limitations of the physical media? (we should catch + continue); or
io problems (e.g. a network drive that is temporarily unavailable - we should retry, with exponential backoff perhaps?

I guess propagating the exception (as you're doing) without executing move is probably right - we don't want to replace the old file with a file that might not have all the data in it.

Do we go with this code, because it's an improvement; and then later come back to (2) for retries?
Or do we worry about (1), where this might entirely break on some OSs/disks?

Thoughts?

geomacy · 2017-08-09T11:31:56Z

@aledsage very good questions!

I agree it's best to propagate the exception if it happens - if it's due to a scenario like (1) where the FS can't guarantee the sync, then it's the wrong filesystem for doing persistence. A retry is probably a good idea - have added one in latest commit, deliberately keeping it very simple, to avoid possibly introducing other issues. Let me know what you think.

I had a look at testing this but can't think of a way to do it easily. You'd either have to have a file system that you could force to fail sync on demand (something using FUSE?), or mock out the sync itself - I looked into using a dynamic proxy, but FileDescriptor is a static final class, so not suitable for proxying. Would be glad of any suggestions.

aledsage · 2017-08-14T09:00:52Z

@geomacy not convinced it's worth the effort of trying to mock the FileDescriptor to simulate a SyncFailedException. If we were doing more than retry + propagate, then maybe it would. But the current logic is pretty simple.

One can mock final classes with Mockito v2 (we're on v2.7.12 currently): https://stackoverflow.com/a/40018295/1393883, but you need to add a config file to change the MockMaker.

I think this is good as-is.

Have you tried running the integration tests, as well as the unit tests?

geomacy · 2017-08-16T09:28:22Z

I haven't run the integration tests, I shall try to make some time to do so.

See https://issues.apache.org/jira/browse/BROOKLYN-526

geomacy · 2017-08-18T13:44:06Z

rebased against master, still haven't got round to integration tests

geomacy · 2017-09-07T14:02:39Z

@aledsage finally got round to checking this against the integration tests.

It doesn't seem to have caused any significant problems - with a run of the tests against current master I got 7 failures:

grep "^<testng-results" {core,camp/camp-brooklyn,launcher}/target/surefire-reports/testng-results.xml
core/target/surefire-reports/testng-results.xml:<testng-results skipped="0" failed="3" ignored="2230" total="4492" passed="2259">
camp/camp-brooklyn/target/surefire-reports/testng-results.xml:<testng-results skipped="0" failed="3" ignored="755" total="806" passed="48">
launcher/target/surefire-reports/testng-results.xml:<testng-results skipped="0" failed="1" ignored="81" total="101" passed="19">

With this PR I actually got fewer:

grep "^<testng-results" {*,*/*}/target/surefire-reports/testng-results.xml | fgrep -v 'failed="0"'
test-framework/target/surefire-reports/testng-results.xml:<testng-results skipped="0" failed="1" ignored="96" total="119" passed="22">
camp/camp-brooklyn/target/surefire-reports/testng-results.xml:<testng-results skipped="0" failed="3" ignored="753" total="804" passed="48">

Think this should be benign enough to merge.

ahgittin · 2017-09-07T23:20:39Z

@geomacy @aledsage I hadn't realized this either; javadocs and online searches suggest the fd.sync() may be needed.

i'm curious whether this really is the issue -- at least on Mac -- as i've done some testing and the sync seems to be very fast whereas write is what takes the time. this is true even under contention: writes contend with each other (8 files written concurrently take 8+ times as long as writing 1 file; if the OS cache is in play i'd expect CPU contention which with 4 cores we'd expect 8 files to take twice as long, which is what happens if we do a System.arraycopy instead of a write) and if there's background disk activity it also slows down substantially. numbers and code below.

worth having this i agree, based on what i've read, though i have one worry -- if sync is done after the close, i always get the SyncFailedException; and it looks like the code in this PR is synching after close. we might want to understand that before merging?

i also came across the code at https://github.com/apache/activemq-apollo/blob/trunk/apollo-util/src/main/scala/org/apache/activemq/apollo/util/IOHelper.java#L272 : here to get sync they included some custom JNI calls. this makes me wonder whether -- although things acted nicely in my tests -- there aren't actually any strong guarantees with java+mac, and you need that type of low-level library?

although another part of me wonders if https://issues.apache.org/jira/browse/BROOKLYN-526 was in fact caused by something completely different (disk corruption?).

one thread:

t0 took 1.390 seconds to write 1,048,576,000 bytes to file
t0 took 0.010 seconds to sync
t0 took 0.000 seconds to close

8 threads:

t7 took 13.180 seconds to write 1,048,576,000 bytes to file
t7 took 0.099 seconds to sync
t7 took 0.000 seconds to close

code:

public class FileWrite {

    public static void main(String[] args) throws Exception {
        write("warmup");
        int NUM_THREADS = 8;
        for (int x=0; x<NUM_THREADS; x++) {
            final String id = "t"+x;
            new Thread() {
                public void run() {
                    try {
                        write(id);
                    } catch (Exception e) {
                        e.printStackTrace();
                    }
                }
            }.start();
        }
    }
    
    public static void write(String id) throws Exception {
        Stopwatch t = Stopwatch.createStarted();
        
        File file = File.createTempFile("deleteme", ".dat"); // put your file here.
        System.out.printf("%s took %.3f seconds to create %s%n", id, t.elapsed(TimeUnit.NANOSECONDS) / 1e9, file.getAbsolutePath()); t.reset(); t.start();
        
        FileOutputStream fos = new FileOutputStream(file);
        byte[] bytes = new byte[32 * 1024];
        // byte[] bytes2 = new byte[32 * 1024];
        for (long l = 0; l < 1000 * 1024 * 1024; l += bytes.length) { 
            fos.write(bytes);
            // System.arraycopy(bytes, 0, bytes2, 0, 32 * 1024);
        }
        System.out.printf("%s took %.3f seconds to write %,d bytes to file%n", id, t.elapsed(TimeUnit.NANOSECONDS) / 1e9, file.length()); t.reset(); t.start();
        
        fos.getFD().sync();
        System.out.printf("%s took %.3f seconds to sync%n", id, t.elapsed(TimeUnit.NANOSECONDS) / 1e9); t.reset(); t.start();
        
        fos.close();
        System.out.printf("%s took %.3f seconds to close%n", id, t.elapsed(TimeUnit.NANOSECONDS) / 1e9); t.reset(); t.start();
        
        File file2 = File.createTempFile("deleteme", ".dat"); // put your file here.
        java.nio.file.Files.move(file.toPath(), file2.toPath(), StandardCopyOption.ATOMIC_MOVE);
        System.out.printf("%s took %.3f seconds to move to %s%n", id, t.elapsed(TimeUnit.NANOSECONDS) / 1e9, file2.getAbsolutePath()); t.reset(); t.start();
    }
    
}

geomacy · 2017-09-08T09:35:20Z

You're right @ahgittin, the sync should be done before the close; I have added another commit to do this.

I agree I'm not sure that the problem was caused by a failure to commit data to disk when @aledsage's laptop crashed, but it's certainly plausible. That's an interesting link you shared above. I've done a bit more digging and come across a very interesting email here:

https://lists.apple.com/archives/darwin-dev/2005/Feb/msg00072.html

It's definitely worth reading; in short, it looks like just adding the sync() in Brooklyn, while it may help, isn't in itself enough to ensure the data is committed to disk on pretty much any system, not just OSX. :-(

In fact the JNI call that you referred to on OSX seems to be the only way to guarantee the data is committed, and of course it's only available on the Mac.

I think this PR may still be something worth doing (it will eliminate one failure point, by ensuring the data is flushed from host memory to the disk device). I don't know what else we might want to do to take this further, would be interested to hear what you think.

ahgittin · 2017-09-08T14:29:22Z

very interesting link @geomacy - the stronger guarantee of fcntl() probably what the activemq jni code does.

i think this PR is good and your fix here for the problem caused by the also-needed #809 looks good. i think merge though let's give @aledsage a few days to respond.

as for any remaining risk ... the documentation for sync is unambiguous (and the post is old) so i tend towards thinking this does what the API says, synching data with the disk. but let's remember this issue is here in case we come across systems where the OS/disk needs additional configuration to ensure sync does what it says it does.

aledsage · 2017-09-11T08:20:08Z

LGTM - merging now.

geomacy force-pushed the brooklyn-526 branch 3 times, most recently from 47ce4c1 to b3f3ce7 Compare August 9, 2017 11:22

Sync file system on temp file before moving it.

9fe0179

See https://issues.apache.org/jira/browse/BROOKLYN-526

geomacy force-pushed the brooklyn-526 branch from b3f3ce7 to 9fe0179 Compare August 18, 2017 13:43

Sync before close

c5cbc08

asfgit merged commit c5cbc08 into apache:master Sep 11, 2017

asfgit pushed a commit that referenced this pull request Sep 11, 2017

This closes #791

06c9f62

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync file system on temp file before moving it.#791

Sync file system on temp file before moving it.#791
asfgit merged 2 commits intoapache:masterfrom
geomacy:brooklyn-526

geomacy commented Aug 4, 2017

Uh oh!

geomacy commented Aug 4, 2017

Uh oh!

aledsage commented Aug 4, 2017 •

edited

Loading

Uh oh!

geomacy commented Aug 9, 2017 •

edited

Loading

Uh oh!

aledsage commented Aug 14, 2017

Uh oh!

geomacy commented Aug 16, 2017

Uh oh!

geomacy commented Aug 18, 2017

Uh oh!

geomacy commented Sep 7, 2017

Uh oh!

ahgittin commented Sep 7, 2017

Uh oh!

geomacy commented Sep 8, 2017

Uh oh!

ahgittin commented Sep 8, 2017

Uh oh!

aledsage commented Sep 11, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

geomacy commented Aug 4, 2017

Uh oh!

geomacy commented Aug 4, 2017

Uh oh!

aledsage commented Aug 4, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

geomacy commented Aug 9, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aledsage commented Aug 14, 2017

Uh oh!

geomacy commented Aug 16, 2017

Uh oh!

geomacy commented Aug 18, 2017

Uh oh!

geomacy commented Sep 7, 2017

Uh oh!

ahgittin commented Sep 7, 2017

Uh oh!

geomacy commented Sep 8, 2017

Uh oh!

ahgittin commented Sep 8, 2017

Uh oh!

aledsage commented Sep 11, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

aledsage commented Aug 4, 2017 •

edited

Loading

geomacy commented Aug 9, 2017 •

edited

Loading