Sync file system on temp file before moving it.#791
Conversation
|
Interesting; These run ok on my laptop. This may be turning up because of the comparatively restricted capability of the build machine. Retest this please. |
|
@geomacy Very cool - I didn't know about that! Looking at the javadoc (https://docs.oracle.com/javase/8/docs/api/java/io/FileDescriptor.html#sync--): That suggests there might be two causes:
I guess propagating the exception (as you're doing) without executing Do we go with this code, because it's an improvement; and then later come back to (2) for retries? Thoughts? |
47ce4c1 to
b3f3ce7
Compare
|
@aledsage very good questions! I agree it's best to propagate the exception if it happens - if it's due to a scenario like (1) where the FS can't guarantee the sync, then it's the wrong filesystem for doing persistence. A retry is probably a good idea - have added one in latest commit, deliberately keeping it very simple, to avoid possibly introducing other issues. Let me know what you think. I had a look at testing this but can't think of a way to do it easily. You'd either have to have a file system that you could force to fail |
|
@geomacy not convinced it's worth the effort of trying to mock the One can mock final classes with Mockito v2 (we're on v2.7.12 currently): https://stackoverflow.com/a/40018295/1393883, but you need to add a config file to change the I think this is good as-is. Have you tried running the integration tests, as well as the unit tests? |
|
I haven't run the integration tests, I shall try to make some time to do so. |
|
rebased against master, still haven't got round to integration tests |
|
@aledsage finally got round to checking this against the integration tests. It doesn't seem to have caused any significant problems - with a run of the tests against current With this PR I actually got fewer: Think this should be benign enough to merge. |
|
@geomacy @aledsage I hadn't realized this either; javadocs and online searches suggest the i'm curious whether this really is the issue -- at least on Mac -- as i've done some testing and the worth having this i agree, based on what i've read, though i have one worry -- if i also came across the code at https://github.com/apache/activemq-apollo/blob/trunk/apollo-util/src/main/scala/org/apache/activemq/apollo/util/IOHelper.java#L272 : here to get sync they included some custom JNI calls. this makes me wonder whether -- although things acted nicely in my tests -- there aren't actually any strong guarantees with java+mac, and you need that type of low-level library? although another part of me wonders if https://issues.apache.org/jira/browse/BROOKLYN-526 was in fact caused by something completely different (disk corruption?). one thread: 8 threads: code: |
|
You're right @ahgittin, the I agree I'm not sure that the problem was caused by a failure to commit data to disk when @aledsage's laptop crashed, but it's certainly plausible. That's an interesting link you shared above. I've done a bit more digging and come across a very interesting email here: https://lists.apple.com/archives/darwin-dev/2005/Feb/msg00072.html It's definitely worth reading; in short, it looks like just adding the In fact the JNI call that you referred to on OSX seems to be the only way to guarantee the data is committed, and of course it's only available on the Mac. I think this PR may still be something worth doing (it will eliminate one failure point, by ensuring the data is flushed from host memory to the disk device). I don't know what else we might want to do to take this further, would be interested to hear what you think. |
|
very interesting link @geomacy - the stronger guarantee of i think this PR is good and your fix here for the problem caused by the also-needed #809 looks good. i think merge though let's give @aledsage a few days to respond. as for any remaining risk ... the documentation for |
|
LGTM - merging now. |
See https://issues.apache.org/jira/browse/BROOKLYN-526.
This is a suggestion for a fix. Not sure how to go about testing this!