Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Atomic writes #167

Closed
wants to merge 2 commits into from
Closed

Conversation

lehins
Copy link
Contributor

@lehins lehins commented Apr 22, 2019

This PR implements atomic file writes and fixes a bunch of issues with durable atomic and only durable file writes. In order to see the actual locations with short description of each issue see this PR

Below is higher level overview of the issues being fixed:

  • withBinaryFileDurableAtomic fails with nested folders #160 (atomic renames aren't working with relative paths)
  • File can get corrupted upon exception in both atomic writing functions
  • Durability for existing files is broken, since copying is done with copyFile and we cannot guarantee successful fsync when handle is closed
  • Temporary files are not deleted upon an exception
  • withBinaryFileDurable produces an error with ReadMode
  • Possible race condition in between doesFileExist and copyFile, which can cause an unexpected DoesNotExist exception
  • Permissions of files are not copied over during atomic rename
  • Temporary file is visible on the file system while it is being modified, as such it can be tempered with by another thread or even a different process. By using open with O_TMPFILE we can provide isolation, which is the implemented approach in this PR.

Besides addition of writeBinaryFileAtomic and withBinaryFileAtomic, this PR also includes a bunch of tests for the all functions in RIO.Files (close to 100% coverage).

Few general changes:

  • moved -DWINDOWS flag into the library section, all tests are OS agnostic and do not directly depend on unix nor Win32
  • turned on -Wall by default.
  • removed of all compile time warnings
  • Added of readBinaryFile, writeBinaryFile and withBinaryFileLazy as synonyms for consistent naming
  • Re-exported RIO.Prelude.Types from RIO.Prelude (it was imported qualified, which prevented from types being re-exported). import RIO is not affected.
  • Silenced one hlint warning
  • Updated changelog and documentation for RIO.File

This change is Reviewable

Copy link

@nh2 nh2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 12 of 14 files at r1.
Reviewable status: 12 of 14 files reviewed, 19 unresolved discussions (waiting on @lehins)


rio/cbits/rio.c, line 34 at r1 (raw file):

  return S_IWUSR;
}

Calling into C is not necessary, all of this can be done at compile time with a .hsc file.

It would probably be easier there.

If we want to keep calling into C for some reasons, we should probably add haskell_rio into the function names or so to ensure those symbols are there only once, in case some other C library linked into the final project also decides to call their functions __o_tmpfile.


rio/src/RIO/File.hs, line 49 at r1 (raw file):

functionality and the one that is done atomically. Even if the orignal file is
removed while it is being modified, due to atomicity, it will be restored with
all modifications, if any. The important consequence of this fact is that

I recommend to insert the reason here, so that the user understands without doubt what's going on: This is because the file is atomically linked into the file system at the end.


rio/src/RIO/File.hs, line 135 at r1 (raw file):

-- After here, we have our own imports

foreign import ccall unsafe "rio.c __o_tmpfile" o_TMPFILE :: CInt

I find it somewhat confusing that o_TMPFILE can also be 0.
That should have a comment here.


rio/src/RIO/File.hs, line 142 at r1 (raw file):

foreign import ccall safe "fcntl.h openat"
  c_safe_openat :: Fd -> CFilePath -> CInt -> CMode -> IO CInt

If I remember correctly, we intentionally used only C* types in foreign function declarations.

I'm always a bit on the fence with this.

The key thing is that there is no type checking for foreign imports; one can write arbitrary stuff in there and nothing checks it. That's why I always try to make foreign import type signatures look as similar to the man page's types as possible, so that it looks "obviously correct" and nobody has to look up in the docs whether Fd really is CInt.


rio/src/RIO/File.hs, line 151 at r1 (raw file):

foreign import ccall safe "unistd.h linkat"
  c_safe_linkat :: CInt -> CFilePath -> CInt -> CFilePath -> CInt -> IO CInt

If we really wanted to go for Fd, then c_safe_linkat should probably use it too.


rio/src/RIO/File.hs, line 208 at r1 (raw file):

-- | Optionally set file permissions, call @fsync@ on the file handle and then
-- close it.

The comment does not align with the name of this function. If it closes the file as the haddocks say, it should be in the name. But I don't see it closing the file, so maybe the comment is outdated. In the first commit in this series, it did close the file. I think it makes sense that it doesn't close the file.


rio/src/RIO/File.hs, line 215 at r1 (raw file):

-- | Call @fsync@ on the opened directory file descriptor
fsyncDirectoryFd :: String -> Fd -> IO ()
fsyncDirectoryFd fname = fsyncFileDescriptor (fname ++ "/Directory")

This is confusing, as there can be a file called Directory.

How about using /../ instead?

Then the error message will show e.g. path/to/file/../, which makes clear that we're talking about the directory called file (you can even call stat on it and get the expected result).


rio/src/RIO/File.hs, line 228 at r1 (raw file):

--
openFileFromDir :: MonadIO m => Fd -> FilePath -> IOMode -> m Handle
openFileFromDir dirFd (dropDirectoryIfRelative -> fp) iomode =

Personally I'm no fan of view patterns in public type signatures.

They make them harder to parse visually, make that now there isn't actually a variable that corresponds to the actual FilePath that was passed into the function (so you can't trace it easily for debugging or, use ghci's debugger to print the variable), and add more fancy Haskell stuff one has to learn to understand the code for little added benefit.

I'd just let-bind it.


rio/src/RIO/File.hs, line 228 at r1 (raw file):

--
openFileFromDir :: MonadIO m => Fd -> FilePath -> IOMode -> m Handle
openFileFromDir dirFd (dropDirectoryIfRelative -> fp) iomode =

Why do we drop relative parent directories here?

I think that is unnecessary, and it would make sense for the function to support them, just like openat() does.

I would find it very surprising to discover that this function drops part of what I gave it.


rio/src/RIO/File.hs, line 257 at r1 (raw file):

dropDirectoryIfRelative fp
  | isRelative fp = takeFileName fp
  | otherwise = fp

I think this function is unnecessary, see comments on its use sites.


rio/src/RIO/File.hs, line 259 at r1 (raw file):

  | otherwise = fp

-- | Open an anonymous temporary file

... in the given directory? Should the function name also end with FromDir, like the function above, for consistency? What about the other comments on the function above, do they apply here too / should they be copied or referred to?

This function should probably have parameter haddocks for extra clarity.

Especially the first argument, eDir. It should say that the Fd or file path should point to a directory. If it's a path, does it need to exist before?

What happens when an Fd or path pointing to a file, instead of dir, is given?


rio/src/RIO/File.hs, line 262 at r1 (raw file):

openAnonymousTempFile ::
     MonadIO m => Either Fd FilePath -> FilePath -> IOMode -> m Handle
openAnonymousTempFile eDir filePath iomode =

What's the filePath argument good for? Can it be removed?


rio/src/RIO/File.hs, line 280 at r1 (raw file):

        (do fileFd <-
              throwErrnoIfMinus1Retry "openAnonymousTempFile" $
              fopen (o_TMPFILE .|. ioModeToTmpFlags iomode) (s_IRUSR .|. s_IWUSR)

We must handle the situation that O_TMPFILE is not supported on the filesystem.

This means implementing a fallback for the case that EOPNOTSUPP is returned.


rio/src/RIO/File.hs, line 319 at r1 (raw file):

--
-- It is important to note, that whenever a file descriptor for the containing
-- directory is supplied, renaming and linking will be done in it's context,

it's -> its


rio/src/RIO/File.hs, line 323 at r1 (raw file):

atomicTempFileCreate ::
     Maybe Fd
  -- ^ Possible handle for the directory where the target file is located

is located sounds like it must already exist. shall end up in?


rio/src/RIO/File.hs, line 331 at r1 (raw file):

  -- `o_TMPFILE`
  -> FilePath
  -- ^ File path for the target file.

Relative, absolute, and what happens in each case and for either case of what the mDirFd is?


rio/src/RIO/File.hs, line 376 at r1 (raw file):

  -- to atomic rename.
  -> Either Handle FilePath
  -- ^ Temporary file's path

It's not clear from the docs that this file must already exist, or the function will fail.

Let's add that to top-level haddocks of the function. Those haddocks should also say in which situations this function is to be used.


rio/src/RIO/File.hs, line 390 at r1 (raw file):

        Just dirFd ->
          withFilePath (dropDirectoryIfRelative filePath) $ \cToFilePath ->
            withFilePath (dropDirectoryIfRelative tmpFilePath) $ \cTmpFilePath ->

I think both uses of dropDirectoryIfRelative are unnecessary and should be dropped.


rio/src/RIO/File.hs, line 435 at r1 (raw file):

  -- of @open@
  -> FilePath
  -- ^ Source file path. The file may exist or may not.

It's not clear from this that if a directory Fd is given, filePath is completely ignored.

Copy link

@nh2 nh2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 12 of 14 files reviewed, 20 unresolved discussions (waiting on @lehins)


rio/src/RIO/File.hs, line 335 at r1 (raw file):

atomicTempFileCreate mDirFd mFileMode tmpFileHandle filePath =
  withHandleFd tmpFileHandle $ \fd@(Fd cFd) ->
    withFilePath ("/proc/self/fd/" ++ show cFd) $ \cFromFilePath ->

/proc does not exist on the BSDs and OSX.

Please add a stating how it ensured that this path cannot be taken, and also mention it in the haddocks that this function is Linux only (or /proc-only).

Copy link

@nh2 nh2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 12 of 14 files reviewed, 20 unresolved discussions (waiting on @lehins)


rio/src/RIO/File.hs, line 335 at r1 (raw file):

a stating

I mean a comment stating

Copy link
Contributor Author

@lehins lehins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 12 of 14 files reviewed, 20 unresolved discussions (waiting on @lehins and @nh2)


rio/src/RIO/File.hs, line 135 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

I find it somewhat confusing that o_TMPFILE can also be 0.
That should have a comment here.

I'll add a comment about it


rio/src/RIO/File.hs, line 142 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

If I remember correctly, we intentionally used only C* types in foreign function declarations.

I'm always a bit on the fence with this.

The key thing is that there is no type checking for foreign imports; one can write arbitrary stuff in there and nothing checks it. That's why I always try to make foreign import type signatures look as similar to the man page's types as possible, so that it looks "obviously correct" and nobody has to look up in the docs whether Fd really is CInt.

If I can add at least a little bit of type safety to it I always will. I don't care if any body has to lookup the fact that Fd is a newtype wrapper around CInt, as long as it is. If I could, I would make every single argument in this function a separate Haskell type. Most important reason why I feel strongly about it is that within this module it is impossible to mix up the arguments and supply some flag instead of a file descriptor.

To keep us both happy I'll add a function synonym for each FFI with proper types


rio/src/RIO/File.hs, line 151 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

If we really wanted to go for Fd, then c_safe_linkat should probably use it too.

Not quite. Since we use special flag AT_FDCWD in place of Fd in some cases, but in other we use Fd


rio/src/RIO/File.hs, line 208 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

The comment does not align with the name of this function. If it closes the file as the haddocks say, it should be in the name. But I don't see it closing the file, so maybe the comment is outdated. In the first commit in this series, it did close the file. I think it makes sense that it doesn't close the file.

Yes, the comment is old. handle is closed elsewhere


rio/src/RIO/File.hs, line 215 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

This is confusing, as there can be a file called Directory.

How about using /../ instead?

Then the error message will show e.g. path/to/file/../, which makes clear that we're talking about the directory called file (you can even call stat on it and get the expected result).

That is not the filepath. It is fname as in function name. Used for error reporting. Eg.
fsyncDirectoryFd "atomicDurableTempFileCreate" dirFd usage in atomicDurableTempFileCreate function


rio/src/RIO/File.hs, line 228 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

Personally I'm no fan of view patterns in public type signatures.

They make them harder to parse visually, make that now there isn't actually a variable that corresponds to the actual FilePath that was passed into the function (so you can't trace it easily for debugging or, use ghci's debugger to print the variable), and add more fancy Haskell stuff one has to learn to understand the code for little added benefit.

I'd just let-bind it.

That's a pretty subjective opinion :P
In my personal opinion this makes it a lot clearer what is going on. Nice and concise, you do not need to follow extra bindings to see what fp really means. And sure you can trace it:

openFileFromDir dirFd fullPath@(dropDirectoryIfRelative -> fp) iomode = trace fullPath $ 

In any case, it does not affect semantics, so I'll leave it as is.


rio/src/RIO/File.hs, line 228 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

Why do we drop relative parent directories here?

I think that is unnecessary, and it would make sense for the function to support them, just like openat() does.

I would find it very surprising to discover that this function drops part of what I gave it.

We do that to fix: #160 Checkout man openat and what it says about relative directories.


rio/src/RIO/File.hs, line 257 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

I think this function is unnecessary, see comments on its use sites.

You are thinking wrong ;) See my comments at use sites. Also if you don't believe me, try removing it and running the test suite.


rio/src/RIO/File.hs, line 259 at r1 (raw file):

Sure, I renamed the function and added a link. Added arguments haddock too

... in the given directory? Should the function name also end with FromDir, like the function above, for consistency? What about the other comments on the function above, do they apply here too / should they be copied or referred to?

I have no idea. It is not designed to be called on the file Fd, only directory Fd. Glad you brought it up. We can add more types for this.

What happens when an Fd or path pointing to a file, instead of dir, is given?


rio/src/RIO/File.hs, line 262 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

What's the filePath argument good for? Can it be removed?

It is used for giving an arbitrary name to the file descriptor


rio/src/RIO/File.hs, line 280 at r1 (raw file):
Cool. I'll need to do a bit of work on that. Fallback functionality is already available, so it is a matter of catching an exception and retrying it.

EOPNOTSUPP


rio/src/RIO/File.hs, line 319 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

it's -> its

Done.


rio/src/RIO/File.hs, line 323 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

is located sounds like it must already exist. shall end up in?

Well, the file is already in that directory, just without a name. In other words it had to be opened with openAnonymousTempFileFromDir


rio/src/RIO/File.hs, line 390 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

I think both uses of dropDirectoryIfRelative are unnecessary and should be dropped.

They are indeed required


rio/src/RIO/File.hs, line 435 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

It's not clear from this that if a directory Fd is given, filePath is completely ignored.

What do you mean ignored? it is also further passed down into openAnonymousTempFile.

Copy link

@nh2 nh2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 14 files at r1.
Reviewable status: 13 of 14 files reviewed, 35 unresolved discussions (waiting on @lehins and @nh2)


rio/src/RIO/File.hs, line 342 at r1 (raw file):

              throwErrnoIfMinus1Retry
                ("atomicFileCreate - c_safe_linkat - " ++ which) $
              c_safe_linkat at_FDCWD cFromFilePath cDirFd to at_SYMLINK_FOLLOW

This needs a direct reference to the man page, perhaps even a quote, to show that this is the standard way of doing it and not something fancy we came up with.


rio/src/RIO/File.hs, line 347 at r1 (raw file):

          safeLink "anonymous" cToFilePath
        case eExc of
          Right _ -> pure ()

What are we ignoring here? Looks like (). Let's give safeLink a type signature with :: ... (), and then match () here to remove the question.


rio/src/RIO/File.hs, line 367 at r1 (raw file):

      case mDirFd of
        Nothing           -> (at_FDCWD, filePath)
        Just (Fd cDirFd') -> (cDirFd', takeFileName filePath)

Like for dropDirectoryIfRelative, I think that this takeFileName is probably not good.


rio/src/RIO/File.hs, line 407 at r1 (raw file):

-- | Create a temporary file for a matching possibly exiting target file that
-- will be replaced in the future. Temporary file
-- is meant to be renamed afterwards, thus it is only delted upon error.

delted -> deleted

Should also comment exactly on how the created file is called.


rio/src/RIO/File.hs, line 410 at r1 (raw file):

--
-- __Important__: Temporary file is not removed and file handle is not closed if
-- there was no exception thrown by the supplied action.

I'd put that behaviour right into the function name.


rio/src/RIO/File.hs, line 428 at r1 (raw file):

    tmpFileName = "." <> fileName <> ".tmp"

withAnonymousBinaryTempFileFor ::

Haddocks on this one and the following with recommendations on when which one is or should be used.


rio/src/RIO/File.hs, line 446 at r1 (raw file):

    dir = maybe (Right (takeDirectory filePath)) Left mDirFd

withSimpleBinaryTempFileFor ::

Simple doesn't tell me much here, perhaps this should be withImmediatelyDeletedBinaryTempFileFor or withNonAnonymous... to more obviously contrast with the anonymous approach?


rio/src/RIO/File.hs, line 453 at r1 (raw file):

  -- of @open@
  -> FilePath
  -- ^ Source file path. The file may exist or may not.

Source file path doesn't mean anything to me in context of what I'd expect the function to do. Can we clarify this somehow? (Same for function above.)


rio/src/RIO/File.hs, line 522 at r1 (raw file):

  case iomode of
    ReadMode
      -- We do not need to consider an durable operations when we are in a

an durable operations -> durable durable operations


rio/src/RIO/File.hs, line 528 at r1 (raw file):

     ->
      withDirectory (takeDirectory filePath) $ \dirFd ->
        withFileInDirectory dirFd filePath iomode $ \tmpFileHandle -> do

This looks like the place to me where we should takeFileName: Right next to where we do takeDirectory, as opposed to inside the function.


rio/src/RIO/File.hs, line 548 at r1 (raw file):

     ->
      withDirectory (takeDirectory filePath) $ \dirFd ->
        if o_TMPFILE == 0

Comment on what this means (the platform doesn't support O_TMPFILE)


rio/src/RIO/File.hs, line 598 at r1 (raw file):
Let's add here something like

If possible, use one of the other functions in this module that does not perform re-opening of already closed files.


rio/src/RIO/File.hs, line 685 at r1 (raw file):
Let's add here

If you do not need durability but only atomicity, use withBinaryFileAtomic instead, which is faster as it does not perform @fsync()@.


rio/src/RIO/File.hs, line 721 at r1 (raw file):

--
-- It is similar to `withBinaryFileDurableAtomic`, but without the durability
-- part. What it means is that all modification can still disappear after it has

What it means is that -> It means that


rio/src/RIO/File.hs, line 725 at r1 (raw file):
Let's add a Performance considerations section here like in the function above, saying something like:

The same performance caveats for non-truncating writes apply as for withBinaryFileDurableAtomic.

Copy link

@nh2 nh2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 14 files at r1.
Reviewable status: all files reviewed, 37 unresolved discussions (waiting on @lehins and @nh2)


rio/test/RIO/FileSpec.hs, line 69 at r1 (raw file):

      withSystemTempDirectory "rio" $ \dir -> do
        let fp = dir </> fname ++ "-read"
        withHelloFileTestable fp ReadWriteMode (`BS.hGetNonBlocking` BS.length hello) `shouldReturn`

Is this correct? As far as I can tell, hGetNonBlocking might as well return "Hell" if you're unlucky. I think hGet would be correct.


rio/test/RIO/FileSpec.hs, line 121 at r1 (raw file):

    it "exception - Does not corrupt files" $
      bool expectFailure id atomic $ -- should fail for non-atomic
      forAll (elements [WriteMode, ReadWriteMode, AppendMode]) $ \iomode ->

What is the purpose of using a QuickCheck Gen here, why not just run all 3 cases deterministically?

Copy link

@nh2 nh2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 34 unresolved discussions (waiting on @lehins and @nh2)


rio/src/RIO/File.hs, line 215 at r1 (raw file):

Previously, lehins (Alexey Kuleshevich) wrote…

That is not the filepath. It is fname as in function name. Used for error reporting. Eg.
fsyncDirectoryFd "atomicDurableTempFileCreate" dirFd usage in atomicDurableTempFileCreate function

Ah right.


rio/src/RIO/File.hs, line 228 at r1 (raw file):

Previously, lehins (Alexey Kuleshevich) wrote…

We do that to fix: #160 Checkout man openat and what it says about relative directories.

       If the pathname given in pathname is relative, then it is  interpreted
       relative  to  the  directory  referred to by the file descriptor dirfd
       (rather than relative to the current working directory of the  calling
       process, as is done by open() for a relative pathname).

       If  pathname is relative and dirfd is the special value AT_FDCWD, then
       pathname is interpreted relative to the current working  directory  of
       the calling process (like open()).

None of these two seem to forbid to do openat(myDirFd, "path/to/file/from/there").

So what I mean is that this function should support that use case, and that at the top-level place where it's called (I just added a comment in one such location), the split should be done accordingly, like

openat(openDirFd(takeDirName path), takeFileName path)

as opposed to implementing the takeFileName missing up there all the way down here.


rio/src/RIO/File.hs, line 262 at r1 (raw file):

Previously, lehins (Alexey Kuleshevich) wrote…

It is used for giving an arbitrary name to the file descriptor

Could we derive it from the eDir argument (that one's fdName if it's a Left, and takeFileName if it's a Right) instead?

It seems a bit redundant to pass in that extra info to name something that is anonymous, when we already have information about the thing that it's for, and removing an argument would make the function easier to understand.


rio/src/RIO/File.hs, line 323 at r1 (raw file):

Previously, lehins (Alexey Kuleshevich) wrote…

Well, the file is already in that directory, just without a name. In other words it had to be opened with openAnonymousTempFileFromDir

Ah, that makes sense. Let's write it explicitly.

Copy link

@nh2 nh2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 13 of 14 files reviewed, 35 unresolved discussions (waiting on @lehins and @nh2)


rio/src/RIO/File.hs, line 563 at r2 (raw file):

  liftIO $
  withDirectory (takeDirectory filePath) $ \dirFd ->
    withFileInDirectory dirFd filePath ReadMode (const $ return ())

Does this function actually still call fsync() anywhere?

Copy link
Contributor Author

@lehins lehins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 13 of 14 files reviewed, 35 unresolved discussions (waiting on @fsync, @lehins, and @nh2)


rio/cbits/rio.c, line 34 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

Calling into C is not necessary, all of this can be done at compile time with a .hsc file.

It would probably be easier there.

If we want to keep calling into C for some reasons, we should probably add haskell_rio into the function names or so to ensure those symbols are there only once, in case some other C library linked into the final project also decides to call their functions __o_tmpfile.

sounds good. I'll add rio_ to the name, adding haskell into C function names feels weird.


rio/src/RIO/File.hs, line 49 at r1 (raw file):
That's not the reason why though, it is because a copy of the file was made prior to any modification taking place.

This is because the file is atomically linked into the file system at the end.

Moreover I don't want to put "linking" in the public documentation because many will not know what this even means, secondly linking is done only on Linux and only when the file does NOT exist, because with linking you can't atomically replace a file.


rio/src/RIO/File.hs, line 151 at r1 (raw file):

Previously, lehins (Alexey Kuleshevich) wrote…

Not quite. Since we use special flag AT_FDCWD in place of Fd in some cases, but in other we use Fd

I created synonyms for all system calls that improve on type safety. So that should be taken care of


rio/src/RIO/File.hs, line 228 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…
       If the pathname given in pathname is relative, then it is  interpreted
       relative  to  the  directory  referred to by the file descriptor dirfd
       (rather than relative to the current working directory of the  calling
       process, as is done by open() for a relative pathname).

       If  pathname is relative and dirfd is the special value AT_FDCWD, then
       pathname is interpreted relative to the current working  directory  of
       the calling process (like open()).

None of these two seem to forbid to do openat(myDirFd, "path/to/file/from/there").

So what I mean is that this function should support that use case, and that at the top-level place where it's called (I just added a comment in one such location), the split should be done accordingly, like

openat(openDirFd(takeDirName path), takeFileName path)

as opposed to implementing the takeFileName missing up there all the way down here.

What you are now recommending is to drop the path for every file, regardless if it's relative or not. That suggestion should work, I think, since we always derive the directory from the file path. I'll change that.

I thought about doing it that way before, but I was worrying a little bit if there is some edge case that I am not thinking of where a file path is absolute and we switch it to relative. But it does make sense, because all files are always opened in the directory, which means this should be ok.

Note that it is different from what you where suggesting to not drop the path and keep the file path in the relative or absolute form, cause using relative path would be (was) a bug. Eg:

openat(openDirFd(takeDirName "foo/bar.txt"), "foo/bar.txt")

while this is ok

openat(openDirFd(takeDirName "/foo/bar.txt"), "/foo/bar.txt")

rio/src/RIO/File.hs, line 257 at r1 (raw file):

Previously, lehins (Alexey Kuleshevich) wrote…

You are thinking wrong ;) See my comments at use sites. Also if you don't believe me, try removing it and running the test suite.

Now that we'll make all files paths to be relative to the directory Fd, that function is unnecessary


rio/src/RIO/File.hs, line 262 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

Could we derive it from the eDir argument (that one's fdName if it's a Left, and takeFileName if it's a Right) instead?

It seems a bit redundant to pass in that extra info to name something that is anonymous, when we already have information about the thing that it's for, and removing an argument would make the function easier to understand.

I got rid of FilePath for the directory. So the argument cannot be dropped, but now instead of an Either DirFd FilePath we have Maybe DirFd, which is certainly an improvement. Thanks.


rio/src/RIO/File.hs, line 446 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

Simple doesn't tell me much here, perhaps this should be withImmediatelyDeletedBinaryTempFileFor or withNonAnonymous... to more obviously contrast with the anonymous approach?

it is not really ImmediatelyDeleted, but immediately closed. Anyways NonAnonymous sounds fine, changed to that


rio/src/RIO/File.hs, line 453 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

Source file path doesn't mean anything to me in context of what I'd expect the function to do. Can we clarify this somehow? (Same for function above.)

Every single exported function in this module supplies a FilePath, that is the one I am talking about. What would you like me to write in this context?


rio/src/RIO/File.hs, line 548 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

Comment on what this means (the platform doesn't support O_TMPFILE)

What's going on in here, the code is outdated?
I changed it to o_TMPFILE == o_TMPFILE_not_supported I Also added a comment describing what that binding means.


rio/src/RIO/File.hs, line 598 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

Let's add here something like

If possible, use one of the other functions in this module that does not perform re-opening of already closed files.

I can add it, but I do not understand what you are trying to say. Is your intention is to say to the user: "This function does not provide the same guarantee as if you would open and modify a file using withBinaryFileDurable or writeBinaryFileDurable, since they ensure that the @fsync()@ is called before the file is closed, so if possible use those instead."


rio/src/RIO/File.hs, line 685 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

Let's add here

If you do not need durability but only atomicity, use withBinaryFileAtomic instead, which is faster as it does not perform @fsync()@.

Done.


rio/src/RIO/File.hs, line 721 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

What it means is that -> It means that

Done.


rio/src/RIO/File.hs, line 725 at r1 (raw file):
I'll rephrase it a bit, since it is not clear which performance caveats. @fsync@ performance or copy file performance

The same performance caveats apply as for withBinaryFileDurableAtomic due
to making a copy of the content of existing files during non-truncating
writes.

Something like that maybe, what do you think?


rio/src/RIO/File.hs, line 563 at r2 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

Does this function actually still call fsync() anywhere?

Good catch! Fixed.

Copy link
Contributor Author

@lehins lehins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 11 of 14 files reviewed, 35 unresolved discussions (waiting on @fsync and @nh2)


rio/src/RIO/File.hs, line 280 at r1 (raw file):

Previously, lehins (Alexey Kuleshevich) wrote…

Cool. I'll need to do a bit of work on that. Fallback functionality is already available, so it is a matter of catching an exception and retrying it.

EOPNOTSUPP

Cool, I implemented it by checking if the error is of UnsupportedOperation type, in which case it falls back on a usual named temporary file handling.


rio/src/RIO/File.hs, line 323 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

Ah, that makes sense. Let's write it explicitly.

Done


rio/src/RIO/File.hs, line 331 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

Relative, absolute, and what happens in each case and for either case of what the mDirFd is?

mDirFd will depend on that file path, if supplied, but that invariant is not enforced within this function, since it is internal to this module. It will not matter if it is relative or not, since only the filename will be used whenever we are dealing in a directory context. If mDirFd is nothing and eTmpFile is not a handle, than it is just a usual rename.

I added a bit of comment, but most of that information is a lot easier to understand from just following the code rather than trying to understand my comments


rio/src/RIO/File.hs, line 335 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

a stating

I mean a comment stating

I added a note that it is a Linux only function


rio/src/RIO/File.hs, line 342 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

This needs a direct reference to the man page, perhaps even a quote, to show that this is the standard way of doing it and not something fancy we came up with.

I added a comment to use man for those who do not trust us.


rio/src/RIO/File.hs, line 347 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

What are we ignoring here? Looks like (). Let's give safeLink a type signature with :: ... (), and then match () here to remove the question.

I switched to throwErrnoIfMinus1Retry_, in usage of safeLink, and added pattern match on () so it is easy to see that it is (). Type signature can be inferred easily and as such is unnecessary.


rio/src/RIO/File.hs, line 367 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

Like for dropDirectoryIfRelative, I think that this takeFileName is probably not good.

What do you mean not good? It is required, since it will be a bug otherwise. If I do not call takeFileName here it will only work for absolute file paths and will be broken for relative ones.

Even in case of dropDirectoryIfRelative it was absolutely fine, I only got rid of it cause you did not like it for some reason.


rio/src/RIO/File.hs, line 376 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

It's not clear from the docs that this file must already exist, or the function will fail.

Let's add that to top-level haddocks of the function. Those haddocks should also say in which situations this function is to be used.

Why do you need the docs for something obvious like that? Wouldn't you expect mv foo bar to fail if foo doesn't exists? Well, it is a file renaming function, as such you cannot rename something that wasn't there to begin with. I wouldn't even add such haddock if the function was exported as part of public API.

It shouldn't be used anywhere, where it isn't already used.


rio/src/RIO/File.hs, line 390 at r1 (raw file):

Previously, lehins (Alexey Kuleshevich) wrote…

They are indeed required

I switched to takeFileName in both places.


rio/src/RIO/File.hs, line 407 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

delted -> deleted

Should also comment exactly on how the created file is called.

Fixed spelling error.

The file is called temporary :) But if you are referring to the name of the file, then it is right in the code, few lines below, adding it here is redundant.


rio/src/RIO/File.hs, line 410 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

I'd put that behaviour right into the function name.

Sure, you would. But I wouldn't. I like the way it is named.


rio/src/RIO/File.hs, line 428 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

Haddocks on this one and the following with recommendations on when which one is or should be used.

It's right in the name. One is anonymous, another one is not. Secondly, it shouldn't be used anywhere that it isn't already used. The whole function is literally a one liner and if someone can't understand what it is doing, he/she should not be using it.

All of these functions are internal, so they do not deserve as much haddock.


rio/src/RIO/File.hs, line 522 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

an durable operations -> durable durable operations

you mean just durable operations? if so then fixed.


rio/src/RIO/File.hs, line 528 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

This looks like the place to me where we should takeFileName: Right next to where we do takeDirectory, as opposed to inside the function.

The full path is now used to name the file descriptor, so we don't want to loose the file path here


rio/src/RIO/File.hs, line 548 at r1 (raw file):

Previously, lehins (Alexey Kuleshevich) wrote…

What's going on in here, the code is outdated?
I changed it to o_TMPFILE == o_TMPFILE_not_supported I Also added a comment describing what that binding means.

improved it further, withAnonymousBinaryTempFileFor now returns Nothing when it is not supported.


rio/test/RIO/FileSpec.hs, line 69 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

Is this correct? As far as I can tell, hGetNonBlocking might as well return "Hell" if you're unlucky. I think hGet would be correct.

It was left over from me trying things out, so switched to hGet instead. Thanks


rio/test/RIO/FileSpec.hs, line 121 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

What is the purpose of using a QuickCheck Gen here, why not just run all 3 cases deterministically?

It will run a 100 tests instead of 3, but it's not that important, so I switched it to mapM_.

* Fixed commercialhaskell#160
* Fix race condition where file exists, but possibly removed prior to
  copy
* Fix durability on original content of the file
* Prevent atomic renamed of corrupt files, which could have been
  caused by an exception
* Added atomic file writing without durability
* Make sure there are no temporary dangling files are left around upon
  exception
* Make sure permissions are properly restored on the atomically
  renamed file
* Reorganize the module RIO.Files to improve clarity
* Improve type safety of FFI calls
* For atomic write switch to using O_TMPFILE on Linux, while falling back
  on usual temporary files for OSs and FSs tha do not support that
  feature
* Improve tests for file operations and add more tests for atomic file
  operations
* Move OS specific deps into library.
* Enable and fix all compile time warnings.
* Addition of `readBinaryFile`, `writeBinaryFile` and `withBinaryFileLazy`.
* Blank tests on Windows for atomic+durable writes
Copy link

@nh2 nh2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 3 of 3 files at r3, 4 of 4 files at r4.
Reviewable status: all files reviewed, 10 unresolved discussions (waiting on @fsync and @lehins)


rio/src/RIO/File.hs, line 49 at r1 (raw file):

Previously, lehins (Alexey Kuleshevich) wrote…

That's not the reason why though, it is because a copy of the file was made prior to any modification taking place.

This is because the file is atomically linked into the file system at the end.

Moreover I don't want to put "linking" in the public documentation because many will not know what this even means, secondly linking is done only on Linux and only when the file does NOT exist, because with linking you can't atomically replace a file.

The explanations you added are good, that makes it clear.


rio/src/RIO/File.hs, line 151 at r1 (raw file):

Previously, lehins (Alexey Kuleshevich) wrote…

I created synonyms for all system calls that improve on type safety. So that should be taken care of

Yep, that looks good.


rio/src/RIO/File.hs, line 228 at r1 (raw file):

Previously, lehins (Alexey Kuleshevich) wrote…

What you are now recommending is to drop the path for every file, regardless if it's relative or not. That suggestion should work, I think, since we always derive the directory from the file path. I'll change that.

I thought about doing it that way before, but I was worrying a little bit if there is some edge case that I am not thinking of where a file path is absolute and we switch it to relative. But it does make sense, because all files are always opened in the directory, which means this should be ok.

Note that it is different from what you where suggesting to not drop the path and keep the file path in the relative or absolute form, cause using relative path would be (was) a bug. Eg:

openat(openDirFd(takeDirName "foo/bar.txt"), "foo/bar.txt")

while this is ok

openat(openDirFd(takeDirName "/foo/bar.txt"), "/foo/bar.txt")

Moving from dropDirectoryIfRelative to takeFileName isn't exactly what I meant.

I was arguing that we should support that the given FilePath argument can be a proper relative path (relative to dirFd, and possibly having multiple path components), thus making openFileFromDir a very thin wrapper around openat that supports the same functionality.

For example, I think it would make sense and be intuitive if openFileFromDir could be callled like this:

withDirectory "/home/alexey" $ \homeFd -> do
  handle <- openFileFromDir fd ".aws/credentials" ...

I had in mind that for the situation where you already have a full FilePath like path/to/somedir/somefile or /path/to/somedir/somefile at hand and you "just want to open it with openat", the caller would do the splitting accordingly, providing the corresponding arguments to openFileFromDir. For example withBinaryFileDurablePosix (which calls openFileFromDir via withFileInDirectory) would, instead of the current

      withDirectory (takeDirectory filePath) $ \dirFd ->
        withFileInDirectory dirFd filePath iomode $ \tmpFileHandle -> do
          res <- action tmpFileHandle

use the following:

      withDirectory (takeDirectory filePath) $ \dirFd ->
        withFileInDirectory dirFd (takeFileName filePath) iomode $ \tmpFileHandle -> do
          res <- action tmpFileHandle

Do you see what I mean?

I think this would be better because then openFileFromDir would behave just like openat, thus having familiar behaviour. It would also remove a not-stated invariant from the function (namely dirFd corresponds to takeDirName fileName).

This should be easy to do: All we need is to move the call to takeFileName from here up to the call sites of openFileFromDir (which is currently just uses of withFileInDirectory).

The only thing it'll lack will be the name-of-the-file-path argument to mkHandleFromFD that appears in error messages. I would make that an explicit argument the caller can provide.

Thus:

openFileFromDir :: MonadIO m => DirFd -> FilePath -> FilePath -> IOMode -> m Handle
openFileFromDir dirFd relPathFromDir fullPathForErrors iomode =
  -- implementation stays unchanged (just the variable names get updated)

I can also make a branch to show what I mean if you like.


rio/src/RIO/File.hs, line 280 at r1 (raw file):

Previously, lehins (Alexey Kuleshevich) wrote…

Cool, I implemented it by checking if the error is of UnsupportedOperation type, in which case it falls back on a usual named temporary file handling.

Nice


rio/src/RIO/File.hs, line 331 at r1 (raw file):
I see. I'm happy with that explanation, but still want to to point out:

most of that information is a lot easier to understand from just following the code rather than trying to understand my comments

I still don't buy that: I understand it now, but it's because of your answer to my question.

As you pointed out: There are many cross-function invariants in this 800-lines file which are neither enforced nor stated. It means one cannot judge the correctness of a given function in isolation. Figuring out all invariants from "just following the code" and keeping it all in working memory is hard. It is much easier to understand your comments.


rio/src/RIO/File.hs, line 367 at r1 (raw file):

Previously, lehins (Alexey Kuleshevich) wrote…

What do you mean not good? It is required, since it will be a bug otherwise. If I do not call takeFileName here it will only work for absolute file paths and will be broken for relative ones.

Even in case of dropDirectoryIfRelative it was absolutely fine, I only got rid of it cause you did not like it for some reason.

I think we had a misunderstanding on that one (I've added a longer explanation in the ohter place further up at openFileFromDir).

With "not good" meant, like over there: "This function should support files in relative subdirectories or ../ style paths, like openat() does; there is no reason not to support those."


rio/src/RIO/File.hs, line 376 at r1 (raw file):

Previously, lehins (Alexey Kuleshevich) wrote…

Why do you need the docs for something obvious like that? Wouldn't you expect mv foo bar to fail if foo doesn't exists? Well, it is a file renaming function, as such you cannot rename something that wasn't there to begin with. I wouldn't even add such haddock if the function was exported as part of public API.

It shouldn't be used anywhere, where it isn't already used.

I think you're right, I must have confused myself somehow here.


rio/src/RIO/File.hs, line 390 at r1 (raw file):

Previously, lehins (Alexey Kuleshevich) wrote…

I switched to takeFileName in both places.

I meant this in relation to my comment on openFileFromDir: Like openat(), renameat(..., oldpath, ... newpath) supports giving full relative file paths as oldpath and newpath, not only file names.

So we could remove the invariant Whenever DirFd is supplied, it must be the containgin directory fo this file, and instead simply make filePath relative from dirFd.


rio/src/RIO/File.hs, line 407 at r1 (raw file):

Previously, lehins (Alexey Kuleshevich) wrote…

Fixed spelling error.

The file is called temporary :) But if you are referring to the name of the file, then it is right in the code, few lines below, adding it here is redundant.

What I mean is that the user-facing documentation (e.g. module or exported-function haddocks) should explain how those temporary files are named.

Here is an example of how this information is important, from a production server breakage system that occurred to me 3 days ago:

I had an rsync loop to backup files from one server to another. Software that wrote into the directory was upgraded to do atomic renames, like we add here. Because of the way rsync works (first obtaining file names, then doing the copy), it is possible that it sees a .xxx.tmp file that's gone when it wants to copy it, and thus rsync errored out with file vanished errors, and I got paged. The correct solution here is to not backup temporary files, using the --exclude flag. For that you have to know how the temporary files are named.


rio/src/RIO/File.hs, line 435 at r1 (raw file):

Previously, lehins (Alexey Kuleshevich) wrote…

What do you mean ignored? it is also further passed down into openAnonymousTempFile.

It was passed into the function, but that one didn't do any filesytem-y things with it if a directory Fd was given; it used it only in the fdName used for exceptions, as far as I can tell.

I had just pointed it out because I reviewed it (as you noticed) from the perspective of pointing out all unstated cross-function invariants (as my opinion is that there should be none).


rio/src/RIO/File.hs, line 453 at r1 (raw file):

Previously, lehins (Alexey Kuleshevich) wrote…

Every single exported function in this module supplies a FilePath, that is the one I am talking about. What would you like me to write in this context?

I am not sure what the best name for it should be. But for example in one callee, withBinaryTempFileFor, this becomes a target file instead of a source file.

Doesn't source suggest that we read from that file somehow? As far as I can tell, we don't read from it.


rio/src/RIO/File.hs, line 522 at r1 (raw file):

Previously, lehins (Alexey Kuleshevich) wrote…

you mean just durable operations? if so then fixed.

Ah yes :D


rio/src/RIO/File.hs, line 548 at r1 (raw file):

Previously, lehins (Alexey Kuleshevich) wrote…

improved it further, withAnonymousBinaryTempFileFor now returns Nothing when it is not supported.

That's great


rio/src/RIO/File.hs, line 598 at r1 (raw file):

Previously, lehins (Alexey Kuleshevich) wrote…

I can add it, but I do not understand what you are trying to say. Is your intention is to say to the user: "This function does not provide the same guarantee as if you would open and modify a file using withBinaryFileDurable or writeBinaryFileDurable, since they ensure that the @fsync()@ is called before the file is closed, so if possible use those instead."

Yes


rio/src/RIO/File.hs, line 725 at r1 (raw file):

Previously, lehins (Alexey Kuleshevich) wrote…

I'll rephrase it a bit, since it is not clear which performance caveats. @fsync@ performance or copy file performance

The same performance caveats apply as for withBinaryFileDurableAtomic due
to making a copy of the content of existing files during non-truncating
writes.

Something like that maybe, what do you think?

That's good


rio/src/RIO/File.hs, line 178 at r3 (raw file):

c_renameat :: DirFd -> CFilePath -> DirFd -> CFilePath -> IO CInt
c_renameat (DirFd (Fd fdFrom)) cFpFrom (DirFd (Fd fdTo)) cFdTo =
  c_safe_renameat fdFrom cFpFrom fdTo cFdTo

Is cFdTo a typo of cFpTo (it's a CFilePath, not an Fd)?


rio/src/RIO/File.hs, line 304 at r3 (raw file):
than -> then

There seems to be some word missing in the sentence, I can't quite parse it:

If the file descriptor for the directory the target file is/will be located in

Maybe something like "if given"?


rio/test/RIO/FileSpec.hs, line 121 at r1 (raw file):

Previously, lehins (Alexey Kuleshevich) wrote…

It will run a 100 tests instead of 3, but it's not that important, so I switched it to mapM_.

There's another instance of this just above

@lehins
Copy link
Contributor Author

lehins commented Jul 9, 2019


rio/src/RIO/File.hs, line 228 at r1 (raw file):
I see what you mean. I honestly don't like that suggestion.

IMHO the type signature and functionality of this function is strictly worse:

openFileFromDir :: MonadIO m => DirFd -> FilePath -> FilePath -> IOMode -> m Handle
openFileFromDir dirFd relPathFromDir fullPathForErrors iomode =

than this:

openFileFromDir :: MonadIO m => DirFd -> FilePath -> IOMode -> m Handle
openFileFromDir dirFd filePath@(takeFileName -> fileName) iomode

With the former it is way to easy to mix up the arguments and get into unpredictable behavior. There is still an invariant that the file must exist in the folder that DirFd is pointing to, thus we are not gaining anything there. It is much easier to describe that invariant in the documentation of openFileFromDir

One more important point is that it allows for anybody to think that this is correct, which it is not, because in that case the full path will be used and the dirFd will simply be ignored:

withDirectory "/home/alexey/.aws" $ \dirFd ->
        withFileInDirectory dirFd "/home/alexey/.aws/credentials" "/home/alexey/.aws/credentials"

See man openat:

If pathname is absolute, then dirfd is ignored.

Moreover, I think looking at the call site it is pretty clear that we are dealing with the same file:

      withDirectory (takeDirectory filePath) $ \dirFd ->
        withFileInDirectory dirFd filePath

One more point is, which I am not quite sure about, but I think I'am correct in my assumption is that doing something like that:

withDirectory "/home/alexey" $ \dirFd ->
        withFileInDirectory dirFd ".aws/credentials" "/home/alexey/.aws/credentials"

would mess up durable writes, since than it would fsync the dirFd(/home/alexey), but not the /home/lehins/.aws, which is the containing folder for the credentials file.

Copy link

@nh2 nh2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 9 unresolved discussions (waiting on @fsync and @lehins)


rio/src/RIO/File.hs, line 228 at r1 (raw file):

in that case the full path will be used and the dirFd will simply be ignored

To me it being like openat() would be the main benefit; it would inherit it's behaviour, but you'd have to read only 1 man page and knowledge from it would directly translate to the Haskell functions without surprises.

I would also find it good if we could export such openat wrapper then.

But we can do that once we actually want to give that openat wrapper to somebody.
I'm OK with the current approach; let's keep it as is until we want to export such functionality.

would mess up durable writes

Yes, ensuring that the right dir is fsynced would remain the task of the caller.

Copy link

@nh2 nh2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 7 unresolved discussions (waiting on @fsync and @lehins)


rio/src/RIO/File.hs, line 367 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

I think we had a misunderstanding on that one (I've added a longer explanation in the ohter place further up at openFileFromDir).

With "not good" meant, like over there: "This function should support files in relative subdirectories or ../ style paths, like openat() does; there is no reason not to support those."

Clarified in the other discussion.


rio/src/RIO/File.hs, line 390 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

I meant this in relation to my comment on openFileFromDir: Like openat(), renameat(..., oldpath, ... newpath) supports giving full relative file paths as oldpath and newpath, not only file names.

So we could remove the invariant Whenever DirFd is supplied, it must be the containgin directory fo this file, and instead simply make filePath relative from dirFd.

Clarified in the other discussion.

@lehins
Copy link
Contributor Author

lehins commented Jul 9, 2019


rio/src/RIO/File.hs, line 331 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

I see. I'm happy with that explanation, but still want to to point out:

most of that information is a lot easier to understand from just following the code rather than trying to understand my comments

I still don't buy that: I understand it now, but it's because of your answer to my question.

As you pointed out: There are many cross-function invariants in this 800-lines file which are neither enforced nor stated. It means one cannot judge the correctness of a given function in isolation. Figuring out all invariants from "just following the code" and keeping it all in working memory is hard. It is much easier to understand your comments.

It has 440 lines of code, it is 800 lines file only because of comments. :P
I'll tell you more, I just read what I said above 2 moths ago and it didn't really make sense to me. I then looked at where filePath was used and it was clear to me how it is used.

@lehins
Copy link
Contributor Author

lehins commented Jul 9, 2019


rio/src/RIO/File.hs, line 178 at r3 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

Is cFdTo a typo of cFpTo (it's a CFilePath, not an Fd)?

Yes, it was a typo. Thanks.

@lehins
Copy link
Contributor Author

lehins commented Jul 9, 2019


rio/src/RIO/File.hs, line 304 at r3 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

than -> then

There seems to be some word missing in the sentence, I can't quite parse it:

If the file descriptor for the directory the target file is/will be located in

Maybe something like "if given"?

I just tried to read that piece of documentation myself and it didn't make any sense to me either. :D
All I had to do was look at the code and I saw what I was trying to say:

     -- ^ If a file descriptor is given for the directory where the target file is/will be
     -- located in, then it will be used for opening an anonymous file. Otherwise
     -- anonymous will be opened unattached to any file path.

See, it is a lot easier to make human language ambiguous, at least for me ;)

Fixed.

@lehins
Copy link
Contributor Author

lehins commented Jul 9, 2019


rio/src/RIO/File.hs, line 407 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

What I mean is that the user-facing documentation (e.g. module or exported-function haddocks) should explain how those temporary files are named.

Here is an example of how this information is important, from a production server breakage system that occurred to me 3 days ago:

I had an rsync loop to backup files from one server to another. Software that wrote into the directory was upgraded to do atomic renames, like we add here. Because of the way rsync works (first obtaining file names, then doing the copy), it is possible that it sees a .xxx.tmp file that's gone when it wants to copy it, and thus rsync errored out with file vanished errors, and I got paged. The correct solution here is to not backup temporary files, using the --exclude flag. For that you have to know how the temporary files are named.

Totally agree with you, if that function was exported, except it is not. So comment here would really be useless for the user and for the developer it is easier to see it in the code, rather than read comments.

At the same time, there is no mention of this temporary file naming pattern anywhere in the functions that are exported, I'll add the naming pattern to the relevant places, which will be included in generated haddock.

@lehins
Copy link
Contributor Author

lehins commented Jul 9, 2019


rio/src/RIO/File.hs, line 453 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

I am not sure what the best name for it should be. But for example in one callee, withBinaryTempFileFor, this becomes a target file instead of a source file.

Doesn't source suggest that we read from that file somehow? As far as I can tell, we don't read from it.

I'll simply remove the comment for this argument. You can't name it, I don't want to name it. This function is not exported, so I really don't care.

@lehins
Copy link
Contributor Author

lehins commented Jul 9, 2019


rio/src/RIO/File.hs, line 598 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

Yes

Done.

@lehins
Copy link
Contributor Author

lehins commented Jul 9, 2019


rio/test/RIO/FileSpec.hs, line 121 at r1 (raw file):

Previously, nh2 (Niklas Hambüchen) wrote…

There's another instance of this just above

Force of habit relying on generators. Fixed.

lehins added a commit to lehins/unliftio that referenced this pull request Jul 10, 2019
* Durability of file writes with the help of `fsync()` syscall on
  the file and containing folder
* Atomicity of file writes with the help of atomic renames with
  `rename()`, `renameat()` and `linkat()` syscalls, depending on how
  it was opened
* Anonymous temp file utilization on Linux for atomic writes with the
  help of `openat()` and `O_TMPFILE`
* Instead of merging this functionality into commercialhaskell/rio#167
  it has been moved upstream into `unliftio`
lehins added a commit to lehins/unliftio that referenced this pull request Jul 10, 2019
* Durability of file writes with the help of `fsync()` syscall on
  the file and containing folder
* Atomicity of file writes with the help of atomic renames with
  `rename()`, `renameat()` and `linkat()` syscalls, depending on how
  it was opened
* Anonymous temp file utilization on Linux for atomic writes with the
  help of `openat()` and `O_TMPFILE`
* Instead of merging this functionality into commercialhaskell/rio#167
  it has been moved upstream into `unliftio`
@lehins
Copy link
Contributor Author

lehins commented Jul 10, 2019

This should be closed in favor of fpco/unliftio#48 Once it is merged in unliftio I'll create a separate PR with simple re-exports.

Copy link

@nh2 nh2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 2 of 2 files at r5.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved

lehins added a commit to lehins/unliftio that referenced this pull request Jul 11, 2019
* Durability of file writes with the help of `fsync()` syscall on
  the file and containing folder
* Atomicity of file writes with the help of atomic renames with
  `rename()`, `renameat()` and `linkat()` syscalls, depending on how
  it was opened
* Anonymous temp file utilization on Linux for atomic writes with the
  help of `openat()` and `O_TMPFILE`
* Instead of merging this functionality into commercialhaskell/rio#167
  it has been moved upstream into `unliftio`
* Dropped support for ghc-7.8
lehins added a commit to lehins/rio that referenced this pull request Jul 15, 2019
* Fixed version of the functions where moved upstream into `unliftio`
  and are now re-exported from `rio`. See commercialhaskell#167 for more info and the
  actual PR review.
* Bump up the version, update changelog and pin newer version of
  `typed-process` in stack.yaml with the hash
@lehins
Copy link
Contributor Author

lehins commented Jul 15, 2019

Modified version of the code from this PR was merged into unliftio: fpco/unliftio#48
Closing in favor of #196

@lehins lehins closed this Jul 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants