fix: sync parent dir to ensure data is reliably stored#4774
fix: sync parent dir to ensure data is reliably stored#4774milosgajdos merged 2 commits intodistribution:mainfrom
Conversation
|
ping @thaJeztah |
c9bf64f to
beb96ac
Compare
milosgajdos
left a comment
There was a problem hiding this comment.
Lets make this code more idiomatic
|
ping @thaJeztah can you review this pr ? thanks |
thaJeztah
left a comment
There was a problem hiding this comment.
Left some notes; as this is probably in a hot path, I also had a look at the existing code, and it looks like the writer was doing a flush (and sync) twice; once on Commit (before the "move") and once on Close (after the "move");
distribution/registry/storage/driver/filesystem/driver.go
Lines 401 to 403 in f2c2cf3
distribution/registry/storage/driver/filesystem/driver.go
Lines 435 to 437 in f2c2cf3
As the "move" is a rename, we shouldn't need to flush twice.
| if err := syncDir(filepath.Dir(d.fullPath(subPath))); err != nil { | ||
| return err | ||
| } | ||
| return nil |
There was a problem hiding this comment.
To keep some linters at bay;
| if err := syncDir(filepath.Dir(d.fullPath(subPath))); err != nil { | |
| return err | |
| } | |
| return nil | |
| return syncDir(filepath.Dir(d.fullPath(subPath))) |
| func syncDir(dir string) (err error) { | ||
| dirF, err := os.Open(dir) | ||
| if err != nil { | ||
| return fmt.Errorf("failed to open dir %s: %w", dir, err) | ||
| } | ||
| defer func() { | ||
| if cerr := dirF.Close(); cerr != nil { | ||
| err = errors.Join(err, fmt.Errorf("failed to close dir %s: %w", dir, cerr)) | ||
| } | ||
| }() | ||
| if err = dirF.Sync(); err != nil { | ||
| err = fmt.Errorf("failed to sync dir %s: %w", dir, err) | ||
| } | ||
| return err | ||
| } |
There was a problem hiding this comment.
For the os.Open, it should not be needed to decorate the error with the path; stdlib os package returns a PathError, which is already decorated with the path and the operation; for example; https://go.dev/play/p/XvyK34-YZsU
package main
import (
"fmt"
"os"
)
func main() {
_, err := os.Open("/no/such/dir")
fmt.Println(err)
}Will print:
open /no/such/dir: no such file or directory
We could still decorate it with the operation we are performing; also wondering if we should ignore "not found" errors here;
dirF, err := os.Open(dir)
if err != nil {
if errors.Is(err, os.ErrNotExist) {
return nil
}
return fmt.Errorf("fsync: %w", dir, err)
}It's a small function so not too risky, but generally I prefer using a named output variable for defers that don't use the "common" one (err), as it's easy to get confused, or to be checking the wrong var; a common name for such output var is retErr, which makes it stand out in code "this is special";
func syncDir(dir string) (retErr error) {
dirF, err := os.Open(dir)
if err != nil {
if errors.Is(err, os.ErrNotExist) {
return nil
}
return fmt.Errorf("sync dir: %w", err)
}
defer func() {
if err := dirF.Close(); err != nil {
retErr = errors.Join(retErr, fmt.Errorf("failed to close dir: %w", err))
}
}()
if err := dirF.Sync(); err != nil {
return fmt.Errorf("sync dir: %w", err)
}
return nil
}There was a problem hiding this comment.
Looks like the current code Flushes and Syncs twice - once on Commit (below), and once on Close; we should avoid that; also mark the writer closed on Cancel;
func (fw *fileWriter) Close() (retErr error) {
if fw.closed {
return fmt.Errorf("already closed")
}
fw.closed = true
defer func() {
if err := fw.file.Close(); err != nil {
retErr = errors.Join(retErr, err)
}
}()
if fw.committed {
// Already flushed and synced
return nil
}
if err := fw.bw.Flush(); err != nil {
return err
}
return fw.file.Sync()
}
func (fw *fileWriter) Cancel(ctx context.Context) error {
if fw.closed {
return fmt.Errorf("already closed")
}
fw.cancelled = true
fw.closed = true
_ = fw.file.Close()
return os.Remove(fw.file.Name())
}|
Had the changes locally; so pushed it as a PR against your branch in case useful; If you merge that PR, the commit should show up here (but we can squash the commits later) |
|
done @thaJeztah thanks |
| if cerr := dirF.Close(); cerr != nil { | ||
| err = errors.Join(err, fmt.Errorf("failed to close dir %s: %w", dir, cerr)) | ||
| } |
There was a problem hiding this comment.
FWIW; I think generally it'd be OK to ignore these errors; situations where this produces an error would be either if difF is nil (already handled above by cashing the os.Open error) ; https://github.com/golang/go/blob/456d0fe4092cb794a02027e178486bc31f05a8e0/src/os/file_posix.go#L19-L24
Other cases would be the return f.file.close() - which largely boils down to an error if f.file is nil, or if the file is already being closed (ErrFileClosing ("use of closed file") or ErrNetClosing ("use of closed network connection"). I think theoretically it could get a EINTR from the syscall closing it.
|
Changes LGTM, but waiting for @milosgajdos to do another review because the last changes were authored by me. If it LGTY @milosgajdos, we can do a rebase and squash to get rid of the extra commits. |
There was a problem hiding this comment.
Pull request overview
This PR improves durability for the filesystem storage driver by ensuring directory entries are flushed to disk after atomically moving newly-written content into place (addressing the fsync(2) caveat that syncing the file alone doesn’t guarantee the directory entry is persisted).
Changes:
- After
PutContentperforms the final atomic rename, it fsyncs the parent directory to persist the directory entry. - Introduces a
syncDirhelper to open andSync()a directory FD and to join close errors. - Refines
fileWriter.Close()/Cancel()behavior (avoid redundant flush+sync on already-committed writers; ensure close errors can be returned).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| dErr := d.Delete(ctx, tempPath) | ||
| return errors.Join(err, dErr) | ||
| } | ||
| return syncDir(filepath.Dir(d.fullPath(subPath))) | ||
| } |
There was a problem hiding this comment.
PutContent returns after syncDir(...), but any error from the deferred writer.Close() is still ignored. Since Close() may surface underlying I/O errors, consider capturing and joining any Close error with the returned error (e.g., via a named return error and a defer that merges errors) so durability-related failures aren’t silently dropped.
| dErr := d.Delete(ctx, tempPath) | ||
| return errors.Join(err, dErr) | ||
| } | ||
| return syncDir(filepath.Dir(d.fullPath(subPath))) |
There was a problem hiding this comment.
PutContent now calls syncDir(filepath.Dir(d.fullPath(subPath))), but fullPath and the rest of this driver use the path package (path.Join, path.Dir) rather than filepath. Mixing path and filepath can produce inconsistent behavior (especially if RootDirectory contains OS-specific separators), and it introduces an extra import just for this call. Prefer using path.Dir(...) here for consistency, or switch fullPath to filepath.Join and use filepath consistently throughout the driver.
There was a problem hiding this comment.
Yeah, existing uses of path should be replaced with filepath - path is intended for forward-slash separated URLs, not for file paths. (Hence the name).
From the path package documentation (emphasis mine);
The path package should only be used for paths separated by forward slashes, such as the paths in URLs. This package does not deal with Windows paths with drive letters or backslashes; to manipulate operating system paths, use the path/filepath package.
|
For #4774 (comment)
We could probably change; diff isn't too large if we do; diff --git a/registry/storage/driver/filesystem/driver.go b/registry/storage/driver/filesystem/driver.go
index 93b3c1664..1b72f32b9 100644
--- a/registry/storage/driver/filesystem/driver.go
+++ b/registry/storage/driver/filesystem/driver.go
@@ -9,7 +9,6 @@ import (
"io"
"net/http"
"os"
- "path"
"path/filepath"
"time"
@@ -213,7 +212,7 @@ func (d *driver) Reader(ctx context.Context, path string, offset int64) (io.Read
func (d *driver) Writer(ctx context.Context, subPath string, append bool) (storagedriver.FileWriter, error) {
fullPath := d.fullPath(subPath)
- parentDir := path.Dir(fullPath)
+ parentDir := filepath.Dir(fullPath)
if err := os.MkdirAll(parentDir, 0o777); err != nil {
return nil, err
}
@@ -285,7 +284,7 @@ func (d *driver) List(ctx context.Context, subPath string) ([]string, error) {
keys := make([]string, 0, len(fileNames))
for _, fileName := range fileNames {
- keys = append(keys, path.Join(subPath, fileName))
+ keys = append(keys, filepath.Join(subPath, fileName))
}
return keys, nil
@@ -301,7 +300,7 @@ func (d *driver) Move(ctx context.Context, sourcePath string, destPath string) e
return storagedriver.PathNotFoundError{Path: sourcePath}
}
- if err := os.MkdirAll(path.Dir(dest), 0o777); err != nil {
+ if err := os.MkdirAll(filepath.Dir(dest), 0o777); err != nil {
return err
}
@@ -337,7 +336,7 @@ func (d *driver) Walk(ctx context.Context, path string, f storagedriver.WalkFn,
// fullPath returns the absolute path of a key within the Driver's storage.
func (d *driver) fullPath(subPath string) string {
- return path.Join(d.rootDirectory, subPath)
+ return filepath.Join(d.rootDirectory, subPath)
}
type fileInfo struct { |
|
done thanks @thaJeztah |
|
Changes LGTM after squashing, but if you could give it a second pair of eyes (as last changes were suggested / partially written by me) @milosgajdos (Happy to do the rebase / squash if it looks good to you) |
milosgajdos
left a comment
There was a problem hiding this comment.
LGTM, can you squash @ningmingxiao
Signed-off-by: ningmingxiao <ning.mingxiao@zte.com.cn>
Co-authored-by: Sebastiaan van Stijn <github@gone.nl> Signed-off-by: ningmingxiao <ning.mingxiao@zte.com.cn> Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
|
@milosgajdos I did a quick rebase and squash (swapped the path -> filepath commit to go first) |
According to https://man7.org/linux/man-pages/man2/fsync.2.html, we need to sync parent dir to ensure data is stored.
fix: #1960 I guess it happended when unexpected poweroff.