Skip to content

fix: sync parent dir to ensure data is reliably stored#4774

Merged
milosgajdos merged 2 commits intodistribution:mainfrom
ningmingxiao:dev01
Feb 12, 2026
Merged

fix: sync parent dir to ensure data is reliably stored#4774
milosgajdos merged 2 commits intodistribution:mainfrom
ningmingxiao:dev01

Conversation

@ningmingxiao
Copy link
Copy Markdown
Contributor

@ningmingxiao ningmingxiao commented Jan 30, 2026

According to https://man7.org/linux/man-pages/man2/fsync.2.html, we need to sync parent dir to ensure data is stored.

   Calling fsync() does not necessarily ensure that the entry in the
   directory containing the file has also reached disk.  For that an
   explicit fsync() on a file descriptor for the directory is also
   needed.

fix: #1960 I guess it happended when unexpected poweroff.

@ningmingxiao
Copy link
Copy Markdown
Contributor Author

ningmingxiao commented Jan 30, 2026

PTAL @thaJeztah @wy65701436 @milosgajdos

@ningmingxiao
Copy link
Copy Markdown
Contributor Author

ping @thaJeztah

@ningmingxiao ningmingxiao force-pushed the dev01 branch 2 times, most recently from c9bf64f to beb96ac Compare February 4, 2026 03:11
Copy link
Copy Markdown
Member

@milosgajdos milosgajdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets make this code more idiomatic

@ningmingxiao
Copy link
Copy Markdown
Contributor Author

ping @thaJeztah can you review this pr ? thanks

Copy link
Copy Markdown
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some notes; as this is probably in a hot path, I also had a look at the existing code, and it looks like the writer was doing a flush (and sync) twice; once on Commit (before the "move") and once on Close (after the "move");

if err := fw.file.Sync(); err != nil {
return err
}

if err := fw.file.Sync(); err != nil {
return err
}

As the "move" is a rename, we shouldn't need to flush twice.

Comment on lines 168 to 171
if err := syncDir(filepath.Dir(d.fullPath(subPath))); err != nil {
return err
}
return nil
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep some linters at bay;

Suggested change
if err := syncDir(filepath.Dir(d.fullPath(subPath))); err != nil {
return err
}
return nil
return syncDir(filepath.Dir(d.fullPath(subPath)))

Comment on lines +174 to +188
func syncDir(dir string) (err error) {
dirF, err := os.Open(dir)
if err != nil {
return fmt.Errorf("failed to open dir %s: %w", dir, err)
}
defer func() {
if cerr := dirF.Close(); cerr != nil {
err = errors.Join(err, fmt.Errorf("failed to close dir %s: %w", dir, cerr))
}
}()
if err = dirF.Sync(); err != nil {
err = fmt.Errorf("failed to sync dir %s: %w", dir, err)
}
return err
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the os.Open, it should not be needed to decorate the error with the path; stdlib os package returns a PathError, which is already decorated with the path and the operation; for example; https://go.dev/play/p/XvyK34-YZsU

package main

import (
	"fmt"
	"os"
)

func main() {
	_, err := os.Open("/no/such/dir")
	fmt.Println(err)
}

Will print:

open /no/such/dir: no such file or directory

We could still decorate it with the operation we are performing; also wondering if we should ignore "not found" errors here;

	dirF, err := os.Open(dir)
	if err != nil {
		if errors.Is(err, os.ErrNotExist) {
			return nil
		}
		return fmt.Errorf("fsync: %w", dir, err)
	}

It's a small function so not too risky, but generally I prefer using a named output variable for defers that don't use the "common" one (err), as it's easy to get confused, or to be checking the wrong var; a common name for such output var is retErr, which makes it stand out in code "this is special";

func syncDir(dir string) (retErr error) {
	dirF, err := os.Open(dir)
	if err != nil {
		if errors.Is(err, os.ErrNotExist) {
			return nil
		}
		return fmt.Errorf("sync dir: %w", err)
	}
	defer func() {
		if err := dirF.Close(); err != nil {
			retErr = errors.Join(retErr, fmt.Errorf("failed to close dir: %w", err))
		}
	}()
	if err := dirF.Sync(); err != nil {
		return fmt.Errorf("sync dir: %w", err)
	}
	return nil
}

Comment on lines 411 to 439
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the current code Flushes and Syncs twice - once on Commit (below), and once on Close; we should avoid that; also mark the writer closed on Cancel;

func (fw *fileWriter) Close() (retErr error) {
	if fw.closed {
		return fmt.Errorf("already closed")
	}
	fw.closed = true
	defer func() {
		if err := fw.file.Close(); err != nil {
			retErr = errors.Join(retErr, err)
		}
	}()

	if fw.committed {
		// Already flushed and synced
		return nil
	}

	if err := fw.bw.Flush(); err != nil {
		return err
	}

	return fw.file.Sync()
}

func (fw *fileWriter) Cancel(ctx context.Context) error {
	if fw.closed {
		return fmt.Errorf("already closed")
	}

	fw.cancelled = true
	fw.closed = true
	_ = fw.file.Close()
	return os.Remove(fw.file.Name())
}

@thaJeztah
Copy link
Copy Markdown
Member

Had the changes locally; so pushed it as a PR against your branch in case useful;

If you merge that PR, the commit should show up here (but we can squash the commits later)

@ningmingxiao
Copy link
Copy Markdown
Contributor Author

done @thaJeztah thanks

Comment on lines +180 to +182
if cerr := dirF.Close(); cerr != nil {
err = errors.Join(err, fmt.Errorf("failed to close dir %s: %w", dir, cerr))
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW; I think generally it'd be OK to ignore these errors; situations where this produces an error would be either if difF is nil (already handled above by cashing the os.Open error) ; https://github.com/golang/go/blob/456d0fe4092cb794a02027e178486bc31f05a8e0/src/os/file_posix.go#L19-L24

Other cases would be the return f.file.close() - which largely boils down to an error if f.file is nil, or if the file is already being closed (ErrFileClosing ("use of closed file") or ErrNetClosing ("use of closed network connection"). I think theoretically it could get a EINTR from the syscall closing it.

@thaJeztah
Copy link
Copy Markdown
Member

Changes LGTM, but waiting for @milosgajdos to do another review because the last changes were authored by me.

If it LGTY @milosgajdos, we can do a rebase and squash to get rid of the extra commits.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves durability for the filesystem storage driver by ensuring directory entries are flushed to disk after atomically moving newly-written content into place (addressing the fsync(2) caveat that syncing the file alone doesn’t guarantee the directory entry is persisted).

Changes:

  • After PutContent performs the final atomic rename, it fsyncs the parent directory to persist the directory entry.
  • Introduces a syncDir helper to open and Sync() a directory FD and to join close errors.
  • Refines fileWriter.Close()/Cancel() behavior (avoid redundant flush+sync on already-committed writers; ensure close errors can be returned).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 165 to +169
dErr := d.Delete(ctx, tempPath)
return errors.Join(err, dErr)
}
return syncDir(filepath.Dir(d.fullPath(subPath)))
}
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PutContent returns after syncDir(...), but any error from the deferred writer.Close() is still ignored. Since Close() may surface underlying I/O errors, consider capturing and joining any Close error with the returned error (e.g., via a named return error and a defer that merges errors) so durability-related failures aren’t silently dropped.

Copilot uses AI. Check for mistakes.
dErr := d.Delete(ctx, tempPath)
return errors.Join(err, dErr)
}
return syncDir(filepath.Dir(d.fullPath(subPath)))
Copy link

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PutContent now calls syncDir(filepath.Dir(d.fullPath(subPath))), but fullPath and the rest of this driver use the path package (path.Join, path.Dir) rather than filepath. Mixing path and filepath can produce inconsistent behavior (especially if RootDirectory contains OS-specific separators), and it introduces an extra import just for this call. Prefer using path.Dir(...) here for consistency, or switch fullPath to filepath.Join and use filepath consistently throughout the driver.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, existing uses of path should be replaced with filepath - path is intended for forward-slash separated URLs, not for file paths. (Hence the name).

From the path package documentation (emphasis mine);

The path package should only be used for paths separated by forward slashes, such as the paths in URLs. This package does not deal with Windows paths with drive letters or backslashes; to manipulate operating system paths, use the path/filepath package.

@thaJeztah
Copy link
Copy Markdown
Member

For #4774 (comment)

Yeah, existing uses of path should be replaced with filepath - path is intended for forward-slash separated URLs, not for file paths. (Hence the name).

From the path package documentation (emphasis mine);

The path package should only be used for paths separated by forward slashes, such as the paths in URLs. This package does not deal with Windows paths with drive letters or backslashes; to manipulate operating system paths, use the path/filepath package.

We could probably change; diff isn't too large if we do;

diff --git a/registry/storage/driver/filesystem/driver.go b/registry/storage/driver/filesystem/driver.go
index 93b3c1664..1b72f32b9 100644
--- a/registry/storage/driver/filesystem/driver.go
+++ b/registry/storage/driver/filesystem/driver.go
@@ -9,7 +9,6 @@ import (
 	"io"
 	"net/http"
 	"os"
-	"path"
 	"path/filepath"
 	"time"
 
@@ -213,7 +212,7 @@ func (d *driver) Reader(ctx context.Context, path string, offset int64) (io.Read
 
 func (d *driver) Writer(ctx context.Context, subPath string, append bool) (storagedriver.FileWriter, error) {
 	fullPath := d.fullPath(subPath)
-	parentDir := path.Dir(fullPath)
+	parentDir := filepath.Dir(fullPath)
 	if err := os.MkdirAll(parentDir, 0o777); err != nil {
 		return nil, err
 	}
@@ -285,7 +284,7 @@ func (d *driver) List(ctx context.Context, subPath string) ([]string, error) {
 
 	keys := make([]string, 0, len(fileNames))
 	for _, fileName := range fileNames {
-		keys = append(keys, path.Join(subPath, fileName))
+		keys = append(keys, filepath.Join(subPath, fileName))
 	}
 
 	return keys, nil
@@ -301,7 +300,7 @@ func (d *driver) Move(ctx context.Context, sourcePath string, destPath string) e
 		return storagedriver.PathNotFoundError{Path: sourcePath}
 	}
 
-	if err := os.MkdirAll(path.Dir(dest), 0o777); err != nil {
+	if err := os.MkdirAll(filepath.Dir(dest), 0o777); err != nil {
 		return err
 	}
 
@@ -337,7 +336,7 @@ func (d *driver) Walk(ctx context.Context, path string, f storagedriver.WalkFn,
 
 // fullPath returns the absolute path of a key within the Driver's storage.
 func (d *driver) fullPath(subPath string) string {
-	return path.Join(d.rootDirectory, subPath)
+	return filepath.Join(d.rootDirectory, subPath)
 }
 
 type fileInfo struct {

@ningmingxiao
Copy link
Copy Markdown
Contributor Author

done thanks @thaJeztah

@thaJeztah
Copy link
Copy Markdown
Member

Changes LGTM after squashing, but if you could give it a second pair of eyes (as last changes were suggested / partially written by me) @milosgajdos

(Happy to do the rebase / squash if it looks good to you)

Copy link
Copy Markdown
Member

@milosgajdos milosgajdos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, can you squash @ningmingxiao

ningmingxiao and others added 2 commits February 12, 2026 18:14
Signed-off-by: ningmingxiao <ning.mingxiao@zte.com.cn>
Co-authored-by: Sebastiaan van Stijn <github@gone.nl>
Signed-off-by: ningmingxiao <ning.mingxiao@zte.com.cn>
Signed-off-by: Sebastiaan van Stijn <github@gone.nl>
@thaJeztah
Copy link
Copy Markdown
Member

thaJeztah commented Feb 12, 2026

@milosgajdos I did a quick rebase and squash (swapped the path -> filepath commit to go first)

Copy link
Copy Markdown
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@milosgajdos milosgajdos merged commit 423d121 into distribution:main Feb 12, 2026
21 checks passed
@ningmingxiao ningmingxiao deleted the dev01 branch February 13, 2026 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

In 2.5.1 garbage-collection stops when running into failed to garbage collect: failed to mark: invalid checksum digest format

4 participants