Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop stating files in dir when request context is cancelled #5129

Closed
abdusco opened this issue Oct 7, 2022 · 7 comments · Fixed by #5131
Closed

Stop stating files in dir when request context is cancelled #5129

abdusco opened this issue Oct 7, 2022 · 7 comments · Fixed by #5131
Labels
feature ⚙️ New feature or request optimization 📉 Performance or cost improvements
Milestone

Comments

@abdusco
Copy link
Contributor

abdusco commented Oct 7, 2022

I am using Caddy to serve a dir using file_server middleware. The directory is hosted on a network drive. So, stating files aren't really cheap, because each stat requires performing a network IO, and it adds up when listing a directory with hundreds of files.

Caddy doesn't pass down the request context to directory listing

listing, err := fsrv.loadDirectoryContents(dir.(fs.ReadDirFile), root, path.Clean(r.URL.Path), repl)

This means if I close the tab before the page loads (before caddy finishes stating all files in directory), caddy keeps on stating and it slows down the disk access for everything else.

func (fsrv *FileServer) loadDirectoryContents(dir fs.ReadDirFile, root, urlPath string, repl *caddy.Replacer) (browseTemplateContext, error) {
files, err := dir.ReadDir(10000) // TODO: this limit should probably be configurable
if err != nil && err != io.EOF {
return browseTemplateContext{}, err
}
// user can presumably browse "up" to parent folder if path is longer than "/"
canGoUp := len(urlPath) > 1
return fsrv.directoryListing(files, canGoUp, root, urlPath, repl), nil

It should check if context is cancelled, and stop doing needless disk IO.

@abdusco
Copy link
Contributor Author

abdusco commented Oct 7, 2022

I feel like passing request context down to this method and bailing out if it's cancelled should work:

func (fsrv *FileServer) serveBrowse(root, dirPath string, w http.ResponseWriter, r *http.Request, next caddyhttp.Handler) error {
	// ...
	listing, err := fsrv.loadDirectoryContents(r.Context(), ...)
	// ...
}


func (fsrv *FileServer) directoryListing(ctx context.Context, entries []fs.DirEntry, canGoUp bool, root, urlPath string, repl *caddy.Replacer) browseTemplateContext {
	filesToHide := fsrv.transformHidePaths(repl)

	var dirCount, fileCount int
	fileInfos := []fileInfo{}

	for _, entry := range entries {
		if err := ctx.Err(); err != nil {
			break // or return err?
		}

but I'm not sure about how this affects the performance:

files, err := dir.ReadDir(10000) // TODO: this limit should probably be configurable 

How does walking the dir with filepath.WalkDir() compare to this?

@abdusco
Copy link
Contributor Author

abdusco commented Oct 7, 2022

Also, I can take over this issue and submit a PR with the solution I outlined above, unless you have something in mind.

@abdusco abdusco changed the title Stop stat`ing files in dir when request context is cancelled Stop stating files in dir when request context is cancelled Oct 7, 2022
@mholt
Copy link
Member

mholt commented Oct 7, 2022

@abdusco Yes, this sounds like a great idea. Would you like to submit a PR then? Your proposal above sounds like how I'd do it. (For some reason I didn't realize you could just check ctx.Err() to see if it was done, I always selected on ctx.Done() but checking the Err() is even easier!)

You could compare using WalkDir() if you want, I don't have a network drive setup handy to test with right now. I'd be curious which one is faster!

@mholt mholt added feature ⚙️ New feature or request optimization 📉 Performance or cost improvements labels Oct 7, 2022
@mholt mholt added this to the v2.6.2 milestone Oct 7, 2022
@abdusco
Copy link
Contributor Author

abdusco commented Oct 8, 2022

Before the fix:

func (fsrv *FileServer) directoryListing(ctx context.Context, entries []fs.DirEntry, canGoUp bool, root, urlPath string, repl *caddy.Replacer) browseTemplateContext {
	// ...
	go func() {
		select {
		case <-ctx.Done():
			fsrv.logger.Error("listing cancelled")
		}
	}()
	for _, entry := range entries {
		// if err := ctx.Err(); err != nil {
		// 	break
		// }
		
		// ...
	}
	fsrv.logger.Error("listing finished")
2022/10/08 04:38:31.937 ERROR   http.handlers.file_server       listing cancelled
2022/10/08 04:38:59.884 ERROR   http.handlers.file_server       listing finished

Listing continues for ~30s more.

After the fix:

2022/10/08 04:42:55.004 ERROR   http.handlers.file_server       listing finished
2022/10/08 04:42:55.007 ERROR   http.handlers.file_server       listing cancelled

@abdusco
Copy link
Contributor Author

abdusco commented Oct 8, 2022

One more point of optimization, but it's not so straightforward:

Ideally, stat should be performed lazily, and executed after we apply the pagination. However this breaks sorting by size & time, because we wouldn't have those info at hand.

@abdusco
Copy link
Contributor Author

abdusco commented Oct 8, 2022

filepath.WalkDir fs.WalkDir calls dir.ReadDir(-1) internally, so it wouldn't make any difference (or it would be even slower for dirs with >10000 items) if we refactor.

abdusco added a commit to abdusco/caddy that referenced this issue Oct 8, 2022
Prevents caddy from performing disk IO needlessly when the request is cancelled before the listing is finished.

Closes caddyserver#5129
abdusco added a commit to abdusco/caddy that referenced this issue Oct 8, 2022
Prevents caddy from performing disk IO needlessly when the request is cancelled before the listing is finished.

Closes caddyserver#5129
mholt pushed a commit that referenced this issue Oct 8, 2022
Prevents caddy from performing disk IO needlessly when the request is cancelled before the listing is finished.

Closes #5129
@mholt
Copy link
Member

mholt commented Oct 8, 2022

Ah, right, I think I knew that, but forgot.

Thank you for the fix.

I'd be open to paginating the browse listings, if possible. (Could be a future TODO)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature ⚙️ New feature or request optimization 📉 Performance or cost improvements
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants