Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ls takes a long time for medium-sized directories #39

Closed
jacobsa opened this issue Apr 28, 2015 · 13 comments
Closed

ls takes a long time for medium-sized directories #39

jacobsa opened this issue Apr 28, 2015 · 13 comments
Assignees
Milestone

Comments

@jacobsa
Copy link
Contributor

jacobsa commented Apr 28, 2015

I hear a second-hand report that doing ls on a directory with a few hundred files takes multiple minutes, from GCE. This is probably due to ls doing a ton of stats plus our consistency guarantees, so plays into #29 (adding a "go fast" mode with caching and reduced guarantees). Investigate and make sure.

@jacobsa jacobsa self-assigned this Apr 28, 2015
@jacobsa jacobsa added this to the beta milestone Apr 28, 2015
@jacobsa
Copy link
Contributor Author

jacobsa commented Apr 28, 2015

Doing a plain ls on a bucket with 100 files in Asia from asia-east1-a takes about 173 ms.

ls -l mp/0 for a single file 0 usually takes about 200 ms but spikes to 1-2 s sometimes. It involves several requests:

fuse: 2015/04/28 01:58:50.592640 connection.go:65: Received: Getattr [ID=0x3c Node=0x1 Uid=1000 Gid=100 Pid=2408]
fuse: 2015/04/28 01:58:50.593083 ops.go:184: Responding: Getattr {AttrValid:0 Attr:{Inode:1 Size:0 Blocks:0 Atime:0001-01-01 00:00:00 +0000 UTC Mtime:0001-01-01 00:00:00 +0000 UTC Ctime:0001-01-01 00:00:00 +0000 UTC Crtime:0001-01-01 00:00:00 +0000 UTC Mode:drwx------ Nlink:1 Uid:1000 Gid:100 Rdev:0 Flags:0}}
fuse: 2015/04/28 01:58:50.593226 connection.go:65: Received: Lookup [ID=0x3d Node=0x1 Uid=1000 Gid=100 Pid=2408] "0"
fuse: 2015/04/28 01:58:50.663174 ops.go:150: Responding: Lookup {Node:2 Generation:0 EntryValid:0 AttrValid:0 Attr:{Inode:2 Size:0 Blocks:0 Atime:0001-01-01 00:00:00 +0000 UTC Mtime:2015-04-28 01:50:14.659 +0000 UTC Ctime:0001-01-01 00:00:00 +0000 UTC Crtime:0001-01-01 00:00:00 +0000 UTC Mode:-rwx------ Nlink:1 Uid:1000 Gid:100 Rdev:0 Flags:0}}
fuse: 2015/04/28 01:58:50.663284 connection.go:65: Received: Getattr [ID=0x3e Node=0x2 Uid=1000 Gid=100 Pid=2408]
fuse: 2015/04/28 01:58:50.696068 ops.go:184: Responding: Getattr {AttrValid:0 Attr:{Inode:2 Size:0 Blocks:0 Atime:0001-01-01 00:00:00 +0000 UTC Mtime:2015-04-28 01:50:14.659 +0000 UTC Ctime:0001-01-01 00:00:00 +0000 UTC Crtime:0001-01-01 00:00:00 +0000 UTC Mode:-rwx------ Nlink:1 Uid:1000 Gid:100 Rdev:0 Flags:0}}
fuse: 2015/04/28 01:58:50.696225 connection.go:65: Received: Getattr [ID=0x3f Node=0x1 Uid=1000 Gid=100 Pid=2408]
fuse: 2015/04/28 01:58:50.696507 ops.go:184: Responding: Getattr {AttrValid:0 Attr:{Inode:1 Size:0 Blocks:0 Atime:0001-01-01 00:00:00 +0000 UTC Mtime:0001-01-01 00:00:00 +0000 UTC Ctime:0001-01-01 00:00:00 +0000 UTC Crtime:0001-01-01 00:00:00 +0000 UTC Mode:drwx------ Nlink:1 Uid:1000 Gid:100 Rdev:0 Flags:0}}
fuse: 2015/04/28 01:58:50.696640 connection.go:65: Received: Lookup [ID=0x40 Node=0x1 Uid=1000 Gid=100 Pid=2408] "0"
fuse: 2015/04/28 01:58:50.755126 ops.go:150: Responding: Lookup {Node:2 Generation:0 EntryValid:0 AttrValid:0 Attr:{Inode:2 Size:0 Blocks:0 Atime:0001-01-01 00:00:00 +0000 UTC Mtime:2015-04-28 01:50:14.659 +0000 UTC Ctime:0001-01-01 00:00:00 +0000 UTC Crtime:0001-01-01 00:00:00 +0000 UTC Mode:-rwx------ Nlink:1 Uid:1000 Gid:100 Rdev:0 Flags:0}}
fuse: 2015/04/28 01:58:50.755285 connection.go:65: Received: Getattr [ID=0x41 Node=0x1 Uid=1000 Gid=100 Pid=2408]
fuse: 2015/04/28 01:58:50.755584 ops.go:184: Responding: Getattr {AttrValid:0 Attr:{Inode:1 Size:0 Blocks:0 Atime:0001-01-01 00:00:00 +0000 UTC Mtime:0001-01-01 00:00:00 +0000 UTC Ctime:0001-01-01 00:00:00 +0000 UTC Crtime:0001-01-01 00:00:00 +0000 UTC Mode:drwx------ Nlink:1 Uid:1000 Gid:100 Rdev:0 Flags:0}}
fuse: 2015/04/28 01:58:50.755654 connection.go:65: Received: Lookup [ID=0x42 Node=0x1 Uid=1000 Gid=100 Pid=2408] "0"
fuse: 2015/04/28 01:58:50.819987 ops.go:150: Responding: Lookup {Node:2 Generation:0 EntryValid:0 AttrValid:0 Attr:{Inode:2 Size:0 Blocks:0 Atime:0001-01-01 00:00:00 +0000 UTC Mtime:2015-04-28 01:50:14.659 +0000 UTC Ctime:0001-01-01 00:00:00 +0000 UTC Crtime:0001-01-01 00:00:00 +0000 UTC Mode:-rwx------ Nlink:1 Uid:1000 Gid:100 Rdev:0 Flags:0}}

ls -l mp takes about 10 seconds.

@jacobsa
Copy link
Contributor Author

jacobsa commented Apr 28, 2015

This appears to scale linearly: ls -l takes ~90 seconds for 1,000 files.

@jacobsa
Copy link
Contributor Author

jacobsa commented Apr 28, 2015

Plan: work on #29, then come back and measure again.

@jacobsa
Copy link
Contributor Author

jacobsa commented Apr 29, 2015

Looks like dropping the nlink feature (issue #40) helps a lot. Timings for time ls -l mp on a bucket with 500 files.

At 3557488:

  • 1m26.550s
  • 0m58.312s
  • 1m18.659s

At 787c24a:

  • 0m24.479s
  • 0m23.492s
  • 0m23.061s

@jacobsa
Copy link
Contributor Author

jacobsa commented May 1, 2015

Reposting from #29:

The only way I see to make the first ls -l very fast is to cache a mapping from name to file/dir/both flags on the directory inode when reading it, then using that to decide what to stat. Yet another kind of caching, with an inconsistent source. Ugh.

@jacobsa
Copy link
Contributor Author

jacobsa commented May 1, 2015

I've started on the type cache in branch type-cache, currently at 851c229. Some notes:

  • Wherever we manipulate the bucket in fs.go, we should instead be calling into DirInode, so it can maintain its cache. I've added panics above each.
  • Ditto in DirInode itself: it will need to update the cache whenever it reads from or writes to the bucket.
  • In order to make us less likely to screw this up, we probably want to go back to not caring about source generations for directory inodes, so that a given directory name's inode is stable until it is forgotten by the kernel. Luckily DirInode.src is used only for setting Nlink, which is disabled by default any. Kill this feature for directory inodes and update the documentation accordingly.

@jacobsa
Copy link
Contributor Author

jacobsa commented May 1, 2015

Oh, DirInode.SourceGeneration doesn't actually exist any more. Here instead is the code that needs fixing: we mint a new directory inode every single time.

Ah, now it makes sense to me why doing 500 touchs in parallel on files under the root inode takes forever blocking on a kernel mutex for the root inode, but doing it for a sub-directory does no such thing. Oops. Well, we'll lose that dubious benefit by doing this. Later we could get it back if we wanted to by moving the cache to a central place rather than the inode object, but meh for now.

jacobsa added a commit that referenced this issue May 1, 2015
 *  The inode object is now stable across requests. This is necessary for
    maintaining a cache in the inode object.

 *  There is no longer any concept of source objects, which complicated
    reasoning about caching on the inode object.

For #39.
@jacobsa
Copy link
Contributor Author

jacobsa commented May 1, 2015

Current status: I believe the skeleton of this is set up correctly on branch type-cache (currently at 80ebdbf).

To do:

  • In another branch, add a helper struct for type caching in the inode package.
  • Rebase type-cache on top of that.
  • Take care of all the "TODO" panics that are currently on the type-cache branch. Don't forget to put a mutex around the type cache. It should be a fast one; can we use DirInode.mu for that? Need to look at where it is held. If it's not held for long ever, can we just start requiring it for LookUpChild again?
  • Merge into master, description like "set up type caching support".
  • Hook up type caching support to a --type_cache_ttl flag.
  • Document that flag a la --stat_cache_ttl.
  • Test and make sure things work as expected.

jacobsa added a commit that referenced this issue May 1, 2015
jacobsa added a commit that referenced this issue May 1, 2015
Instead, send requests via DirInode, allowing it in the future to invalidate
and update its type cache.

For #39.
@jacobsa
Copy link
Contributor Author

jacobsa commented May 1, 2015

Still to do:

  • Take care of all the "TODO" panics that are currently on the type-cache branch.
  • Merge into master, description like "set up type caching support".
  • Hook up type caching support to a --type_cache_ttl flag.
  • Document that flag a la --stat_cache_ttl.
  • Test and make sure things work as expected.

@jacobsa
Copy link
Contributor Author

jacobsa commented May 4, 2015

This appears to be implemented at b38756f.

Still to do:

  • Why is it taking so long to list a directory containing 100 sub-dirs? It's probably the serial statting to overcome "automatic implicit dirs" in ListObjects. Add a --gcs.debug flag that causes every GCS request to be printed to find out for sure. If so, we'll want to make DirInode.ReadEntries deal with filtering out such entries rather than the dir handle, so it can do things in parallel. It should probably also consult the type cache and not bother statting if it says IsDir.
  • Test and make sure things work as expected.
  • Merge into master.

@jacobsa
Copy link
Contributor Author

jacobsa commented May 4, 2015

Okay, --gcs.debug shows pretty clearly this is some sort of livelock or similar. Output after SIGQUIT here.

jacobsa added a commit that referenced this issue May 4, 2015
@jacobsa
Copy link
Contributor Author

jacobsa commented May 4, 2015

That fixed it, but ls -1 --color=never still takes 4.23s on a root with 100 sub-dirs. Let's move the statting into DirInode as discussed above. Kill the error-prone bundle usage in dirHandle now.

jacobsa added a commit that referenced this issue May 4, 2015
jacobsa added a commit that referenced this issue May 4, 2015
@jacobsa
Copy link
Contributor Author

jacobsa commented May 4, 2015

Nice, that brings ls -1 --color=never to less than one second, as opposed to 4.23s above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant