New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add an index to /var/lib/docker/containers #51
Comments
With modern filesystems, the only point of hashing is to avoid shocking the careless sysadmin when he issues "ls" in the wrong directory :-) In other words, you can have millions of files in a single directory without performance degradation. However, some kind of indexed database can definitely help. |
wait, what? most filesystems don't have a files per directory limit today, and isn't a clean up procedure, or to not log to the filesystem a better idea than to bloat with hashing or sqlite? On Tuesday 12 March 2013 at 11:51, jpetazzo wrote:
|
Ok :-) |
To defend the feature, I'd say having several thousands of directories in /var/lib/docker will make debugging super hard (a simple ls will be getting hard to run). And without an index, it's impossible to limit the "docker ps" command to the last 50 containers without walking the entire directory for instance. Do you guys have suggestion? I'd be glad to open another issue. |
there's a limit to how many containers you can ran simultaneous on a server, even with say 128gb ram you can't really have more than one-two thousands containers running, at least not doing anything useful? On Tuesday 12 March 2013 at 12:04, Sam Alba wrote:
|
The problem here is not really the running containers (and you're right about that btw). The main issue is more for all container and the history of containers listed with the ps command. Basically if I run: $ repeat 3 docker run -t base:e9cb4ad9173245ac /bin/true My first 3 lines of ps -a will be:
The current implementation of the ps command must walk the entire filesystem because of that (whether it runs with `-a' or not). After several months of using docker regularly, the ps command will quickly become useless and/or super slow. We could garbage collect the old containers, but I hate losing historical data if I don't have a strong reason. Sqlite tends to be a nice candidate for this indexing matter, if you have better idea, I'd be glad to hear. Your feedback is super useful, we want to build the best container system for users like you and your comments will help to make the project live soon! I am changing the title of the issue since it's not really a filesystem hashing concern, but more to index the containers data. |
how interesting are old processes on a individual docker machine? shouldn't it be the job of the component administering docker through the mgmt api to store and use that information? the lightweightness of the docker today really appeals to me. by adding yet another big dependency such as sqlite defeats that, especially if it's not for a very very good reason. you know how i goes, sqlite has to be compiled with the correct flags, it's upgraded and changes schema format, and what not. debugging is harder for new docker users as they have to know that you use sqlite, and then figure out the schema etc, the sqlite file gets corrupted and you're in a world of pain, which means that you have to have a backup routine for it and yadda yadda.. On Tuesday 12 March 2013 at 13:52, Sam Alba wrote:
|
I'm not familiar (not yet!) with the directory format; but it could make sense to keep only the last N (10?) entries for each container, and move the older entries to an
|
FWIW, docker already uses sqlite for image metadata (see the fs branch Sqlite definitely feels like an improvement compared to the "big json blob" Another alternative we looked at is LevelDB, but it seemed too low-level, This doesn't necessarily mean we want to use sqlite for container metadata |
oh, ok, yes, if a db is needed, than sqlite is certainly the way to go. i was just seeing infront of me that this type of data was stored on a higher level, that to use docker on a single machine was an exception, as to use the CLI. i was thinking of a "docker orchestrator" or something that managed multiple docker machines, and that docker machines was more or less stateless and easily replaceable (due to hardware errors etc). and that the orchestrator stored the meta data of images, stored the logs (if it needed them) and so forth.. i guess it's not mutually exclusive but yeah, had the view that docker in it self would do as little as possible.. |
Agreed, currently performance is acceptable in real-world scenarios, I have We can revive the indexing scenario when we start needing more advanced Thanks for catching this! On Tue, Jun 25, 2013 at 5:43 PM, Jérôme Petazzoni
|
Handle image metadata when drivers are switched
looks like we have hit this issue |
…inerd-startup-error [18.09] backport: Add fail fast path when containerd fails on startup
Don't try to restore containers on restart with contaienrd
Don't try to restore containers on restart with contaienrd Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
Refactor libkv to not directly import storage backends
Refactor libkv to not directly import storage backends
Docker keeps a lot of data on the disk. Basically one directory per command.
It's really important to hash the container directories in order to not reach a critical amount of directories inside /var/lib/docker/containers.
The problem with directory hashing is that it'll be more difficult to list the containers (without walking the whole filesystem). It's then mandatory to index the directories (using a sqlite db?) to keep the ps command fast (and to limit this command to a certain number of containers by default).
The text was updated successfully, but these errors were encountered: