Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an index to /var/lib/docker/containers #51

Closed
samalba opened this issue Mar 12, 2013 · 13 comments
Closed

Add an index to /var/lib/docker/containers #51

samalba opened this issue Mar 12, 2013 · 13 comments
Assignees

Comments

@samalba
Copy link
Contributor

samalba commented Mar 12, 2013

Docker keeps a lot of data on the disk. Basically one directory per command.

It's really important to hash the container directories in order to not reach a critical amount of directories inside /var/lib/docker/containers.

The problem with directory hashing is that it'll be more difficult to list the containers (without walking the whole filesystem). It's then mandatory to index the directories (using a sqlite db?) to keep the ps command fast (and to limit this command to a certain number of containers by default).

@ghost ghost assigned samalba Mar 12, 2013
@jpetazzo
Copy link
Contributor

With modern filesystems, the only point of hashing is to avoid shocking the careless sysadmin when he issues "ls" in the wrong directory :-)

In other words, you can have millions of files in a single directory without performance degradation.

However, some kind of indexed database can definitely help.

@carlhoerberg
Copy link

wait, what? most filesystems don't have a files per directory limit today, and isn't a clean up procedure, or to not log to the filesystem a better idea than to bloat with hashing or sqlite?

On Tuesday 12 March 2013 at 11:51, jpetazzo wrote:

With modern filesystems, the only point of hashing is to avoid shocking the careless sysadmin when he issues "ls" in the wrong directory :-)
In other words, you can have millions of files in a single directory without performance degradation.
However, some kind of indexed database can definitely help.


Reply to this email directly or view it on GitHub (#51 (comment)).

@samalba
Copy link
Contributor Author

samalba commented Mar 12, 2013

Ok :-)

@samalba samalba closed this as completed Mar 12, 2013
@samalba
Copy link
Contributor Author

samalba commented Mar 12, 2013

To defend the feature, I'd say having several thousands of directories in /var/lib/docker will make debugging super hard (a simple ls will be getting hard to run). And without an index, it's impossible to limit the "docker ps" command to the last 50 containers without walking the entire directory for instance.

Do you guys have suggestion? I'd be glad to open another issue.

@carlhoerberg
Copy link

there's a limit to how many containers you can ran simultaneous on a server, even with say 128gb ram you can't really have more than one-two thousands containers running, at least not doing anything useful?

On Tuesday 12 March 2013 at 12:04, Sam Alba wrote:

To defend the feature, I'd say having several thousands of directories in /var/lib/docker will make debugging super hard (a simple ls will be getting hard to run). And without an index, it's impossible to limit the "docker ps" command to the last 50 containers without walking the entire directory for instance.
Do you guys have suggestion? I'd be glad to open another issue.


Reply to this email directly or view it on GitHub (#51 (comment)).

@samalba
Copy link
Contributor Author

samalba commented Mar 12, 2013

The problem here is not really the running containers (and you're right about that btw). The main issue is more for all container and the history of containers listed with the ps command.

Basically if I run:

$ repeat 3 docker run -t base:e9cb4ad9173245ac /bin/true

My first 3 lines of ps -a will be:

$ docker ps -a
ID          IMAGE                             COMMAND                CREATED         STATUS      COMMENT
afd76b15    base:e9cb4ad9173245ac             /bin/true              3 seconds ago   Exit 0
43ae3861    base:e9cb4ad9173245ac             /bin/true              4 seconds ago   Exit 0
e398c0db    base:e9cb4ad9173245ac             /bin/true              5 seconds ago   Exit 0

The current implementation of the ps command must walk the entire filesystem because of that (whether it runs with `-a' or not). After several months of using docker regularly, the ps command will quickly become useless and/or super slow. We could garbage collect the old containers, but I hate losing historical data if I don't have a strong reason.

Sqlite tends to be a nice candidate for this indexing matter, if you have better idea, I'd be glad to hear.

Your feedback is super useful, we want to build the best container system for users like you and your comments will help to make the project live soon!

I am changing the title of the issue since it's not really a filesystem hashing concern, but more to index the containers data.

@samalba samalba reopened this Mar 12, 2013
@carlhoerberg
Copy link

how interesting are old processes on a individual docker machine? shouldn't it be the job of the component administering docker through the mgmt api to store and use that information?

the lightweightness of the docker today really appeals to me. by adding yet another big dependency such as sqlite defeats that, especially if it's not for a very very good reason. you know how i goes, sqlite has to be compiled with the correct flags, it's upgraded and changes schema format, and what not. debugging is harder for new docker users as they have to know that you use sqlite, and then figure out the schema etc, the sqlite file gets corrupted and you're in a world of pain, which means that you have to have a backup routine for it and yadda yadda..

On Tuesday 12 March 2013 at 13:52, Sam Alba wrote:

The problem here is not really the running containers (and you're right about that btw). The main issue is more for all container and the history of containers listed with the ps command.
Basically if I run:
$ repeat 3 docker run -t base:e9cb4ad9173245ac /bin/true
My first 3 lines of ps -a will be:
$ docker ps -a ID IMAGE COMMAND CREATED STATUS COMMENT afd76b15 base:e9cb4ad9173245ac /bin/true 3 seconds ago Exit 0 43ae3861 base:e9cb4ad9173245ac /bin/true 4 seconds ago Exit 0 e398c0db base:e9cb4ad9173245ac /bin/true 5 seconds ago Exit 0
The current implementation of the ps command must walk the entire filesystem because of that (whether it runs with `-a' or not). After several months of using docker regularly, the ps command will quickly become useless and/or super slow. We could garbage collect the old containers, but I hate losing historical data if I don't have a strong reason.
Sqlite tends to be a nice candidate for this indexing matter, if you have better idea, I'd be glad to hear.
Your feedback is super useful, we want to build the best container system for users like you and your comments will help to make the project live soon!
I am changing the title of the issue since it's not really a filesystem hashing concern, but more to index the containers data.


Reply to this email directly or view it on GitHub (#51 (comment)).

@jpetazzo
Copy link
Contributor

I'm not familiar (not yet!) with the directory format; but it could make sense to keep only the last N (10?) entries for each container, and move the older entries to an attic directory.

  • You can nuke the attic directory without losing recent data.
  • You can see older results by adding --all or something similar to the docker commands.

@shykes
Copy link
Contributor

shykes commented Mar 12, 2013

FWIW, docker already uses sqlite for image metadata (see the fs branch
which is about to be merged).

Sqlite definitely feels like an improvement compared to the "big json blob"
system that it replaced. I agree that using a binary format and depending
on a 3d-party tool is always something to be cautious about, and I'm not
sure how much of a problem schema changes will pose. But it beats
re-inventing our own half-assed embedded database, which would probably
suffer from similar problems anyway.

Another alternative we looked at is LevelDB, but it seemed too low-level,
and less proven than sqlite.

This doesn't necessarily mean we want to use sqlite for container metadata
(container store and image stores are 2 distinct components), but it's
definitely a technical possibility.

@carlhoerberg
Copy link

oh, ok, yes, if a db is needed, than sqlite is certainly the way to go.

i was just seeing infront of me that this type of data was stored on a higher level, that to use docker on a single machine was an exception, as to use the CLI. i was thinking of a "docker orchestrator" or something that managed multiple docker machines, and that docker machines was more or less stateless and easily replaceable (due to hardware errors etc). and that the orchestrator stored the meta data of images, stored the logs (if it needed them) and so forth..

i guess it's not mutually exclusive but yeah, had the view that docker in it self would do as little as possible..

@jpetazzo
Copy link
Contributor

We ditched sqlite to get rid of the dependency; so... Maybe this should come back later, if/when performance issues arise? I suggest closing this; what do you think @shykes @creack ?

@shykes shykes closed this as completed Jun 26, 2013
@shykes
Copy link
Contributor

shykes commented Jun 26, 2013

Agreed, currently performance is acceptable in real-world scenarios, I have
1000+ containers and lookup is still fast since it's either direct key
lookup, or a simple tag lookup.

We can revive the indexing scenario when we start needing more advanced
filtering and search.

Thanks for catching this!

On Tue, Jun 25, 2013 at 5:43 PM, Jérôme Petazzoni
notifications@github.comwrote:

We ditched sqlite to get rid of the dependency; so... Maybe this should
come back later, if/when performance issues arise? I suggest closing this;
what do you think @shykes https://github.com/shykes @creackhttps://github.com/creack?


Reply to this email directly or view it on GitHubhttps://github.com//issues/51#issuecomment-20019466
.

crosbymichael pushed a commit to crosbymichael/docker that referenced this issue Nov 20, 2013
Handle image metadata when drivers are switched
@anandkumarpatel
Copy link
Contributor

looks like we have hit this issue
24923 containers == docker ps: real 0m54.406s
29191 containers == docker ps: real 6m42.420s

tiborvass pushed a commit to tiborvass/docker that referenced this issue Sep 24, 2018
…inerd-startup-error

[18.09] backport: Add fail fast path when containerd fails on startup
ndeloof pushed a commit to ndeloof/docker that referenced this issue Aug 10, 2022
Don't try to restore containers on restart with contaienrd
crazy-max pushed a commit to crazy-max/moby that referenced this issue Sep 29, 2022
Don't try to restore containers on restart with contaienrd
Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>
thaJeztah pushed a commit to thaJeztah/docker that referenced this issue Jun 26, 2023
Refactor libkv to not directly import storage backends
thaJeztah pushed a commit to thaJeztah/docker that referenced this issue Jun 26, 2023
Refactor libkv to not directly import storage backends
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants