Add an index to /var/lib/docker/containers #51

samalba · 2013-03-12T03:45:47Z

Docker keeps a lot of data on the disk. Basically one directory per command.

It's really important to hash the container directories in order to not reach a critical amount of directories inside /var/lib/docker/containers.

The problem with directory hashing is that it'll be more difficult to list the containers (without walking the whole filesystem). It's then mandatory to index the directories (using a sqlite db?) to keep the ps command fast (and to limit this command to a certain number of containers by default).

jpetazzo · 2013-03-12T03:51:34Z

With modern filesystems, the only point of hashing is to avoid shocking the careless sysadmin when he issues "ls" in the wrong directory :-)

In other words, you can have millions of files in a single directory without performance degradation.

However, some kind of indexed database can definitely help.

carlhoerberg · 2013-03-12T03:53:26Z

wait, what? most filesystems don't have a files per directory limit today, and isn't a clean up procedure, or to not log to the filesystem a better idea than to bloat with hashing or sqlite?

On Tuesday 12 March 2013 at 11:51, jpetazzo wrote:

With modern filesystems, the only point of hashing is to avoid shocking the careless sysadmin when he issues "ls" in the wrong directory :-)
In other words, you can have millions of files in a single directory without performance degradation.
However, some kind of indexed database can definitely help.

—
Reply to this email directly or view it on GitHub (#51 (comment)).

samalba · 2013-03-12T03:57:32Z

Ok :-)

samalba · 2013-03-12T04:04:47Z

To defend the feature, I'd say having several thousands of directories in /var/lib/docker will make debugging super hard (a simple ls will be getting hard to run). And without an index, it's impossible to limit the "docker ps" command to the last 50 containers without walking the entire directory for instance.

Do you guys have suggestion? I'd be glad to open another issue.

carlhoerberg · 2013-03-12T05:27:11Z

there's a limit to how many containers you can ran simultaneous on a server, even with say 128gb ram you can't really have more than one-two thousands containers running, at least not doing anything useful?

On Tuesday 12 March 2013 at 12:04, Sam Alba wrote:

To defend the feature, I'd say having several thousands of directories in /var/lib/docker will make debugging super hard (a simple ls will be getting hard to run). And without an index, it's impossible to limit the "docker ps" command to the last 50 containers without walking the entire directory for instance.
Do you guys have suggestion? I'd be glad to open another issue.

—
Reply to this email directly or view it on GitHub (#51 (comment)).

samalba · 2013-03-12T05:52:21Z

The problem here is not really the running containers (and you're right about that btw). The main issue is more for all container and the history of containers listed with the ps command.

Basically if I run:

$ repeat 3 docker run -t base:e9cb4ad9173245ac /bin/true

My first 3 lines of ps -a will be:

$ docker ps -a
ID          IMAGE                             COMMAND                CREATED         STATUS      COMMENT
afd76b15    base:e9cb4ad9173245ac             /bin/true              3 seconds ago   Exit 0
43ae3861    base:e9cb4ad9173245ac             /bin/true              4 seconds ago   Exit 0
e398c0db    base:e9cb4ad9173245ac             /bin/true              5 seconds ago   Exit 0

The current implementation of the ps command must walk the entire filesystem because of that (whether it runs with `-a' or not). After several months of using docker regularly, the ps command will quickly become useless and/or super slow. We could garbage collect the old containers, but I hate losing historical data if I don't have a strong reason.

Sqlite tends to be a nice candidate for this indexing matter, if you have better idea, I'd be glad to hear.

Your feedback is super useful, we want to build the best container system for users like you and your comments will help to make the project live soon!

I am changing the title of the issue since it's not really a filesystem hashing concern, but more to index the containers data.

carlhoerberg · 2013-03-12T06:03:02Z

how interesting are old processes on a individual docker machine? shouldn't it be the job of the component administering docker through the mgmt api to store and use that information?

the lightweightness of the docker today really appeals to me. by adding yet another big dependency such as sqlite defeats that, especially if it's not for a very very good reason. you know how i goes, sqlite has to be compiled with the correct flags, it's upgraded and changes schema format, and what not. debugging is harder for new docker users as they have to know that you use sqlite, and then figure out the schema etc, the sqlite file gets corrupted and you're in a world of pain, which means that you have to have a backup routine for it and yadda yadda..

On Tuesday 12 March 2013 at 13:52, Sam Alba wrote:

The problem here is not really the running containers (and you're right about that btw). The main issue is more for all container and the history of containers listed with the ps command.
Basically if I run:
$ repeat 3 docker run -t base:e9cb4ad9173245ac /bin/true
My first 3 lines of ps -a will be:
$ docker ps -a ID IMAGE COMMAND CREATED STATUS COMMENT afd76b15 base:e9cb4ad9173245ac /bin/true 3 seconds ago Exit 0 43ae3861 base:e9cb4ad9173245ac /bin/true 4 seconds ago Exit 0 e398c0db base:e9cb4ad9173245ac /bin/true 5 seconds ago Exit 0
The current implementation of the ps command must walk the entire filesystem because of that (whether it runs with `-a' or not). After several months of using docker regularly, the ps command will quickly become useless and/or super slow. We could garbage collect the old containers, but I hate losing historical data if I don't have a strong reason.
Sqlite tends to be a nice candidate for this indexing matter, if you have better idea, I'd be glad to hear.
Your feedback is super useful, we want to build the best container system for users like you and your comments will help to make the project live soon!
I am changing the title of the issue since it's not really a filesystem hashing concern, but more to index the containers data.

—
Reply to this email directly or view it on GitHub (#51 (comment)).

jpetazzo · 2013-03-12T06:18:51Z

I'm not familiar (not yet!) with the directory format; but it could make sense to keep only the last N (10?) entries for each container, and move the older entries to an attic directory.

You can nuke the attic directory without losing recent data.
You can see older results by adding --all or something similar to the docker commands.

shykes · 2013-03-12T07:23:31Z

FWIW, docker already uses sqlite for image metadata (see the fs branch
which is about to be merged).

Sqlite definitely feels like an improvement compared to the "big json blob"
system that it replaced. I agree that using a binary format and depending
on a 3d-party tool is always something to be cautious about, and I'm not
sure how much of a problem schema changes will pose. But it beats
re-inventing our own half-assed embedded database, which would probably
suffer from similar problems anyway.

Another alternative we looked at is LevelDB, but it seemed too low-level,
and less proven than sqlite.

This doesn't necessarily mean we want to use sqlite for container metadata
(container store and image stores are 2 distinct components), but it's
definitely a technical possibility.

carlhoerberg · 2013-03-12T15:31:36Z

oh, ok, yes, if a db is needed, than sqlite is certainly the way to go.

i was just seeing infront of me that this type of data was stored on a higher level, that to use docker on a single machine was an exception, as to use the CLI. i was thinking of a "docker orchestrator" or something that managed multiple docker machines, and that docker machines was more or less stateless and easily replaceable (due to hardware errors etc). and that the orchestrator stored the meta data of images, stored the logs (if it needed them) and so forth..

i guess it's not mutually exclusive but yeah, had the view that docker in it self would do as little as possible..

jpetazzo · 2013-06-26T00:43:36Z

We ditched sqlite to get rid of the dependency; so... Maybe this should come back later, if/when performance issues arise? I suggest closing this; what do you think @shykes @creack ?

shykes · 2013-06-26T01:03:24Z

Agreed, currently performance is acceptable in real-world scenarios, I have
1000+ containers and lookup is still fast since it's either direct key
lookup, or a simple tag lookup.

We can revive the indexing scenario when we start needing more advanced
filtering and search.

Thanks for catching this!

On Tue, Jun 25, 2013 at 5:43 PM, Jérôme Petazzoni
notifications@github.comwrote:

We ditched sqlite to get rid of the dependency; so... Maybe this should
come back later, if/when performance issues arise? I suggest closing this;
what do you think @shykes https://github.com/shykes @creackhttps://github.com/creack?

—
Reply to this email directly or view it on GitHubhttps://github.com//issues/51#issuecomment-20019466
.

Handle image metadata when drivers are switched

anandkumarpatel · 2014-04-18T06:57:55Z

looks like we have hit this issue
24923 containers == docker ps: real 0m54.406s
29191 containers == docker ps: real 6m42.420s

…inerd-startup-error [18.09] backport: Add fail fast path when containerd fails on startup

Don't try to restore containers on restart with contaienrd

Don't try to restore containers on restart with contaienrd Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>

Refactor libkv to not directly import storage backends

ghost assigned samalba Mar 12, 2013

samalba closed this as completed Mar 12, 2013

samalba reopened this Mar 12, 2013

shykes closed this as completed Jun 26, 2013

crosbymichael pushed a commit to crosbymichael/docker that referenced this issue Nov 20, 2013

Merge pull request moby#51 from crosbymichael/driver-specific-image

579a5c8

Handle image metadata when drivers are switched

anandkumarpatel mentioned this issue Apr 18, 2014

docker ps is slow #5297

Closed

mateeyow mentioned this issue May 12, 2015

Update docker version to latest #13145

Closed

gregath mentioned this issue Jul 6, 2016

Can't access internet from containers #13381

Closed

tiborvass pushed a commit to tiborvass/docker that referenced this issue Sep 24, 2018

Merge pull request moby#51 from thaJeztah/18.09_backport_fix-libconta…

e69efe2

…inerd-startup-error [18.09] backport: Add fail fast path when containerd fails on startup

OhMyMndy mentioned this issue Nov 5, 2020

Docker inspect and API don't have the same data #41610

Closed

ndeloof pushed a commit to ndeloof/docker that referenced this issue Aug 10, 2022

Merge pull request moby#51 from rumpl/feat-no-restore

7b5ae4c

Don't try to restore containers on restart with contaienrd

crazy-max pushed a commit to crazy-max/moby that referenced this issue Sep 29, 2022

Merge pull request moby#51 from rumpl/feat-no-restore

eb87f2a

Don't try to restore containers on restart with contaienrd Signed-off-by: CrazyMax <crazy-max@users.noreply.github.com>

thaJeztah pushed a commit to thaJeztah/docker that referenced this issue Jun 26, 2023

Merge pull request moby#51 from abronan/fix_vendor_import

1dc33d4

Refactor libkv to not directly import storage backends

thaJeztah pushed a commit to thaJeztah/docker that referenced this issue Jun 26, 2023

Merge pull request moby#51 from abronan/fix_vendor_import

29a75f0

Refactor libkv to not directly import storage backends

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an index to /var/lib/docker/containers #51

Add an index to /var/lib/docker/containers #51

samalba commented Mar 12, 2013

jpetazzo commented Mar 12, 2013

carlhoerberg commented Mar 12, 2013

samalba commented Mar 12, 2013

samalba commented Mar 12, 2013

carlhoerberg commented Mar 12, 2013

samalba commented Mar 12, 2013

carlhoerberg commented Mar 12, 2013

jpetazzo commented Mar 12, 2013

shykes commented Mar 12, 2013

carlhoerberg commented Mar 12, 2013

jpetazzo commented Jun 26, 2013

shykes commented Jun 26, 2013

anandkumarpatel commented Apr 18, 2014

Add an index to /var/lib/docker/containers #51

Add an index to /var/lib/docker/containers #51

Comments

samalba commented Mar 12, 2013

jpetazzo commented Mar 12, 2013

carlhoerberg commented Mar 12, 2013

samalba commented Mar 12, 2013

samalba commented Mar 12, 2013

carlhoerberg commented Mar 12, 2013

samalba commented Mar 12, 2013

carlhoerberg commented Mar 12, 2013

jpetazzo commented Mar 12, 2013

shykes commented Mar 12, 2013

carlhoerberg commented Mar 12, 2013

jpetazzo commented Jun 26, 2013

shykes commented Jun 26, 2013

anandkumarpatel commented Apr 18, 2014