Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache docker info and version #209

Merged
merged 2 commits into from
Jun 27, 2023
Merged

Cache docker info and version #209

merged 2 commits into from
Jun 27, 2023

Conversation

xinfengliu
Copy link
Contributor

  • This is to reduce the load on docker daemon.
    In some cases, frequent docker info calls could lead to the system failure (high CPU, out-of-memory) due to many GetTotalUsedFds goroutines stuck in getdents64 syscall. In addition, docker version calls exec to run OS commands for getting other binaries' version, this can also become a bottleneck on a busy node due to many goroutines contending a lock for forking a new process.
    This PR caches docker info for 1 minute and tries best to get docker version from the cache.

  • Also add version info in logs on starting.

xinfengliu and others added 2 commits June 26, 2023 15:23
Also add version info on starting

Signed-off-by: Xinfeng Liu <XinfengLiu@icloud.com>
@nwneisen nwneisen self-requested a review June 26, 2023 15:34
@vikramhh
Copy link

vikramhh commented Jun 26, 2023

@xinfengliu - what is the worst case scenario for undesirable things happening because somewhere we used an almost one minute old value for docker info? Is it possible to contain those "bad effects" by limiting usage of cache? For example a scenario that may cause usage of a suboptimal logging driver [and its attendant side-effects] or "wrong" information about FS seems to be more serious and must-avoid than someone just getting stale statistics for a minute. Do we know if there are only some particular paths where docker info is used that are getting exercised most of the time - so that usage of cache could be restricted to just that path alone? My motive in asking these questions is to avoid any unexpected surprises to existing users of the API - we would be changing an existing contract with this change.

@afbjorklund
Copy link
Contributor

afbjorklund commented Jun 26, 2023

The FS information is wildly incorrect anyway, and has been for years (ever since storage was moved)

i.e. crictl imagefsinfo

That is, it only reports one subdirectory /var/lib/docker/image - but not the rest of the storage used.

core/docker_service.go Show resolved Hide resolved
core/docker_service.go Show resolved Hide resolved
core/docker_service.go Show resolved Hide resolved
@xinfengliu
Copy link
Contributor Author

@vikramhh
The main usage of docker info is getting the docker root dir (another usage is getting supported logging drivers, while cri-dockerd only supports json logging driver currently and docker supports it by default, so this is not a concern)

Changing docker root dir needs restarting docker daemon. On Linux, we configure the cri-dockerd service to depend on docker service, restarting docker service will restart cri-dockerd service too. So there's no stale data for docker root dir if we use cri-dockerd in this way.

I don't know how other softwares use cri-dockerd, if docker root dir changes and cri-dockerd does not restarts, then there will be stale data for 1 minute with this PR. It affects the reported ImageFsInfoResponse and ContainerStats in regard to FilesystemIdentifier of FilesystemUsage. Would it be a serious issue?

@vikramhh
Copy link

@xinfengliu - thanks for the details above, in light of which I am OK with the change as-is.

@xinfengliu
Copy link
Contributor Author

@afbjorklund

The FS information is wildly incorrect anyway, and has been for years (ever since storage was moved)

Yeah, we should fix it. Do you know if there's any existing github tickets for it in either this repo or kubernetes repo? I seem to not able to find it.

In addition, I'm not sure what's the best way to fix it. According to the definition of UsedBytes represents the bytes used for images on the filesystem. For most used overlay2 graph driver, docker uses overlay2 dir to store layers, but it also contains RW layers of containers. Maybe we should iterate /var/lib/docker/image/overlay2/layerdb/sha256/ to read each size file under it.

@afbjorklund
Copy link
Contributor

afbjorklund commented Jun 27, 2023

Most of CRI is only defined by the implementations, and those are not being tested.

I don't know if there are any issues open, or even where those would be at...


Anyhow, it is a separate topic from this info/version caching which sounds like a good idea.

But I think that docker system df knows how to get the information?

@neersighted neersighted merged commit 210faa6 into Mirantis:master Jun 27, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants