Skip to content
This repository has been archived by the owner on Mar 9, 2022. It is now read-only.

Add image stats and integration test #257

Merged
merged 3 commits into from
Sep 25, 2017

Conversation

Random-Liu
Copy link
Member

Fixes #195.
Part of #121.
Depends on kubernetes/kubernetes#52635.

This PR:

  1. Update Kubernetes to include CRI stats fix. Fix CRI container/imagefs stats. kubernetes/kubernetes#52635
  2. Add ImageFsInfo support.
  3. Add integration test framework and integration test for ImageFsInfo.

@kubernetes-incubator/maintainers-cri-containerd

@Random-Liu
Copy link
Member Author

@miaoyq This PR is using mount.Lookup function in containerd, which may be useful for your PR.

@Random-Liu Random-Liu changed the title Add image stats Add image stats and integration test Sep 18, 2017
@miaoyq
Copy link
Member

miaoyq commented Sep 18, 2017

@Random-Liu Yes, I have used mount.GetMount function in docker, but I think mount.Lookup is better.
I will use this function after merged.

@Random-Liu Random-Liu force-pushed the add-image-stats branch 3 times, most recently from bbce1d3 to 3eb8ebf Compare September 19, 2017 00:21
@Random-Liu
Copy link
Member Author

@abhinandanpb @mikebrow Ready for review.

@Random-Liu Random-Liu force-pushed the add-image-stats branch 3 times, most recently from e2d861f to 2e7a61a Compare September 19, 2017 01:26
@Random-Liu Random-Liu force-pushed the add-image-stats branch 3 times, most recently from 3b7ee00 to f695f06 Compare September 19, 2017 21:51
@Random-Liu
Copy link
Member Author

Will rebase after #264 is merged.


// Eventually waits for f to return nil, it checks every period, and
// returns error if timeout exceeds.
func Eventually(f func() (bool, error), period, timeout time.Duration) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be better if func() (bool, error) is defined as a type, and given an explanation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

@@ -31,6 +31,8 @@ const configFilePathArgName = "config"

// ContainerdConfig contains config related to containerd
type ContainerdConfig struct {
// ContainerdRootDir is the root directory path for containerd.
ContainerdRootDir string
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As is suggested by @yanxuean, add toml here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@Random-Liu Random-Liu force-pushed the add-image-stats branch 2 times, most recently from efadb8f to e6ab872 Compare September 22, 2017 20:27
Signed-off-by: Lantao Liu <lantaol@google.com>

// Get returns the snapshot with specified id. Returns store.ErrNotExist if the
// snapshot doesn't exist.
func (s *Store) Get(id string) (Snapshot, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: id to key

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

}

// Delete deletes the snapshot with specified id.
func (s *Store) Delete(id string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

sns = s.List()
assert.Len(sns, 2)

t.Logf("get should return nil after deletion")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get should return empty struct and ErrNotExist after deletion

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

@abhi abhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. Just did a first pass

REPORT_DIR=${REPORT_DIR:-"/tmp/test-integration"}

mkdir -p ${REPORT_DIR}
start_cri_containerd ${REPORT_DIR}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought. Should we call this testSetup and testTeardown(for kill_cri_containerd) ? Because I remember I was trying to look at why containerd was not being started and realized its started as part of this :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


# Start cri-containerd
sudo ${ROOT}/_output/cri-containerd --alsologtostderr --v 4 ${CRI_CONTAINERD_FLAGS} \
&> ${report_dir}/cri-containerd.log &
readiness_check "sudo ${GOPATH}/bin/crictl --runtime-endpoint=${CRICONTAINERD_SOCK} info"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks cleaner :D

pkg/os/os.go Outdated
}

// DeviceUUID gets device uuid of a device. The passed in device should be
// a aboslute path of the device.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: %s/a/an

Copy link
Member

@mikebrow mikebrow Sep 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit /aboslute/absolute/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

var usedBytes, inodesUsed uint64
for _, sn := range snapshots {
// Use the oldest timestamp as the timestamp of imagefs info.
if sn.Timestamp < timestamp {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just for my knowledge. This is being done because we are caching the info stats ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a timestamp for the imagefs stats. And the imagefs stats is got from all snapshot stat.
Given so, it seems that we should use the earliest timestamp as the timestamp of imagefs stats?

}
resp, err := c.ImageFsInfo(context.Background(), &runtime.ImageFsInfoRequest{})
require.NoError(t, err)
assert.Equal(t, expected, resp.GetImageFilesystems())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the order returned by resp.GetImageFilesystems() gauranteed to be same here ? Because if we return from cache which is in the store (map) . I am sure you must have tested this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be only one filesystem stat in the list. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed :) May be just be explicit in this case? Sorry just in case somebody reuses this test to build on top they might just expect it to be in order since its done here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


// Store stores all snapshots.
type Store struct {
lock sync.RWMutex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think we can remove the name ? and just keep the type ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing the name means exposing the Lock/Unlock function to the user, which we may want to avoid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make sense

tick := time.NewTicker(s.syncPeriod)
go func() {
defer tick.Stop()
for {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so how is this go routine going to exit ? should we pass a context and cancel it ?
Note: I did see the comment above :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems expensive to be running on a timer even if there are no changes.. maybe a TODO to come up with better solution.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so how is this go routine going to exit ? should we pass a context and cancel it ?
Note: I did see the comment above :)

The goroutine will only stop when the process stops. Mentioned in the comment No stop function is needed because the syncer doesn't update any persistent states, it's fine to let it exit with the process.

this seems expensive to be running on a timer even if there are no changes.. maybe a TODO to come up with better solution.

Yes, this is expensive, but we have to do it. :( This is also how cadvisor works. Kubernetes retrieves stats every <10 second, and expect the stats should be returned immediately. So we have to cache the stats periodically for Kubernetes to use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right.. I remember that.. just pointing out we could use a todo to come up with a better way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikebrow OK. Will do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

pkg/os/os.go Outdated
}

// DeviceUUID gets device uuid of a device. The passed in device should be
// a aboslute path of the device.
Copy link
Member

@mikebrow mikebrow Sep 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit /aboslute/absolute/

if sn.Timestamp < timestamp {
timestamp = sn.Timestamp
}
usedBytes += sn.Size
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was the response supposed to be an aggregate response? If so why did they create an array of filesystemusage? Or is that what the below todo is for?

Copy link
Member Author

@Random-Liu Random-Liu Sep 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because there may be multiple filesystems used for image management.

tick := time.NewTicker(s.syncPeriod)
go func() {
defer tick.Stop()
for {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems expensive to be running on a timer even if there are no changes.. maybe a TODO to come up with better solution.

@Random-Liu
Copy link
Member Author

@miaoyq @abhinandanpb @mikebrow Addressed/replied comments.

Copy link
Member

@abhi abhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. One minor comment.

@Random-Liu
Copy link
Member Author

@abhinandanpb @mikebrow Addressed comments.

Will squash the latest 2 commits after LGTM.

Signed-off-by: Lantao Liu <lantaol@google.com>
Signed-off-by: Lantao Liu <lantaol@google.com>
@Random-Liu
Copy link
Member Author

Apply LGTM based on #257 (review).

Will merge it after test passes.

Copy link
Member

@mikebrow mikebrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/LGTM

@Random-Liu Random-Liu merged commit b9200ac into containerd:master Sep 25, 2017
@Random-Liu Random-Liu deleted the add-image-stats branch September 25, 2017 22:35
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add our own integration test framework.
5 participants