Skip to content
This repository has been archived by the owner on May 6, 2020. It is now read-only.

vendor: Update virtcontainers vendoring #1044

Merged
merged 1 commit into from Mar 6, 2018
Merged

vendor: Update virtcontainers vendoring #1044

merged 1 commit into from Mar 6, 2018

Conversation

sboeuf
Copy link
Contributor

@sboeuf sboeuf commented Mar 2, 2018

Update virtcontainers version vendored into the runtime, addding the
new nsenter package so that our shims are now entering both network
and PID namespaces.
It also includes new Kata Containers agent protocol updates related
to networking and devices. Those are needed for latest network
support, block device passthrough, and devicemapper rootfs.
It also fixes some linter errors and improve the docs.

Shortlog:

4f9cb18 kata_agent: Add nouuid option for xfs filesystem
085848b kata_agent: Provide hotplugged device the right rootfs path
9ef18a6 kata_agent: Update host and guest paths for Kata
2c88b8e kata_agent: Wait for rootfs if it is a hotplugged device
15a4d17 kata_agent: Support block device passthrough
fd5b27f vendor: Update Kata agent protocol
0519e89 check: lint: ineffassign in qemu.go
ae661c5 network: Fix lint errors
5c73774 shim: Start shims inside PID namespace
6d7f6e5 shim: Add ability to enter any set of namespaces
f318b77 shim: Add the ability to spawn the shim in new namespaces
fefb087 pkg: nsenter: Introduce a generic nsenter package
43c05c2 utils: setup: Add some more prereq checks to the setup script.
1a25bad docs: developers: Add some developer docs
43c9ed3 kata-agent: add initial test for kata-agent networking
082ec3f kata-agent: add network gRPC support
4b30262 setup: #!/bin/bash in the wrong place
8b7fddf vendor: re-vendor kata-containers agent

Fixes #1043

Signed-off-by: Sebastien Boeuf sebastien.boeuf@intel.com

@sboeuf
Copy link
Contributor Author

sboeuf commented Mar 2, 2018

Replaces #1039 as it includes it.

@jodh-intel
Copy link
Contributor

jodh-intel commented Mar 2, 2018

lgtm

Approved with PullApprove Approved with PullApprove

@sboeuf
Copy link
Contributor Author

sboeuf commented Mar 2, 2018

@grahamwhaley if metrics failures are not significant, please go ahead and merge this one :)

@grahamwhaley
Copy link
Contributor

Hi @sboeuf metrics are not currently blocking, but they are interesting to check what happened (as we improve them to the point where we can make them blocking).
It helps if you have a peek in the logs and see if you can tell why they failed.
In this case almost all the IO metrics fell below where we expected...

@sboeuf @amshinde @devimc , do we think there is anything in this revendor that could hit the storage IO? If so, let me know, and I can do some hand-runs to verify or disprove...

@sboeuf
Copy link
Contributor Author

sboeuf commented Mar 2, 2018

@grahamwhaley no, nothing in particular should impact the IO since most of this revendoring is related to Kata Containers (you don't test Kata, do you ?). And the other main thing is about putting our shims into some PID namespaces, which should not change anything regarding the IO.

@amshinde
Copy link
Contributor

amshinde commented Mar 2, 2018

@grahamwhaley We did move from virtio-block to scsi, which is supposed to be slower performance wise. We need to see how this has affected the I/O benchamarks.

@sboeuf
Copy link
Contributor Author

sboeuf commented Mar 2, 2018

@amshinde yes sure, but this is not part of this vendoring, it has already been merged in a previous PR.

@egernst
Copy link

egernst commented Mar 2, 2018

@grahamwhaley -- I think this is an example of why we should have our metrics running in virtcontainers instead of runtime. Majority of the metric-depleting-features and bugs will be introduced at that level instead of here. What do you think? It'd be easier to catch sooner rather than just when we revendor.

I guess this won't be an issue once we move to kata?

@sboeuf
Copy link
Contributor Author

sboeuf commented Mar 2, 2018

Ok so let me paste the errors here:

| Fail | storage-IO-linear-read-bs-16k   | 180000.000 | 158467.690 | 300000.000 | 66.7 %  |     1 |     0.000 | 0.00 %  |
| Fail | storage-IO-linear-write-bs-16k  | 150000.000 | 147874.290 | 300000.000 | 100.0 % |     1 |     0.000 | 0.00 %  |
| Fail | storage-IO-random-write-bs-16k  | 150000.000 | 141030.530 | 300000.000 | 100.0 % |     1 |     0.000 | 0.00 %  |

I have noticed that we have merged the vendoring PR #1037 from @amshinde introducing this metrics error:

| Fail | storage-IO-random-write-bs-16k  | 150000.000 | 145717.380 | 300000.000 | 100.0 % |     1 |     0.000 | 0.00 %  |

I am thinking those issues are the same. I am going to raise a test PR on the runtime repo with virtio-blk as the default driver for block devices, and we'll see if this improves things (here is the test PR).

@amshinde
Copy link
Contributor

amshinde commented Mar 2, 2018

@sboeuf I think the failures are mostly due to SCSI changes. I am trying to make sense of the failures here, by how much performance has regressed due to SCSI.
@grahamwhaley Can you provide some insight.

@sboeuf PR #1037 failed for PnP before it was merged, an earlier run did pass though:
https://clearlinux.org/cc-ci/job/clear-containers-runtime-16.04-PR/150/console

@jodh-intel jodh-intel added the P2 label Mar 5, 2018
@grahamwhaley
Copy link
Contributor

@amshinde , you asked for some insight. The easiest thing is to look back at a PR that passed. Having a look at the logs for #1038, we can see:

+------+---------------------------------+------------+------------+------------+---------+-------+-----------+---------+
| P/F  |              NAME               |   FLOOR    |    MEAN    |  CEILING   |   GAP   | ITERS |    SD     |   COV   |
+------+---------------------------------+------------+------------+------------+---------+-------+-----------+---------+
| Pass | storage-IO-linear-read-bs-16k   | 180000.000 | 219922.530 | 300000.000 | 66.7 %  |     1 |     0.000 | 0.00 %  |
| Pass | storage-IO-linear-write-bs-16k  | 150000.000 | 199772.880 | 300000.000 | 100.0 % |     1 |     0.000 | 0.00 %  |
| Pass | storage-IO-random-read-bs-16k   | 150000.000 | 200365.270 | 300000.000 | 100.0 % |     1 |     0.000 | 0.00 %  |
| Pass | storage-IO-random-write-bs-16k  | 150000.000 | 166210.260 | 300000.000 | 100.0 % |     1 |     0.000 | 0.00 %  |

We should note that storage tests are quite noisy anyhow, so we expect to see some variation, and whilst we are refining the tests and the bounds check values at present, sometimes we do see the occasional check fail - but - to see three of the checks fail all substantially below what we normally see and the lower bounds check, it does feel like something has regressed.

Let's see if we can get some info from #1046 ... if that also hints we have a regress, then we can do some multiple hand-runs to get better info.

@grahamwhaley
Copy link
Contributor

@egernst Yes, we should probably run the metrics on both vc and the runtime(s).
Now we have the CI working with vc that does the magic re-vendoring, I think that is feasible (just had a peek at the scripts).
But, also, given the discussion of moving vc over to the kata-containers home, and that the metrics are not quite stable yet, I vote we delay implementing metrics on vc until the decision has been made as to where it lives for use with kata - OK?

@devimc
Copy link

devimc commented Mar 5, 2018

@sboeuf could you please re-vendor, to include latest changes of virtcontainers

@sboeuf
Copy link
Contributor Author

sboeuf commented Mar 5, 2018

@devimc yeah I will, but we need to get the metrics CI fixed before we move forward on this.

Update virtcontainers version vendored into the runtime, addding the
new nsenter package so that our shims are now entering both network
and PID namespaces.
It also includes new Kata Containers agent protocol updates related
to networking and devices. Those are needed for latest network
support, block device passthrough, and devicemapper rootfs.
It fixes some linter errors and improve the docs.
It also adds the support for CPU hotplug, and fixes Qemu SCSI
controller in case it runs in a nested environment.

Shortlog:

dba0a6f vendor: update govmm
83c209d qemu: disable modern in SCSI constroller
fef9feb gitignore: Add the test support files dir
12f35ce qemu: enable CPU hotplug
925a70c pod: hot add/remove vCPUs before starting containers
4a84438 qemu: Add support for CPU hot add/remove
8ec5e61 tests: benchmark: update bundles path to local dir
6d75bc5 cni: test: move test base dir to local files
4732b03 utils: Make script install components locally
c17c48d vendor: Constraint containerd vendoring
c9996f3 filesystem: set correct access mode for pod dir tree
4f9cb18 kata_agent: Add nouuid option for xfs filesystem
085848b kata_agent: Provide hotplugged device the right rootfs path
9ef18a6 kata_agent: Update host and guest paths for Kata
2c88b8e kata_agent: Wait for rootfs if it is a hotplugged device
15a4d17 kata_agent: Support block device passthrough
fd5b27f vendor: Update Kata agent protocol
0519e89 check: lint: ineffassign in qemu.go
ae661c5 network: Fix lint errors
5c73774 shim: Start shims inside PID namespace
6d7f6e5 shim: Add ability to enter any set of namespaces
f318b77 shim: Add the ability to spawn the shim in new namespaces
fefb087 pkg: nsenter: Introduce a generic nsenter package
43c05c2 utils: setup: Add some more prereq checks to the setup script.
1a25bad docs: developers: Add some developer docs
43c9ed3 kata-agent: add initial test for kata-agent networking
082ec3f kata-agent: add network gRPC support
4b30262 setup: #!/bin/bash in the wrong place
8b7fddf vendor: re-vendor kata-containers agent

Fixes #1043

Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
@sboeuf
Copy link
Contributor Author

sboeuf commented Mar 6, 2018

@egernst @amshinde @devimc the re-vendoring PR is ready to be merged.

@devimc
Copy link

devimc commented Mar 6, 2018

lgtm

Approved with PullApprove Approved with PullApprove

@egernst
Copy link

egernst commented Mar 6, 2018

Merging these changes despite being blocked by coveralls finishing. I think this is appropriate given the known issue with coveralls never completing (is there an issue for this @sboeuf @jodh-intel?).

Since this is a vendoring pull only, I think the usefulness of coveralls is limited as well.

@jodh-intel
Copy link
Contributor

Hi @egernst - yep, that's fine. There is a strange issue with coveralls/github reporting somewhere. When coveralls appears to be "stuck", if you look at the end of the 16.04 build log, you'll generally see that coveralls did run, as happened here:

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

vendor: update virtcontainers to get latest Kata agent protocol
6 participants