kata-agent: add network gRPC support#595
Conversation
80e718e to
1bf0069
Compare
|
Functioning network gRPC calls dependent on kata-containers/agent#126 |
1bf0069 to
693d6b5
Compare
|
This is dependent on the gRPC changes landing in kata-containers/agent and being re-vendorded. |
kata_agent.go
Outdated
| } | ||
|
|
||
| func (k *kataAgent) sendNetworkInterfaceAndRouteRequests(pod Pod) error { | ||
|
|
There was a problem hiding this comment.
Nit: unnecessary blank line. Also, a few missing newlines scattered around.
| var route grpc.Route | ||
| var ipAddresses []*grpc.IpAddress | ||
| for _, addr := range endpoint.Properties().Addrs { | ||
| // Skip IPv6 because not supported |
There was a problem hiding this comment.
Do we have an open issue for lack of IPv6 support?
There was a problem hiding this comment.
Might need separate issues for hyperstart_agent.go and kata_agent.go. And actually we might not support it for hyperstart_agent.go.
| // Skip IPv6 because not supported | ||
| // Skip localhost interface. | ||
| if addr.IP.To4() == nil || addr.IP.IsLoopback() { | ||
| continue |
There was a problem hiding this comment.
Could you split this into two tests and add a log call for the IPv6 case. You can then add an "unsupported" log call as we do for hyperstart:
kata_agent.go
Outdated
|
|
||
| _, err = k.sendReq(ifcReq) | ||
| if err != nil { | ||
| k.Logger().Error("Error sending update interface request: %s", err) |
There was a problem hiding this comment.
Here (and below) - could you use WithError():
k.Logger().WithError(err).Error("failed to send update interface request")
But maybe also add in a WithField() or WithFields() so it is 100% clear which interface failed to update (not sure of the format of error here).
kata_agent.go
Outdated
| hostname = hostname[:maxHostnameLen] | ||
| } | ||
|
|
||
| if err := k.sendNetworkInterfaceAndRouteRequests(pod); err != nil { |
There was a problem hiding this comment.
Could you add some tests to kata_agent_test.go or elsewhere to support this addition?
There was a problem hiding this comment.
Hey @jodh-intel - I started to do this a couple of times and am having a hard time. The current "mocking" in kata_agent_test.go is pretty limited (return nil, nil), so it is pretty hard to exercise without re-implementing a lot of logic from the existing kata agent in the test file. I now recall all of the mock discussions that you were driving back in the CC-3.0 pre-release time frame.
Let me think on how to handle this pragmatically. Let me know if you have suggestion in the meantime.
There was a problem hiding this comment.
I'm not sure I understand. We don't necessarily need 100% coverage but this PR adds a relatively big function with no tests. I don't think we should be attempting to land significant features without tests.
The approach I'd take is to break sendNetworkInterfaceAndRouteRequests() into smaller functions (for improved clarity, code complexity and testing) and you should then be able to unit-test most of those. But even as it is, can't you create test functions to atleast handle?:
fetchPodNetwork()failing.networkNS.NetNsPath == "".addr.IP.IsLoopback() == true(force it withlink, err := netlink.LinkByName("lo")?)
There was a problem hiding this comment.
Yes, you are right. I will refactor and add tests today.
| return nil | ||
| } | ||
|
|
||
| for _, endpoint := range networkNS.Endpoints { |
There was a problem hiding this comment.
Random thought: this code is pretty similar to:
I wonder if it would be worth refactoring to make a generic function in agent.go that accepted a Pod and a function to operate on the Endpoint's?
There was a problem hiding this comment.
It is similar in that it loops over the endpoints and addresses, but with the protocols being different the implementation is different enough (operating on an interface and route at a time) that I think it may be a bit painful for a potentially short term code reuse benefit. WDYT?
cf31760 to
37508b0
Compare
|
Tested with latest agent and this PR now includes revendoring for latest gRPC changes. @sboeuf @jodh-intel can you take a look? Would be good to get working networking in... |
|
@egernst I'll review it today ! |
|
Yep - would be great to get networking in, once we have tests for it. |
4a95abb to
96e346e
Compare
|
@jodh-intel I ended up doing some refactoring to help make it a bit more testable and then added some basic unit tests. This has also been verified with latest agent. The final looping over the resulting routes (and perhaps interfaces) will likely not be necessary in the future once we move to updating all the routes in a single shot. PTAL, and thanks for the feedback. |
96e346e to
27c0071
Compare
sboeuf
left a comment
There was a problem hiding this comment.
Looks good !
Only a few comments.
| var route grpc.Route | ||
| var ipAddresses []*grpc.IpAddress | ||
| for _, addr := range endpoint.Properties().Addrs { | ||
| // Skip IPv6 because not supported |
There was a problem hiding this comment.
Might need separate issues for hyperstart_agent.go and kata_agent.go. And actually we might not support it for hyperstart_agent.go.
| ifc := grpc.Interface{ | ||
| IPAddresses: ipAddresses, | ||
| Device: endpoint.Name(), | ||
| Name: endpoint.Name(), |
There was a problem hiding this comment.
If we don't make any difference between those params (Device and Name), might worth opening an issue on https://github.com/kata-containers/agent/ to remove one of them.
There was a problem hiding this comment.
kata_agent.go
Outdated
| // | ||
| // Setup network interfaces and routes | ||
| // | ||
| networkNS, err := pod.storage.fetchPodNetwork(pod.id) |
There was a problem hiding this comment.
Note: This should be part of the pod and we should be able to access those infos from in-memory. Opened an issue here: #628
Will be fixed in another PR
| interfaces, routes, err := k.generateKataInterfacesAndRoutes(networkNS) | ||
| if err != nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
Please add a blank line to let the code breathe 😄
And globally, this comment applies to every place in the code where there are not enough blank lines (for readability)
There was a problem hiding this comment.
Will add some air.
kata_agent.go
Outdated
| "interface requested": fmt.Sprintf("%+v", ifc), | ||
| "resulting interface": fmt.Sprintf("%+v", resultingInterface), | ||
| }).WithError(err).Error("update interface request failed") | ||
| continue |
There was a problem hiding this comment.
Why don't we fail here ? Have we agreed on ignoring failures here ?
There was a problem hiding this comment.
I agree on logging the error and continuing to apply remaining interfaces and routes. I think that this is an error, but not necessarily catastrophic -- user can get into the workload and then assess what failed in setup. WDYT?
There was a problem hiding this comment.
Well, only logging without returning this error might be confusing since the user will expect everything to work as expected if he does not check the logs. And he might not understand why he does not get what he expects inside the container.
We could have a retry mechanism (maybe retry 5 times with 100ms between each time), so that we really make sure there is a real issue here, that would worth throwing an error.
There was a problem hiding this comment.
@egernst @sboeuf If an interface or a route failed to be set, it's likely that networking will be at least partly dysfunctional, leaving the end-user with a potentially non working pod. By only logging this and not failing we'd give a broken pod to the upper layers of the stack and pretend it's all fine, which is bad from a end user experience perspective. In other words, we should log and fail here and the pod must not be passed upward. It should be stopped at least.
There was a problem hiding this comment.
+1 for failing - if we continue, surely the system is in an unknown state (bad)? Turning it round, do we ever think this might fail? If so, why?
There was a problem hiding this comment.
Agreed. Thanks guys.
There was a problem hiding this comment.
I should have stopped with error in the first place, mea culpa. It won't work today as I didn't handle conflicting routes (for the initial interface route in the guest). I have a new PR, kata-containers/agent#148, on the agent that I started (not ready just yet) which will act on a set of routes instead, acting fully declarative. Without this I don't think we can really test networking in a container without bypassing conflict errors (because the initial interface route already exists). This'll simplify our handling quite a bit -- I'll try to get this ready for merge today.
kata_agent.go
Outdated
| k.Logger().WithFields(logrus.Fields{ | ||
| "route requested": fmt.Sprintf("%+v", route), | ||
| }).WithError(err).Error("update route request failed") | ||
| continue |
There was a problem hiding this comment.
Why don't we fail here ? Have we agreed on ignoring failures here ?
| } | ||
| ipAddresses = append(ipAddresses, &ipAddress) | ||
| } | ||
| ifc := grpc.Interface{ |
There was a problem hiding this comment.
You can do
ifc := &grpc.Interface{so that later in the code you can do
ifaces = append(ifaces, ifc)27c0071 to
d82bb39
Compare
|
@sboeuf -- updated. |
kata_agent.go
Outdated
| return prepareAndStartShim(pod, k.shim, c.id, req.ExecId, k.state.URL, cmd) | ||
| } | ||
|
|
||
| func (k *kataAgent) generateKataInterfacesAndRoutes(networkNS NetworkNamespace) ([]*grpc.Interface, []*grpc.Route, error) { |
There was a problem hiding this comment.
The Kata part of this function name is redundant: This is a kataAgent method, and it's part of kata_agent.go already.
There was a problem hiding this comment.
apparently I really like to push the kata brand, in this case in error. Will update...
kata_agent.go
Outdated
| } | ||
|
|
||
| route.Gateway = r.Gw.String() | ||
| if route.Gateway == "<nil>" { |
There was a problem hiding this comment.
It would be good to have this <nil> string defined as a constant somewhere.
d82bb39 to
384ef8d
Compare
|
waiting for kata-containers/agent#148 to land so I can update the revendoring. In the meantime, addressed feedback from @sameo and @jodh-intel and @sboeuf . |
25cb547 to
f04e04d
Compare
Re-vendor agent in order to pull in latest gRPC protocol definitions for
network interfaces and routes.
shortlog:
3bfa071 CI: add setup.sh and run.sh to execute integration tests
d41d4e3 grpc: network: remove Add/Update/Remove route support
e16d5b1 network: add tests for updateRoutes
614ad59 grpc: network: add UpdateRoutes
4fc578f agent: Allow to retrieve the process of a stopped container
bfcf5dd network: add initial tests
cab528c grpc: initial changes for networking protocol
27d9f2c agent: run init setup first
8540821 agent: run init setup first
17fdd1b log: disable color
68794dd lint: Remove unused variables and struct fields
acdb463 log: init logrus always before use it
ae7887f vendor: prune unneeded code
8a4d6e4 vendor: fix dep warning
e51c454 doc: add CONTRIBUTING.md
8fba14f agent: Fix the amount of bytes transmitted to the shim
0aef782 CI: Run central go test script
f3926bb build: add ppc64le support
976ef0f agent: enable it to be the init process
54afb09 client: Replace grpc.WithTimeout
0d8ffbf CI: Require doc team signoff for doc changes
ee518f8 grpc: Fix not online Mem
9ed615c CI: switch to central static check script
cac185a CI: Rename static check script
5ebcaec CI: Fix gometalinter errors
fb52144 vendor: Update testify for latest assert
Signed-off-by: Eric Ernst <eric.ernst@intel.com>
f04e04d to
2a6e7a0
Compare
2a6e7a0 to
7ed8276
Compare
|
@sboeuf PTAL? |
sboeuf
left a comment
There was a problem hiding this comment.
One comment but this looks good !
kata_agent.go
Outdated
| // | ||
| // Setup network interfaces and routes | ||
| // | ||
| networkNS, err := pod.storage.fetchPodNetwork(pod.id) |
7ed8276 to
c0f3d04
Compare
|
@jodh-intel @sboeuf - updated based on updated pod structure. PTAL |
kata_agent.go
Outdated
| resultingInterface, err := k.sendReq(ifcReq) | ||
| if err != nil { | ||
| k.Logger().WithFields(logrus.Fields{ | ||
| "interface requested": fmt.Sprintf("%+v", ifc), |
There was a problem hiding this comment.
Nit: there is an informal understanding [*] that all log fields should be single-word (so I'd just replace spaces with hyphens here and below).
[*] - yep, we should make expectations formal by writing a doc! ;)
c0f3d04 to
43c9ed3
Compare
Introduce initial support for setting up interface and routes when working with Kata agent. Fixes: containers#596 Signed-off-by: Eric Ernst <eric.ernst@intel.com>
These tests deal with hand-generated netlink structures and check the resulting agent protocol interface and route structures. Signed-off-by: Eric Ernst <eric.ernst@intel.com>
Introduce support for setting up interface and routes when working with
Kata agent.
Signed-off-by: Eric Ernst eric.ernst@intel.com