Skip to content
This repository was archived by the owner on Apr 3, 2018. It is now read-only.

Comments

kata-agent: add network gRPC support#595

Merged
sameo merged 3 commits intocontainers:masterfrom
egernst:network-agent-changes
Feb 27, 2018
Merged

kata-agent: add network gRPC support#595
sameo merged 3 commits intocontainers:masterfrom
egernst:network-agent-changes

Conversation

@egernst
Copy link
Collaborator

@egernst egernst commented Feb 6, 2018

Introduce support for setting up interface and routes when working with
Kata agent.

Signed-off-by: Eric Ernst eric.ernst@intel.com

@egernst
Copy link
Collaborator Author

egernst commented Feb 6, 2018

Functioning network gRPC calls dependent on kata-containers/agent#126

@egernst
Copy link
Collaborator Author

egernst commented Feb 6, 2018

This is dependent on the gRPC changes landing in kata-containers/agent and being re-vendorded.

kata_agent.go Outdated
}

func (k *kataAgent) sendNetworkInterfaceAndRouteRequests(pod Pod) error {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: unnecessary blank line. Also, a few missing newlines scattered around.

var route grpc.Route
var ipAddresses []*grpc.IpAddress
for _, addr := range endpoint.Properties().Addrs {
// Skip IPv6 because not supported
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have an open issue for lack of IPv6 support?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes: #579

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might need separate issues for hyperstart_agent.go and kata_agent.go. And actually we might not support it for hyperstart_agent.go.

// Skip IPv6 because not supported
// Skip localhost interface.
if addr.IP.To4() == nil || addr.IP.IsLoopback() {
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you split this into two tests and add a log call for the IPv6 case. You can then add an "unsupported" log call as we do for hyperstart:

kata_agent.go Outdated

_, err = k.sendReq(ifcReq)
if err != nil {
k.Logger().Error("Error sending update interface request: %s", err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here (and below) - could you use WithError():

k.Logger().WithError(err).Error("failed to send update interface request")

But maybe also add in a WithField() or WithFields() so it is 100% clear which interface failed to update (not sure of the format of error here).

kata_agent.go Outdated
hostname = hostname[:maxHostnameLen]
}

if err := k.sendNetworkInterfaceAndRouteRequests(pod); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some tests to kata_agent_test.go or elsewhere to support this addition?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jodh-intel - I started to do this a couple of times and am having a hard time. The current "mocking" in kata_agent_test.go is pretty limited (return nil, nil), so it is pretty hard to exercise without re-implementing a lot of logic from the existing kata agent in the test file. I now recall all of the mock discussions that you were driving back in the CC-3.0 pre-release time frame.

Let me think on how to handle this pragmatically. Let me know if you have suggestion in the meantime.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand. We don't necessarily need 100% coverage but this PR adds a relatively big function with no tests. I don't think we should be attempting to land significant features without tests.

The approach I'd take is to break sendNetworkInterfaceAndRouteRequests() into smaller functions (for improved clarity, code complexity and testing) and you should then be able to unit-test most of those. But even as it is, can't you create test functions to atleast handle?:

  • fetchPodNetwork() failing.
  • networkNS.NetNsPath == "".
  • addr.IP.IsLoopback() == true (force it with link, err := netlink.LinkByName("lo") ?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. I will refactor and add tests today.

return nil
}

for _, endpoint := range networkNS.Endpoints {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random thought: this code is pretty similar to:

I wonder if it would be worth refactoring to make a generic function in agent.go that accepted a Pod and a function to operate on the Endpoint's?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is similar in that it loops over the endpoints and addresses, but with the protocols being different the implementation is different enough (operating on an interface and route at a time) that I think it may be a bit painful for a potentially short term code reuse benefit. WDYT?

@egernst egernst force-pushed the network-agent-changes branch 2 times, most recently from cf31760 to 37508b0 Compare February 14, 2018 18:48
@egernst
Copy link
Collaborator Author

egernst commented Feb 14, 2018

Tested with latest agent and this PR now includes revendoring for latest gRPC changes. @sboeuf @jodh-intel can you take a look? Would be good to get working networking in...

@sboeuf
Copy link
Collaborator

sboeuf commented Feb 14, 2018

@egernst I'll review it today !

@jodh-intel
Copy link
Collaborator

Yep - would be great to get networking in, once we have tests for it.

@egernst egernst force-pushed the network-agent-changes branch 4 times, most recently from 4a95abb to 96e346e Compare February 16, 2018 06:19
@egernst
Copy link
Collaborator Author

egernst commented Feb 16, 2018

@jodh-intel I ended up doing some refactoring to help make it a bit more testable and then added some basic unit tests. This has also been verified with latest agent. The final looping over the resulting routes (and perhaps interfaces) will likely not be necessary in the future once we move to updating all the routes in a single shot.

PTAL, and thanks for the feedback.

@egernst egernst force-pushed the network-agent-changes branch from 96e346e to 27c0071 Compare February 16, 2018 07:16
Copy link
Collaborator

@sboeuf sboeuf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good !
Only a few comments.

var route grpc.Route
var ipAddresses []*grpc.IpAddress
for _, addr := range endpoint.Properties().Addrs {
// Skip IPv6 because not supported
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might need separate issues for hyperstart_agent.go and kata_agent.go. And actually we might not support it for hyperstart_agent.go.

ifc := grpc.Interface{
IPAddresses: ipAddresses,
Device: endpoint.Name(),
Name: endpoint.Name(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't make any difference between those params (Device and Name), might worth opening an issue on https://github.com/kata-containers/agent/ to remove one of them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks !

kata_agent.go Outdated
//
// Setup network interfaces and routes
//
networkNS, err := pod.storage.fetchPodNetwork(pod.id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This should be part of the pod and we should be able to access those infos from in-memory. Opened an issue here: #628
Will be fixed in another PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ack

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can update this cause you don't need to fetch the pod network anymore since #628 has been solved by #629

interfaces, routes, err := k.generateKataInterfacesAndRoutes(networkNS)
if err != nil {
return err
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a blank line to let the code breathe 😄
And globally, this comment applies to every place in the code where there are not enough blank lines (for readability)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add some air.

kata_agent.go Outdated
"interface requested": fmt.Sprintf("%+v", ifc),
"resulting interface": fmt.Sprintf("%+v", resultingInterface),
}).WithError(err).Error("update interface request failed")
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we fail here ? Have we agreed on ignoring failures here ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on logging the error and continuing to apply remaining interfaces and routes. I think that this is an error, but not necessarily catastrophic -- user can get into the workload and then assess what failed in setup. WDYT?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, only logging without returning this error might be confusing since the user will expect everything to work as expected if he does not check the logs. And he might not understand why he does not get what he expects inside the container.
We could have a retry mechanism (maybe retry 5 times with 100ms between each time), so that we really make sure there is a real issue here, that would worth throwing an error.

Copy link
Collaborator

@sboeuf sboeuf Feb 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@egernst @sboeuf If an interface or a route failed to be set, it's likely that networking will be at least partly dysfunctional, leaving the end-user with a potentially non working pod. By only logging this and not failing we'd give a broken pod to the upper layers of the stack and pretend it's all fine, which is bad from a end user experience perspective. In other words, we should log and fail here and the pod must not be passed upward. It should be stopped at least.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for failing - if we continue, surely the system is in an unknown state (bad)? Turning it round, do we ever think this might fail? If so, why?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Thanks guys.

Copy link
Collaborator Author

@egernst egernst Feb 21, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have stopped with error in the first place, mea culpa. It won't work today as I didn't handle conflicting routes (for the initial interface route in the guest). I have a new PR, kata-containers/agent#148, on the agent that I started (not ready just yet) which will act on a set of routes instead, acting fully declarative. Without this I don't think we can really test networking in a container without bypassing conflict errors (because the initial interface route already exists). This'll simplify our handling quite a bit -- I'll try to get this ready for merge today.

kata_agent.go Outdated
k.Logger().WithFields(logrus.Fields{
"route requested": fmt.Sprintf("%+v", route),
}).WithError(err).Error("update route request failed")
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we fail here ? Have we agreed on ignoring failures here ?

}
ipAddresses = append(ipAddresses, &ipAddress)
}
ifc := grpc.Interface{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do

ifc := &grpc.Interface{

so that later in the code you can do

ifaces = append(ifaces, ifc)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yawn. Sure. :)

@egernst egernst force-pushed the network-agent-changes branch from 27c0071 to d82bb39 Compare February 20, 2018 20:55
@egernst
Copy link
Collaborator Author

egernst commented Feb 20, 2018

@sboeuf -- updated.

kata_agent.go Outdated
return prepareAndStartShim(pod, k.shim, c.id, req.ExecId, k.state.URL, cmd)
}

func (k *kataAgent) generateKataInterfacesAndRoutes(networkNS NetworkNamespace) ([]*grpc.Interface, []*grpc.Route, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Kata part of this function name is redundant: This is a kataAgent method, and it's part of kata_agent.go already.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apparently I really like to push the kata brand, in this case in error. Will update...

kata_agent.go Outdated
}

route.Gateway = r.Gw.String()
if route.Gateway == "<nil>" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to have this <nil> string defined as a constant somewhere.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

@egernst
Copy link
Collaborator Author

egernst commented Feb 22, 2018

see kata-containers/agent#148

@egernst
Copy link
Collaborator Author

egernst commented Feb 22, 2018

waiting for kata-containers/agent#148 to land so I can update the revendoring.

In the meantime, addressed feedback from @sameo and @jodh-intel and @sboeuf .

@egernst egernst force-pushed the network-agent-changes branch 3 times, most recently from 25cb547 to f04e04d Compare February 25, 2018 20:06
Re-vendor agent in order to pull in latest gRPC protocol definitions for
network interfaces and routes.

shortlog:

    3bfa071 CI: add setup.sh and run.sh to execute integration tests
    d41d4e3 grpc: network: remove Add/Update/Remove route support
    e16d5b1 network: add tests for updateRoutes
    614ad59 grpc: network: add UpdateRoutes
    4fc578f agent: Allow to retrieve the process of a stopped container
    bfcf5dd network: add initial tests
    cab528c grpc: initial changes for networking protocol
    27d9f2c agent: run init setup first
    8540821 agent: run init setup first
    17fdd1b log: disable color
    68794dd lint: Remove unused variables and struct fields
    acdb463 log: init logrus always before use it
    ae7887f vendor: prune unneeded code
    8a4d6e4 vendor: fix dep warning
    e51c454 doc: add CONTRIBUTING.md
    8fba14f agent: Fix the amount of bytes transmitted to the shim
    0aef782 CI: Run central go test script
    f3926bb build: add ppc64le support
    976ef0f agent: enable it to be the init process
    54afb09 client: Replace grpc.WithTimeout
    0d8ffbf CI: Require doc team signoff for doc changes
    ee518f8 grpc: Fix not online Mem
    9ed615c CI: switch to central static check script
    cac185a CI: Rename static check script
    5ebcaec CI: Fix gometalinter errors
    fb52144 vendor: Update testify for latest assert

Signed-off-by: Eric Ernst <eric.ernst@intel.com>
@egernst egernst force-pushed the network-agent-changes branch from f04e04d to 2a6e7a0 Compare February 25, 2018 20:22
@egernst egernst force-pushed the network-agent-changes branch from 2a6e7a0 to 7ed8276 Compare February 26, 2018 15:38
@egernst
Copy link
Collaborator Author

egernst commented Feb 26, 2018

@sboeuf PTAL?

Copy link
Collaborator

@sboeuf sboeuf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment but this looks good !

kata_agent.go Outdated
//
// Setup network interfaces and routes
//
networkNS, err := pod.storage.fetchPodNetwork(pod.id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can update this cause you don't need to fetch the pod network anymore since #628 has been solved by #629

@egernst egernst force-pushed the network-agent-changes branch from 7ed8276 to c0f3d04 Compare February 27, 2018 17:00
@egernst
Copy link
Collaborator Author

egernst commented Feb 27, 2018

@jodh-intel @sboeuf - updated based on updated pod structure. PTAL

@sboeuf
Copy link
Collaborator

sboeuf commented Feb 27, 2018

LGTM

Approved with PullApprove Approved with PullApprove

kata_agent.go Outdated
resultingInterface, err := k.sendReq(ifcReq)
if err != nil {
k.Logger().WithFields(logrus.Fields{
"interface requested": fmt.Sprintf("%+v", ifc),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: there is an informal understanding [*] that all log fields should be single-word (so I'd just replace spaces with hyphens here and below).


[*] - yep, we should make expectations formal by writing a doc! ;)

Copy link
Collaborator

@jodh-intel jodh-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@egernst egernst force-pushed the network-agent-changes branch from c0f3d04 to 43c9ed3 Compare February 27, 2018 17:52
Eric Ernst added 2 commits February 27, 2018 09:55
Introduce initial support for setting up interface and routes when
working with Kata agent.

Fixes: containers#596

Signed-off-by: Eric Ernst <eric.ernst@intel.com>
These tests deal with hand-generated netlink structures and check the
resulting agent protocol interface and route structures.

Signed-off-by: Eric Ernst <eric.ernst@intel.com>
@sameo
Copy link
Collaborator

sameo commented Feb 27, 2018

LGTM

Approved with PullApprove Approved with PullApprove

@sameo
Copy link
Collaborator

sameo commented Feb 27, 2018

@sboeuf @egernst Thanks for the PR. Let's merge it once CI passes.

@sameo sameo merged commit 6bccdf6 into containers:master Feb 27, 2018
@sameo sameo removed the in progress label Feb 27, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants