Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

endpoint: sync endpoint IP-SecurityID mapping to kvstore #2875

Merged
merged 2 commits into from
Mar 14, 2018

Conversation

ianvernon
Copy link
Member

@ianvernon ianvernon commented Feb 19, 2018

Add controllers within the endpoint that synchronize its IPv4 and IPv6 addresses with the key-value store.

Add a watcher for this mapping in the key-value store, and cache it locally with the new ipcache package. When a new entry is added to this local cache, trigger policy updates for endpoints.

Signed-off by: Ian Vernon ian@cilium.io

Fixes: #2552

Add mapping of endpoint IPs to security identities in the key-value store. Watch the key-value store for updates and cache them locally per agent.

@ianvernon ianvernon added the wip label Feb 19, 2018
@ianvernon ianvernon requested a review from a team February 19, 2018 20:56
@ianvernon ianvernon requested review from a team as code owners February 19, 2018 20:56
@ianvernon ianvernon requested a review from a team February 19, 2018 20:56
@ianvernon
Copy link
Member Author

test-me-please

@ianvernon
Copy link
Member Author

test-me-please

@ianvernon
Copy link
Member Author

test-me-please

@ianvernon ianvernon force-pushed the endpoint-ip-kvstore branch 2 times, most recently from 438f18d to dd6168a Compare February 20, 2018 00:17
@ianvernon ianvernon requested a review from a team February 20, 2018 00:17
@ianvernon ianvernon force-pushed the endpoint-ip-kvstore branch 2 times, most recently from 6fe770a to 6483d56 Compare February 20, 2018 00:21
@ianvernon
Copy link
Member Author

test-me-please

}
return nil
},
RunInterval: time.Duration(1) * time.Minute,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 * time.Minute is more readable. There's no need to convert the constant.

// NewIPCache returns a new IPCache with the mappings of endpoint IP to security
// identity (and vice-versa) initialized.
// TODO (ianvernon) - populate proxyMutator here / add parameter for it.
func NewIPCache() *IPCache {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass in the xds.ResourceMutator.

identityToIPCache map[NumericIdentity]map[string]struct{}
// TODO (ianvernon)
proxyMutator xds.ResourceMutator
proxyMutatorTypeURL string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace this field with the xds.NetworkPolicyHostsTypeURL constant. That doesn't need to be configurable.

// TODO (ianvernon) - mapping of security identity to set of IPs.
identityToIPCache map[NumericIdentity]map[string]struct{}
// TODO (ianvernon)
proxyMutator xds.ResourceMutator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proxyMutator -> xdsResourceMutator

// IPIdentityCache caches the mapping of endpoint IPs to their corresponding
// security identities across the entire cluster in which this instance of
// Cilium is running.
IPIdentityCache = NewIPCache()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass in the xds.ResourceMutator, after FIXME is merged:

IPIdentityCache = NewIPCache(xds.NetworkPolicyHostsCache)

@@ -28,6 +28,11 @@ import (
"github.com/cilium/cilium/pkg/logging/logfields"
"github.com/cilium/cilium/pkg/u8proto"

"encoding/json"
// TODO - uncomment this when we actually update the proxyMutator for IP Cache
//"github.com/cilium/cilium/pkg/envoy/api"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, that import path will change with #2781.

// IPIdentityCache caches the mapping of endpoint IPs to their corresponding
// security identities across the entire cluster in which this instance of
// Cilium is running.
IPIdentityCache = NewIPCache()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pass in the cache, which is defined in #2879:

IPIdentityCache = NewIPCache(envoy.NetworkPolicyHostsCache)

"encoding/json"
// TODO - uncomment this when we actually update the proxyMutator for IP Cache
//"github.com/cilium/cilium/pkg/envoy/api"
"github.com/cilium/cilium/pkg/envoy/xds"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll also need to import "github.com/cilium/cilium/pkg/envoy" to refer to the cache.

@ianvernon
Copy link
Member Author

test-me-please

ianvernon pushed a commit that referenced this pull request Feb 20, 2018
Factor out code from policy package related to identity allocation, types, etc.
into a separate package. This was motivated by cyclic import issues faced in
PR #2875. Update code to use this package accordingly. No change in
functionality should occur as part of this commit.

Signed-off by: Ian Vernon <ian@cilium.io>
@ianvernon
Copy link
Member Author

This is blocked by #2890, which was created to resolve cyclic import issues.

ianvernon pushed a commit that referenced this pull request Feb 21, 2018
Factor out code from policy package related to identity allocation, types, etc.
into a separate package. This was motivated by cyclic import issues faced in
PR #2875. Update code to use this package accordingly. No change in
functionality should occur as part of this commit.

Signed-off by: Ian Vernon <ian@cilium.io>
ianvernon pushed a commit that referenced this pull request Feb 21, 2018
Factor out code from policy package related to identity allocation, types, etc.
into a separate package. This was motivated by cyclic import issues faced in
PR #2875. Update code to use this package accordingly. No change in
functionality should occur as part of this commit.

Signed-off by: Ian Vernon <ian@cilium.io>
@cilium cilium deleted a comment from houndci-bot Mar 10, 2018
@cilium cilium deleted a comment from houndci-bot Mar 10, 2018
@cilium cilium deleted a comment from houndci-bot Mar 10, 2018
@cilium cilium deleted a comment from houndci-bot Mar 10, 2018
@cilium cilium deleted a comment from houndci-bot Mar 10, 2018
@cilium cilium deleted a comment from houndci-bot Mar 10, 2018
@cilium cilium deleted a comment from houndci-bot Mar 10, 2018
@cilium cilium deleted a comment from houndci-bot Mar 10, 2018
@cilium cilium deleted a comment from houndci-bot Mar 10, 2018
@cilium cilium deleted a comment from houndci-bot Mar 10, 2018
@cilium cilium deleted a comment from houndci-bot Mar 10, 2018
@ianvernon ianvernon requested a review from tgraf March 10, 2018 03:51
return nil
},
StopFunc: func() error {
if err := kvstore.Delete(ipKey); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to your PR directly but StopFunc() is not being retried. Your code is an example of where it would make sense to retry for at least a bit. I'm still a bit undecided as the lease will cause it to be removed if the agent reconnects and creates a new lease.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the lease was created on a per-key basis? I.e., if the entry fails to be deleted, and there is no such endpoint once the agent reconnects, after the lease expires, the key-value pair will get deleted from the key-value store. Is this assumption incorrect?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lease is created per client and attached to all keys which request it. Lease renewal thus applies to all keys of an agent right now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it follow our controller pattern to then create a new controller on stop to handle the delete? We've been treating the controllers as both periodic functions and long lived "operations". In any case, we can explicitly retry in StopFunc, since it is called by the goroutine dedicated to this controller, can't we?

@ianvernon
Copy link
Member Author

@rlenglet can you give a pass at this please to confirm that the XDS cache work is done correctly?

Copy link
Member

@joestringer joestringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

common/ changes LGTM. Not sure why bpf-loader or cli owners are being pulled in.

//
// WARNING - STABLE API
// This structure is written as JSON to the key-value store. Do NOT modify this
// structure in ways which are not JSON forward compatible.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mind adding field tags for the JSON formatting?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that, done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this say" in ways which are not JSON forward and backward compatible"? We would, in any case, roll the version path component for the keys in ipcache when we changed this field, wouldn't we? If we don't, the change would have to be backwards compatible to continue working with old nodes (and forwards compatibility wouldn't matter).
In any case, mentioning that we have version variables that might also need to change might be worthwhile (assuming what I said above is correct).

@ianvernon
Copy link
Member Author

test-me-please

Copy link
Contributor

@raybejjani raybejjani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nothing blocking :)

func (e *Endpoint) runIPIdentitySync(endpointIP addressing.CiliumIP) {

if endpointIP == nil {
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably log a debug that the controller isn't running, at least (although if someone is calling this with nil it would be a little odd).

@@ -923,6 +931,74 @@ func (e *Endpoint) runIdentityToK8sPodSync() {
)
}

// FormatGlobalEndpointID returns the global ID of endpoint in the format
// / <global ID Prefix>:<cluster name>:<node name>:<endpoint ID> as a string.
func (e *Endpoint) FormatGlobalEndpointID() string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect to use this again elsewhere? or is this a function because node.GetLocalNode() returns an *node.Node that we need to drop?


// Release lock as we do not want to have long-lasting key-value
// store operations resulting in lock being held for a long time.
e.Mutex.RUnlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be ok if, at the start of DoFunc, you took the lock, grabbed copies of all the relevant things, and released the lock without any branches? I think it's basically e.state, e.SecurityIdentity.ID and e.ID (via FormatGlobalEndpointID that we need to the lock to read. This is mostly to remove the need to reason about the lock while reading this code.

return nil
},
StopFunc: func() error {
if err := kvstore.Delete(ipKey); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it follow our controller pattern to then create a new controller on stop to handle the delete? We've been treating the controllers as both periodic functions and long lived "operations". In any case, we can explicitly retry in StopFunc, since it is called by the goroutine dedicated to this controller, can't we?

@@ -913,6 +983,11 @@ func (e *Endpoint) SetIdentity(owner Owner, identity *identityPkg.Identity) {

e.runIdentityToK8sPodSync()

// Whenever the identity is updated, propagate change to key-value store
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more general question: Does this mean that we rely on the kvstore access controls to stop someone/something messing the IP->Security Identity mapping (and thus, the egress security enforcement)?
More specifically, we rely on the v1 part of IPIdentitiesPath to avoid collisions during upgrades between cilium versions that change this implementation, right? If so, do we need to guard against misconfigurations anywhere? The most likely example I can think of would be multiple clusters using the same IP range and the same etcd.

//
// WARNING - STABLE API
// This structure is written as JSON to the key-value store. Do NOT modify this
// structure in ways which are not JSON forward compatible.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this say" in ways which are not JSON forward and backward compatible"? We would, in any case, roll the version path component for the keys in ipcache when we changed this field, wouldn't we? If we don't, the change would have to be backwards compatible to continue working with old nodes (and forwards compatibility wouldn't matter).
In any case, mentioning that we have version variables that might also need to change might be worthwhile (assuming what I said above is correct).

// GetIPIdentityMapModel returns all known endpoint IP to security identity mappings
// stored in the key-value store.
func GetIPIdentityMapModel() {
// TODO (ianvernon) return model of ip to identity mapping. For use in CLI.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was discussed earlier in the PRs life but I'm not sure if I'm missing something. If we have an issue to implement this why do we need the stub? From what I can tell it isn't actually used anywhere.

ipStrings = append(ipStrings, endpointIP)
}
sort.Strings(ipStrings)
envoy.NetworkPolicyHostsCache.Upsert(envoy.NetworkPolicyHostsTypeURL, ipIDPair.ID.StringID(), &envoyAPI.NetworkPolicyHosts{Policy: uint64(ipIDPair.ID), HostAddresses: ipStrings}, false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative is to define a callback type in this package, and have others register conforming callbacks. This has the advantage that the dependency direction reflects the flow of data (in that the envoy package knows about ipcache and is using it, so it imports it), instead of a push scheme like we have now. It also allows this attachment to happen in a third package, allowing generic types to be wired together without the type system needing to know.
If we intended to have a global host cache then what I said is less useful, but it still retains the niceties around reflecting which direction data is flowing.

"github.com/cilium/cilium/pkg/policy"
"github.com/cilium/cilium/pkg/policy/api"

"github.com/sirupsen/logrus"
"strconv"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this above.

// This structure is written as JSON to the key-value store. Do NOT modify this
// structure in ways which are not JSON forward compatible.
type IPIdentityPair struct {
IP net.IP `json:"IP"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually start JSON field names with lowercase, instead of using the Go capitalization. Not a big deal though.


// IPIdentityMappingOwner is the interface the owner of an identity allocator
// must implement
type IPIdentityMappingOwner interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename ...Owner into ...Observer?

cacheChanged = true

// Update XDS Cache as well.
ipStrings := make([]string, 0, len(endpointIPs))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you call IPIdentityCache.LookupByIdentity before calling IPIdentityCache.upsert? Aren't you missing the newly added IP?

case kvstore.EventTypeDelete:
if exists {
// Delete from XDS Cache as well.
envoy.NetworkPolicyHostsCache.Delete(envoy.NetworkPolicyHostsTypeURL, cachedIdentity.StringID(), false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not correct. You only deleted one IP-ID pair. Not the whole identity.
You should delete form this cache only if there are no IPs left associated with an identity.
Otherwise, you must call Upsert with the set resulting from removing the IP in the deleted pair.

@ianvernon
Copy link
Member Author

test-me-please

Ian Vernon added 2 commits March 13, 2018 14:56
Whenever an endpoint is assigned a new security identity, update the key-value
store with a mapping of the endpoint's IP(s) to its newly assigned identity.
Add a new cache which caches this information from the key-value store locally.

Signed-off by: Ian Vernon <ian@cilium.io>
Signed-off by: Ian Vernon <ian@cilium.io>
@ianvernon
Copy link
Member Author

test-me-please

@ianvernon ianvernon merged commit 2cc32fd into master Mar 14, 2018
@ianvernon ianvernon deleted the endpoint-ip-kvstore branch March 14, 2018 01:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/daemon Impacts operation of the Cilium daemon. release-note/major This PR introduces major new functionality to Cilium. sig/k8s Impacts the kubernetes API, or kubernetes -> cilium internals translation layers.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants