-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
endpoint: sync endpoint IP-SecurityID mapping to kvstore #2875
Conversation
test-me-please |
caa1ecd
to
9247b01
Compare
test-me-please |
9247b01
to
286f076
Compare
test-me-please |
438f18d
to
dd6168a
Compare
6fe770a
to
6483d56
Compare
test-me-please |
pkg/endpoint/policy.go
Outdated
} | ||
return nil | ||
}, | ||
RunInterval: time.Duration(1) * time.Minute, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 * time.Minute
is more readable. There's no need to convert the constant.
pkg/policy/identity.go
Outdated
// NewIPCache returns a new IPCache with the mappings of endpoint IP to security | ||
// identity (and vice-versa) initialized. | ||
// TODO (ianvernon) - populate proxyMutator here / add parameter for it. | ||
func NewIPCache() *IPCache { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pass in the xds.ResourceMutator
.
pkg/policy/identity.go
Outdated
identityToIPCache map[NumericIdentity]map[string]struct{} | ||
// TODO (ianvernon) | ||
proxyMutator xds.ResourceMutator | ||
proxyMutatorTypeURL string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace this field with the xds.NetworkPolicyHostsTypeURL
constant. That doesn't need to be configurable.
pkg/policy/identity.go
Outdated
// TODO (ianvernon) - mapping of security identity to set of IPs. | ||
identityToIPCache map[NumericIdentity]map[string]struct{} | ||
// TODO (ianvernon) | ||
proxyMutator xds.ResourceMutator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
proxyMutator
-> xdsResourceMutator
pkg/policy/identity.go
Outdated
// IPIdentityCache caches the mapping of endpoint IPs to their corresponding | ||
// security identities across the entire cluster in which this instance of | ||
// Cilium is running. | ||
IPIdentityCache = NewIPCache() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pass in the xds.ResourceMutator
, after FIXME is merged:
IPIdentityCache = NewIPCache(xds.NetworkPolicyHostsCache)
pkg/policy/identity.go
Outdated
@@ -28,6 +28,11 @@ import ( | |||
"github.com/cilium/cilium/pkg/logging/logfields" | |||
"github.com/cilium/cilium/pkg/u8proto" | |||
|
|||
"encoding/json" | |||
// TODO - uncomment this when we actually update the proxyMutator for IP Cache | |||
//"github.com/cilium/cilium/pkg/envoy/api" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, that import path will change with #2781.
pkg/policy/identity.go
Outdated
// IPIdentityCache caches the mapping of endpoint IPs to their corresponding | ||
// security identities across the entire cluster in which this instance of | ||
// Cilium is running. | ||
IPIdentityCache = NewIPCache() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pass in the cache, which is defined in #2879:
IPIdentityCache = NewIPCache(envoy.NetworkPolicyHostsCache)
pkg/policy/identity.go
Outdated
"encoding/json" | ||
// TODO - uncomment this when we actually update the proxyMutator for IP Cache | ||
//"github.com/cilium/cilium/pkg/envoy/api" | ||
"github.com/cilium/cilium/pkg/envoy/xds" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll also need to import "github.com/cilium/cilium/pkg/envoy"
to refer to the cache.
6483d56
to
a3eb5e7
Compare
test-me-please |
Factor out code from policy package related to identity allocation, types, etc. into a separate package. This was motivated by cyclic import issues faced in PR #2875. Update code to use this package accordingly. No change in functionality should occur as part of this commit. Signed-off by: Ian Vernon <ian@cilium.io>
This is blocked by #2890, which was created to resolve cyclic import issues. |
Factor out code from policy package related to identity allocation, types, etc. into a separate package. This was motivated by cyclic import issues faced in PR #2875. Update code to use this package accordingly. No change in functionality should occur as part of this commit. Signed-off by: Ian Vernon <ian@cilium.io>
Factor out code from policy package related to identity allocation, types, etc. into a separate package. This was motivated by cyclic import issues faced in PR #2875. Update code to use this package accordingly. No change in functionality should occur as part of this commit. Signed-off by: Ian Vernon <ian@cilium.io>
a3eb5e7
to
fe9b234
Compare
return nil | ||
}, | ||
StopFunc: func() error { | ||
if err := kvstore.Delete(ipKey); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not related to your PR directly but StopFunc()
is not being retried. Your code is an example of where it would make sense to retry for at least a bit. I'm still a bit undecided as the lease will cause it to be removed if the agent reconnects and creates a new lease.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the lease was created on a per-key basis? I.e., if the entry fails to be deleted, and there is no such endpoint once the agent reconnects, after the lease expires, the key-value pair will get deleted from the key-value store. Is this assumption incorrect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lease is created per client and attached to all keys which request it. Lease renewal thus applies to all keys of an agent right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it follow our controller pattern to then create a new controller on stop to handle the delete? We've been treating the controllers as both periodic functions and long lived "operations". In any case, we can explicitly retry in StopFunc, since it is called by the goroutine dedicated to this controller, can't we?
@rlenglet can you give a pass at this please to confirm that the XDS cache work is done correctly? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
common/
changes LGTM. Not sure why bpf-loader or cli owners are being pulled in.
// | ||
// WARNING - STABLE API | ||
// This structure is written as JSON to the key-value store. Do NOT modify this | ||
// structure in ways which are not JSON forward compatible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mind adding field tags for the JSON formatting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about that, done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this say" in ways which are not JSON forward and backward compatible"? We would, in any case, roll the version path component for the keys in ipcache when we changed this field, wouldn't we? If we don't, the change would have to be backwards compatible to continue working with old nodes (and forwards compatibility wouldn't matter).
In any case, mentioning that we have version variables that might also need to change might be worthwhile (assuming what I said above is correct).
ff7e4bc
to
b476e48
Compare
test-me-please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nothing blocking :)
func (e *Endpoint) runIPIdentitySync(endpointIP addressing.CiliumIP) { | ||
|
||
if endpointIP == nil { | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably log a debug that the controller isn't running, at least (although if someone is calling this with nil it would be a little odd).
@@ -923,6 +931,74 @@ func (e *Endpoint) runIdentityToK8sPodSync() { | |||
) | |||
} | |||
|
|||
// FormatGlobalEndpointID returns the global ID of endpoint in the format | |||
// / <global ID Prefix>:<cluster name>:<node name>:<endpoint ID> as a string. | |||
func (e *Endpoint) FormatGlobalEndpointID() string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect to use this again elsewhere? or is this a function because node.GetLocalNode()
returns an *node.Node
that we need to drop?
|
||
// Release lock as we do not want to have long-lasting key-value | ||
// store operations resulting in lock being held for a long time. | ||
e.Mutex.RUnlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be ok if, at the start of DoFunc, you took the lock, grabbed copies of all the relevant things, and released the lock without any branches? I think it's basically e.state
, e.SecurityIdentity.ID
and e.ID
(via FormatGlobalEndpointID
that we need to the lock to read. This is mostly to remove the need to reason about the lock while reading this code.
return nil | ||
}, | ||
StopFunc: func() error { | ||
if err := kvstore.Delete(ipKey); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it follow our controller pattern to then create a new controller on stop to handle the delete? We've been treating the controllers as both periodic functions and long lived "operations". In any case, we can explicitly retry in StopFunc, since it is called by the goroutine dedicated to this controller, can't we?
@@ -913,6 +983,11 @@ func (e *Endpoint) SetIdentity(owner Owner, identity *identityPkg.Identity) { | |||
|
|||
e.runIdentityToK8sPodSync() | |||
|
|||
// Whenever the identity is updated, propagate change to key-value store |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A more general question: Does this mean that we rely on the kvstore access controls to stop someone/something messing the IP->Security Identity mapping (and thus, the egress security enforcement)?
More specifically, we rely on the v1
part of IPIdentitiesPath
to avoid collisions during upgrades between cilium versions that change this implementation, right? If so, do we need to guard against misconfigurations anywhere? The most likely example I can think of would be multiple clusters using the same IP range and the same etcd.
// | ||
// WARNING - STABLE API | ||
// This structure is written as JSON to the key-value store. Do NOT modify this | ||
// structure in ways which are not JSON forward compatible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this say" in ways which are not JSON forward and backward compatible"? We would, in any case, roll the version path component for the keys in ipcache when we changed this field, wouldn't we? If we don't, the change would have to be backwards compatible to continue working with old nodes (and forwards compatibility wouldn't matter).
In any case, mentioning that we have version variables that might also need to change might be worthwhile (assuming what I said above is correct).
// GetIPIdentityMapModel returns all known endpoint IP to security identity mappings | ||
// stored in the key-value store. | ||
func GetIPIdentityMapModel() { | ||
// TODO (ianvernon) return model of ip to identity mapping. For use in CLI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was discussed earlier in the PRs life but I'm not sure if I'm missing something. If we have an issue to implement this why do we need the stub? From what I can tell it isn't actually used anywhere.
ipStrings = append(ipStrings, endpointIP) | ||
} | ||
sort.Strings(ipStrings) | ||
envoy.NetworkPolicyHostsCache.Upsert(envoy.NetworkPolicyHostsTypeURL, ipIDPair.ID.StringID(), &envoyAPI.NetworkPolicyHosts{Policy: uint64(ipIDPair.ID), HostAddresses: ipStrings}, false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative is to define a callback type in this package, and have others register conforming callbacks. This has the advantage that the dependency direction reflects the flow of data (in that the envoy package knows about ipcache and is using it, so it imports it), instead of a push scheme like we have now. It also allows this attachment to happen in a third package, allowing generic types to be wired together without the type system needing to know.
If we intended to have a global host cache then what I said is less useful, but it still retains the niceties around reflecting which direction data is flowing.
pkg/endpoint/policy.go
Outdated
"github.com/cilium/cilium/pkg/policy" | ||
"github.com/cilium/cilium/pkg/policy/api" | ||
|
||
"github.com/sirupsen/logrus" | ||
"strconv" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this above.
// This structure is written as JSON to the key-value store. Do NOT modify this | ||
// structure in ways which are not JSON forward compatible. | ||
type IPIdentityPair struct { | ||
IP net.IP `json:"IP"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually start JSON field names with lowercase, instead of using the Go capitalization. Not a big deal though.
|
||
// IPIdentityMappingOwner is the interface the owner of an identity allocator | ||
// must implement | ||
type IPIdentityMappingOwner interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename ...Owner into ...Observer?
pkg/ipcache/ipcache.go
Outdated
cacheChanged = true | ||
|
||
// Update XDS Cache as well. | ||
ipStrings := make([]string, 0, len(endpointIPs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you call IPIdentityCache.LookupByIdentity
before calling IPIdentityCache.upsert
? Aren't you missing the newly added IP?
pkg/ipcache/ipcache.go
Outdated
case kvstore.EventTypeDelete: | ||
if exists { | ||
// Delete from XDS Cache as well. | ||
envoy.NetworkPolicyHostsCache.Delete(envoy.NetworkPolicyHostsTypeURL, cachedIdentity.StringID(), false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not correct. You only deleted one IP-ID pair. Not the whole identity.
You should delete form this cache only if there are no IPs left associated with an identity.
Otherwise, you must call Upsert with the set resulting from removing the IP in the deleted pair.
b476e48
to
a34e2a3
Compare
test-me-please |
Whenever an endpoint is assigned a new security identity, update the key-value store with a mapping of the endpoint's IP(s) to its newly assigned identity. Add a new cache which caches this information from the key-value store locally. Signed-off by: Ian Vernon <ian@cilium.io>
Signed-off by: Ian Vernon <ian@cilium.io>
a34e2a3
to
a827153
Compare
test-me-please |
Add controllers within the endpoint that synchronize its IPv4 and IPv6 addresses with the key-value store.
Add a watcher for this mapping in the key-value store, and cache it locally with the new ipcache package. When a new entry is added to this local cache, trigger policy updates for endpoints.
Signed-off by: Ian Vernon ian@cilium.io
Fixes: #2552