Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix NetworkPolicy optional-vs-empty fields #37289

Closed
wants to merge 4 commits into from

Conversation

danwinship
Copy link
Contributor

@danwinship danwinship commented Nov 22, 2016

Fixes #34641, making sure that JSON NetworkPolicy objects are interpreted in the way the docs say, with unset Rule.Ports and Rule.From being different from zero-length Rule.Ports and Rule.From.

We had been talking about using pointer-to-slice to distinguish "not present" from "zero-length", but we don't actually need that; although len() and append() treat nil the same as a zero-length array, that doesn't mean that we have to, and we just have to be careful to distinguish rule.Ports == nil from len(rule.Ports) == 0 (&& rule.Ports != nil). (This is what is done in the OpenShift example @smarterclayton linked to from #34641 (comment), and at any rate, I couldn't get codegen to work with a pointer-to-slice anyway.)

Needs tests, but I couldn't figure out what APIs I should be using to test JSON/protobuf serialization/deserialization... Can someone point me to another unit test that does that?

@kubernetes/sig-network


This change is Reviewable

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 22, 2016
@k8s-ci-robot
Copy link
Contributor

Jenkins Bazel Build failed for commit cff9b1c41a9062b2f7265cc20c224d1a7d8d119b. Full PR test history.

The magic incantation to run this job again is @k8s-bot bazel test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-github-robot k8s-github-robot added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/design Categorizes issue or PR as related to design. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. release-note-label-needed labels Nov 22, 2016
@smarterclayton
Copy link
Contributor

Serialization is tested by pkg/api/serialization_test.go - double check that you aren't using a custom fuzzer that hardcodes network-peers / network-ports (fuzzer automatically tests nil, empty slice, and populated slice).

@danwinship
Copy link
Contributor Author

Serialization is tested by pkg/api/serialization_test.go

There are no tests for specific nuances of serialization of specific types?

(fuzzer automatically tests nil, empty slice, and populated slice)

That doesn't seem to be true; pkg/api/testing/fuzzer.go:FuzzerFor() calls fuzz.New().NilChance(.5).NumElements(1, 1), so it either generates nil or a 1-element slice, never a 0-element slice.

Changing it to NumElements(0, 2) (not "1", because of google/gofuzz#20) makes it generate all three possibilities though, and the tests still pass. Although, master passes with the same change, which suggests that it's not testing what we want it to test, since we don't expect both nil and [] to round-trip correctly in NetworkPolicyIngressRule in master.

@k8s-ci-robot
Copy link
Contributor

Jenkins GCE Node e2e failed for commit 89a0bccdfd144bd196aa9356c0acccdd577f8b97. Full PR test history.

The magic incantation to run this job again is @k8s-bot node e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@smarterclayton
Copy link
Contributor

Do you have a defaulter in play? Try adding an explicit fuzzer function to
verify.

On Nov 22, 2016, at 12:57 PM, Dan Winship notifications@github.com wrote:

Serialization is tested by pkg/api/serialization_test.go

There are no tests for specific nuances of serialization of specific types?

(fuzzer automatically tests nil, empty slice, and populated slice)

That doesn't seem to be true; pkg/api/testing/fuzzer.go:FuzzerFor()
calls fuzz.New().NilChance(.5).NumElements(1,
1), so it either generates nil or a 1-element slice, never a 0-element
slice.

Changing it to NumElements(0, 2) (not "1", because of google/gofuzz#20
google/gofuzz#20) makes it generate all three
possibilities though, and the tests still pass. Although, master passes
with the same change, which suggests that it's not testing what we want it
to test, since we don't expect both nil and [] to round-trip correctly in
NetworkPolicyIngressRule in master.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#37289 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABG_pxS45i-uV-TB1c7TJHjqQyshL63yks5rAy0NgaJpZM4K5i8v
.

@danwinship
Copy link
Contributor Author

ok, pkg/api/serialization_test.go compares the original and serialized-and-deserialized objects using api.Semantic.DeepEqual(), which uses a forked version of reflect.DeepEqual() that considers nil and [] to be equal. So it does not test that this patch works.

I tried adding extra rules to api.Semantic to fix the comparison of NetworkPolicy values, but that creates an import loop between pkg/api and pkg/apis/extensions...

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 24, 2016
@thockin
Copy link
Member

thockin commented Nov 24, 2016

@danwinship @lavalamp ick. We could do something like this:

  • if semantic.DeepEqual() finds a T.DeepEqual(T) bool method, call that instead of doing it ourselves
  • define a type with this method for these fields

But that feels terrible. Better ideas?

@k8s-github-robot k8s-github-robot added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/design Categorizes issue or PR as related to design. kind/old-docs labels Dec 1, 2016
@k8s-github-robot
Copy link

Adding label:do-not-merge because PR changes docs prohibited to auto merge
See http://kubernetes.io/editdocs/ for information about editing docs

@k8s-github-robot k8s-github-robot added the do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. label Dec 1, 2016
Rather than having the forked DeepEqual code take a list of override
functions, have it dynamically look up override methods on the types
being compared.

FIXME: pkg/runtime/serializer/codec_test.go does not currently pass
due to weirdness with overrides of embedded types
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 1, 2016
@danwinship
Copy link
Contributor Author

We could do something like this:

  • if semantic.DeepEqual() finds a T.DeepEqual(T) bool method, call that instead of doing it ourselves
  • define a type with this method for these fields

But that feels terrible. Better ideas?

Well, I tried that, and it is terrible. I ended up needing to replace all the existing overrides as well, because otherwise you end up with a weird mix of global and non-global overrides. (In particular, NetworkPolicyIngressRule's override of DeepEqual needs to recursively call whatever.DeepEqual on its subparts, but in order to do that it would have to have a copy of the right "whatever", but the forked DeepEqual code can't just pass it its Equalities object because that's a private unexported type. So I just made the existing overrides also work by making the types implement their own semantic DeepEqual methods as well so I could get rid of "Equalities".) Except that it doesn't work because I apparently did it wrong because reflect is weird and the code doesn't handle types with embedded types with DeepEqual overrides correctly (see commit message). Also, to do it right we probably need to override DeepDerivative separately from DeepEqual...

Anyway, mucking about with all of this just highlighted the fact that we're trying to make NetworkPolicy violate a fundamental norm of kubernetes ("nil and empty slice are the same thing in API objects") and makes me wonder if maybe we should just go back to the original plan of using a pointer-to-slice so we can be more explicit about the difference between unset and set-but-zero-length. (There is still some code generation problem with that approach, where some code [IIRC protobufs] is failing to insert a * in a place where it's needed, but that ought to be resolvable.)

OTOH, while also implementing NetworkPolicy in OpenShift in parallel with all this, I'm feeling more like maybe we should have gone with the "don't actually make any distinction between unset and set-but-zero-length [just like everyone else]" approach. The idea that a 0-length Ports or From array means to allow connections to 0 ports or from 0 namespaces is nice and consistent, but it's completely useless...

@k8s-ci-robot
Copy link
Contributor

Jenkins unit/integration failed for commit bafa502. Full PR test history.

The magic incantation to run this job again is @k8s-bot unit test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@k8s-ci-robot
Copy link
Contributor

Jenkins verification failed for commit bafa502. Full PR test history.

The magic incantation to run this job again is @k8s-bot verify test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

@thockin
Copy link
Member

thockin commented Dec 1, 2016

Hmm. If I recall, protobuf was the holdout for pointer-to-slice, wasn't it? Or am I misremembering? Looking at the code that gets generated for pointer-to-slice, there are some obvious problems.

I'm falling into the "cut our losses" camp. We're not in a rush, I guess, but I am somewhat loathe to commit to fixing teh codegen for this very niche case that has relatively low RoI.

@smarterclayton
Copy link
Contributor

smarterclayton commented Dec 1, 2016 via email

@smarterclayton
Copy link
Contributor

smarterclayton commented Dec 1, 2016 via email

@smarterclayton
Copy link
Contributor

smarterclayton commented Dec 1, 2016 via email

@thockin
Copy link
Member

thockin commented Dec 2, 2016 via email

@caseydavenport
Copy link
Member

The idea that a 0-length Ports or From array means to allow connections to 0 ports or from 0 namespaces is nice and consistent, but it's completely useless.

FWIW, I'm also in the "cut our losses" camp.

@danwinship
Copy link
Contributor Author

Do you want to take a look at Clayton's suggestion - it's a little chintzy but not terrible - or should we cut our losses?

Which suggestion? "The way we fixed optional slices was..." or "just do a specific test and verify that nil and empty are the same"?

Fixing just the test case is easy, but api.Semantic.DeepEqual gets used all over the place so if we don't change how it works then it seems likely that some cache or something elsewhere in kubernetes is going to misinterpret our objects. (In the other case I'm aware of that distinguishes nil and 0-length at deserializing time [DeploymentConfigSpec.Triggers in OpenShift], it distinguishes them only at deserializing time: the defaulting func for DeploymentConfigSpec leaves a 0-length Trigger list as-is, but interprets obj.Triggers == nil as meaning "set Triggers to the default list of triggers". After defaulting happens, there is no longer a distinction between nil and empty.)

The other nice thing about dropping the alleged special behavior for 0-length list is that it just preserves the API we already have in 1.4; right now { "spec": {} } and { "spec": { "ports": [] } } both generate the same NetworkPolicy. Changing things would mean that future releases would interpret those objects differently.

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 2, 2016
@k8s-github-robot
Copy link

@danwinship PR needs rebase

@danwinship
Copy link
Contributor Author

@eparis points out on IRC that our currently documented distinction between not-present vs present-but-empty matches the definition of LabelSelectors (which may be why we specified it that way?). So we could also do it in a more LabelSelector-like way, having an optional struct containing a slice, instead of an optional slice, so we'd have something like:

# all ports
spec:
  ingress:
  - {}

# some ports
spec:
  ingress:
  - allowPorts:
      ports: [80, 443]

# no ports
spec:
  ingress:
  - allowPorts:
      ports: []

# also no ports
spec:
  ingress:
  - allowPorts: {}

@thockin
Copy link
Member

thockin commented Dec 2, 2016 via email

@smarterclayton
Copy link
Contributor

smarterclayton commented Dec 2, 2016 via email

@smarterclayton
Copy link
Contributor

smarterclayton commented Dec 2, 2016 via email

@thockin
Copy link
Member

thockin commented Dec 2, 2016 via email

@smarterclayton
Copy link
Contributor

smarterclayton commented Dec 2, 2016 via email

@danwinship
Copy link
Contributor Author

I am running a quick test to see what happens if semantic deepEqual stops conflating nil with []. Only a handful of unit tests fail

make sure to include #37823 in that testing

k8s-github-robot pushed a commit that referenced this pull request Dec 3, 2016
Automatic merge from submit-queue (batch tested with PRs 38049, 37823, 38000, 36646)

Test 0-length-arrays in fuzzing tests

While hacking on #37289 I noticed that our fuzzing tests test nil slices and slices of length 1, but not slices of length 0, meaning we aren't testing that 0-length slices get treated the same as nil in all the places we expect them to (and in particular, we aren't ensuring that comparisons always use api.Semantic.DeepEqual rather than reflect.DeepEqual). (Though in fact, changing the fuzzer didn't turn up any bugs, so maybe this effectively gets tested somewhere else...)

`fuzz.New().NilChance(.5).NumElements(0, 1)` means we end up generating `nil` 50% of the time, a length 0 array 25% of the time, and a length 1 array 25% of the time... maybe it should be `fuzz.New().NilChance(.33).NumElements(0, 1)` instead?

The gofuzz rebase is to pull in google/gofuzz#20, and the other fix is just a drive-by.
@danwinship
Copy link
Contributor Author

I'm not against a struct-with-slice member. At least it is not "tricky". Do you want to prototype that to see if the API is really stinky?

See danwinship@de0c4fc. Though of course this would also require changes in third-party implementations of NetworkPolicy as well. And technically I guess we're not supposed to change the v1beta1 definition but would have to add a new API version, and converters, and the like?

@thockin
Copy link
Member

thockin commented Dec 7, 2016 via email

@caseydavenport
Copy link
Member

what do the implementations think? Is the semantic distinction worth making this change for?

As far as Calico is concerned, it would be easy to support the new semantic distinction. I'd still rather not have to support two code paths just to support a semantic change that AFAICT isn't actually useful and we haven't had anyone complain about.

My guess would be other implementors would feel similarly.

@danwinship
Copy link
Contributor Author

As discussed in the SIG, closing this in favor of #39164 "Move NetworkPolicy to v1"

@danwinship danwinship closed this Dec 22, 2016
@danwinship danwinship deleted the fix-network-policy branch May 30, 2017 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge DEPRECATED. Indicates that a PR should not merge. Label can only be manually applied/removed. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/design Categorizes issue or PR as related to design. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants