Abstract iptables in proxier.go #7032

BenTheElder · 2015-04-19T07:33:06Z

Abstract opening/closing/deleting all portals to a ProxyManager
interface so we can later choose between iptables and pf.

See: #2170; in particular: #2170 (comment)

BenTheElder · 2015-04-19T07:34:29Z

For previous conversation/PR see: #5961

(I cleaned up the commit history, moved it to a branch and brought it up to date with master, so I've moved the PR to here form the previous one.)

kelseyhightower · 2015-04-19T15:08:46Z

pkg/proxy/iptables_portalmanager.go

+limitations under the License.
+*/
+
+package proxy


If there are no dependancies on the proxy package should we consider moving this to a new portal package?

pkg/proxy/portalmanager

Then we cloud move this under

pkg/proxy/portalmanager/iptables/

There are some depends. There's The ServicePortName type from loadbalancer.go and there's Proxier itself which I think is just used for getting the listen and host ip addresses so It would be pretty trivial to move it to be in another package but I'd need to know where to move ServicePortName to.

Perhaps ServicePortName could go in /pkg/types ?
I'm unsure or where it belongs if not in proxy.

brendandburns · 2015-04-20T20:50:22Z

Generally LGTM, but this has to get run through e2e before we submit it.

BenTheElder · 2015-04-20T22:18:00Z

Any idea why shippable just failed?
(I logged in with my github but it's not loading the build information for me)
I only added comment lines...

ghost · 2015-04-20T22:42:09Z

Shippable is currently unreliable as a build signal. But you definitely need to run the e2e tests successfully. Please paste the results of a successful e2e run here.

BenTheElder · 2015-04-21T00:28:41Z

First attempt: https://gist.github.com/BenTheElder/a9d6de168ef858a7fb3c

BenTheElder · 2015-04-21T00:47:57Z

It seems to run fine but then after a bit it goes to start another cluster and fails. this is probably because i'm calling it via sudo, i needed to fix my docker permissions. will post back with another attempt later.

BenTheElder · 2015-04-21T01:39:48Z

I've been fiddling around with running it, vagrant up / starting the e2e cluster seems to have timeout errors. I'm not very familiar with vagrant or the e2e tests but here's a paste of my latest attempt:
sudo env "PATH=$PATH" "KUBERNETES_PROVIDER=vagrant" sh -c 'hack/e2e-test.sh > /e2eout2.txt 2>&1' results in:
https://gist.github.com/BenTheElder/f8eb6a503175916e68a8

BenTheElder · 2015-04-21T20:59:52Z

Going to do some more debug later, but i seem to be having issues with cluster/kube-up.sh
It gives me the following at the end:

Waiting for each minion to be registered with cloud provider
current-context: "vagrant"
Running: ./cluster/../cluster/vagrant/../../cluster/../cluster/vagrant/../../_output/dockerized/bin/linux/amd64/kubectl --kubeconfig=/root/.kubernetes_vagrant_kubeconfig get minions -o template -t {{range.items}}{{.id}}:{{end}}
F0421 15:45:20.871726   14884 get.go:67] Client error processing command: Get https://10.245.1.2:443/api/v1beta1/minions: dial tcp 10.245.1.2:443: i/o timeout

Since this fails, once I start the e2e tests then partway through it goes to start up another cluster it seems and gets this error, stopping the test. I'll try to debug it later, I'm not sure what exactly is up...

BenTheElder · 2015-04-21T22:14:07Z

Got a little further. I killed all of the virtualbox vms and started over.
From kubernetes dir:

su
cd <kubernetes dir>
export KUBERNETES_PROVIDER=vagrant
cluster/kube-up
hack/e2e-test.sh > ~/e2eout.txt 2>&1

http://paste.fedoraproject.org/214087/54093142/

errors with:

Running: /home/bentheelder/code/kubernetes/src/github.com/GoogleCloudPlatform/kubernetes/cluster/../cluster/vagrant/../../cluster/vagrant/../../_output/dockerized/bin/linux/amd64/kubectl --kubeconfig=/root/.kubernetes_vagrant_kubeconfig --match-server-version get pods --template={{range.items}}{{.currentState.status}} {{end}}
/home/bentheelder/code/kubernetes/src/github.com/GoogleCloudPlatform/kubernetes/hack/e2e-internal/e2e-status.sh: line 64: statuses[@]: unbound variable
2015/04/21 18:01:30 Error running get status: exit status 1
2015/04/21 18:01:30 Testing requested, but e2e cluster not up!
exit status 1

BenTheElder · 2015-04-25T01:56:14Z

Had a little spare time and got vagrant setup and ran the tests again on my macbook, got the same results and eventual error:

 Vagrant doesn't need special preparations for e2e tests
F0424 21:48:36.084088   12114 command.go:442] open /Users/benjamin/.kubernetes_vagrant_kubeconfig: no such file or directory
current-context: ""
Running: /Users/benjamin/code/kubernetes/src/github.com/GoogleCloudPlatform/kubernetes/cluster/../cluster/vagrant/../../cluster/vagrant/../../_output/dockerized/bin/darwin/amd64/kubectl --kubeconfig=/Users/benjamin/.kubernetes_vagrant_kubeconfig --match-server-version version
Error: stat /Users/benjamin/.kubernetes_vagrant_kubeconfig: no such file or directory
2015/04/24 21:48:36 Error running get status: exit status 1
2015/04/24 21:48:36 Running: up
Vagrant doesn't need special preparations for e2e tests
Starting cluster using provider: vagrant
... calling verify-prereqs
... calling kube-up
Using credentials: vagrant:vagrant
Bringing machine 'master' up with 'virtualbox' provider...
Bringing machine 'minion-1' up with 'virtualbox' provider...
==> master: VirtualBox VM is already running.
==> minion-1: VirtualBox VM is already running.
2015/04/24 21:48:43 Error running up: exit status 1
2015/04/24 21:48:43 Error starting e2e cluster. Aborting.
exit status 1

The error seems to be unrelated to the code though, it looks like it's bugging out trying to setup the vagrant cluster even though the cluster is already up.

derekwaynecarr · 2015-04-25T02:05:14Z

Right now e2e does not tear down a cluster when it's done, so it's not re-entrant

Sent from my iPhone

On Apr 24, 2015, at 9:56 PM, Benjamin Elder notifications@github.com wrote:

Had a little spare time and got vagrant setup and ran the tests again on my macbook, got the same results and eventual error:

Vagrant doesn't need special preparations for e2e tests
F0424 21:48:36.084088 12114 command.go:442] open /Users/benjamin/.kubernetes_vagrant_kubeconfig: no such file or directory
current-context: ""
Running: /Users/benjamin/code/kubernetes/src/github.com/GoogleCloudPlatform/kubernetes/cluster/../cluster/vagrant/../../cluster/vagrant/../../_output/dockerized/bin/darwin/amd64/kubectl --kubeconfig=/Users/benjamin/.kubernetes_vagrant_kubeconfig --match-server-version version
Error: stat /Users/benjamin/.kubernetes_vagrant_kubeconfig: no such file or directory
2015/04/24 21:48:36 Error running get status: exit status 1
2015/04/24 21:48:36 Running: up
Vagrant doesn't need special preparations for e2e tests
Starting cluster using provider: vagrant
... calling verify-prereqs
... calling kube-up
Using credentials: vagrant:vagrant
Bringing machine 'master' up with 'virtualbox' provider...
Bringing machine 'minion-1' up with 'virtualbox' provider...
==> master: VirtualBox VM is already running.
==> minion-1: VirtualBox VM is already running.
2015/04/24 21:48:43 Error running up: exit status 1
2015/04/24 21:48:43 Error starting e2e cluster. Aborting.
exit status 1
The error seems to be unrelated to the code though, it looks like it's bugging out trying to setup the vagrant cluster even though the cluster is already up.

—
Reply to this email directly or view it on GitHub.

BenTheElder · 2015-04-25T02:07:55Z

@derekwaynecarr so what is the correct way to run the tests?
Are they just broken on vagrant currently or Is there a workaround?
If not I'd be willing to try to fix it after my finals.

BenTheElder · 2015-05-04T01:46:35Z

Success! I'll be debugging exactly what was needed to get it working in a moment (it looks like I just needed to reinstall vagrant as well as set NUM_MINIONS=2 prior to running the tests to get them working on vagrant.

I've gotten it running on my laptop both on 86751e8 and then on BenTheElder@064dbf9.

Full results on BenTheElder@064dbf9: https://gist.github.com/BenTheElder/e64f8805012d367759ce
Short version:

Summarizing 4 Failures:

[Fail] Services [It] should correctly serve identically named services in different namespaces on different external IP addresses 
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/service.go:389

[Fail] Density [It] [Performance suite] should allow starting 30 pods per node 
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/density.go:200

[Fail] Events [It] should be sent by kubelets and the scheduler about pods scheduling and running 
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/events.go:122

[Fail] Services [It] should be able to create a functioning external load balancer 
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/service.go:293

Ran 34 of 39 Specs in 1033.850 seconds
FAIL! -- 30 Passed | 4 Failed | 1 Pending | 4 Skipped F0503 21:25:02.427307    3838 driver.go:94] At least one test failed

Probably not necessary but;
Full results on 86751e8: https://gist.github.com/BenTheElder/d570873ed9a5e2990d81
Short Version:

Summarizing 8 Failures:

[Fail] Density [It] [Performance suite] should allow starting 30 pods per node 
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/density.go:136

[Fail] Pods [It] should contain environment variables for services 
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/util.go:374

[Timeout...] Services [It] should serve a basic endpoint from pods 
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/service.go:256

[Fail] ReplicationController [It] should serve a basic image on each replica with a public image 
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/rc.go:137

[Fail] Density [It] [Performance suite] should allow starting 3 pods per node 
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/density.go:136

[Fail] kubectl update-demo [It] should scale a replication controller 
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/util.go:304

[Fail] Pods [It] should be restarted with a docker exec "cat /tmp/health" liveness probe 
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/pods.go:54

[Fail] kubectl update-demo [It] should create and stop a replication controller 
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/util.go:304

Ran 37 of 42 Specs in 2198.465 seconds
FAIL! -- 29 Passed | 8 Failed | 1 Pending | 4 Skipped F0503 19:58:03.207184    5230 driver.go:94] At least one test failed

derekwaynecarr · 2015-05-04T14:41:41Z

I have not had an opportunity to review your PR in detail, but feel free to open an issue if you see those failures on a vanilla Vagrant cluster and I will look to try and get them resolved.

BenTheElder · 2015-05-04T15:23:23Z

@derekwaynecarr I'll go do that.

Edit: I'm running hack/e2e-test.sh on c1d1f95 to ensure that those errors persist first.

Abstract opening/closing/deleting all portals to a ProxyManager interface so we can later choose between iptables and pf. See: #2170; in particular: #2170 (comment)

BenTheElder · 2015-05-06T01:17:11Z

After rebasing this up to 1b7749b I have:

Summarizing 1 Failure:

[Fail] Events [It] should be sent by kubelets and the scheduler about pods scheduling and running 
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/events.go:122

Ran 38 of 43 Specs in 973.040 seconds
FAIL! -- 37 Passed | 1 Failed | 1 Pending | 4 Skipped F0505 21:12:09.015275   15348 driver.go:94] At least one test failed

I'm not sure why this is failing; but vagrant as a provider is working smoother on fedora at least; and i'm caught up to the latest changes to proxy.

BenTheElder · 2015-05-06T18:44:34Z

I've been working on getting vagrant to run smoother, and I think I've gotten satisfactory results with e2e on vangrat, after switching to my desktop and deleting all old virtual boxes hack/e2e-test.sh gives:

Summarizing 1 Failure:

[Fail] Density [It] [Performance suite] should allow starting 30 pods per node 
/go/src/github.com/GoogleCloudPlatform/kubernetes/_output/dockerized/go/src/github.com/GoogleCloudPlatform/kubernetes/test/e2e/density.go:135

Ran 38 of 43 Specs in 698.795 seconds
FAIL! -- 37 Passed | 1 Failed | 1 Pending | 4 Skipped F0506 14:39:27.966103   12674 driver.go:94] At least one test failed

It looks like there are just performance issues related to running under virtualbox, no problems with the code here.

See #7719 for discussion about vagrant and e2e.

Create a ProxierIPs struct for embedding in Proxier to contain the listenIP and hostIP fields for passing to PortalManager.

BenTheElder · 2015-05-08T18:03:57Z

Woohoo! Just finished running e2e on my mac:

Ran 38 of 43 Specs in 773.538 seconds
SUCCESS! -- 38 Passed | 0 Failed | 1 Pending | 4 Skipped I0508 14:00:05.678822   30482 driver.go:96] All tests pass

As a side note: Forcing vagrant to make new virtual boxes for each run seems to do the trick. I'll see If i can find out why that isn't already the case and get that fixed separately.

derekwaynecarr · 2015-05-08T18:19:07Z

@BenTheElder - I think https://github.com/GoogleCloudPlatform/kubernetes/blob/master/cluster/vagrant/util.sh#L298 needs to call kube-down a'la https://github.com/GoogleCloudPlatform/kubernetes/blob/master/cluster/gce/util.sh#L846

BenTheElder · 2015-05-08T18:20:45Z

@derekwaynecarr I was actually just writing an issue for that.
The second problem is that when i call kube-down manually it misses the second minion.

derekwaynecarr · 2015-05-08T18:24:24Z

Is NUM_MINIONS not exported? I think e2e may manually be changing the value, and if they are, they need to pass that value back into the invocation of the bash shell.

BenTheElder · 2015-05-08T18:27:23Z

I'm not sure. The change i wrote previously sets NUM_MINIONS in config-test.sh prior to sourcing config-default.sh which in turn exports it in theory, however if I do echo $NUM_MINIONS i get nothing.

roberthbailey · 2015-05-08T18:44:29Z

How are you deleting your e2e cluster? If you run go run hack/e2e.go -v --down then it should pick up the variables from config-test.sh for your provider (see https://github.com/GoogleCloudPlatform/kubernetes/blob/master/docs/devel/development.md#flag-options).

BenTheElder · 2015-05-08T18:47:04Z

@roberthbailey I am doing just that but the vagrant scripts do not do anything when instructed to run test teardown (they echo "Vagrant ignores tear-down"). See #7978 I've just opened.

Edit: I can delete them manually through the virtualbox graphical client (current quick fix), but e2e's -down should be doing this.

roberthbailey · 2015-05-08T18:52:57Z

I just saw that one. Gotta love an "echo TODO". :)

BenTheElder · 2015-05-08T18:57:32Z

Haha yeah. I didn't think anything of it at first since I'm new; but I checked aws's scripts today and they call kube-down.

I'm probably the only one using vagrant to run tests anyhow.
It's handy to be able to test from anywhere on my laptop though so I'll see if I can get those todo's sorted.

roberthbailey · 2015-05-08T19:01:36Z

I'm probably the only one using vagrant to run tests anyhow.

I think that a couple of RedHat folks are running e2es on vagrant as well. For example, see #6817 from @timothysc.

BenTheElder · 2015-05-08T19:02:24Z

Good to know. Fix has been merged.

thockin · 2015-05-08T19:19:10Z

I'm still sort of backlogged, but I will be getting to this soon.

On Fri, May 8, 2015 at 12:02 PM, Benjamin Elder notifications@github.com
wrote:

Good to know. Fix has been merged.

—
Reply to this email directly or view it on GitHub
#7032 (comment)
.

BenTheElder · 2015-06-22T17:46:17Z

This PR is pretty broken at this point considering how many changes have been made to the proxy package since then.

Also given that rkt and docker seem to have no plans to natively run on OSX, BSD, etc anytime soon we probably won't have any use for this for quite some time, if ever. Boot2docker + native clients for OSX and Windows seem to be staying the accepted solution.

If there is demand for this, I can revisit it later.

BenTheElder · 2015-07-07T07:09:38Z

Docker at least may be on FreeBSD at some point soon (https://svnweb.freebsd.org/ports?view=revision&revision=391421) but I have no idea how much of the rest of the stack would need porting.

However, if anyone is interested in supporting PF then ping me and after #9210 goes through I'd be happy to tinker on porting kube-proxy to PF.

ibotty · 2015-08-13T09:39:47Z

it would be great to make kubernetes play well with firewalld as well, so that's another thing to consider.

googlebot added the cla: yes label Apr 19, 2015

BenTheElder mentioned this pull request Apr 19, 2015

Abstract iptables in proxier.go #5961

Closed

kelseyhightower reviewed Apr 19, 2015
View reviewed changes

brendandburns assigned thockin Apr 20, 2015

BenTheElder mentioned this pull request Apr 27, 2015

use iptables for proxying instead of userspace #3760

Closed

BenTheElder mentioned this pull request May 4, 2015

e2e tests failures and bugs on vanilla vagrant cluster #7719

Closed

BenTheElder added 2 commits May 5, 2015 20:35

Abstract iptables in proxier.go

c9b5772

Abstract opening/closing/deleting all portals to a ProxyManager interface so we can later choose between iptables and pf. See: #2170; in particular: #2170 (comment)

godoc IptablesPortalManager

2560877

Create ProxierIPs struct for use with PortalManager

73e5ee8

Create a ProxierIPs struct for embedding in Proxier to contain the listenIP and hostIP fields for passing to PortalManager.

BenTheElder mentioned this pull request May 8, 2015

Vagrant test cluster is not torn down and Minions are missed when doing so manually. #7978

Closed

erictune added this to the v1.0-post milestone Jun 1, 2015

BenTheElder closed this Jun 22, 2015

Abstract iptables in proxier.go #7032

Abstract iptables in proxier.go #7032

Conversation

BenTheElder commented Apr 19, 2015

BenTheElder commented Apr 19, 2015

kelseyhightower Apr 19, 2015

Choose a reason for hiding this comment

BenTheElder Apr 19, 2015

Choose a reason for hiding this comment

BenTheElder Apr 19, 2015

Choose a reason for hiding this comment

brendandburns commented Apr 20, 2015

BenTheElder commented Apr 20, 2015

ghost commented Apr 20, 2015

BenTheElder commented Apr 21, 2015

BenTheElder commented Apr 21, 2015

BenTheElder commented Apr 21, 2015

BenTheElder commented Apr 21, 2015

BenTheElder commented Apr 21, 2015

BenTheElder commented Apr 25, 2015

derekwaynecarr commented Apr 25, 2015

BenTheElder commented Apr 25, 2015

BenTheElder commented May 4, 2015

derekwaynecarr commented May 4, 2015

BenTheElder commented May 4, 2015

BenTheElder commented May 6, 2015

BenTheElder commented May 6, 2015

BenTheElder commented May 8, 2015

derekwaynecarr commented May 8, 2015

BenTheElder commented May 8, 2015

derekwaynecarr commented May 8, 2015

BenTheElder commented May 8, 2015

roberthbailey commented May 8, 2015

BenTheElder commented May 8, 2015

roberthbailey commented May 8, 2015

BenTheElder commented May 8, 2015

roberthbailey commented May 8, 2015

BenTheElder commented May 8, 2015

thockin commented May 8, 2015

BenTheElder commented Jun 22, 2015

BenTheElder commented Jul 7, 2015

ibotty commented Aug 13, 2015