Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add east-west and IPv4 load balancing #109

Merged
merged 5 commits into from Sep 13, 2016

Conversation

Projects
None yet
3 participants
@tgraf
Copy link
Member

tgraf commented Aug 31, 2016

Reworks the load balancing code and makes most of the code available for reuse.

Adds the following capabilities:

  • Balancing between containers on the same node
  • Balancing between containers on different nodes
  • Balancing from and to arbitrary IPs
  • IPv4 support (limited)

IPv4 limitations:

  • This does not include any work to carry the reverse NAT index in the packet from node to node. Reverse translation with IPv4 currently only works on the same node.

Example config, L3 load balancer based on L4 hash:

SVC_IP6="f00d::1:1"
LB_PORT=0
REVNAT=222
sudo cilium lb create-services-map
sudo cilium lb update-service $SVC_IP6 $LB_PORT 0 2 $REVNAT :: $LB_PORT
sudo cilium lb update-service $SVC_IP6 $LB_PORT 1 2 $REVNAT $SERVER1_IP $LB_PORT
sudo cilium lb update-service $SVC_IP6 $LB_PORT 2 2 $REVNAT $SERVER2_IP $LB_PORT
sudo cilium lb create-rev-nat-map
sudo cilium lb update-rev-nat $REVNAT $SVC_IP6 $LB_PORT
ArgsUsage: "<map file> <maptype> <ipv6 addr> <dport> <state> <count> [<lxc-id> <lxc-port> <node-id> ...]",
Action: lbUpdateKey,
ArgsUsage: "<ipv6 addr> <port> <state> <count> <lxc-id> <node-id>",
Action: lbUpdateService,

This comment has been minimized.

@mchalla

mchalla Sep 1, 2016

Contributor

This help needs to be updated.


if (!(svc = lb6_lookup_slave(skb, &key, slave)))
return DROP_NO_SERVICE;

This comment has been minimized.

@mchalla

mchalla Sep 1, 2016

Contributor

So we always need to do this additional lookup, its definitely more cleaner and simpler to use and I cannot think of how to avoid it with the new scheme.

This comment has been minimized.

@tgraf

tgraf Sep 1, 2016

Author Member

This is another good example where masked lookups in the form if key1 does not match, then apply mask and try again will make sense. Same as for conntrack.

cilium_trace(skb, DBG_GENERIC, state, ret);
/* Service redirection must occur before conntrack to ensure that the
* conntrack entry is created for the translated final endpoint. */
if (!skip_service && (svc = lb6_lookup_service(skb, &key)) != NULL) {

This comment has been minimized.

@mchalla

mchalla Sep 1, 2016

Contributor

Isn't this happening after conntrack ? Or am I missing something ?

This comment has been minimized.

@tgraf

tgraf Sep 1, 2016

Author Member

Good catch, it was before conntrack before so the comment became outdated. I think it makes sense to eventually do it before conntrack which would allow to see the actual backend service in the CT table at the source. The reverse translation becomes slightly more complex though.

@mchalla

This comment has been minimized.

Copy link
Contributor

mchalla commented Sep 1, 2016

In general it looks much cleaner and simpler to use. Have a few comments inline. Thanks.

fmt.Fprintf(os.Stderr, "%s\n", err)
printArgsUsageAndExit(ctx)
return
}
lbval.lxcCount = int32(count)
lbval.count = uint16(count)

This comment has been minimized.

@mchalla

mchalla Sep 1, 2016

Contributor

I suppose the count is only needed for the master and not the slaves. So should we update the map count, only the master changes with the addition or deletion of a slave.

This comment has been minimized.

@tgraf

tgraf Sep 1, 2016

Author Member

Right. We need to figure out the automation part of this anyway.

p4 |= (state << 16);
addr->p4 = htonl(p4);
}

This comment has been minimized.

@mchalla

mchalla Sep 1, 2016

Contributor

We still need these apis for vxlan mode where the state needs to be now derived from the ipv6 target and copied to the tunnel metadata and zeroed in the address.

This comment has been minimized.

@tgraf

tgraf Sep 1, 2016

Author Member

I removed them because the byte order conversion should not happen in the fast path at all. So when we do the encap part, let's add it properly with the minimal instructions required.

This comment has been minimized.

@mchalla

mchalla Sep 1, 2016

Contributor

Sounds good.

@tgraf tgraf force-pushed the lb-updates branch 7 times, most recently from 21c7236 to 79712a7 Sep 1, 2016

@tgraf tgraf changed the title Add east-west load balancing [WIP] Add east-west and IPv4 load balancing [WIP] Sep 7, 2016

@tgraf tgraf force-pushed the lb-updates branch from 79712a7 to 4cdf770 Sep 7, 2016

* IP and optional destination port for every IPv4 and
* IPv6 packet recevied. If a matching entry is found, the
* destination address will be written to one of the
* configures slaves. Optionall the destination port can be

This comment has been minimized.

@aanm

aanm Sep 7, 2016

Member

s/Optionall/Optionally


func servicesMap() *bpf.Map {
if ipv4 {
return bpf.NewMap("/sys/fs/bpf/tc/globals/cilium_lb4_services",

This comment has been minimized.

@aanm

aanm Sep 7, 2016

Member

"/sys/fs/bpf/tc/globals/cilium_lb4_services" -> common.BPFCiliumMaps + "/cilium_lb4_services"

int(unsafe.Sizeof(LB4Service{})),
mapSize)
} else {
return bpf.NewMap("/sys/fs/bpf/tc/globals/cilium_lb6_services",

This comment has been minimized.

@aanm

aanm Sep 7, 2016

Member

"/sys/fs/bpf/tc/globals/cilium_lb6_services" -> common.BPFCiliumMaps + "/cilium_lb6_services"


func stateMap() *bpf.Map {
if ipv4 {
return bpf.NewMap("/sys/fs/bpf/tc/globals/cilium_lb4_state",

This comment has been minimized.

@aanm

aanm Sep 7, 2016

Member

"/sys/fs/bpf/tc/globals/cilium_lb4_state" -> common.BPFCiliumMaps + "/cilium_lb4_state"

int(unsafe.Sizeof(LB4State{})),
mapSize)
} else {
return bpf.NewMap("/sys/fs/bpf/tc/globals/cilium_lb6_state",

This comment has been minimized.

@aanm

aanm Sep 7, 2016

Member

"/sys/fs/bpf/tc/globals/cilium_lb6_state" -> common.BPFCiliumMaps + "/cilium_lb6_state"

@tgraf tgraf force-pushed the lb-updates branch 3 times, most recently from 6c8f0df to 7ff4a5c Sep 8, 2016

"github.com/cilium/cilium/common/addressing"
"github.com/cilium/cilium/common/bpf"
"github.com/cilium/cilium/common/types"
"github.com/op/go-logging"

This comment has been minimized.

@aanm

aanm Sep 9, 2016

Member

"github.com/op/go-logging" It's one line above from what it should be.

This comment has been minimized.

@aanm

aanm Sep 13, 2016

Member

Bump

@tgraf tgraf force-pushed the lb-updates branch 3 times, most recently from e86be57 to 7c5c563 Sep 11, 2016

@tgraf tgraf added pending-review and removed wip labels Sep 13, 2016

@tgraf tgraf force-pushed the lb-updates branch from 7c5c563 to 917ba3e Sep 13, 2016

@tgraf tgraf changed the title Add east-west and IPv4 load balancing [WIP] Add east-west and IPv4 load balancing Sep 13, 2016

@@ -70,7 +70,7 @@ function write_footer() {
cat <<EOF >> "$filename"
sleep 2s
sed -i '/exec/d' /etc/init/cilium-net-daemon.conf
echo 'exec cilium -D daemon run -n ${ipv6_addr} ${ipv4_options}-t vxlan -c "${NODE_IP_BASE}${FIRST_IP_SUFFIX}:8500"' >> /etc/init/cilium-net-daemon.conf
echo 'exec cilium -D daemon run -n ${ipv6_addr} ${ipv4_options}-t vxlan --ipv4 -c "${NODE_IP_BASE}${FIRST_IP_SUFFIX}:8500"' >> /etc/init/cilium-net-daemon.conf

This comment has been minimized.

@aanm

aanm Sep 13, 2016

Member

Why not setting IPV4=1 at the top of the file? That option is being verified at line 66-68

This comment has been minimized.

@tgraf

tgraf Sep 13, 2016

Author Member

Good point, changing this.

@aanm

This comment has been minimized.

Copy link
Member

aanm commented Sep 13, 2016

Aside my comments LGTM, waiting for @mchalla

tgraf added some commits Sep 13, 2016

vagrant: Enable IPv4 by default
Required for loadbalancer tests. Intermediate solution until we
can enable/disable IPv4 at runtime

Acked-by: André Martins <andre@cilium.io>
Signed-off-by: Thomas Graf <thomas@cilium.io>
map_ctrl: Make tool generic so it can work with any map
Acked-by: André Martins <andre@cilium.io>
Signed-off-by: Thomas Graf <thomas@cilium.io>
bpf: Add east-west and IPv4 load balancing
Reworks the load balancing code and makes most of the code available
for reuse.

Adds the following capabilities:
 * Balancing between containers on the same node
 * Balancing between containers on different nodes
 * Balancing from and to arbitrary IPs
 * IPv4 support (limited)
 *
IPv4 limitations:
This does not include any work to carry the reverse NAT index in the
packet from node to node. Reverse translation with IPv4 currently only
works on the same node.

Example config, L3 load balancer based on L4 hash:
SVC_IP6="f00d::1:1"
LB_PORT=0
REVNAT=222
sudo cilium lb create-services-map
sudo cilium lb update-service $SVC_IP6 $LB_PORT 0 2 $REVNAT :: $LB_PORT
sudo cilium lb update-service $SVC_IP6 $LB_PORT 1 2 $REVNAT $SERVER1_IP $LB_PORT
sudo cilium lb update-service $SVC_IP6 $LB_PORT 2 2 $REVNAT $SERVER2_IP $LB_PORT
sudo cilium lb create-rev-nat-map
sudo cilium lb update-rev-nat $REVNAT $SVC_IP6 $LB_PORT

Signed-off-by: Thomas Graf <thomas@cilium.io>
bpf: Do not convert sec_label byteorder in debug message
Signed-off-by: Thomas Graf <thomas@cilium.io>
bpf: Add policy handler to bpf_overlay
When using overlay mode, loadbalancer punting to stack requires a
policy program to be available even in encap mode.

Signed-off-by: Thomas Graf <thomas@cilium.io>

@tgraf tgraf force-pushed the lb-updates branch from 917ba3e to fa549ee Sep 13, 2016

@tgraf

This comment has been minimized.

Copy link
Member Author

tgraf commented Sep 13, 2016

I will merge this since this is well tested by now.

@tgraf tgraf added acked and removed pending-review labels Sep 13, 2016

@tgraf tgraf merged commit 657cf15 into master Sep 13, 2016

2 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details

@tgraf tgraf deleted the lb-updates branch Sep 13, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.