Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UDP load balancing #19

Open
silviud opened this issue Oct 15, 2015 · 9 comments
Open

UDP load balancing #19

silviud opened this issue Oct 15, 2015 · 9 comments

Comments

@silviud
Copy link

silviud commented Oct 15, 2015

The UDP load balancer algorithm doesn't count for dead servers.
Example:

./pen -fU 8080 127.0.0.1:8001 127.0.0.1:8002

If servers are up on port 8001 and 8002 traffic is forwarded, however if the server is not up on port 8001 pen will not detect it nor will stop forwarding traffic to it ...

Any plans to add this kind of detection ?

thanks!

-silviu

@UlricE
Copy link
Owner

UlricE commented Oct 16, 2015

Pen is blind to what happens to udp traffic after it is forwarded. If there is a way to detect that a back-end is nonresponsive (e.g. a dns server that doesn't reply), you can use a script to monitor them. Here's an old example for http which can trivially be updated for other protocols:

http://siag.nu/hypermail/pen/0038.html

@silviud
Copy link
Author

silviud commented Oct 16, 2015

Hi,

I tried with the blacklist but got partial success. This is what happened

  1. Started pen
    ./pen -fU -a -dd 8080 127.0.0.1:10000 127.0.0.1:10001 127.0.0.1:10002 -C localhost:9000
  2. Connect a client that receive a response from server 1 (port 10001)
  3. Blacklist the server 1
    ./penctl localhost:9000 server 1 blacklist 30
  4. Connect the client again but it gets nowhere

2015-10-14 10:55:18: add_client: received 4 bytes from client
2015-10-14 10:55:18: Client 127.0.0.1 has index 0
2015-10-14 10:55:18: store_client returns 0
2015-10-14 10:55:18: incrementing connections_used to 1 for connection 0
2015-10-14 10:55:18: store_conn: conn = 0, downfd = 4, connections_used = 1
2015-10-14 10:55:18: store_conn returns 0
2015-10-14 10:55:18: match_acl_ipv4(0, 16777343)
2015-10-14 10:55:18: Will try previous server 1 for client 0
2015-10-14 10:55:18: Trying server 1 for connection 0 at time 1444834518
2015-10-14 10:55:18: Server 1 is blacklisted
2015-10-14 10:55:18: failover_server(0)
2015-10-14 10:55:18: Won't failover from abuse server
2015-10-14 10:55:18: decrementing connections_used to 0 for connection 0
2015-10-14 10:55:18: close_conn: Closing connection 0 to server -3; connections_used = 0
Read 0 from client, wrote 0 to server
Read 0 from server, wrote 0 to client
2015-10-14 10:55:18: No failover server found, giving up

The client will not work until the blacklist window will expire and reconnect on port 10001.

My expectation was to fail over to a different port since the server 1 was blacklisted.

@UlricE
Copy link
Owner

UlricE commented Oct 17, 2015

That would be a reasonable expectation, I think. Let me try to reproduce the problem and see if it is a bug.

UlricE pushed a commit that referenced this issue Oct 29, 2015
@UlricE
Copy link
Owner

UlricE commented Nov 2, 2015

The latest version in Git fixes this failover problem. Here's what I get:

First prepare three pens proxying dns requests to google (to get something to test against) and verify that they work:

ulric@debtest:/Git/pen$ ./pen -U 127.0.0.1:10000 8.8.8.8:53
ulric@debtest:
/Git/pen$ ./pen -U 127.0.0.1:10001 8.8.8.8:53
ulric@debtest:/Git/pen$ ./pen -U 127.0.0.1:10002 8.8.8.8:53
ulric@debtest:
/Git/pen$ dig @127.0.0.1 -p 10000 +short siag.nu
194.9.95.65
ulric@debtest:/Git/pen$ dig @127.0.0.1 -p 10001 +short siag.nu
194.9.95.65
ulric@debtest:
/Git/pen$ dig @127.0.0.1 -p 10002 +short siag.nu
194.9.95.65

Then start Pen, same command line as you used above:

ulric@debtest:~/Git/pen$ ./pen -fU -a -dd 8080 127.0.0.1:10000 127.0.0.1:10001 127.0.0.1:10002 -C localhost:9000 > log 2>&1

And from another terminal, test failover:

ulric@debtest:/Git/pen$ dig @127.0.0.1 -p 8080 +short siag.nu
194.9.95.65
ulric@debtest:
/Git/pen$ ./penctl localhost:9000 server 1 blacklist 30
ulric@debtest:~/Git/pen$ dig @127.0.0.1 -p 8080 +short siag.nu
194.9.95.65

So that looks good. The log says:

2015-11-02 10:05:30: add_client: received 36 bytes from client
2015-11-02 10:05:30: Resetting client stats for slot 0
2015-11-02 10:05:30: Client 127.0.0.1 has index 0
2015-11-02 10:05:30: store_client returns 0
2015-11-02 10:05:30: incrementing connections_used to 1 for connection 0
2015-11-02 10:05:30: expanding fd2conn to 10006 bytes
2015-11-02 10:05:30: store_conn: conn = 0, downfd = 6, connections_used = 1
2015-11-02 10:05:30: store_conn returns 0
2015-11-02 10:05:30: match_acl_ipv4(0, 16777343)
2015-11-02 10:05:30: Will try previous server -3 for client 0
2015-11-02 10:05:30: Trying server 1 for connection 0 at time 1446455130
2015-11-02 10:05:30: match_acl_ipv4(0, 16777343)
2015-11-02 10:05:30: socket returns 8, socket_errno=0
2015-11-02 10:05:30: Connecting to 127.0.0.1
2015-11-02 10:05:30: Family: AF_INET
2015-11-02 10:05:30: Port: 10001
2015-11-02 10:05:30: Address: 127.0.0.1
2015-11-02 10:05:30: connect (upfd = 8) returns 0, errno = 0, socket_errno = 0
2015-11-02 10:05:30: epoll_event_add(fd=8, events=65536)
2015-11-02 10:05:30: epoll_event_ctl(fd=8, events=65536, op=1)
2015-11-02 10:05:30: Successful connect to server 1
conns[0].client = 0
conns[0].server = 1
2015-11-02 10:05:30: Setting server 1 for client 0
2015-11-02 10:05:30: add_client: wrote 36 bytes to socket 8
2015-11-02 10:05:30: epoll_event_fd(revents=0x7fffce832c94)
2015-11-02 10:05:30: epoll_event_wait()
2015-11-02 10:05:30: epoll_wait returns 1
2015-11-02 10:05:30: After event_wait()
2015-11-02 10:05:30: epoll_event_fd(revents=0x7fffce832c94)
2015-11-02 10:05:30: event_fd returns fd=8, events=65536
2015-11-02 10:05:30: want to read from upstream socket 8 of connection 0
2015-11-02 10:05:30: copy_down: recv(8, 0x7fffce82ac30, 32768, 0) returns 52
2015-11-02 10:05:30: copy_down sending 52 bytes to socket 6
2015-11-02 10:05:30: epoll_event_delete(fd=8)
2015-11-02 10:05:30: decrementing connections_used to 0 for connection 0
2015-11-02 10:05:30: close_conn: Closing connection 0 to server 1; connections_used = 0
Read 0 from client, wrote 0 to server
Read 0 from server, wrote 0 to client
[...]
2015-11-02 10:05:37: do_cmd(server 1 blacklist 30
, 0x404470, 0x7fffce831c5c)
[...]
2015-11-02 10:05:42: add_client: received 36 bytes from client
2015-11-02 10:05:42: Client 127.0.0.1 has index 0
2015-11-02 10:05:42: store_client returns 0
2015-11-02 10:05:42: incrementing connections_used to 1 for connection 0
2015-11-02 10:05:42: store_conn: conn = 0, downfd = 6, connections_used = 1
2015-11-02 10:05:42: store_conn returns 0
2015-11-02 10:05:42: match_acl_ipv4(0, 16777343)
2015-11-02 10:05:42: Will try previous server 1 for client 0
2015-11-02 10:05:42: Trying server 1 for connection 0 at time 1446455142
2015-11-02 10:05:42: Server 1 is blacklisted
2015-11-02 10:05:42: failover_server(0): server = 1
2015-11-02 10:05:42: Intend to try server 2
2015-11-02 10:05:42: Trying server 2 for connection 0 at time 1446455142
2015-11-02 10:05:42: match_acl_ipv4(0, 16777343)
2015-11-02 10:05:42: socket returns 8, socket_errno=0
2015-11-02 10:05:42: Connecting to 127.0.0.1
2015-11-02 10:05:42: Family: AF_INET
2015-11-02 10:05:42: Port: 10002
2015-11-02 10:05:42: Address: 127.0.0.1
2015-11-02 10:05:42: connect (upfd = 8) returns 0, errno = 0, socket_errno = 0
2015-11-02 10:05:42: epoll_event_add(fd=8, events=65536)
2015-11-02 10:05:42: epoll_event_ctl(fd=8, events=65536, op=1)
2015-11-02 10:05:42: Successful connect to server 2
conns[0].client = 0
conns[0].server = 1
2015-11-02 10:05:42: Setting server 2 for client 0
2015-11-02 10:05:42: add_client: wrote 36 bytes to socket 8
2015-11-02 10:05:42: epoll_event_fd(revents=0x7fffce832c94)
2015-11-02 10:05:42: epoll_event_wait()
2015-11-02 10:05:42: epoll_wait returns 1
2015-11-02 10:05:42: After event_wait()
2015-11-02 10:05:42: epoll_event_fd(revents=0x7fffce832c94)
2015-11-02 10:05:42: event_fd returns fd=8, events=65536
2015-11-02 10:05:42: want to read from upstream socket 8 of connection 0
2015-11-02 10:05:42: copy_down: recv(8, 0x7fffce82ac30, 32768, 0) returns 52
2015-11-02 10:05:42: copy_down sending 52 bytes to socket 6
2015-11-02 10:05:42: epoll_event_delete(fd=8)
2015-11-02 10:05:42: decrementing connections_used to 0 for connection 0
2015-11-02 10:05:42: close_conn: Closing connection 0 to server 2; connections_used = 0
Read 0 from client, wrote 0 to server
Read 0 from server, wrote 0 to client

@UlricE
Copy link
Owner

UlricE commented Nov 11, 2015

Closing since the fix is in 0.31.1.

@UlricE UlricE closed this as completed Nov 11, 2015
@ccs10021
Copy link

Hi - New to Pen and have just started playing around with dns load balancing. Can't seem to get the load balancer to adjust for failures within load balance pool.

Am trying to work through your examples from above to get a better handle on health checks and blacklisting.

Have done this config based on your examples...

./pen -U 127.0.0.1:10000 10.10.10.1:53
./pen -U 127.0.0.1:10001 10.10.10.4:53
./pen -fU -a -dd 53 127.0.0.1:10000 127.0.0.1:10001 127.0.0.1:10002 -C localhost:9000 > log 2>&1
./penctl localhost:9000 server 1 blacklist 30

Getting this error on blacklisting:
[root@xxx-xxx01 pen-0.31.1]# penctl localhost:9000 server 1 blacklist 30
error connecting to server

Server is up though:
root@xxx-xxx01 pen-0.31.1]# dig @127.0.0.1 -p 10000 +short siag.nu
194.9.95.65

Any ideas?

Also, I've only been seeing empty log files so far. Maybe I am looking in the wrong place?

Any help is greatly appreciated!

Thank you,
CCS

@UlricE
Copy link
Owner

UlricE commented Nov 25, 2016

Looking at your third command line, I see that you're running Pen as root since it's listening on port 53, but then it will be reluctant to create the listening socket. Look near the top of the log file and you should find a line similar to "Won't open control port running as root; use -u to run as different user".

And the error message from penctl simply means the control port isn't listening.

@UlricE UlricE reopened this Nov 25, 2016
@ccs10021
Copy link

Thank you very much for your help. I am now running pen as non-root using an iptable nat to redirect 53 to 8080 on the listening vip. So now, the penctl channel is working fine.

I'm still having some issue with creating my init.d script such that the pen service starts upon boot of the server. Seems I'm running into permissions issues with the pid and log files. Not sure who should own those files, ie, root or non-root user.

Also, working out a script for doing the health check on the back end. Have been working on a script which will run dns calls to my target dns servers which I am load balancing against. If those dns calls fail, the script calls penctl to blacklist the failed server. Just wanted to confirm with you that scripts would be required for this type of health checking, ie, pen can not health check downstream directly?

Thanks again,
CCS

@UlricE
Copy link
Owner

UlricE commented Nov 29, 2016

You can get a bunch of hints for the init script here:

https://github.com/UlricE/pen/wiki/Pen-and-Systemd

It's written for systemd but a lot of the priciples carry over.

You are right that Pen doesn't know anything about the back end health. Remember that unlike TCP, where the three-way handshake confirms that a connection has been made, there is no corresponding mechanism in UDP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants