Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Losing tons of data, wondering what i'm doing wrong #26

Open
jepries opened this issue Mar 15, 2016 · 5 comments
Open

Losing tons of data, wondering what i'm doing wrong #26

jepries opened this issue Mar 15, 2016 · 5 comments

Comments

@jepries
Copy link

jepries commented Mar 15, 2016

@UlricE HI Ulric, I'm testing PEN to see if it works for me and I'm wondering what I'm doing incorrect.

I would like to start multiple instances of https://github.com/CESNET/ipfixcol on a server and I would like to put PEN as the gatekeeper to redirect incoming packets to the instances.

I'm starting one instance of ipfixcol on port 50000, and another instance on 50001
I start PEN with this command

pen -ddfUs -x 15000 4739 10.73.198.150:51000 10.73.198.150:51001

As you can see, I'm running PEN and the two instances of ipfixcol on the same VM.

When I send packets directly to ipfixcol, I collect about 500,000 unique records. When I send packets through PEN (port 4739), I'm only storing about 5,000 unique records. Somewhere data is getting lost. One issue is that too many "ports" are getting opened

[v795363@enmv-dev002-1 pen]$ pen -ddfUs  -x 15000 4739 10.73.198.150:51000   10.73.198.150:51001
2016-03-15 11:54:14: expand_conntable(15000)
2016-03-15 11:54:14: read_cfg((null))
2016-03-15 11:54:14: open_listener(4739)
2016-03-15 11:54:14: pen_aton(0.0.0.0, 0x7fff2bdc9400)
2016-03-15 11:54:14: family = 2
socktype = 1
protocol = 6
addrlen = 16
sockaddr = 21296320
canonname = (null)
2016-03-15 11:54:14: socket returns 3, socket_errno=0
2016-03-15 11:54:14: local address=[0.0.0.0:4739]
2016-03-15 11:54:14: init(3, 0x7fff2bdc9a18); port = 4739
2016-03-15 11:54:14: Before: conns = 0x7f1b13eb9010, connections_max = 15000, clients = (nil),     clients_max = 0
2016-03-15 11:54:14: After: conns = 0x7f1b13eb9010, connections_max = 15000, clients =  0x7f1b11176010, clients_max = 2048
2016-03-15 11:54:14: server[0] = 10.73.198.150:51000
2016-03-15 11:54:14: n = 2, address = 10.73.198.150, pno = 51000, maxc1 = 0, hard = 0, weight = 0, prio = 0, proto = udp
2016-03-15 11:54:14: pen_aton(10.73.198.150, 0x144f608)
2016-03-15 11:54:14: family = 2
socktype = 1
protocol = 6
addrlen = 16
sockaddr = 21296160
canonname = (null)
2016-03-15 11:54:14: server[1] = 10.73.198.150:51001
2016-03-15 11:54:14: n = 2, address = 10.73.198.150, pno = 51001, maxc1 = 0, hard = 0, weight = 0,  prio = 0, proto = udp
2016-03-15 11:54:14: pen_aton(10.73.198.150, 0x144f990)
2016-03-15 11:54:14: family = 2
socktype = 1
protocol = 6
addrlen = 16
sockaddr = 21296160
canonname = (null)
2016-03-15 11:54:14: pen 0.32.1 starting
2016-03-15 11:54:14: servers:
2016-03-15 11:54:14:  0 10.73.198.150:51000:0:0:0:0
2016-03-15 11:54:14:  1 10.73.198.150:51001:0:0:0:0
2016-03-15 11:54:14: epoll_create1 returns 4
2016-03-15 11:54:14: epoll_event_add(fd=3, events=65536)
2016-03-15 11:54:14: epoll_event_ctl(fd=3, events=65536, op=1)
2016-03-15 11:54:14: mainloop()   
2016-03-15 11:54:14: epoll_event_arm(fd=3, events=65536)
2016-03-15 11:54:14: epoll_event_ctl(fd=3, events=65536, op=3)
2016-03-15 11:54:14: epoll_event_wait()
....
2016-03-15 11:52:59: socket returns 1023, socket_errno=0
2016-03-15 11:52:59: Connecting to 10.73.198.150
2016-03-15 11:52:59: Family: AF_INET
2016-03-15 11:52:59: Port: 51001
2016-03-15 11:52:59: Address: 10.73.198.150
2016-03-15 11:52:59: connect (upfd = 1023) returns 0, errno = 0, socket_errno = 0
2016-03-15 11:52:59: epoll_event_add(fd=1023, events=65536)
2016-03-15 11:52:59: epoll_event_ctl(fd=1023, events=65536, op=1)
2016-03-15 11:52:59: Successful connect to server 1
conns[1018].client = 0
conns[1018].server = 1
2016-03-15 11:52:59: Setting server 1 for client 0
2016-03-15 11:52:59: add_client: wrote 1428 bytes to socket 1023
2016-03-15 11:52:59: epoll_event_fd(revents=0x7fff430dec04)
2016-03-15 11:52:59: epoll_event_wait()
2016-03-15 11:52:59: epoll_wait returns 1
2016-03-15 11:52:59: After event_wait()
2016-03-15 11:52:59: epoll_event_fd(revents=0x7fff430dec04)
2016-03-15 11:52:59: event_fd returns fd=3, events=65536
2016-03-15 11:52:59: check_listen_socket()
2016-03-15 11:52:59: add_client: received 1428 bytes from client
2016-03-15 11:52:59: Client 10.73.198.151 has index 0
2016-03-15 11:52:59: store_client returns 0
2016-03-15 11:52:59: incrementing connections_used to 1020 for connection 1019
2016-03-15 11:52:59: store_conn: conn = 1019, downfd = 3, connections_used = 1020
2016-03-15 11:52:59: store_conn returns 1019
2016-03-15 11:52:59: match_acl_ipv4(0, 2546354442)
2016-03-15 11:52:59: Will try previous server 1 for client 0
2016-03-15 11:52:59: Trying server 1 for connection 1019 at time 1458057179
2016-03-15 11:52:59: match_acl_ipv4(0, 2546354442)
2016-03-15 11:52:59: socket returns -1, socket_errno=24
Error opening socket: Too many open files

All I really want to do is forward UDP incoming on port 4739 to other ports on the same box. I've tried forwarding to localhost 127.0.0.1 and then the eth0 IP (as above), I get the same massively lost amount of data.

Based on everything I read, PEN should be able to handle this no problem, but I can't seem to find the right keys.

Also, I lose the original address of who sent the packet, the new sender is now the address of PEN. Anyway to preserve the actual original packet received so that the original sender is saved?

Thanks!

ps, I also turned off debugging and get same results, I'm missing something for sure.

@UlricE
Copy link
Owner

UlricE commented Mar 16, 2016

I see that you're starting pen with -x 15000. The "too many open files" error suggests that the server isn't configured to allow that.

What is the traffic rate? Did you run top or something during the test to see that the server wasn't overloaded?

You can use the transparent reverse proxy mode to preserve the original source address.

https://github.com/UlricE/pen/wiki/Transparent-Reverse-Proxy

That describes TCP, but UDP is the same but with -U as usual.

@jepries
Copy link
Author

jepries commented Mar 16, 2016

Didn't run top, but the only things running were 2 ipfixcol's and Pen. the 2 ipfixcol's were only getting about 10% of the packets I was expecting they would get. I added the -x15000 after my initial runs thinking I had to increase the configuration. I would think that UDP wouldn't open "connections", it should be stateless so copying the packet and sending it should be the end of the process, didn't think I would run out of "open files" with UDP.

The traffic rate was 1000 packets per second. My goal is 15000 pps. I can receive 15000 pps directly to the ipfixcol software on this server before I begin to lose any data. I know the server can handle the load, so figure PEN would be a nice addition to spread the load a bit.

Thanks for the link to the reverse proxy. I couldn't find information on it yesterday.

@UlricE
Copy link
Owner

UlricE commented Mar 17, 2016

I've been trying to reproduce the problem but not really had any success. Here 192.168.1.1 send "garbage" using iperf to 192.168.1.2, which is Pen load balancing round robin across 192.168.2.2 and 192.168.2.3. There are no iperf servers running on those last two, only tcpdump to see that they get the traffic.

   .---------.
   | debtest |
   `---------´
        |.1
  ------------ 192.168.1.0/24
        |.2
     .-------.
     | test1 |
     `-------´
         |.1
------------------- 192.168.2.0/24
    |.2       |.3
.-------. .-------.
| test3 | | test4 |
`-------´ `-------´

As seen from "debtest":

root@debtest:~# iperf -c 192.168.1.2 -u -b 100M
------------------------------------------------------------
Client connecting to 192.168.1.2, UDP port 5001
Sending 1470 byte datagrams
UDP buffer size:  208 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.1 port 58306 connected with 192.168.1.2 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   119 MBytes   100 Mbits/sec
[  3] Sent 85164 datagrams
[  3] WARNING: did not receive ack of last datagram after 10 tries.

And "test1":

ulric@test1:~/Git/pen$ ./pen 5001 -frU 5001 192.168.2.2 192.168.2.3
2015-11-21 11:05:07: Before: conns = (nil), connections_max = 0, clients = (nil), clients_max = 0
2015-11-21 11:05:07: After: conns = 0x1e49970, connections_max = 500, clients = 0x7fbbe776e010, clients_max = 2048
2015-11-21 11:05:10: Server 0 failed, retry in 30 sec: 22
2015-11-21 11:05:10: blacklisting server 0 because connect error 22

That "connect" error is interesting, it looks like a bug. It doesn't seem to affect anything though.

On "test3":

root@test3:~# tcpdump udp port 5001 > udp.log 2>&1
^Croot@test3:~tail udp.log
11:03:26.293483 IP 192.168.2.1.55388 > test3.5001: UDP, length 1470
11:03:26.544617 IP 192.168.2.1.50338 > test3.5001: UDP, length 1470
11:03:27.046233 IP 192.168.2.1.42635 > test3.5001: UDP, length 1470
11:03:27.547368 IP 192.168.2.1.36448 > test3.5001: UDP, length 1470
11:03:28.049268 IP 192.168.2.1.57049 > test3.5001: UDP, length 1470
11:03:28.551355 IP 192.168.2.1.51999 > test3.5001: UDP, length 1470

42412 packets captured
42413 packets received by filter
0 packets dropped by kernel

And finally "test4":

root@test4:~# tcpdump udp port 5001 > udp.log 2>&1
^Croot@test4:~# tail udp.log
11:02:08.574046 IP 192.168.2.1.44670 > test4.5001: UDP, length 1470
11:02:08.575046 IP 192.168.2.1.38798 > test4.5001: UDP, length 1470
11:02:09.078211 IP 192.168.2.1.34101 > test4.5001: UDP, length 1470
11:02:09.577757 IP 192.168.2.1.34233 > test4.5001: UDP, length 1470
11:02:10.079773 IP 192.168.2.1.55955 > test4.5001: UDP, length 1470
11:02:10.581865 IP 192.168.2.1.55055 > test4.5001: UDP, length 1470

42410 packets captured
42410 packets received by filter
0 packets dropped by kernel

@jepries
Copy link
Author

jepries commented Mar 17, 2016

Thanks for looking into this Ulric, the only difference between your test above and min, I was attempting to send to two other ports on the same host.

@UlricE
Copy link
Owner

UlricE commented Mar 18, 2016

Trying that:

root@debtest:~# iperf -c 192.168.1.2 -u -b 200m
------------------------------------------------------------
Client connecting to 192.168.1.2, UDP port 5001
Sending 1470 byte datagrams
UDP buffer size:  208 KByte (default)
------------------------------------------------------------
[  3] local 192.168.1.1 port 45654 connected with 192.168.1.2 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   240 MBytes   201 Mbits/sec
[  3] Sent 171325 datagrams
[  3] WARNING: did not receive ack of last datagram after 10 tries.

Pen, latest code from git:

ulric@test1:~/Git/pen$ ./pen -fdrU -O epoll 5001 127.0.0.1:5002 127.0.0.1:5003
2016-03-18 14:55:49: read_cfg((null))
2016-03-18 14:55:49: Before: conns = (nil), connections_max = 0, clients = (nil), clients_max = 0
2016-03-18 14:55:49: expand_conntable(500)
2016-03-18 14:55:49: After: conns = 0x1e59780, connections_max = 500, clients = 0x7fab18058010, clients_max = 2048
2016-03-18 14:55:49: pen 0.32.1 starting
2016-03-18 14:55:49: servers:
2016-03-18 14:55:49:  0 127.0.0.1:5002:0:0:0:0
2016-03-18 14:55:49:  1 127.0.0.1:5003:0:0:0:0

Tcpdump:

root@test1:~# tcpdump -i lo -n port 5002 or port 5003 > udp.log 2>&1
^Croot@test1:~# tail udp.log
14:56:02.948424 IP 127.0.0.1.51787 > 127.0.0.1.5002: UDP, length 1470
14:56:03.198966 IP 127.0.0.1.36358 > 127.0.0.1.5003: UDP, length 1470
14:56:03.449192 IP 127.0.0.1.48461 > 127.0.0.1.5002: UDP, length 1470
14:56:03.699962 IP 127.0.0.1.54404 > 127.0.0.1.5003: UDP, length 1470
14:56:03.951191 IP 127.0.0.1.57475 > 127.0.0.1.5002: UDP, length 1470
14:56:04.201031 IP 127.0.0.1.48524 > 127.0.0.1.5003: UDP, length 1470

171323 packets captured
342646 packets received by filter
0 packets dropped by kernel

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants