Skip to content

Troubleshooting

Ilya Baldin edited this page Mar 20, 2025 · 7 revisions

Testing and Troubleshooting

It is not uncommong to experience problems with delivering network traffic from the source to the destinations when using E2SAR with a load balancer.

Some common problems include:

  • Firewalls
  • Incorrect routing
  • Improper MTU setting
  • IPv4 vs IPv6 connectivity issues

This document presents a few troubleshooting procedures to help rule out simple network or configuration issues.

Testing load balancer control plane reachability

If provided with an admin EJFAT_URI (in the form of ejfats://<admin token>@<control plane host>:<control plane port>/), you should test the reachability of the control plane host and port as follows:

  • ping control plane host name
    • if unsuccessful use dig command to determine host IP address and try to ping the address directly
    • Sometimes a host may resolve to IPv4 and IPv6 address, determine which of the two is reachable via ping and see the next section for suggestions
  • once the ping is successful you can use nc command to try to reach the control plane port - generally the command would silently hang if the port is reachable
  • use lbadm command to run --version command on the control plane and make sure it returns a non-error response: lbadm --version -u 'ejfats://<admin token>@<control plane host>:<control plane port>/'
    • if you encounter TLS/SSL issues try adding -v flag to the lbadm command line - this disables SSL certificate validation

IPv4 vs IPv6 issues

In modern environments it is not uncommon to see two valid networking configurations - IPv4 and IPv6. E2SAR provides you with controls on which you want to use. You can

  • Replace host name with an appropriate IP address in the admin or instance EJFAT_URI
  • Use preferV6 flag in EjfatURI constructor if using IPv6 is preferred over v6 when using the control plane to reach the load balancer
  • Use SegmenterFlags.dpV6 flag to select which data= address (v4 or v6) is to be used to send data to the load balancer. These are returned as part of the instance Ejfat URI as e.g. ejfats://<instance token>@<control plane host>:<control plane port>/lb/20?sync=129.57.177.130:19532&data=129.57.177.66&data=[2620:0:22f0:677::177:42]
  • Use v6 flag on Reassembler constructor to favor IPv6 address for receiving data, when using IP address auto-detection (not specifying IP address explicitly)

Testing the dataplane reachability

If you have an instance EJFAT URI in the form of ejfats://<instance token>@<control plane host>:<control plane port>/lb/20?sync=129.57.177.130:19532&data=129.57.177.66&data=[2620:0:22f0:677::177:42] which has both an IPv4 data= address and an IPv6 data= address of the load balancer. First determine whether you will be using IPv4 or IPv6 to send and receive your data. Then from every sender and receiver node

  • Determine basic reachability by using ping to the appropriate data= address
  • Determine that to the path to the load balancer has an appropriate MTU by using ping as follows:
    • ping -M do -s 8972 -c 5 <selected data= address> if you expect MTU of 9000 to work
    • ping -M do -s 1472 -c 5 <selected data= address> if you expect MTU of 1500 to work
  • Note it is currently not possible to bridge IPv4 and IPv6 networks through the load balancer. Both senders and receivers must be using either IPv4 or IPv6.

Testing the basic workflow

Bash scripts installed under /usr/local/bin or found in the source tree under scripts/bash-helpers/ can help validate the basic workflow - running the scripts with appropriate parameters in the order described in the README.md file helps validate the various pieces:

  • Querying the load balancer
  • Reserving a load balancer instance
  • Checking the status of the instance
  • Sending and receiving event data over the dataplane
  • Freeing load balancer instance

Clone this wiki locally