Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

findif doesn't properly detect interfaces #52

Closed
b-a-t opened this issue Jan 31, 2012 · 4 comments
Closed

findif doesn't properly detect interfaces #52

b-a-t opened this issue Jan 31, 2012 · 4 comments

Comments

@b-a-t
Copy link

b-a-t commented Jan 31, 2012

I was a bit surprised that despite description of IPAddr[2] just and IP address never was enough for configuration and I had to supply both netmask and NIC name to get things working properly. But a lot of tutorials said that's how it should be done, so I accepted it until got a configuration where NICs had different names on two machines in a cluster and migration of IP didn't work.

I never could get such a configuration to work in Debian squeeze, but on Debian lenny it worked. After digging into the code and configuration it came out that on lenny 'cat /proc/net/route' gives routing table like:

Iface   Destination     Gateway         Flags   RefCnt  Use     Metric  Mask            MTU     Window  IRTT                     
eth0    006BD35F        00000000        0001    0       0       0       E0FFFFFF        0       0       0                        
eth1    0000000A        00000000        0001    0       0       0       00FFFFFF        0       0       0                        
eth2    0001000A        00000000        0001    0       0       0       00FFFFFF        0       0       0                        
eth0    00000000        1E6BD35F        0003    0       0       0       00000000        0       0       0                        

When on squeeze the order of the information is different:

Iface   Destination     Gateway         Flags   RefCnt  Use     Metric  Mask            MTU     Window  IRTT
bond0   00000000        9E77D35F        0003    0       0       0       00000000        0       0       0                                                                         
eth1    0000000A        00000000        0001    0       0       0       00FFFFFF        0       0       0                                                                         
*       FA00000A        00000000        0205    0       0       0       FFFFFFFF        0       0       0                                                                         
bond0   8077D35F        00000000        0001    0       0       0       E0FFFFFF        0       0       0                                                                         

Basically, with the newer kernels the default route 0/0 comes as a first line and matches any supplied IP address/mask combination, giving interface of the default route as a match. It may be suitable result in some cases, but just incidently. Also, the code tries to take route metric into account and rejects anything that is below already found one, although 'man route' says:

Metric The 'distance' to the target (usually counted in hops). It is not used by recent kernels, but may be needed by routing daemons.

And in most cases it's value 0 for all the entries. But current code just picks the first matching line with such a metric rejecting any following better routes with the same metric (0).

So, my proposal is to make this check more relaxed in the first place and consider equal metrics suitable as well. Still in this case default route possibly would be the best choice, so I add additional constrain - pick most specific matching route(with narrower netmask).

Here is the small patch:

--- findif.c.orig       2010-04-14 13:57:20.000000000 +0200
+++ findif.c    2012-01-27 22:06:14.000000000 +0100
@@ -171,11 +171,13 @@
        long            metric = LONG_MAX;
        long            best_metric = LONG_MAX;
        int             rc = 0;
-
+
        char    buf[2048];
        char    interface[MAXSTR];
        FILE *routefd = NULL;

+       *best_netmask = 0;
+
        if ((routefd = fopen(PROCROUTE, "r")) == NULL) {
                snprintf(errmsg, errmsglen
                ,       "Cannot open %s for reading"
@@ -199,17 +201,18 @@
                        ,       PROCROUTE, buf);
                        rc = -1; goto out;
                }
-               if ( (in->s_addr&mask) == (in_addr_t)(dest&mask)
-               &&      metric < best_metric) {
-                       best_metric = metric;
-                       *best_netmask = mask;
-                       strncpy(best_if, interface, best_iflen);
+               if ((in->s_addr&mask) == (in_addr_t)(dest&mask)) {
+                       if(metric <= best_metric && mask >= *best_netmask) {
+                               best_metric = metric;
+                               *best_netmask = mask;
+                               strncpy(best_if, interface, best_iflen);
+                       }
                }
        }

        if (best_metric == LONG_MAX) {
                snprintf(errmsg, errmsglen, "No route to %s\n", address);
-               rc = 1;
+               rc = 1;
        }

   out:

Testing both on lenny and squeeze gave consistent right results :)

Still, this and current code doesn't take into account, for example case when 'reject' route present in the routing table:

*       FA00000A        00000000        0205    0       0       0       FFFFFFFF        0       0       0                                                                 

If desired IP happen to fall into the rejected range an '*' interface returned as a result. Well, it can easily be verified in the code, as well as on a higher level.

Regards,
Timur.

@tserong
Copy link
Member

tserong commented Feb 1, 2012

I hit the same problem with a 3.x kernel on SUSE, and I can confirm that your patch fixes it. I would suggest a small tweak as follows, so it doesn't modify *best_netmask until after the procfile is successfully opened, and removes the nested 'if' (although, really, this is probably mostly academic...)

diff --git a/tools/findif.c b/tools/findif.c
index 72452f7..b911ae2 100644
--- a/tools/findif.c
+++ b/tools/findif.c
@@ -198,6 +198,7 @@ SearchUsingProcRoute (char *address, struct in_addr *in
        ,   PROCROUTE);
        rc = OCF_ERR_GENERIC; goto out;
    }
+   *best_netmask = 0;
    while (fgets(buf, sizeof(buf), routefd) != NULL) {
        if (sscanf(buf, "%[^\t]\t%lx%lx%lx%lx%lx%lx%lx"
        ,   interface, &dest, &gw, &flags, &refcnt, &use
@@ -208,7 +209,7 @@ SearchUsingProcRoute (char *address, struct in_addr *in
            rc = OCF_ERR_GENERIC; goto out;
        }
        if ( (in->s_addr&mask) == (in_addr_t)(dest&mask)
-       &&  metric < best_metric) {
+       &&  metric <= best_metric && mask >= *best_netmask) {
            best_metric = metric;
            *best_netmask = mask;
            strncpy(best_if, interface, best_iflen);

@dmuhamedagic
Copy link
Contributor

Good work! Many thanks. Tim, can you please apply your version of the patch.

@tserong
Copy link
Member

tserong commented Feb 2, 2012

Done: de3074c

@tserong tserong closed this as completed Feb 2, 2012
@b-a-t
Copy link
Author

b-a-t commented Feb 2, 2012

Just want to thank you, guys, for the quick resolution of the problem and commit! Nice team work :)

yuusuke pushed a commit to yuusuke/resource-agents that referenced this issue Nov 21, 2013
See ClusterLabs#52 for
discussion.  Thanks to Timur Bakeyev for original diagnosis and
patch.

Signed-off-by: Tim Serong <tserong@suse.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants