FreeBSD/OSD.cc: add client_messenger to the avoid_ports set. #12463

Merged
merged 1 commit into from Dec 15, 2016

Projects

None yet

5 participants

@wjwithagen
Contributor

Observed "feature":
During rebind due to a "wrongly marked down" log message, FreeBSD is
able to bind to the port used by client_messenger.
The Linux variant avoids that port because it is already in use.

Result:
In FreeBSD there would be 2 listeners on the port, and due to the rebind
they have different nonces. (This is written in the logfile)
But they also will expect different protocols on that same port.

This is likely due to an interpretation difference in the SO_REUSEADDR
socket option:

Linux:
SO_REUSEADDR
Indicates that the rules used in validating addresses supplied
in a bind(2) call should allow reuse of local addresses.
For AF_INET sockets this means that a socket may bind,
except when there is an active listening socket bound to the address.
When the listening socket is bound to INADDR_ANY with a specific port
then it is not possible to bind to this port for any local address.
Argument is an integer boolean flag.

FreeBSD:
SO_REUSEADDR
Enables local address reuse
indicates that the rules used in validating addresses supplied in a
bind(2) system call should allow reuse of local addresses.

So FreeBSD doesn't guarantee that the connection is refused when there is already
a connection. So it is best avoided during rebinding otherwise any of the
cluster_messengers will attach to the port.

FreeBSD log with wrong connect:
bb98d80 0 log_channel(cluster) log [WRN] : map e18 wrongly marked me down
bb98d80 1 -- 127.0.0.1:6801/17881 rebind rebind avoid 6801,6802,6803
bb98d80 1 -- 127.0.0.1:6801/17881 shutdown_connections
bb98d80 1 -- 127.0.0.1:6800/1017881 _finish_bind bind my_inst.addr is 127.0.0.1:6800/1017881
bb98d80 1 Processor -- start
bb98d80 1 -- 127.0.0.1:6802/17881 rebind rebind avoid 6801,6802,6803
bb98d80 1 -- 127.0.0.1:6802/17881 shutdown_connections
bb98d80 1 -- 127.0.0.1:0/17881 learned_addr learned my addr 127.0.0.1:0/17881
bb98d80 1 -- 127.0.0.1:6804/1017881 _finish_bind bind my_inst.addr is 127.0.0.1:6804/1017881

FreeBSD with the correct behaviour:
bb98d80 0 log_channel(cluster) log [WRN] : map e17 wrongly marked me down
bb98d80 1 -- 127.0.0.1:6802/15296 rebind rebind avoid 6801,6802,6803,6812
bb98d80 1 -- 127.0.0.1:6802/15296 shutdown_connections
bb98d80 1 -- 127.0.0.1:6806/1015296 _finish_bind bind my_inst.addr is 127.0.0.1:6806/1015296
bb98d80 1 Processor -- start
bb98d80 1 -- 127.0.0.1:6803/15296 rebind rebind avoid 6801,6802,6803,6812
bb98d80 1 -- 127.0.0.1:6803/15296 shutdown_connections
bb98d80 1 -- 127.0.0.1:0/15296 learned_addr learned my addr 127.0.0.1:0/15296
bb98d80 1 -- 127.0.0.1:6807/1015296 _finish_bind bind my_inst.addr is 127.0.0.1:6807/1015296

Signed-off-by: Willem Jan Withagen wjw@digiware.nl

@wjwithagen wjwithagen FreeBSD/OSD.cc: add client_messenger to the avoid_ports set.
Observed "feature":
  During rebind due to a "wrongly marked down" log message, FreeBSD is
  able to bind to the port used by client_messenger.
  The Linux variant avoids that port because it is already in use.

Result:
  In FreeBSD there would be 2 listeners on the port, and due to the rebind
  they have different nonces. (This is written in the logfile)
  But they also will expect different protocols on that same port.

This is likely due to an interpretation difference in the SO_REUSEADDR
socket option:

Linux:
  SO_REUSEADDR
        Indicates that the rules used in validating addresses supplied
        in a bind(2)  call  should  allow  reuse  of  local addresses.
        For AF_INET sockets this means that a socket may bind,
        except when there is an active listening socket bound to the address.
        When the listening socket is bound to INADDR_ANY with a specific port
        then it is not possible to bind to this port for any local address.
        Argument is an integer boolean flag.

FreeBSD:
  SO_REUSEADDR
        Enables local address reuse
        indicates that the rules used in validating addresses supplied in a
        bind(2) system call should allow reuse of local addresses.

So FreeBSD doesn't guarantee that the connection is refused when there is already
a connection. So it is best avoided during rebinding otherwise any of the
cluster_messengers will attach to the port.

FreeBSD log with wrong connect:
bb98d80  0 log_channel(cluster) log [WRN] : map e18 wrongly marked me down
bb98d80  1 -- 127.0.0.1:6801/17881 rebind rebind avoid 6801,6802,6803
bb98d80  1 -- 127.0.0.1:6801/17881 shutdown_connections
bb98d80  1 -- 127.0.0.1:6800/1017881 _finish_bind bind my_inst.addr is 127.0.0.1:6800/1017881
bb98d80  1  Processor -- start
bb98d80  1 -- 127.0.0.1:6802/17881 rebind rebind avoid 6801,6802,6803
bb98d80  1 -- 127.0.0.1:6802/17881 shutdown_connections
bb98d80  1 -- 127.0.0.1:0/17881 learned_addr learned my addr 127.0.0.1:0/17881
bb98d80  1 -- 127.0.0.1:6804/1017881 _finish_bind bind my_inst.addr is 127.0.0.1:6804/1017881

FreeBSD with the correct behaviour:
bb98d80  0 log_channel(cluster) log [WRN] : map e17 wrongly marked me down
bb98d80  1 -- 127.0.0.1:6802/15296 rebind rebind avoid 6801,6802,6803,6812
bb98d80  1 -- 127.0.0.1:6802/15296 shutdown_connections
bb98d80  1 -- 127.0.0.1:6806/1015296 _finish_bind bind my_inst.addr is 127.0.0.1:6806/1015296
bb98d80  1  Processor -- start
bb98d80  1 -- 127.0.0.1:6803/15296 rebind rebind avoid 6801,6802,6803,6812
bb98d80  1 -- 127.0.0.1:6803/15296 shutdown_connections
bb98d80  1 -- 127.0.0.1:0/15296 learned_addr learned my addr 127.0.0.1:0/15296
bb98d80  1 -- 127.0.0.1:6807/1015296 _finish_bind bind my_inst.addr is 127.0.0.1:6807/1015296

Signed-off-by: Willem Jan Withagen <wjw@digiware.nl>
517a77f
@wjwithagen
Contributor

@tchaikov @yuyuyu101
After rambling thru a lot of debugging I think that the best direction for FreeBSD is to add this extra port the the avoid_ports. I guess it would not even harm on Linux.
The current error is not related to my change (at least I think so)

Would you care to review (and possibly submit?)

@tchaikov tchaikov self-assigned this Dec 13, 2016
@tchaikov
Contributor
tchaikov commented Dec 13, 2016 edited

@wjwithagen i will be AFK in the following two weeks, will try to review your change in a even more boring week day in my vacation. or next year if @yuyuyu101 has not reviewed your change yet by then.

@yuyuyu101 yuyuyu101 was assigned by tchaikov Dec 13, 2016
@wjwithagen
Contributor

@tchaikov
Then I hope that you did not have any time to review :)
Enjoy the time with your family, and happy days.

@liewegas liewegas added the needs-qa label Dec 14, 2016
@liewegas
Member

retest this please

+ // prevent FreeBSD from grabbing the client_messenger port during
+ // rebinding. In which case a cluster_meesneger will connect also
+ // to the same port
+ avoid_ports.insert(client_messenger->get_myaddr().get_port());
@yuyuyu101
yuyuyu101 Dec 14, 2016 Member

if so, do we need to insert public_messenger port?

@liewegas
Member
@yuyuyu101
Member

agreed

@wjwithagen
Contributor

@liewegas @yuyuyu101
That was my idea as well but I wanted to be as less invasive as possible.
But I think that the troubles are not quite over yet. After running a few dozen times, I did fail again on my Jenkins host.
( I'm starting to become a ctest fan: ctest --repeat-until-fail -V -R )

And that is because there is another port that we open, and that is the command channel. (./src/ceph_osd.cc:443) That port is opened very early on. And I'm having a hard time in transparently getting the port of external_messenger. But I think that translates to client_messenger???

So I should be covered?

@yuriw
Contributor
yuriw commented Dec 14, 2016

test this please

@yuyuyu101
Member

@wjwithagen i think so

@yuriw yuriw merged commit 8d7d5ae into ceph:master Dec 15, 2016

3 checks passed

Signed-off-by all commits in this PR are signed
Details
Unmodifed Submodules submodules for project are unmodified
Details
default Build finished.
Details
@wjwithagen
Contributor

@yuriw
Hi Yuri, thanx for the merge

@wjwithagen wjwithagen deleted the wjwithagen:wip-wjw-freebsd-osd-avoidports branch Jan 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment