Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mosquitto bridge problem due to non blocking socket connect #478

Closed
gourish2k opened this issue Jun 28, 2017 · 6 comments

Comments

@gourish2k
Copy link

commented Jun 28, 2017

Hello,

I was creating a Mosquitto bridge on Windows 7 x64 system.
By using the latest stable build of installer mosquitto-1.4.12-install-win32.exe I was getting the following error

1498638467: mosquitto version 1.4.12 (build date 28-06-2017 11:31:30.70) startin
g
1498638467: Config loaded from mosquitto.conf.
1498638467: Opening ipv6 listen socket on port 1883.
1498638467: Opening ipv4 listen socket on port 1883.
1498638467: Bridge local.mosquittoBridgeGourishWindows1 doing local SUBSCRIBE on
topic take000/#
1498638467: Connecting bridge local_to_121_mosquitto (10.170.7.121:1884)
1498638467: Bridge mosquittoBridgeGourishWindows1 sending CONNECT
1498638467: Error creating bridge: Unknown error.
1498638467: Warning: Unable to connect to bridge local_to_121_mosquitto.
1498638469: mosquitto version 1.4.12 terminating

After checking the code I found that the statement in bridge.c at line number 368 as follows, makes a non blocking socket connect.

rc = _mosquitto_socket_connect(context, context->bridge->addresses[context->bridge->cur_address].address, context->bridge->addresses[context->bridge->cur_address].port, NULL, false);

So the the next command on line 387 sometimes fails, since the connection has not been established, as the previous connect is non-blocking. In other word the command below is executed before the previous command has succeeded in making the socket connection.

rc = _mosquitto_send_connect(context, context->keepalive, context->clean_session);

As a quick workaround I had put a Sleep statement previous to line 387, so as to allow some time for the connection, however the the time taken for connect was depending on the speed of the network connection, as for a slower network it took more time and the duration for Sleep was not constant. So in some cases the bridge connection was failing where the network is slow.

Another solution which I made was to change line no. 368 as

rc = _mosquitto_socket_connect(context, context->bridge->addresses[context->bridge->cur_address].address, context->bridge->addresses[context->bridge->cur_address].port, NULL, true);

where now the socket connection is a blocking connect. And this solution worked, even if the network was slow.

I wanted to know if the second solution which I have provided is a proper one, or is there a better method to solve the problem. I suppose that originally the socket connection may have be done as non-blocking for some specific purpose.

Thanks

@ralight

This comment has been minimized.

Copy link
Contributor

commented Jul 16, 2017

Could you please try this again with 1.4.14? Setting the bridge to be a blocking connect isn't a great solution I'm afraid.

@gourish2k

This comment has been minimized.

Copy link
Author

commented Jul 17, 2017

Hello,
I have checked with version 1.4.14.
Here also as in case of 1.4.12 I had to make a blocking connect to make a bridge connection, with a non-blocking connection method the bridge connection does not succeed.

@taixi112

This comment has been minimized.

Copy link

commented Jul 25, 2017

My solution:
Modify the 656 lines of the netmosq.c file as follows
/*rc = */_mosquitto_socket_connect_step3 (mosq, host, port, bind_address, blocking).

@jsirpoma

This comment has been minimized.

Copy link

commented Feb 7, 2018

I have this same thing happening for 1.4.14. Windows 7 x64 and Windows 10.

@jsirpoma

This comment has been minimized.

Copy link

commented Feb 7, 2018

I have mosquitto_net_write for the CONNECT returning -1 and WSAGetLastError giving 10057 (WSAENOTCONN). This allows the CONNECT to proceed:

diff --git a/lib/net_mosq.c b/lib/net_mosq.c
index 063c4a2..3eabc32 100644
--- a/lib/net_mosq.c
+++ b/lib/net_mosq.c
@@ -889,7 +889,7 @@ int _mosquitto_packet_write(struct mosquitto *mosq)
 #ifdef WIN32
                                errno = WSAGetLastError();
 #endif
-                               if(errno == EAGAIN || errno == COMPAT_EWOULDBLOCK){
+                               if(errno == EAGAIN || errno == COMPAT_EWOULDBLOCK || errno == WSAENOTCONN){
                                        pthread_mutex_unlock(&mosq->current_out_packet_mutex);
                                        return MOSQ_ERR_SUCCESS;
                                }else{
@ralight

This comment has been minimized.

Copy link
Contributor

commented Sep 4, 2019

This is fixed for 1.6.5, thanks for the hints.

@ralight ralight closed this Sep 4, 2019
ralight added a commit that referenced this issue Sep 4, 2019
ralight added a commit that referenced this issue Sep 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.