Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WiFi Library does not always disconnect when signal is lost (when socket buffer of one client is full) #180

Closed
klausj1 opened this issue Jun 16, 2017 · 4 comments

Comments

@klausj1
Copy link

klausj1 commented Jun 16, 2017

Hi,

I use an Arduino Mega with a WiFi shield (bluefly from Watterott, Firmware version 19.5.2) to cyclically measure temperature and light and send these data to Microsoft Azure.

I had stability problems with the usage of the WiFi-Library 0.14.3 (the WiFi signal strength is far from perfect), and sometimes the communication to the router gets lost and is not reestablished again.

I added serial port logging to the WiFi Library and found out that the reason is that if a socket receives data which is not read by the client, almost every WiFi action is blocked. OK, this is documented, and most probably related to #32, but my problem is that I do not know how I can react in my sketch.

What happens:

  1. I send an HTTP-Request, and receive an answer. The receiving is done in this way (simplified):
    unsigned long startTime2 = millis();
    while (_client->available() && ((millis() - startTime2) < httpResponseTimeOut)) {

        char c = _client->read();
        Serial.print(c);
// do something, e.g. evaluate http return code (variable code)
    }

    if ((millis() - startTime2) >= httpResponseTimeOut)
    {
        Serial.println("HTTP request was not answered in time.");
        return 0;
    }

    return code;

From time to time, additional data is received later and the client gets available again, and the socket buffer of this client gets full. If I do something different with the WiFi in the meantime, e.g. transmit an NTP request and wait for the response, or sending something to another client via http, this fails (at least receiving), because in m2m_wifi.c / m2m_wifi_handle_events always returns M2M_ERR_FAIL, and consequently wifi_cb is not called:

	for (i = 0; i < MAX_SOCKET; ++i) {
		if (gastrSocketBuffer[i].flag && (*(gastrSocketBuffer[i].flag) & SOCKET_BUFFER_FLAG_FULL)) {
        (*(volatile unsigned char *)(0XC6)) = ('0' + i);
        return M2M_ERR_FAIL;
		}
	}

(as you can see, I added a log to the serial port - the number of the socket where the buffer is full is written).

In this situation, also a WiFi disconnection will not be recognized, because the method wifi_cb will never be called, and therefore _status will stay at WL_CONNECTED forever.

What I did: When I do not receive an expected response from reading (e.g. no UDP answer received, no HTTP answer received, I call WiFi.disconnect(); This does not help, because m2m_wifi.c / m2m_wifi_handle_events is still blocked.

So in the end, I did this:

    WiFi.disconnect(); // Try to repair with disconnect to Wifi
    Serial.println("Disconnecting done.");
    delay(500);
    Serial.print("WLAN State: ");
    Serial.println(WiFi.status());
    // Some strange error: When the WiFi signal is not perfect, sometimes the disconnect does not work and the connections stays in connected.
    // In this case, try to repair with WiFi.end().

    while (WiFi.status() == WL_CONNECTED)
    {
        Serial.println("WLAN State is still connected, waiting for disconnected!!!");
        delay(250);
        WiFi.disconnect(); // Try to repair with disconnect to Wifi
        Serial.print("WLAN State after disconnect: ");
        Serial.println(WiFi.status());
        if (WiFi.status() == WL_CONNECTED)
        {
            Serial.println("WLAN State is still connected (second try), calling WiFi.end()!!!");
            WiFi.end();
        }
    }

This helped, a disconnection is always recognized, and the reconnection works as well.

I also tried closing all sockets via WiFi._client[i]->stop(); before calling WiFiClass::disconnect(), but also in this case the disconnect did not work. I do not understand why - it looks like there is simply no interrupt fired (I log a '+' in m2m_hif.c / static void isr(void), and this does not show up on the serial port in this situation). Maybe this has something to do with trying to write to WiFi when SOCKET_BUFFER_FLAG_FULL is true. Only the WiFi.End() solved the problem.

I produced a (hopefully self-explaining) logfile. Flag: 3 means, that SOCKET_BUFFER_FULL is set:

"13.06.2017 22:00:19",Disconnecting WiFi ...
"13.06.2017 22:00:19",WiFiClass::disconnect(): Client Nr. 4 is active, client will be stopped. Socket number: 6 Flag: 3
"13.06.2017 22:00:19",WiFiClient::stop: Stopping socket 6
"13.06.2017 22:00:19",WiFiClass::disconnect(): Client Nr. 5 is active, client will be stopped. Socket number: 5 Flag: 1
"13.06.2017 22:00:19",WiFiClient::stop: Stopping socket 5
"13.06.2017 22:00:19",WiFiClass::disconnect(): Client Nr. 6 is active, client will be stopped. Socket number: -1 Flag: 0
"13.06.2017 22:00:19",WiFiClient::stop: Stopping socket -1
"13.06.2017 22:00:29",WiFiClass::disconnect: Could not disconnect!!!
"13.06.2017 22:00:29",Disconnecting done.
"13.06.2017 22:00:30",WLAN State: 3
"13.06.2017 22:00:30",Azure Client connection state: 0
"13.06.2017 22:00:30",Orf Client connection state: 0
"13.06.2017 22:00:30",WLAN State is still connected, waiting for disconnected!!!
"13.06.2017 22:00:30",WiFiClass::disconnect(): Client Nr. 4 is active, client will be stopped. Socket number: -1 Flag: 0
"13.06.2017 22:00:30",WiFiClient::stop: Stopping socket -1
"13.06.2017 22:00:30",WiFiClass::disconnect(): Client Nr. 5 is active, client will be stopped. Socket number: -1 Flag: 0
"13.06.2017 22:00:30",WiFiClient::stop: Stopping socket -1
"13.06.2017 22:00:30",WiFiClass::disconnect(): Client Nr. 6 is active, client will be stopped. Socket number: -1 Flag: 0
"13.06.2017 22:00:30",WiFiClient::stop: Stopping socket -1
"13.06.2017 22:00:40",WiFiClass::disconnect: Could not disconnect!!!
"13.06.2017 22:00:40",WLAN State after disconnect: 3
"13.06.2017 22:00:40",WLAN State is still connected (second try), calling WiFi.end()!!!
"13.06.2017 22:00:40",WiFiClient::stop: Stopping socket -1
"13.06.2017 22:00:40",WiFiClient::stop: Stopping socket -1
"13.06.2017 22:00:40",WiFiClient::stop: Stopping socket -1
"13.06.2017 22:00:40",WiFiClass::init: Initializing Wifi.
"13.06.2017 22:00:41",+-?WLAN State: 0

I see the following issues around this problem:

  • The returncode of m2m_wifi.c / m2m_wifi_handle_events is ignored all over the WiFi Library. Esp. long-lasting timeouts like
	while (!IS_CONNECTED && millis() - start < 20000) {
		m2m_wifi_handle_events(NULL);
	}

do not make sense if some SOCKET_BUFFER_FLAG_FULL is true. There should be a "big" warning written to the serial port when this situation happens.

  • This error code should be propagated back to the code using the libary
  • WiFiClass::disconnect() should close all the sockets before trying to disconnect the WiFi. This is done in wifi_cb when a disconnected-event is received (but only for WiFiClient, not for UDP and not for server), but to avoid the problem above all sockets should be closed before
  • I do not understand why SOCKET_BUFFER_FLAG_FULL is set to false only if the buffer is empty. Shouldn't it be set to false already if the free buffer is larger than the MTU-size?
  • Why doesn't WiFiClass::disconnect() work, when all sockets are closed?

I can do some more investigations (normally I can reproduce the problem in some minutes), but before I would like to get some feedback if I am on the right track and you are interested in a solution.

And can somebody help me with adding logs to m2m_*.c, esp. in the ISRs? Logging one character is more than nothing, but it does not work always (when you write too much, some characters are lost).

If somebody is interested, the WiFi-Library with additional logging is here.

My sketch is here, but I did not reduce it (yet), it is not sooo small.
ArduinoAzureIotTemperature.zip

Thanks for any help/comments,
Klaus

@sandeepmistry
Copy link
Contributor

Unfortunately, this is a consequence of the WINC1500 interface. If there's any data pending, the WINC1500 will not send any command responses until the host reads it and on the Mega there is limited RAM resources to buffer this data.

do not make sense if some SOCKET_BUFFER_FLAG_FULL is true. There should be a "big" warning written to the serial port when this situation happens.

I have mixed feels, if Serial.begin(...) is not called it could lock up the board.

@klausj1
Copy link
Author

klausj1 commented Oct 19, 2017

I know that the root cause of all these points is the WINC1500 interface.

But as far as I understood the code, the WiFi Library could at least recognize this situation and tell the application that nothing can be done until data from socket x are read (or the whole WiFi is disconnected/ended). Even if tracing to Serial is questionable, returncodes would be very helpful to allow the application to react on this situation.

I think that a major part of the stability problems could be related to this issue.

Regards,
Klaus

@klausj1
Copy link
Author

klausj1 commented Oct 19, 2017

BTW: With the changes described above (and the change from #182) my application is running without any problems since months :)

@sandeepmistry
Copy link
Contributor

Closing this since PR #204 has be merged, which based on @klausj1's feedback has resolved this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants