Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework socket buffer layer to prevent/reduce locks up #204

Merged

Conversation

sandeepmistry
Copy link
Contributor

Similar idea to #77 but more focused on preventing/reducing lock ups. On the MKR1000 (or boards with higher RAM), there should be no lock ups now, AVR based boards reduced lock ups.

Related issues: #32, #36, #89, #180, #182.

I think the following PR's can be closed if this is merged, as this will resolve similar issues to them: #200, #202.

Needs thorough testing before merging.

@sandeepmistry
Copy link
Contributor Author

Hi @klausj1, if you have some time if would be great if you can test this and provide your feedback. I think it fixes #182. I think the sends were being blocked because there was some pending recv data from the WINC1500. The timeout you proposed in #202 is effective but this will fail the write if there is a partial recv pending and the write has no buffers available.

@sandeepmistry
Copy link
Contributor Author

@drp0 has tested this an confirms it improves stability with #36 in #36 (comment)

@klausj1
Copy link

klausj1 commented Dec 8, 2017

@sandeepmistry, yes, I will do a test. Latest after christmas, hopefully before. I assume #180 should also be solved, correct?

@klausj1
Copy link

klausj1 commented Dec 8, 2017

@sandeepmistry, OK, I did a quick test just now:

  • Arduino Mega 2560
  • WiFi-Shield firmware updated to 19.5.4
  • To be sure: Test with "my" old, changed WiFi Library -> fine
  • Test with the official 0.14.5-version -> fine

Test with sandeepmistry:socket-buffer-winc1500:

  • Opening connection to WLAN -> fine
  • Fetching the time via UDP -> fine
  • Sending HTTP-Request via first client (and read complete answer) -> fine
  • Sending second HTTP-Request via first client (and read complete answer) -> fine
  • Sending HTTP-Request via second client -> this always (after each restart of the sketch/arduino) hangs. The client can connect, the client gets available, but then no data are received
  • As no data are received, I close the Wifi with WiFi.disconnect();
  • After the disconnect, (WiFi.status() == WL_CONNECTED) is still true
  • If I call WiFi.end();, the connection is disconnected
  • But all further attempts to reconnect fail, uint8_t wlanState = WiFi.begin(ssid, pass); hangs for 60 seconds

The good thing is that it seems to be 100% reproducable, no need to wait for hours or days.

I'll try to make a reduced sketch, or do you have a better idea?

Klaus

@klausj1
Copy link

klausj1 commented Dec 8, 2017

Some more info: The problem was not that the HTTP-Request from the second client was not answered, but with the new library the request takes >30 seconds, with the old one it takes <4 seconds. And I only wait for 10 seconds before I assume that there is no answer. So the above behaviour should be easily reproducable by fetching a not too small website with HTTP get (I use http://wetter.orf.at/burgenland/), and disconnecting the WIFI while data are still pending.

I read the data with

    while (_client->available() && ((millis() - startTime2) < httpResponseTimeOut)) {

        wdt_reset();
        //Serial.print(",");
        char c = _client->read();
        if (mPrintResponse) Serial.print(c);

        if (c == ' ' && !inStatus) {
            inStatus = true;
        }

        if (inStatus && i < 3 && c != ' ') {
            statusCode[i] = c;
            i++;
        }
        if (i == 3) {
            statusCode[i] = '\0';
            code = atoi(statusCode);
        }

        if (httpBody) {
            if (response != NULL) response->concat(c);
        }
        else
        {
            if (c == '\n' && currentLineIsBlank) {
                httpBody = true;
            }

            if (c == '\n') {
                // you're starting a new line
                currentLineIsBlank = true;
            }
            else if (c != '\r') {
                // you've gotten a character on the current line
                currentLineIsBlank = false;
            }
        }
    }

    if ((millis() - startTime2) >= httpResponseTimeOut)
    {
        Serial.println("HTTP request was not answered in time.");
        return 0;
    }

Do you need a reduced sketch, or is this info enough?

Klaus

@sandeepmistry
Copy link
Contributor Author

Hi @klausj1,

Thanks for testing! Yes, #180 should be solved with this (see the changes in WiFiClient::write which calls WiFiSocket::write now).

If you have a full sketch to reproduce the issue you shared, that would be great. I'll try to re-create later today if I don't hear back.

@sandeepmistry
Copy link
Contributor Author

Hi @klausj1,

I've pushed another change in 0159bf8 to discard all socket data when WiFi.disconnect() or WiFi.end() occurs.

I think this will improve behaviour in the setup you mentioned. Looking forward to your feedback and full example sketch as well. Thanks.

@klausj1
Copy link

klausj1 commented Dec 8, 2017

Thanks, now the disconnect works.
But the problem that doing an HTTP get is almost 10 times slower than before is still there. Will you have a look at this? Do you need the example sketch for this problem?

@klausj1
Copy link

klausj1 commented Dec 9, 2017

I made a testprogram showing that HTTP get is 10 times slower, and that a call to a second WiFi-Client still does not deliver data (client does not get available) if there is unread data from the first WiFi-Client (I am not sure if this behavioud is intended with this version or not, probably it is intended).

The sketch does 4 tests:

  1. HTTP get without timeout, measure needed time, call HTTP get with a second client
  2. HTTP get, interrupt reading from socket before all data is read, call WiFi.disconnect() and (if disconnect() does not work) WiFi.end(), reconnect, call HTTP get with a second client
  3. HTTP get, interrupt reading from socket before all data is read, call WiFi.disconnect(), reconnect, call HTTP get with a second client
  4. HTTP get, interrupt reading from socket before all data is read, call HTTP get with a second client

With the original 0.14.5-version, already the second test "hangs" - WiFi.Begin() does not work anymore

With the new version, test 4 cannot read data from the second client, either the client does not get available, or (if the client is not connected anymore) connect() does not work. If I call WiFi.disconnect() afterwards and try again, everything works again.

I attach the sketch (sketch.zip) and the logfiles for original and new WiFi-Library (logs.zip):

sketch.zip
logs.zip

Can you fix 10 slow HTTP get-times? I would like to wait for a long-time-test until this is fixed.

Regards,
Klaus

@hoblins
Copy link

hoblins commented Dec 13, 2017

With this branch my application sending post requests to a web server is working without any problems now for days on a SAMD21.
See #182, #178 and #202.
The only noticeable thing is, that
WiFiSSLClient.connect()
returns false quite often.

UPDATE: Now it is stable running for about 3 weeks.

@sandeepmistry
Copy link
Contributor Author

Hi @hoblins,

Thank you for taking the time to test these changes and provide your feedback!

If you have a (minimal) sketch to reproduce the WiFiSSLClient::connect(...) failing, I would be happy to take a look.

@klausj1
Copy link

klausj1 commented Dec 17, 2017

Hi,

with the LED-changes HTTP get is fast again :)

After some rework my sketch works again.

The behaviour of reading from the socket has changed a little bit, or probably my implementation (copied from some restclient-library) was wrong, but with the old Wifi-Libraries it worked.
In the old version i did the following:

while (_client->available() && ((millis() - startTime2) < maxTime)) {
// read everything until client is not anymore available or timeout is reached
// ...
}

This does not work anymore, because while reading the response from HTTP get, the client becomes unavailable for some (typically 3-5) milliseconds, and then comes available again.

So I changed my implementation a little bit, and now it works as before the LED-changes, but much faster :)

I assume my old implementation was simply not correct. Right? Hopefully not too many people did the same mistake as I did.

I'll start a long-time test now.

Klaus

@sandeepmistry
Copy link
Contributor Author

Hi @klausj1,

Thanks for trying this again, we'll most likely be merging and releasing these changes in January.

I assume my old implementation was simply not correct. Right? Hopefully not too many people did the same mistake as I did.

Yes, available() is not guaranteed to return > 0 while connected. I'll take a quick look to see if things can be sped up so it returns non-zero faster. However, I would strongly suggest sketches or libraries do not rely on this behaviour. There is also an ArduinoHttpClient library that you might be interested in.

when buffer is empty and recv/recvfrom is called on the socket, makes
it a bit less likely WiFiSocketClass::available() returns 0 if data is
in the pipeline.
@sandeepmistry
Copy link
Contributor Author

I did a quick test with a modified version of https://github.com/arduino-libraries/WiFi101/blob/master/examples/WiFiWebClient/WiFiWebClient.ino which does a GET to http://arduino.cc/asciilogo.txt

client.available() returned 0 for 5 to 7 iterations of loop() until the initial response datawas received. With the small change above in 6a47f6e WiFi.available() does not return zero mid-response reading, previously I saw it once on the WiFi101 + Uno. On the MKR1000, since it's a bit faster client.available() returns 0 many times in loop before data is received and once mid-response body with and without he change above.

@hoblins
Copy link

hoblins commented Dec 18, 2017

@sandeepmistry

If you have a (minimal) sketch to reproduce the WiFiSSLClient::connect(...) failing, I would be happy to take a look.

As I connect to a private server I would like to avoid spreading the code. If that could help, I could try to minimize the sketch for reading the code and/or generate some log files.

@klausj1
Copy link

klausj1 commented Dec 20, 2017

My sketch is running for ~3 days now, without any problem :) With the original version, it survived just 2-3 hours.

One more question: If the buffer is full (e.g. because data from one client are not read yet), and then the WLAN connection goes down, should the automatic disconnect work (i.e., will WiFi.Status() deliver not connected, or will WiFi.Status() still dliver connected, because of the full buffer)?

@sandeepmistry
Copy link
Contributor Author

@hoblins

As I connect to a private server I would like to avoid spreading the code. If that could help, I could try to minimize the sketch for reading the code and/or generate some log files.

Understood, minimal sketch and log files are better than nothing, but would be slower. Any chance you can reproduce the issue with a public server?

@sandeepmistry
Copy link
Contributor Author

Hi @klausj1,

My sketch is running for ~3 days now, without any problem :) With the original version, it survived just 2-3 hours.

Excellent news!

One more question: If the buffer is full (e.g. because data from one client are not read yet), and then the WLAN connection goes down, should the automatic disconnect work (i.e., will WiFi.Status() deliver not connected, or will WiFi.Status() still dliver connected, because of the full buffer)?

On AVR boards it would still return connected until all the data in the queue is processed, so up until the last 64 bytes of the socket data chunk. SAMD boards do that have this issue with these changes.

We requested an additional API from Microchip/Atmel to cap the maximum chunk size, but have not heard any news on this.

@sandeepmistry sandeepmistry changed the title [DO NOT MERGE] Rework socket buffer layer to prevent/reduce locks up Rework socket buffer layer to prevent/reduce locks up Dec 20, 2017
@hoblins
Copy link

hoblins commented Dec 28, 2017

The only noticeable thing is, that
WiFiSSLClient.connect()
returns false quite often.

With my new router I can't reproduce the error anymore. So I think was some error with my old AVM Fritzbox 7270.

@sandeepmistry
Copy link
Contributor Author

Many thanks for @klausj1, @hoblins, and @drp0 for testing these changes and providing their feedback. These changes will be included in an upcoming WiFi101 library release.

@klausj1
Copy link

klausj1 commented Jan 6, 2018

Yesterday my sketch blocked after ~20 days. I do not have a serial log, but from the LED state I can see that it hangs in reading the time via UDP (code from this function see below, it hangs between LEDsetAction(timeLED, HIGH); and LEDsetAction(timeLED, LOW);, because the timeLED was on).

I assume it hangs when WLAN is turned off during or shortly before calling this function. The blue WLAN LED on the WLAN shield was on, so somehow the disconnect was not realized.

Maybe you have the chance to find something. Here is the code (again, I disconnect the WiFi if I do not get an answer, trying a "self-repair"):

time_t getNtpTime()
{
    Serial.println("Fetching current time from internet... ");
    if (WiFi.status() != WL_CONNECTED) {
        Serial.println("No WiFi connection, aborting.");
        return 0;
    }
    LEDsetAction(timeLED, HIGH);

    Serial.print("WLAN State: ");
    Serial.println(WiFi.status());

    Udp.begin(localPort);
    delay(100);

    IPAddress ntpServerIP; // NTP server's ip address
    int tries = 0;
    int bytesInBuffer = 0;

    unsigned long startTime;
    unsigned long udpTimeOut = 1500;

    while (tries < 5)
    {
        if (WiFi.status() != WL_CONNECTED)
        {
            LEDsetAction(timeLED, LOW);
            Udp.stop();
            return 0;
        }

        // discard any previously received packets 
        unsigned long startTime = millis();

        bytesInBuffer = Udp.parsePacket();
        wdt_reset();
        while (bytesInBuffer > 0 && ((millis() - startTime) < udpTimeOut))
        {
            Serial.print("reading previously received packets ... Count: ");
            Serial.print(bytesInBuffer);
            if (bytesInBuffer > NTP_PACKET_SIZE) bytesInBuffer = NTP_PACKET_SIZE;
            Udp.read(packetBuffer, NTP_PACKET_SIZE);
            bytesInBuffer = Udp.parsePacket();
            delay(10);
        }
        if ((millis() - startTime) >= udpTimeOut)
            Serial.println("Could not finish reading previously received packets.");

        if (WiFi.status() != WL_CONNECTED)
        {
            LEDsetAction(timeLED, LOW);
            Udp.stop();
            return 0;
        }

        Serial.println("Transmit NTP Request");
        // get a random server from the pool
        WiFi.hostByName(ntpServerName, ntpServerIP);
        Serial.print(ntpServerName);
        Serial.print(": ");
        Serial.println(ntpServerIP);
        sendNTPpacket(ntpServerIP);
        startTime = millis();
        while (millis() - startTime < udpTimeOut) {
            int size = Udp.parsePacket();
            //Serial.print("NTP size: ");
            //Serial.println(size);
            if (size >= NTP_PACKET_SIZE) {
                Serial.println("Receive NTP Response");
                Udp.read(packetBuffer, NTP_PACKET_SIZE);  // read packet into the buffer
                unsigned long secsSince1900;
                // convert four bytes starting at location 40 to a long integer
                secsSince1900 = (unsigned long)packetBuffer[40] << 24;
                secsSince1900 |= (unsigned long)packetBuffer[41] << 16;
                secsSince1900 |= (unsigned long)packetBuffer[42] << 8;
                secsSince1900 |= (unsigned long)packetBuffer[43];
                if (secsSince1900 == 0)
                {
                    Serial.print("Wrong time (zero) received, ignoring time.");
                }
                else
                {
                    time_t t = secsSince1900 - 2208988800UL; // 2208988800UL: difference between 1900 (NTP) and 1970 (unix time)
                    Serial.print("Currenttime: ");
                    Serial.println(currentTime);
                    if ((currentTime < 86400*365) || (abs(t - currentTime) < 86400))
                    {
                        Serial.println("Setting time.");
                        LEDsetAction(timeLED, LOW);
                        Udp.stop();
                        return t;
                    }
                    if (abs(t - currentTime) >= 86400)
                    {
                        Serial.print("Wrong time received, time difference to current time is too high. Ignoring time.");
                    }
                    Serial.print("NTP: Should not be here!!!");
                }
                delay(100);
            }
            else delay(10);
        }
        tries++;
    }
    wdt_reset();
    Serial.println("No valid NTP Response after 5 times :-(");
    Serial.print("WLAN State: ");
    Serial.println(WiFi.status());
    Udp.stop();

    WlanDisconnect();

    LEDsetAction(timeLED, LOW);
    return 0; // return 0 if unable to get the time
}

@sandeepmistry
Copy link
Contributor Author

@klausj1 please create a new issue to track what you mentioned in #204 (comment). Since this PR is merged it would be easier to track in a new issue.

@drp0
Copy link

drp0 commented Jan 15, 2018 via email

@drp0
Copy link

drp0 commented Jan 15, 2018

I had a bit of time, so I now have the callback working:

  1. WiFi.cpp static removed from static void socket_cb(SOCKET sock, uint8 u8Msg, void *pvMsg)

  2. WiFi.h uint32_t _resolve; moved to public

  3. In sketch:
    pre-declare:

String theclient = ""; // callback returned remote ip and port

void socket_cb(SOCKET sock, uint8 u8Msg, void *pvMsg);

static void myResolve_cb(uint8_t * /* hostName */, uint32_t hostIp) {
WiFi._resolve = hostIp;
}
void mySocketBufferCb(SOCKET sock, uint8 u8Msg, void *pvMsg) {
socket_cb(sock, u8Msg, pvMsg); // call the original callback
if( u8Msg == SOCKET_MSG_ACCEPT) {
tstrSocketAcceptMsg *pstrAccept = (tstrSocketAcceptMsg *)pvMsg;
rip = pstrAccept->strAddr.sin_addr.s_addr;
rport = pstrAccept->strAddr.sin_port;
theclient = String(rip[0]) + "." + String(rip[1]) + "." + String(rip[2]) + "." + String(rip[3]) + ":" + String(rport);
}
}

theclient contains the latest ip and port

David

@sandeepmistry
Copy link
Contributor Author

Discussion from #204 (comment) continued in #212

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants