Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WiFi.status() == WL_CONNECTED although connection lost and WiFi.RSSI() == 0 #86

Closed
michael71 opened this issue Jul 31, 2016 · 30 comments · Fixed by #91
Closed

WiFi.status() == WL_CONNECTED although connection lost and WiFi.RSSI() == 0 #86

michael71 opened this issue Jul 31, 2016 · 30 comments · Fixed by #91

Comments

@michael71
Copy link

I'm using the MKR1000 with the WiFi101 lib (and WiFiiUDP) to send Multicast UDP messages. (which works fine in principle).

But after some time (this can be after 30 min or after 2..4 hours), the UDP messages are no longer sent to the network.

When this happens, the WiFi.status() is still equal to WL_CONNECTED (!), but WiFi.RSSI() is equal to 0 (I'm using a Serial(USB) debug output). Calling WiFi.begin() and upd.beginMulti(..) again does not result in a new, working connection. (and it happens if I connect a Serial-USB connection or not). Power supply is either via PC/USB or with the 800mAh battery - this also makes no difference.

thx, michael.

P.S.
I uploaded a similar test SW to both the

  • "samw25 xplained" board (using the "bare" ASF calls)
    and also to the
  • "feather-m0-wifi-atwinc1500" (which does not have the SAMW25 hardware, but a similar SAMD21+WINC1500 combination) and this board also uses an Arduino lib derived from WiFi101)

and both these boards do not seem to have this problem.

P.S.2
My WiFi RSSIs are in the -55 to -40 range - the AP is in the same room with the boards.

@trlafleur
Copy link

trlafleur commented Jul 31, 2016

This maybe related to what I see in issue: 80. Sometime I can run for a day or two, sometimes just for 10 minutes or less...

I'm testing with both the Feather M0 and MKR1000 board, same issue...

@sandeepmistry
Copy link
Contributor

Closed via #85.

Please try out the master version the library.

@sandeepmistry
Copy link
Contributor

Oops, closed the wrong issue, re-opening.

@sandeepmistry sandeepmistry reopened this Aug 11, 2016
@sandeepmistry
Copy link
Contributor

@michael71 could you please share an example sketch to reproduce the issue.

Also, it would be great if you could try out #77 (comment).

@michael71
Copy link
Author

michael71 commented Aug 12, 2016

here is my example code: (I started 3 MKR1000s yesterday evening - 2 are still running after 32000 seconds, 1 has "rssi=0" since 2 hours ago)

/*****************************************************************
 RSSI_Test.ino
 version for MKR 1000
 *****************************************************************/
#include <SPI.h>
#include <WiFi101.h>
#include <WiFiUdp.h>

#define _DEBUG      // with Serial output

//************** network constants *****************************
WiFiUDP Udp;                           // using UDP messages
const IPAddress lanbahnip(239, 200, 201, 250);  // lanbahn multicast IP address
const unsigned int lanbahnport = 27027;      // lanbahn port to listen on

String ssid = "lonstoke";
String pass = "mypassword";

char packetBuffer[255]; //buffer to hold incoming packet
#define BUF_LEN   80
char buffer[BUF_LEN];       // for message strings

//******* timers *********************************************
long announceTimer;

void setup() {

#ifdef _DEBUG
    Serial.begin(57600); 
    long t1 = millis();
    while (!Serial)  {
        delay(100);
    }

    Serial.println("RSSI_Test");

#endif

    updateBuffer();
    announceTimer = millis(); // reset timer
    connectToWiFi();  
}

void connectToWiFi() {

#ifdef _DEBUG
    Serial.print("trying connect to ssid=");
    Serial.println(ssid);
#endif

    // attempt to connect to Wifi network:
    while (WiFi.status() != WL_CONNECTED)  {
        // Connect to WPA/WPA2 network:
        WiFi.begin(ssid, pass);
        delay(5000);  // wait

        Udp.beginMulti(lanbahnip, lanbahnport);
    // the issue happens with AND without "DEEP_AUTOMATIC"
        m2m_wifi_set_sleep_mode(M2M_PS_DEEP_AUTOMATIC, 1);

    }

    if (WiFi.status() == WL_CONNECTED) {
#ifdef _DEBUG
        Serial.print("successfully connected to ");
        Serial.println(ssid);
#endif
    }
}

void updateBuffer() {
    IPAddress ip = WiFi.localIP();
    int rssi = WiFi.RSSI();
    long secs = millis() / 1000;
    sprintf(buffer, "A RSSI_TEST_2 %d.%d.%d.%d %d %d", ip[0], ip[1], ip[2],
            ip[3], rssi, secs);
}

void readUdp() {
    if (WiFi.status() != WL_CONNECTED)
        return; // >>>>>>

    int packetSize = Udp.parsePacket();
    if (packetSize)  // read packet
    {
        // read the packet into packetBufffer
        int len = Udp.read(packetBuffer, 255);
        if (len > 0)
            packetBuffer[len] = 0;
    }
}

void sendAnnounceMessage() {
    if ((WiFi.status() == WL_CONNECTED) && (WiFi.RSSI() > -80)
            && (WiFi.RSSI() != 0)) {
        // wifi is o.k., we can send a message
        updateBuffer();
        // send lanbahn announce packet
        Udp.beginPacket(lanbahnip, lanbahnport);
        Udp.write(buffer);
        Udp.endPacket();

    } else {

#ifdef _DEBUG
        Serial.println("ERROR: wifi connection lost, rssi low or zero");
        Serial.print("status=");
        Serial.println(WiFi.status());
        Serial.print("rssi=");
        Serial.println(WiFi.RSSI());
#endif
    }
}

void loop() {

    readUdp();

    // send the announce string every 20sec
    if ((millis() - announceTimer) >= 20000) {
        announceTimer = millis();
        sendAnnounceMessage();
#ifdef _DEBUG
        Serial.println(buffer);
#endif
    }

}

@michael71
Copy link
Author

I will now start a test with the "Wifi101-Socket-Buffer" lib.

@michael71 michael71 reopened this Aug 12, 2016
@michael71
Copy link
Author

sorry, hit the wrong button

@sandeepmistry
Copy link
Contributor

@michael71 thanks. I don't see a loop function in the sketch you provided.

@michael71
Copy link
Author

sorry, sandeepmistry, I added the loop() to the listing ...

@michael71
Copy link
Author

I did some more long term experiments, with the #77 lib and with the standard WiFi101 library.

I experienced rssi=0 both with and without the "#77" patch - and with and without running the board from battery or from a 5V-USB power source. 1 board still runs o.k. after 194700 seconds - others stop sending multicast UDP messages to the network (with WiFi.status() = WL_CONNECTED and WiFi.rssi()=0 ) after only 14000 secs. (I also had "Serial" used and "Serial" not used - also made no difference)

So I cannot (with my 4 MKR1000s) conclude a root cause for this phenomenon. It happens with different (attached) hardware, with different power supplies and with different WiFi101 libs (see above).

However, reconnecting (I have a "config" mode in my full SW, with a "reconnect" command) seems to work, therefore I will call this "reconnect" function whenever RSSI is zero:

reconnect() {
  WiFi.begin(ssid, pass);
  wifiRetries++;
  delay(1000);  // wait 1 seconds
  Udp.beginMulti(lanbahnip, lanbahnport);
  m2m_wifi_set_sleep_mode(M2M_PS_DEEP_AUTOMATIC, 1);
  Serial.print("trying reconnect, millis=");
  Serial.println(millis());
  Serial.print("#retries=");
  Serial.println(wifiRetries);
  delay(2000);
}

(I also added a watchdog with 8secs, reset in my loop() function - but this doesn't seem to reset the processor - because the "loop()" just runs fine all the time.)

@sandeepmistry
Copy link
Contributor

Thanks, I'll setup a similar test here to see if I can reproduce it. Just to confirm, I should be able to reproduce if I only setup 2 MKR1000's to broadcast?

For now, let's avoid using the m2m_wifi_set_sleep_mode(M2M_PS_DEEP_AUTOMATIC, 1); since there's no official API for it we haven't really tested deep sleep mode.

@trlafleur
Copy link

The re connection process, is working on my test code, but if an external UDP or WEB access was process, then they are dropped. This requires the external device to re establish the connection... Not good.

This is only a PATCH, we need to discover the root cause of these issues...
My issue #80 look to be similar to this.

I have connections that run fine for a few days, other drop in 2 minutes...

thanks

@sandeepmistry
Copy link
Contributor

Looking at WiFi::RSSI():

int32_t WiFiClass::RSSI()
{
    // Clear pending events:
    m2m_wifi_handle_events(NULL);

    // Send RSSI request:
    _resolve = 0;
    if (m2m_wifi_req_curr_rssi() < 0) {
        return 0;
    }

    // Wait for connection or timeout:
    unsigned long start = millis();
    while (_resolve == 0 && millis() - start < 1000) {
        m2m_wifi_handle_events(NULL);
    }

    return _resolve;
}

I'm wondering if the request for the RSSI is timing out. This might be caused by the socket RX queue getting full and blocking all communication with the WINC1500 firmware for all other operations until it is cleared. I'll try some tests where I setup a UDP socket on a board and blast some UDP packets from my Mac.

@trlafleur
Copy link

If you could enable the debug print facilities in the stack, it might help in tracing this issue.

~~ _/) ~~~~ _/) ~~~~ _/) ~~~~ _/) ~~

On Aug 16, 2016, at 12:34 PM, Sandeep Mistry notifications@github.com wrote:

Looking at WiFi::RSSI():

int32_t WiFiClass::RSSI()
{
// Clear pending events:
m2m_wifi_handle_events(NULL);

// Send RSSI request:
_resolve = 0;
if (m2m_wifi_req_curr_rssi() < 0) {
    return 0;
}

// Wait for connection or timeout:
unsigned long start = millis();
while (_resolve == 0 && millis() - start < 1000) {
    m2m_wifi_handle_events(NULL);
}

return _resolve;

}
I'm wondering if the request for the RSSI is timing out. This might be caused by the socket RX queue getting full and blocking all communication with the WINC1500 firmware for all other operations until it is cleared. I'll try some tests where I setup a UDP socket on a board and blast some UDP packets from my Mac.


You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.

@sandeepmistry
Copy link
Contributor

If you could enable the debug print facilities in the stack, it might help in tracing this issue.

I'll push a branch with them enabled, it's not super useful from what I remember unfortunately.

@sandeepmistry
Copy link
Contributor

Info. on enabling debug mode on SAMD (MKR1000) can be found here: #83 (comment)

@michael71
Copy link
Author

thanks for the DEBUG prints - however, besides start up messages this is the only debug-message within a few hours of operation:

(APP)(ERR)[nm_clkless_wake][174]clocks still OFF. Wake up failed
(APP)(ERR)[nm_clkless_wake][174]clocks still OFF. Wake up failed
(APP)(ERR)[nm_clkless_wake][174]clocks still OFF. Wake up failed
(APP)(ERR)[nm_clkless_wake][174]clocks still OFF. Wake up failed
(APP)(ERR)[nm_clkless_wake][174]clocks still OFF. Wake up failed
(APP)(ERR)[nm_clkless_wake][174]clocks still OFF. Wake up failed

this messages appeared during (what I thought) normal operation.

@sandeepmistry
Copy link
Contributor

@michael71 @trlafleur another branch for you to try out for me: https://github.com/sandeepmistry/WiFi101/tree/samd-debug-enable_socket-buffer-experiments

This should prevent the RSSI from being set to zero if there is pending received UDP data.

@michael71 please also see my reply in #86 (comment) - did do you disable deep sleep mode?

@michael71
Copy link
Author

michael71 commented Aug 18, 2016

thanks for the new test version - any special messages/events which I should look for?

Concerning the deep sleep: I made some tests with and some without "deep sleep" - however, I experienced the same result (after some hours: rssi=0) with both of them, therefore I don't think the problem is located there. (but I can switch off "deep sleep" for the new tests, just to exclude a problem with this mode)

@michael71
Copy link
Author

michael71 commented Aug 18, 2016

and here we already have a nice ERROR:

ERROR: wifi connection lost, rssi low or zero // output from my program
rss=0 // output from my program

(APP)(DBG)[hif_send][414]Failed to alloc rx size
(APP)(DBG)[hif_send][414]Failed to alloc rx size
(APP)(DBG)[hif_send][414]Failed to alloc rx size
(APP)(DBG)[hif_send][414]Failed to alloc rx size
(APP)(DBG)[hif_send][414]Failed to alloc rx size
(APP)(DBG)[hif_send][414]Failed to alloc rx size
(APP)(DBG)[hif_send][414]Failed to alloc rx size
(APP)(DBG)[hif_send][414]Failed to alloc rx size
..... // and some more of the same
(APP)(INFO)POWER SAVE 3

(this was still with deep sleep enabled, will now repeat the test without deep sleep)

@michael71
Copy link
Author

this "Failed to alloc rx size" is also reported once right after the start of the WiFi module:

trying connect to ssid=lonstoke pass=*** #=0 // output of my program

(APP)(DBG)[hif_send][414]Failed to alloc rx size
(APP)(INFO)Chip ID 1503a0
(APP)(DBG)[nm_spi_init][745][nmi spi]: chipid (001003a0)
(APP)(DBG)[wait_for_firmware_start][536]ffff0000 ffff0000 2
(APP)(DBG)[wait_for_firmware_start][536]ffff0000 ffff0000 2
.... some more of these
.....
(APP)(INFO)Firmware ver : 19.4.4
(APP)(INFO)Min driver ver : 19.3.0
(APP)(INFO)Curr driver ver: 19.3.0
(APP)(DBG)[socket][589]1 Socket 7 session ID = 1
NO deep sleep // output of my program
....

@michael71
Copy link
Author

michael71 commented Aug 18, 2016

After 44000 seconds the MKR1000 is still running well (with https://github.com/sandeepmistry/WiFi101/tree/samd-debug-enable_socket-buffer-experiments and without deep sleep). However, another MKR1000 with the original WiFi101 SW and with deep-sleep enabled is also running for so long.

@sandeepmistry
Copy link
Contributor

@michael71 do you have any more updates on your testing?

@michael71
Copy link
Author

I had to stop the test at ~160000 seconds (two days) without any problems with the software. I can restart it today to see if I can confirm this.

@michael71
Copy link
Author

the WiFi101/samd-debug-enable_socket-buffer-experiments software looks very promising, @sandeepmistry !
After 120000 secs the MKR1000 is still running well! 3 other MKR1000's with the original WiFi101 lib stopped running at ~11000 secs, 24000 secs and 29000 secs. (all 4 connected to the same network and all transmitting and receiving multicast UDP messages)

@sandeepmistry
Copy link
Contributor

@michael71 thanks for the feedback. I'll submit a PR with just sandeepmistry@c2be9f6 for more testing. Stay tuned ...

It will have the debug output disabled just to make sure that wasn't affecting the timing.

@mmattocks
Copy link

I am getting this exact issue with WiFi101 0.14.3, WINC1501 19.5.2 firmware on a WINC1500 breakout connected to an Arduino Mega. WiFi.RSSI() called once in the program loop (loop time ~1 second) guarantees dropping off the network with WiFi.status() == 3 and RSSI == 0 within 2 hours. When I removed the WiFi.RSSI() call I immediately was able to extend uptime to 12+ hours. It seems like this was not completely resolved. It should probably be noted in the documentation that WiFi.RSSI() should not be called in the program loop.

@Keith-OSU
Copy link

Keith-OSU commented Sep 29, 2017

I'm also experiencing this problem WiFi.status() == 3 and RSSI == 0. I've attempted to add code to reset the connection if this condition exists, but I'm unable to reset using WiFi.end(), or WiFi.disconnect(). Ultimately, it would be nice if this event never occurred, or would automatically trigger some sort of reset, or there would be a way for me to reset the connection. Anyone have a suggestion?

BTW... I have 10 devices running 24/7 and they all exhibit this symptom. Generally, they need to be manually rebooted between a few times a day, to every couple days.

Adafruit Feather MO board, 19.5.2 firmware, WiFi101 0.14.3

@michael71
Copy link
Author

michael71 commented Sep 29, 2017

Keith,
I'm resetting the Wifi part of the WINC1500 with the following code sequence:
WiFi.end(); nm_bsp_reset();
this resets only the wifi part of WINC1500, see #118

Then I reconnect to the WPA/WPA2 network:

WiFi.begin(ssid, pass);

This fixed the "RSSI=0" problem for me - without having to reset the main processor.

@Keith-OSU
Copy link

Michael,

Thanks for the tip. This does seem to work. I've added code in various locations to check for a non-responsive network connection, then call the reset code you suggest. This solves the problem, but I wish the problem didn't exist in the first place.

-- Keith

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants