Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

list of detected wifi access points is empty #206

Open
bbinet opened this issue Apr 9, 2018 · 45 comments
Open

list of detected wifi access points is empty #206

bbinet opened this issue Apr 9, 2018 · 45 comments
Assignees
Labels

Comments

@bbinet
Copy link

bbinet commented Apr 9, 2018

We often have the following case: when the rpi starts, it correctly connects to the wifi access point (that was previously configured using wifi-connect captive portal), but after a few minutes or a few hours, the wifi-connect is starting (probably because the connection was temporarily lost), and the captive portal shows an empty list of detected wifis.

But we have a lot of wifi access points around, so there must be an issue here as the list should not be empty...

The rpi never connects back to our initial wifi access point, until we manually reboot the device.

Here is my start.sh script:

$ cat /usr/src/app/start.sh
#!/usr/bin/env bash
export DBUS_SYSTEM_BUS_ADDRESS=unix:path=/host/run/dbus/system_bus_socket
while true
do
    iwgetid -r
    if [ $? -ne 0 ]; then
        printf 'Starting WiFi Connect\n'
        ./wifi-connect --portal-ssid helioslite-tracker --portal-passphrase XXX --activity-timeout 600
    fi
    sleep 10
done

Here are the logs of NetworkManager service:
https://gist.github.com/bbinet/d0ed2e5465824ef2594589b383cc5f93#file-networkmanager-log

Here are the logs of the container that runs wifi-connect:
https://gist.github.com/bbinet/d0ed2e5465824ef2594589b383cc5f93#file-wifi-connect-log

Front logo Front conversations

@bbinet
Copy link
Author

bbinet commented Apr 9, 2018

I'm happy to provide more details as I can connect to the rpi through the wifi-connect access point, so if think some other logs might be helpful to investigate this issue, please tell me.

Also do you know a temporary workaround that I can run to force network manager to reconnect to my initial wifi access point, so that I don't need to hard reboot the device?

@majorz
Copy link
Collaborator

majorz commented Apr 9, 2018

Hi, I will check the logs now and let you know if I find anything.

@majorz
Copy link
Collaborator

majorz commented Apr 9, 2018

The wifi-connect logs definitely reveal an issue: it should not start the access point when no SSIDs are available.

The empty SSIDs list is another issue that we need to look into. I am thinking about elevating the log level of NetworkManager and wpa_supplicant, and running some diagnostic code that we can try a few things with it. I will do some research tomorrow on this and let you know how that goes.

I had a non-related to WiFi Connect report about RPi 3 and NetworkManager not reporting any networks, so I am definitely interested in figuring out how to workaround this issue without reboots or driver reloads.

@majorz majorz self-assigned this Apr 9, 2018
@majorz majorz added the bug label Apr 9, 2018
@bbinet
Copy link
Author

bbinet commented Apr 9, 2018

Ok, thanks

@majorz
Copy link
Collaborator

majorz commented Apr 10, 2018

@bbinet I started implementing this here: https://github.com/resin-io-playground/networkmanager-empty-ssids

It elevates the log levels of NetworkManager and wpa_supplicant, and then starts to check whether the SSID list is empty each 30 seconds. If the list is empty it triggers a rescan and waits 10 seconds to check whether they are available.

I am going to do local tests with it tomorrow and hopefully I will be able to reproduce an empty list. I am thinking to try it using a router and also a hotspot on my phone.

If I am not able to reproduce, I may ask you to run the script on a device that has this issue. But let's see first what will happen with the local testing. I will keep you posted.

@bbinet
Copy link
Author

bbinet commented Apr 11, 2018

Sure, I'm also running your ssids.py script on my rpi zero w since this morning, so I let you know the results too.

@majorz
Copy link
Collaborator

majorz commented Apr 11, 2018

I was able to reproduce zero access points available locally. The RequestScan method brought back the access point list, which is good news. I will check the logs now and do more testing.

@bbinet
Copy link
Author

bbinet commented Apr 11, 2018

I've also been able to reproduce the issue.

Here are the logs of the ssids.py script:
https://gist.github.com/bbinet/e53e33236a9862bb2c70a4c6de4bbf36#file-ssids-log

Here are the logs of NetworkManager service:
https://gist.github.com/bbinet/e53e33236a9862bb2c70a4c6de4bbf36#file-nm-log

Here are the logs of the container that runs wifi-connect:
https://gist.github.com/bbinet/e53e33236a9862bb2c70a4c6de4bbf36#file-container-log

@majorz
Copy link
Collaborator

majorz commented Apr 11, 2018

@bbinet that is really useful! In your case the RequestScan call did not bring back the list with SSIDs, which is more like the actual problem, than what I was able to reproduce here.

I still have not got to chance to check the NM logs you attached for the specifics, but I will do that first thing tomorrow and will follow up with you. Thanks again!

@majorz
Copy link
Collaborator

majorz commented Apr 12, 2018

@bbinet I did different can on the instrumentation script: if the RequestScan does not work, it now tells NM to unmanage and then manage the device again. If that does not work, it then restarts the NM service. I also added timestamped logging.

Can you please run this again, but this time do not start WiFi Connect together with the script (assuming that the connection profile is defined)? Also you may reboot the device after the application update, so that we get only fresher logs.

If you reproduce something, please attach the logs of wpa_supplicant as well. Thanks!

@bbinet
Copy link
Author

bbinet commented Apr 12, 2018

@majorz but if I don't run WiFi Connect together with the script, I won't be able to log in to get the logs...
Why not run WiFi Connect together with the script?

@majorz
Copy link
Collaborator

majorz commented Apr 12, 2018

Actually now on second thought, you may leave it running, it should not be a problem.

@bbinet
Copy link
Author

bbinet commented Apr 12, 2018

@majorz Let's run it

@majorz
Copy link
Collaborator

majorz commented Apr 12, 2018

@bbinet one thing I forgot to mention is that I changed the implementation to use libnm as a higher level alternative to using dbus directly, so the Dockerfile has changed.

@majorz
Copy link
Collaborator

majorz commented Apr 12, 2018

@bbinet I found an issue with this version and applied a fix. Can you please run the latest version instead?

I also made the project to use docker-compose for easier local testing, but the old structure is preserved in the ssids directory.

@bbinet
Copy link
Author

bbinet commented Apr 12, 2018

Traceback (most recent call last):
  File "/ssids.py", line 8, in <module>
    gi.require_version("NM", "1.0")
  File "/usr/lib/python3/dist-packages/gi/__init__.py", line 118, in require_version
    raise ValueError('Namespace %s not available' % namespace)
ValueError: Namespace NM not available

@bbinet
Copy link
Author

bbinet commented Apr 12, 2018

(I've installed python3-gi, but it may not be sufficient?)

@majorz
Copy link
Collaborator

majorz commented Apr 12, 2018

It will need libnm-dev as well for this to work. I am using a Debian Buster base image, so that it is more recent and compatible with the one running on the host OS.

@bbinet
Copy link
Author

bbinet commented Apr 12, 2018

@majorz the new ssids.py script is now running (still on debian jessie image though).

I hope I'll be able to reproduce the issue this afternoon, as I'm leaving tonight for 5 days off.

@bbinet
Copy link
Author

bbinet commented Apr 12, 2018

Sorry, I meant Debian Stretch (not jessie)

@majorz
Copy link
Collaborator

majorz commented Apr 12, 2018

Thanks! I think it should be fine. I am running it locally as well, but not sure whether I will be able to reproduce. I will run it on a few other devices as well later on. Good luck!

@majorz
Copy link
Collaborator

majorz commented Apr 15, 2018

I made some adjustments on the diagnostic script, as there is a caching bug with python-gir, which led to incorrect reporting of access point count.

@bbinet
Copy link
Author

bbinet commented Apr 18, 2018

I'm back at work today, so I've updated the ssids.py script, but I have some encoding issues:

--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib/python3.5/logging/__init__.py", line 983, in emit
    stream.write(msg)
UnicodeEncodeError: 'ascii' codec can't encode character '\u2014' in position 24: ordinal not in range(128)
Call stack:
  File "/ssids.py", line 191, in <module>
    main()
  File "/ssids.py", line 128, in main
    set_nm_log_level("trace")
  File "/ssids.py", line 29, in set_nm_log_level
    debug("Setting NetworkManager log level to", level)
  File "/ssids.py", line 24, in debug
    LOGGER.debug(" ".join([str(arg) for arg in args]))
Message: 'Setting NetworkManager log level to trace'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib/python3.5/logging/__init__.py", line 983, in emit
    stream.write(msg)
UnicodeEncodeError: 'ascii' codec can't encode character '\u2014' in position 24: ordinal not in range(128)
Call stack:
  File "/ssids.py", line 191, in <module>
    main()
  File "/ssids.py", line 129, in main
    set_wpa_log_level("msgdump")
  File "/ssids.py", line 34, in set_wpa_log_level
    debug("Setting wpa_supplicant log level to", level)
  File "/ssids.py", line 24, in debug
    LOGGER.debug(" ".join([str(arg) for arg in args]))
Message: 'Setting wpa_supplicant log level to msgdump'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib/python3.5/logging/__init__.py", line 983, in emit
    stream.write(msg)
UnicodeEncodeError: 'ascii' codec can't encode character '\u2014' in position 24: ordinal not in range(128)
Call stack:
  File "/ssids.py", line 191, in <module>
    main()
  File "/ssids.py", line 131, in main
    print_device_info()
  File "/ssids.py", line 86, in print_device_info
    debug("Device:", device.get_iface())
  File "/ssids.py", line 24, in debug
    LOGGER.debug(" ".join([str(arg) for arg in args]))
Message: 'Device: wlan0'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib/python3.5/logging/__init__.py", line 983, in emit
    stream.write(msg)
UnicodeEncodeError: 'ascii' codec can't encode character '\u2014' in position 24: ordinal not in range(128)
Call stack:
  File "/ssids.py", line 191, in <module>
    main()
  File "/ssids.py", line 131, in main
    print_device_info()
  File "/ssids.py", line 87, in print_device_info
    debug("Driver:", device.get_driver())
  File "/ssids.py", line 24, in debug
    LOGGER.debug(" ".join([str(arg) for arg in args]))
Message: 'Driver: brcmfmac'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib/python3.5/logging/__init__.py", line 983, in emit
    stream.write(msg)
UnicodeEncodeError: 'ascii' codec can't encode character '\u2014' in position 24: ordinal not in range(128)
Call stack:
  File "/ssids.py", line 191, in <module>
    main()
  File "/ssids.py", line 131, in main
    print_device_info()
  File "/ssids.py", line 88, in print_device_info
    debug("Driver version:", device.get_driver_version())
  File "/ssids.py", line 24, in debug
    LOGGER.debug(" ".join([str(arg) for arg in args]))
Message: 'Driver version: 7.45.41.46'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib/python3.5/logging/__init__.py", line 983, in emit
    stream.write(msg)
UnicodeEncodeError: 'ascii' codec can't encode character '\u2014' in position 24: ordinal not in range(128)
Call stack:
  File "/ssids.py", line 191, in <module>
    main()
  File "/ssids.py", line 131, in main
    print_device_info()
  File "/ssids.py", line 89, in print_device_info
    debug("Firmware version:", device.get_firmware_version())
  File "/ssids.py", line 24, in debug
    LOGGER.debug(" ".join([str(arg) for arg in args]))
Message: 'Firmware version: 01-f8a78378'
Arguments: ()
--- Logging error ---
Traceback (most recent call last):
  File "/usr/lib/python3.5/logging/__init__.py", line 983, in emit
    stream.write(msg)
UnicodeEncodeError: 'ascii' codec can't encode character '\u2014' in position 24: ordinal not in range(128)
Call stack:
  File "/ssids.py", line 191, in <module>
    main()
  File "/ssids.py", line 134, in main
    wait(10)
  File "/ssids.py", line 119, in wait
    debug("Sleeping", seconds, "seconds...")
  File "/ssids.py", line 24, in debug
    LOGGER.debug(" ".join([str(arg) for arg in args]))
Message: 'Sleeping 10 seconds...'
Arguments: ()

@bbinet
Copy link
Author

bbinet commented Apr 18, 2018

It seems to be fixed if I set the env variable: PYTHONIOENCODING=utf-8

@majorz
Copy link
Collaborator

majorz commented Apr 19, 2018

In my testing here with RPi 3 I also reproduced multiple times a scenario where 0 access points were reported by NetworkManager. However NM detected this and triggered a RequestScan which brought back the list with access points.

I will test today with a RPi 1, since I do not have a zero w, but it has the same wifi chip, so let's see what happens.

@majorz
Copy link
Collaborator

majorz commented Apr 19, 2018

I got confused, the RPi 1 does not have a built-in wireless chip. RPi 3 B has the same wireless chip as Zero W.

@bbinet
Copy link
Author

bbinet commented Apr 19, 2018

@majorz my rpi zero w is currently running ssids.py and I'll drop you a message as soon as I've reproduced the issue.

@majorz
Copy link
Collaborator

majorz commented Apr 19, 2018

Fingers crossed!

@majorz
Copy link
Collaborator

majorz commented Apr 20, 2018

I am thinking about the following modifications in WiFi Connect:

  1. Always trigger RequestScan before retreiving the list of access points. Since with the current implementation there is not a way to subscribe for the event where the scan has completed, I will do some experimentation and see how to do it best with sleeping. Later on when we use libnm this will change. I did a few days ago the first steps towards this: https://github.com/resin-io-playground/libnm-rs. libnm is the NetworkManager official library. nmcli is using it and it handles all D-Bus communication itself + adds useful utility functions on top. Another nice thing about it is that it provides GObject introspection, so the Rust bindings will be autogenerated for the most part. The Rust autogeneration is not perfect, so I will have to do manual stuff as well unfortunately, but hopefully that will not be much effort.

  2. If a list with access points is not available then restart NetworkManager from inside WiFi Connect (if our findings confirm that this will fix the issue).

@bbinet
Copy link
Author

bbinet commented Apr 20, 2018

@majorz this sounds good.

Also, I was able to reproduce the issue this night, but sadly I had used journalctl --vacuum-size and --vacuum-time option to try to purge old logs yesterday, and it seems journalctl also purged the logs from the night, so you might not find the relevant logs in the files attached below.

I will restart completely my rpi, and run ssids.py script again...

Here are the logs of the ssids.py script:
https://gist.github.com/bbinet/aa1c3260f5756b31f20c37276d61ba27#file-ssids-log

Here are the logs of NetworkManager service:
https://gist.github.com/bbinet/aa1c3260f5756b31f20c37276d61ba27#file-nm-log

Here are the logs of wpa_supplicant service:
https://gist.github.com/bbinet/aa1c3260f5756b31f20c37276d61ba27#file-wpa_supplicant-log

Here are the logs of the container that runs wifi-connect:
https://gist.github.com/bbinet/aa1c3260f5756b31f20c37276d61ba27#file-container-log

@majorz
Copy link
Collaborator

majorz commented Apr 20, 2018

Ooops, I left a cleanup() call in the middle and forgot to remove it. The logs revealed that. I will fix it now.

@majorz
Copy link
Collaborator

majorz commented Apr 20, 2018

@bbinet Done! Please run the new version as the previous one did not attempt to unmanaged the device or restart NM. I also added PYTHONIOENCODING in the Dockerfile. And also something in the compose since wifi-connect's dnsmasq needs it (although the last one should not have affected you).

@bbinet
Copy link
Author

bbinet commented Apr 20, 2018

@majorz Thanks, I've just restarted my rpi with the new ssids.py script. Let's wait for the issue to show up again.

@bbinet
Copy link
Author

bbinet commented Apr 30, 2018

Sorry I was not able to reproduce the issue for 10 days...

@majorz
Copy link
Collaborator

majorz commented May 3, 2018

@bbinet alright, I will include the changes we talked above with invoking RequestScan on start and also restarting NetworkManager once if access points are not available.

@bbinet
Copy link
Author

bbinet commented May 3, 2018

@majorz ok.

@bbinet
Copy link
Author

bbinet commented Jul 17, 2018

@majorz Any news on this issue? I'm still getting empty list of wifi access points from time to time: if I manually restart NetworkManager from the hostos then it works again.

@majorz
Copy link
Collaborator

majorz commented Jul 17, 2018

No much progress on this issue yet. I am planning to address it till the end of the month.

@bbinet
Copy link
Author

bbinet commented Jul 17, 2018

Ok, thank you @majorz

@bbinet
Copy link
Author

bbinet commented Oct 16, 2018

I'm still facing this issue from time to time: this is quite painful as we cannot connect to the device until the next reboot (though we have configured a force reboot every night to workaround this issue).
@majorz Any news on this issue and the wifi-connect project in general, as the development seems to have stalled these days?

@majorz
Copy link
Collaborator

majorz commented Oct 17, 2018

There is a new ui branch that should be integrated as soon as I can. And also we will be basing the project on this new Rust binding library which I worked on: https://github.com/resin-io-modules/libnm-rs.

I am going to work on this once we release this bigger update, since I would like to depend on the new library for the scanning notification mechanisms.

Also we are going to integrate WiFi Connect in Resin OS itself. We are figuring the exact details currently.

@bbinet
Copy link
Author

bbinet commented Oct 17, 2018

Ok, thanks for the update, and for your efforts in this project.

@andrewjaykeller
Copy link

I'm still seeing this issue, any idea for a fix?

@majorz
Copy link
Collaborator

majorz commented Jan 18, 2019

@andrewjaykeller I left you a comment in the other thread here (about the same issue): #232 (comment)

@maggie44
Copy link

If you are still experiencing this issue can you try this branch instead and report back on whether it helps: #450

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants