Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pico W drops from Web Workflow list of networked CircuitPython devices #7346

Closed
RetiredWizard opened this issue Dec 15, 2022 · 19 comments
Closed
Assignees
Labels
Milestone

Comments

@RetiredWizard
Copy link

RetiredWizard commented Dec 15, 2022

CircuitPython version

Adafruit CircuitPython 8.0.0-beta.5-11-gfc13fba6e-dirty on 2022-12-14; Raspberry Pi Pico W with rp2040

Code/REPL

import wifi
import socketpool
import ssl
import adafruit_requests
from os import getenv

TEXT_URL = "http://wifitest.adafruit.com/testwifi/index.html"

print("Connecting...")
wifi.radio.connect(getenv('CIRCUITPY_WIFI_SSID'),getenv('CIRCUITPY_WIFI_PASSWORD'))
print("My IP address is", wifi.radio.ipv4_address)

pool = socketpool.SocketPool(wifi.radio)
requests = adafruit_requests.Session(pool, ssl.create_default_context())

input("Check web workflow devices on network and then press enter")
response = requests.get(TEXT_URL)
print("Check web workflow devices on network again")

Behavior

When connecting to the Pico W over the USB serial port:
Up until the response = requests.get(TEXT_URL) is executed in the test script above the Pico W is listed by other CircuitPython devices as available on the network. Once the requests.get call is made the Pico W is no longer listed (upon browser refresh) until a Ctrl-D restart occurs.

If the local web workflow server on the Pico W is accessed via a browser (main page, serial terminal, file browser or full terminal), the Pico W is dropped from the available devices list.

Description

Even though the Pico W device is not listed by other devices the Pico W can still be accessed via the microcontroller hosted site and the file browser/serial terminal/info page all work properly.

Additional information

I am testing from a Windows 8 PC and not running mDNS.

@tannewt tannewt added network rp2 Raspberry Pi RP2 Micros labels Dec 15, 2022
@tannewt tannewt added this to the 8.0.0 milestone Dec 15, 2022
@tannewt
Copy link
Member

tannewt commented Dec 15, 2022

I wonder if the additional requests are turning off the MDNS responder.

@RetiredWizard
Copy link
Author

I tested this out on an Feather ESP32-S3 and didn't see the same issue.

@anecdata
Copy link
Member

Does the dual-core S3 have an advantage with blocking Python or core code, vs. underlying priorities of core tasks?

@RetiredWizard
Copy link
Author

I think the S2 is a single core and I just tested the latest CircuitPython on the QT Py ESP32-S2 and the web workflow didn't drop the device from the list when this code was run.

@dhalbert
Copy link
Collaborator

Does the dual-core S3 have an advantage with blocking Python or core code, vs. underlying priorities of core tasks?

I would say not really, the RTOS can take care of allowing the IDF network code to run, on a single core or the other core, if available. If we have something blocking in the Python code that is blocking network code, that is a bug.

@jepler jepler self-assigned this Jan 3, 2023
jepler added a commit to jepler/circuitpython that referenced this issue Jan 4, 2023
This is a speeculative fix for adafruit#7346. The theory is that having some
active socket objects uses up the small (4000-byte) pool reserved for
lwip and prevents mdns from operating properly. However, as I wasn't
able to reproduce the problem with the given script, this is just a
guess.
@RetiredWizard
Copy link
Author

I don't know if it helps but I traced the steps through the adafruit_requests.py library and the line that appears to shutdown the responder is:

524 sock.connect((connect_host, port))

@RetiredWizard
Copy link
Author

RetiredWizard commented Jan 5, 2023

Looking (and a bit of selective code commenting) in common-hal/socketpool/Socket.c I believe

981 err = tcp_connect(socket->pcb.tcp, &dest, port, _lwip_tcp_connected);

is the line that causes the issue. I took a look at the tcp_connect function in tcp.c and compared it to the same function in esp_tls.c hoping that the comments would suggest a clue. The only thing I noticed (understood) was that the ESP module sets the socket to "non-blocking" before connecting. That sounds to me like it would only impact the process while attempting to make the connection and since the pico w never comes back to the list after executing the requests.get I'm doubtful the non-blocking setting is relevant.

@RetiredWizard
Copy link
Author

A couple more observations during testing:

The first power up after flashing CircuitPython Web Workflow doesn't start (device isn't listed on other board pages, title bar doesn't show IP until web.radio.connect and hosted web page isn't available). Ctrl-D soft reboots don't bring it up. The network stack seems to be working, Once it's connected to an access point via wifi.radio the device can be pinged and it can access the network but none of the Web Workflow services are available.

The next power cycle of the board starts the Web Workflow normally.

Accessing the Pico W's hosted web page has the same effect as response = requests.get(TEXT_URL), that is once you access the web workflow site on the Pico W it will no longer be listed on other boards' pages. Unlike my test code, a soft ctrl-D boot of the Pico W does not cause the Pico W to show up again on other boards' pages. However a power cycle will.

@jepler
Copy link
Member

jepler commented Jan 8, 2023

Immediately after flashing a uf2, what's circuitpython give as the reset reason?

@jepler
Copy link
Member

jepler commented Jan 8, 2023

When I upload a uf2, the device restarts with the reason being

>>> microcontroller.cpu.reset_reason
microcontroller.ResetReason.WATCHDOG

the web workflow may skip being started depending on reset reason:

    // Skip starting the workflow if we're not starting from power on or reset.
    const mcu_reset_reason_t reset_reason = common_hal_mcu_processor_get_reset_reason();
    if (reset_reason != RESET_REASON_POWER_ON &&
        reset_reason != RESET_REASON_RESET_PIN &&
        reset_reason != RESET_REASON_UNKNOWN &&
        reset_reason != RESET_REASON_SOFTWARE) {
        return;
    }

@RetiredWizard
Copy link
Author

I also get the WATCHDOG reason after flashing.

the web workflow may skip being started depending on reset reason:

Makes sense to me 😁

@tannewt
Copy link
Member

tannewt commented Jan 18, 2023

I've fixed the reset reason issue with #7462.

I wasn't able to reproduce the MDNS dropping over the long term. I missed one or two scans but then it came back without any intervention from me. MDNS is unreliable so this is ok by me.

@tannewt tannewt assigned tannewt and unassigned jepler Jan 18, 2023
@RetiredWizard
Copy link
Author

RetiredWizard commented Jan 18, 2023

I loaded the 7462 artifacts and I'm still seeing the PicoW drop from the list of devices after the test code from above runs...

Refreshed many times and it's still missing after several minutes.

I'm not sure how high a priority this should be, the PicoW functions perfectly it's just not listed on other boards' pages on my Windows 8 non mDNS network. On my network the CPYxxxxx host name resolutions don't work either.....

dhalbert pushed a commit that referenced this issue Jan 20, 2023
Watchdogs are used to reboot out of the bootloader. There is a
scratch register for user watchdogs. So use sdk functions to better
distinguish these.

Related to #7346
@tannewt tannewt modified the milestones: 8.0.0, 8.x.x Jan 26, 2023
@RetiredWizard
Copy link
Author

I just spotted @anecdata 's discord question: https://discord.com/channels/327254708534116352/327298996332658690/1072356309715918848

Any chance this issue is related to the number of advertised services on the PicoW?

@anecdata
Copy link
Member

anecdata commented Feb 7, 2023

I didn't think this was related to #7543 but given that it seems to be triggered when requests grabbing a socket (or two), it's possible.

Does it still drop with requests = adafruit_requests.Session(pool, None)?

(I'm not sure where in the code a TLS socket actually grabs two sockets, so this question could be off the deep end)

@RetiredWizard
Copy link
Author

I'm down another rabbit hole at the moment, but when I come up, I'll take a look.

@RetiredWizard
Copy link
Author

RetiredWizard commented Feb 8, 2023

Replacing the requests = line with the pool, None version had no impact, the Pico W is still dropped from the list of devices when the response = requests.get(TEXT_URL) call is made.

I did notice that when I was running this latest test, my board was executing a call to adafruit_ntp in the code.py which apparently didn't bother MDNS as it didn't cause the Pico W to drop from the list.

@anecdata
Copy link
Member

anecdata commented Feb 8, 2023

curiouser and curiouser

@RetiredWizard
Copy link
Author

This appears to have been fixed by pr #7589

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants