Skip to content

[BUG] DHCP Ethernet crashes after repeated switching of network interface [OPTA/Portenta H7] #891

@Channel59

Description

@Channel59

I experience some issues with the network interface of the OPTA (probably also the portenta H7 suffers from this problem).

I am not sure if only the OPTA is affected, and whether or not this issue is OPTA related, arduinoCore-mbed related or really mbed-os related, or a combination of those.

Setup:

PC with multiple eth interface, Opta directly connected to eth0. The PC has a static IP and runs a DHCP server. ArduinoCore-mbed version: 4.0.10

Issue:

If from the PC the interface eth0 is brought up and down multiple times, the OPTA sees this and the LAN light turn off when the interface is down and turn back on when the interface is up.
If the OPTA is configured to use a static IP this gives no issues. However, when DHCP is enabled, the OPTA seems to be unable to reconnect to the network. It cannot be pinged either. Also, the network stack of the OPTA does not seem to notice this, because the state will actually become NSAPI_STATUS_GLOBAL_UP, but it does not respond to pings.

The same behavior can also be reproduced by repeatedly pulling the inserting the ethernet cable in the RJ-45 port.

With a static IP this seems not to be an issue.

Recovering from this state

It seems that the only way to recover from this state is by resetting the OPTA, which is undesirable. It seems that removing and re-inserting the ethernet cable can also recover the OPTA from this state.

Code on the OPTA:

This issue happened when using Arduino_ConnectionHandler, just the PortentaEthernet class from the arduinocore, and it also happen when I just use the mbed EthernetInterface.

The simplest way to reproduce the problem is with this code:

#include <Arduino.h>
#include "EthernetInterface.h"

auto eth = EthernetInterface::get_default_instance();

void status_callback(nsapi_event_t status, intptr_t param) ;

void setup() {
  eth->set_blocking(true);
  eth->attach(&status_callback);
  eth->set_dhcp(true);
  eth->connect();
  eth->set_blocking(false);  
}


void status_callback(nsapi_event_t status, intptr_t param) {
    Serial.println("Connection status changed!");
    switch (param) {
        case NSAPI_STATUS_LOCAL_UP:
            Serial.println("Local IP address set!");
            break;
        case NSAPI_STATUS_GLOBAL_UP:
            Serial.println("Global IP address set!");
            break;
        case NSAPI_STATUS_DISCONNECTED:
            Serial.println("No connection to network!");
            break;
        case NSAPI_STATUS_CONNECTING:
            Serial.println("Connecting to network!");
            break;
        default:
            Serial.println("Not supported");
            break;
    }
}

The script to break the opta:

---break-opta.sh---
#!/bin/bash
for j in {1..5}
do
    ifconfig eth0 down
    sleep 2
    ifconfig eth0 up
    sleep 2
done
if ping -c 1 192.168.0.100
then
    echo "."
else
    echo "X"
    exit 1
fi

Outputs:

First we ping the opta to check if it has gotten a DHCP lease:

$ ping 192.168.0.100
PING 192.168.0.100 (192.168.0.100) 56(84) bytes of data.
64 bytes from 192.168.0.100: icmp_seq=1 ttl=255 time=341 ms
64 bytes from 192.168.0.100: icmp_seq=2 ttl=255 time=0.195 ms
64 bytes from 192.168.0.100: icmp_seq=3 ttl=255 time=0.172 ms
^C
--- 192.168.0.100 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2019ms
rtt min/avg/max/mdev = 0.172/113.680/340.674/160.508 ms

This seems normal.
Then we run the break-opta.sh script.

$ sudo ./break-opta-dhcp.sh 
PING 192.168.0.100 (192.168.0.100) 56(84) bytes of data.
From 192.168.0.1 icmp_seq=1 Destination Host Unreachable

--- 192.168.0.100 ping statistics ---
1 packets transmitted, 0 received, +1 errors, 100% packet loss, time 0ms
X

The opta cannot be reached anymore.
If we look at the serial output from the opta, it seems that everything is fine:

Connection status changed!
Connecting to network!
Connection status changed!
Global IP address set!
Connection status changed!
Connecting to network!
Connection status changed!
Global IP address set!
Connection status changed!
Connecting to network!
Connection status changed!
Global IP address set!
Connection status changed!
Connecting to network!
Connection status changed!
Global IP address set!
Connection status changed!
Connecting to network!
Connection status changed!
Global IP address set!

A reset of the opta seems to be the only way to recover from this state.

Attempts to recover without resetting

Several things I have tried (behind a button push in the loop() function):

eth->set_blocking(true);
eth->disconnect();
eth->set_dhcp(false);
eth->set_dhcp(true);
eth->disconnect();

Furthermore, I tried to build the project with the lwip headers included, by adding these values to variants/OPTA/includes.txt:

-iwithprefixbefore/mbed/connectivity/lwipstack
-iwithprefixbefore/mbed/connectivity/lwipstack/include
-iwithprefixbefore/mbed/connectivity/lwipstack/include/lwipstack
-iwithprefixbefore/mbed/connectivity/lwipstack/lwip
-iwithprefixbefore/mbed/connectivity/lwipstack/lwip-sys
-iwithprefixbefore/mbed/connectivity/lwipstack/lwip-sys/arch
-iwithprefixbefore/mbed/connectivity/lwipstack/lwip/src
-iwithprefixbefore/mbed/connectivity/lwipstack/lwip/src/include
-iwithprefixbefore/mbed/connectivity/lwipstack/lwip/src/include/compat
-iwithprefixbefore/mbed/connectivity/lwipstack/lwip/src/include/compat/posix
-iwithprefixbefore/mbed/connectivity/lwipstack/lwip/src/include/compat/posix/arpa
-iwithprefixbefore/mbed/connectivity/lwipstack/lwip/src/include/compat/posix/net
-iwithprefixbefore/mbed/connectivity/lwipstack/lwip/src/include/compat/posix/sys
-iwithprefixbefore/mbed/connectivity/lwipstack/lwip/src/include/lwip
-iwithprefixbefore/mbed/connectivity/lwipstack/lwip/src/include/lwip/priv
-iwithprefixbefore/mbed/connectivity/lwipstack/lwip/src/include/lwip/prot
-iwithprefixbefore/mbed/connectivity/lwipstack/lwip/src/include/netif

This allowed me to access the functions, calling them after the button push. None of these seemed to be able to recover the network connection.

struct netif *find_netif() {
    struct netif *netif = netif_list;
    while (netif) {
        if (netif->flags & NETIF_FLAG_ETHERNET) {
            return netif;
        }
        netif = netif->next;
    }
    return NULL;
}
struct netif *netif = find_netif();

dhcp_network_changed(netif);
sys_check_timeouts();
dhcp_renew(netif);

None of these things seem to be able to restore the network status.

Does anyone have an idea as to how to fix this?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions