Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Border router disappears randomly from home assistant thread network #2216

Closed
paolofaz opened this issue Mar 10, 2024 · 5 comments
Closed

Border router disappears randomly from home assistant thread network #2216

paolofaz opened this issue Mar 10, 2024 · 5 comments

Comments

@paolofaz
Copy link

I'm trying the last official openthread docker image with a sonoff-e dongle flashed with the last openthread fw (2.4.1)

I've activated the otbr integration in homeassistant (http://my-ip:8081) and it connects well.

The Thread integration is auto discovered from home assistant and it creates a network with a name like "ha-thread-XXXX" as you can see in the image below:

1

After a few minutes (random, about 5), the border router disappears from the thread network with this error "no border routers were found" as you can see in the image below:

2

If i "click" on "reset boarder router", home assistant makes a new thread network with another name (ha-thread-YYYY) and after a few minutes i get the same problem: the border router disappares again etc etc etc

The border router web page is always reachable.

The logs doesnt help beacuse i dont see any error.

this is my compose yaml

` otbr:
container_name: otbr
image: openthread/otbr:latest
ports:
- "8086:80"
- "8081:8081"
volumes:
- /dev/ttyACM0:/dev/ttyUSB0
- /home/paolo/docker/thread:/var/lib/thread
privileged: true
cap_add:
- SYS_ADMIN
- NET_ADMIN
command: --radio-url spinel+hdlc+uart:///dev/ttyUSB0?uart-baudrate=460800
sysctls:
- net.ipv6.conf.all.disable_ipv6=0
- net.ipv4.conf.all.forwarding=1
- net.ipv6.conf.all.forwarding=1
dns:
- 127.0.0.1
networks:
- rete_otbr
ulimits: #THIS IS NECESSARY otherways rsyslog takes a long time to start. with this start in few seconds. bug?
nofile:
soft: "65536"
hard: "65536"

networks:
rete_otbr:
driver: bridge
driver_opts:
com.docker.network.bridge.name: "otbr0"
enable_ipv6: true
ipam:
config:
- subnet: fd11:db8:1::/64
`

@agners
Copy link
Contributor

agners commented Mar 12, 2024

The border router list shown in Home Assistant is based on mDNS/DNS-SD _meshcop._udp service. Usually, the timeout is 30 min or so, so if the OTBR/mDNSResponder crashes hard, then it would disappear after that period. But if the border router disappears after a few minutes already, it sounds more it would gracefully announce a service remove.

Anything in the logs of the otbr container (docker logs otbr)?

@paolofaz
Copy link
Author

paolofaz commented Mar 14, 2024

@agners i copy only the relevants sections of otbr container logs
`
...
++ RESOLV_CONF_HEAD=/etc/resolvconf/resolv.conf.d/head

  • . script/_firewall
    ++ FIREWALL_SERVICE=/etc/init.d/otbr-firewall
    ++ sudo modprobe ip6table_filter
    sudo: modprobe: command not found
    ++ true
    ++ FIREWALL=1
  • main
    ...
    ...
    Mar 14 08:20:41 6cd2c7eaf122 otbr-agent[152]: [INFO]-BA------: Result of publish meshcop service OpenThread BorderRouter #6DC7._meshcop._udp.local: OK
    Mar 14 08:20:41 6cd2c7eaf122 otbr-agent[152]: [INFO]-BA------: Result of publish meshcop service OpenThread BorderRouter #6DC7._meshcop._udp.local: OK
    Mar 14 08:20:41 6cd2c7eaf122 avahi-daemon[136]: Withdrawing address record for fd11:db8:1::2 on eth0.
    Mar 14 08:20:41 6cd2c7eaf122 avahi-daemon[136]: Withdrawing address record for ::1 on lo.
    Mar 14 08:20:41 6cd2c7eaf122 avahi-daemon[136]: Withdrawing address record for 127.0.0.1 on lo.
    Mar 14 08:20:41 6cd2c7eaf122 avahi-daemon[136]: Host name conflict, retrying with 6cd2c7eaf122-3
    Mar 14 08:20:41 6cd2c7eaf122 avahi-daemon[136]: Registering new address record for fd11:db8:1::2 on eth0..
    Mar 14 08:20:41 6cd2c7eaf122 avahi-daemon[136]: Registering new address record for 172.18.0.2 on eth0.IPv4.
    Mar 14 08:20:41 6cd2c7eaf122 avahi-daemon[136]: Registering new address record for ::1 on lo.
    .
    Mar 14 08:20:41 6cd2c7eaf122 avahi-daemon[136]: Registering new address record for 127.0.0.1 on lo.IPv4.
    Mar 14 08:20:42 6cd2c7eaf122 avahi-daemon[136]: Withdrawing address record for fd11:db8:1::2 on eth0.
    Mar 14 08:20:42 6cd2c7eaf122 avahi-daemon[136]: Withdrawing address record for ::1 on lo.
    Mar 14 08:20:42 6cd2c7eaf122 avahi-daemon[136]: Withdrawing address record for 127.0.0.1 on lo.
    Mar 14 08:20:42 6cd2c7eaf122 avahi-daemon[136]: Host name conflict, retrying with 6cd2c7eaf122-4
    Mar 14 08:20:42 6cd2c7eaf122 avahi-daemon[136]: Registering new address record for fd11:db8:1::2 on eth0..
    Mar 14 08:20:42 6cd2c7eaf122 avahi-daemon[136]: Registering new address record for 172.18.0.2 on eth0.IPv4.
    Mar 14 08:20:42 6cd2c7eaf122 avahi-daemon[136]: Registering new address record for ::1 on lo.
    .
    Mar 14 08:20:42 6cd2c7eaf122 avahi-daemon[136]: Registering new address record for 127.0.0.1 on lo.IPv4.
    Mar 14 08:20:43 6cd2c7eaf122 avahi-daemon[136]: Withdrawing address record for fd11:db8:1::2 on eth0.
    Mar 14 08:20:43 6cd2c7eaf122 avahi-daemon[136]: Withdrawing address record for ::1 on lo.
    Mar 14 08:20:43 6cd2c7eaf122 avahi-daemon[136]: Withdrawing address record for 127.0.0.1 on lo.
    Mar 14 08:20:43 6cd2c7eaf122 avahi-daemon[136]: Host name conflict, retrying with 6cd2c7eaf122-5
    Mar 14 08:20:43 6cd2c7eaf122 avahi-daemon[136]: Registering new address record for fd11:db8:1::2 on eth0..
    Mar 14 08:20:43 6cd2c7eaf122 avahi-daemon[136]: Registering new address record for 172.18.0.2 on eth0.IPv4.
    Mar 14 08:20:43 6cd2c7eaf122 avahi-daemon[136]: Registering new address record for ::1 on lo.
    .
    Mar 14 08:20:43 6cd2c7eaf122 avahi-daemon[136]: Registering new address record for 127.0.0.1 on lo.IPv4.
    Mar 14 08:20:45 6cd2c7eaf122 avahi-daemon[136]: Withdrawing address record for fd11:db8:1::2 on eth0.
    Mar 14 08:20:45 6cd2c7eaf122 avahi-daemon[136]: Withdrawing address record for ::1 on lo.
    Mar 14 08:20:45 6cd2c7eaf122 avahi-daemon[136]: Withdrawing address record for 127.0.0.1 on lo.
    Mar 14 08:20:45 6cd2c7eaf122 avahi-daemon[136]: Host name conflict, retrying with 6cd2c7eaf122-6
    Mar 14 08:20:45 6cd2c7eaf122 avahi-daemon[136]: Registering new address record for fd11:db8:1::2 on eth0..
    Mar 14 08:20:45 6cd2c7eaf122 avahi-daemon[136]: Registering new address record for 172.18.0.2 on eth0.IPv4.
    Mar 14 08:20:45 6cd2c7eaf122 avahi-daemon[136]: Registering new address record for ::1 on lo.
    .
    Mar 14 08:20:45 6cd2c7eaf122 avahi-daemon[136]: Registering new address record for 127.0.0.1 on lo.IPv4.
    ...
    ...
    Mar 14 08:20:46 6cd2c7eaf122 avahi-daemon[136]: Server startup complete. Host name is 6cd2c7eaf122-6.local. Local service cookie is 3781344350.
    Mar 14 08:20:46 6cd2c7eaf122 mDNSResponder: Default: mDNSCoreReceiveResponse: Received from 172.18.0.2:5353 22 2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.1.0.0.0.8.b.d.0.1.1.d.f.ip6.arpa. PTR 6cd2c7eaf122-6.local.
    Mar 14 08:20:46 6cd2c7eaf122 mDNSResponder: Default: mDNSCoreReceiveResponse: Unexpected conflict discarding 20 2.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.0.1.0.0.0.8.B.D.0.1.1.D.F.ip6.arpa. PTR 6cd2c7eaf122.local.
    Mar 14 08:20:46 6cd2c7eaf122 mDNSResponder: Default: mDNSCoreReceiveResponse: Received from 172.18.0.2:5353 22 2.0.18.172.in-addr.arpa. PTR 6cd2c7eaf122-6.local.
    Mar 14 08:20:46 6cd2c7eaf122 mDNSResponder: Default: mDNSCoreReceiveResponse: Unexpected conflict discarding 20 2.0.18.172.in-addr.arpa. PTR 6cd2c7eaf122.local.
    Mar 14 08:21:40 6cd2c7eaf122 otbr-agent[152]: 00:01:00.032 [D] P-RadioSpinel-: Trying to get RCP time offset
    Mar 14 08:21:40 6cd2c7eaf122 otbr-agent[152]: 00:01:00.032 [D] P-RadioSpinel-: Sent spinel frame, flg:0x2, iid:0, tid:3, cmd:PROP_VALUE_GET, key:TIMESTAMP
    Mar 14 08:21:40 6cd2c7eaf122 otbr-agent[152]: 00:01:00.032 [D] P-RadioSpinel-: Wait response: tid=3 key=2050
    Mar 14 08:21:40 6cd2c7eaf122 otbr-agent[152]: 00:01:00.034 [D] P-RadioSpinel-: Received spinel frame, flg:0x2, iid:0, tid:3, cmd:PROP_VALUE_IS, key:TIMESTAMP, timestamp:134738041
    Mar 14 08:22:40 6cd2c7eaf122 otbr-agent[152]: 00:02:00.035 [D] P-RadioSpinel-: Trying to get RCP time offset
    Mar 14 08:22:40 6cd2c7eaf122 otbr-agent[152]: 00:02:00.035 [D] P-RadioSpinel-: Sent spinel frame, flg:0x2, iid:0, tid:4, cmd:PROP_VALUE_GET, key:TIMESTAMP
    ...
    ...
    Mar 14 08:25:36 6cd2c7eaf122 mDNSResponder: Default: mDNSCoreReceiveResponse: Unexpected conflict discarding 20 2.0.0.0.2.1.E.F.F.F.C.A.2.4.0.0.1.A.8.B.A.9.6.2.3.4.F.1.4.F.D.F.ip6.arpa. PTR 6cd2c7eaf122.local.
    Mar 14 08:25:40 6cd2c7eaf122 otbr-agent[152]: 00:05:00.043 [D] P-RadioSpinel-: Trying to get RCP time offset
    Mar 14 08:25:40 6cd2c7eaf122 otbr-agent[152]: 00:05:00.043 [D] P-RadioSpinel-: Sent spinel frame, flg:0x2, iid:0, tid:2, cmd:PROP_VALUE_GET, key:TIMESTAMP
    Mar 14 08:25:40 6cd2c7eaf122 otbr-agent[152]: 00:05:00.043 [D] P-RadioSpinel-: Wait response: tid=2 key=2050
    Mar 14 08:25:40 6cd2c7eaf122 otbr-agent[152]: 00:05:00.044 [D] P-RadioSpinel-: Received spinel frame, flg:0x2, iid:0, tid:2, cmd:PROP_VALUE_IS, key:TIMESTAMP, timestamp:374754361
    Mar 14 08:25:43 6cd2c7eaf122 otbr-agent[152]: 00:05:03.542 [I] Mle-----------: Send Advertisement (ff02:0:0:0:0:0:0:1)
    Mar 14 08:25:43 6cd2c7eaf122 otbr-agent[152]: 00:05:03.543 [D] P-RadioSpinel-: Sent spinel frame, flg:0x2, iid:0, tid:3, cmd:PROP_VALUE_SET, key:STREAM_RAW, len:69, channel:15, maxbackoffs:4, maxretries:15 ...
    Mar 14 08:25:43 6cd2c7eaf122 otbr-agent[152]: 00:05:03.543 [D] P-RadioSpinel-: ... csmaCaEnabled:1, isHeaderUpdated:0, isARetx:0, skipAes:0, txDelay:0, txDelayBase:0
    Mar 14 08:25:43 6cd2c7eaf122 otbr-agent[152]: 00:05:03.551 [D] P-RadioSpinel-: Received spinel frame, flg:0x2, iid:0, tid:3, cmd:PROP_VALUE_IS, key:LAST_STATUS, status:OK
    Mar 14 08:25:43 6cd2c7eaf122 otbr-agent[152]: 00:05:03.551 [I] MeshForwarder-: Sent IPv6 UDP msg, len:90, chksum:d053, ecn:no, to:0xffff, sec:no, prio:net
    Mar 14 08:25:43 6cd2c7eaf122 otbr-agent[152]: 00:05:03.551 [I] MeshForwarder-: src:[fe80:0:0:0:b8db:8a8:e4a6:6dc7]:19788
    Mar 14 08:25:43 6cd2c7eaf122 otbr-agent[152]: 00:05:03.551 [I] MeshForwarder-: dst:[ff02:0:0:0:0:0:0:1]:19788
    `

### and then... board router get out at 08:27:25, this is logs:

Mar 14 08:27:14 6cd2c7eaf122 otbr-agent[152]: 00:06:34.595 [D] P-RadioSpinel-: ... csmaCaEnabled:1, isHeaderUpdated:0, isARetx:0, skipAes:0, txDelay:0, txDelayBase:0
Mar 14 08:27:14 6cd2c7eaf122 otbr-agent[152]: 00:06:34.602 [D] P-RadioSpinel-: Received spinel frame, flg:0x2, iid:0, tid:12, cmd:PROP_VALUE_IS, key:LAST_STATUS, status:OK
Mar 14 08:27:14 6cd2c7eaf122 otbr-agent[152]: 00:06:34.602 [I] MeshForwarder-: Sent IPv6 UDP msg, len:90, chksum:cec9, ecn:no, to:0xffff, sec:no, prio:net
Mar 14 08:27:14 6cd2c7eaf122 otbr-agent[152]: 00:06:34.603 [I] MeshForwarder-: src:[fe80:0:0:0:b8db:8a8:e4a6:6dc7]:19788
Mar 14 08:27:14 6cd2c7eaf122 otbr-agent[152]: 00:06:34.603 [I] MeshForwarder-: dst:[ff02:0:0:0:0:0:0:1]:19788
Mar 14 08:27:30 6cd2c7eaf122 otbr-agent[152]: 00:06:50.384 [I] Mle-----------: Send Advertisement (ff02:0:0:0:0:0:0:1)
Mar 14 08:27:30 6cd2c7eaf122 otbr-agent[152]: 00:06:50.384 [D] P-RadioSpinel-: Sent spinel frame, flg:0x2, iid:0, tid:13, cmd:PROP_VALUE_SET, key:STREAM_RAW, len:69, channel:15, maxbackoffs:4, maxretries:15 ...
Mar 14 08:27:30 6cd2c7eaf122 otbr-agent[152]: 00:06:50.384 [D] P-RadioSpinel-: ... csmaCaEnabled:1, isHeaderUpdated:0, isARetx:0, skipAes:0, txDelay:0, txDelayBase:0
Mar 14 08:27:30 6cd2c7eaf122 otbr-agent[152]: 00:06:50.393 [D] P-RadioSpinel-: Received spinel frame, flg:0x2, iid:0, tid:13, cmd:PROP_VALUE_IS, key:LAST_STATUS, status:OK
Mar 14 08:27:30 6cd2c7eaf122 otbr-agent[152]: 00:06:50.393 [I] MeshForwarder-: Sent IPv6 UDP msg, len:90, chksum:db5f, ecn:no, to:0xffff, sec:no, prio:net
Mar 14 08:27:30 6cd2c7eaf122 otbr-agent[152]: 00:06:50.393 [I] MeshForwarder-: src:[fe80:0:0:0:b8db:8a8:e4a6:6dc7]:19788
Mar 14 08:27:30 6cd2c7eaf122 otbr-agent[152]: 00:06:50.393 [I] MeshForwarder-: dst:[ff02:0:0:0:0:0:0:1]:19788
`

@agners
Copy link
Contributor

agners commented Mar 14, 2024

Uh, I am a bit confused, is that container running mDNSResponder and avahi-daemon at the same time 🤔

It does seem to me that something with the OTBR container is wrong, as it has this Withdrawing messages. Is maybe that bridge configuration causing problems?

It doesn't look like the official docs on how to use the Docker container use a bridged setup, maybe the container is not designed to be used that way?

I'd suggest to monitor/track mDNS announcements from the outside to see if there are remove announcements. That would explain the behavior you see in Home Assistant.

Then the question becomes why is the OTBR sending remove announcements?

If you want a easy and just works setup: I can recommend using Home Assistant OS. It offers the OpenThread border router add-on, which is built from upstream OpenThread repositories (see https://github.com/home-assistant/addons/tree/master/openthread_border_router). The whole stack of Home Assistant OS + Home Assistant Core + OTBR add-on is well tested and known to be working well.

@lineumaciel
Copy link

lineumaciel commented Mar 21, 2024

It all depends on the type of your installation. Can you write something more. You have typical symptoms for OTBR in bridge mode and Home Assitant in host mode. In such a configuration you need to build a completely new OTBR image. What on the host is responsible for mdns?

Without major problems, OTBR works in host mode but, as I mentioned, you need to prepare it for this. Below you will find a dockerfile to build OTBR in host mode. I personally use Avahi instead of mDNSResponder.

My Dockerfile.

ARG BASE_IMAGE=ubuntu:bionic
FROM ${BASE_IMAGE}

ARG INFRA_IF_NAME
ARG BORDER_ROUTING
ARG BACKBONE_ROUTER
ARG OT_BACKBONE_CI
ARG OTBR_OPTIONS
ARG DNS64
ARG NAT64
ARG NAT64_SERVICE
ARG NAT64_DYNAMIC_POOL
ARG REFERENCE_DEVICE
ARG RELEASE
ARG REST_API
ARG WEB_GUI
ARG MDNS
ARG FIREWALL

ENV INFRA_IF_NAME=${INFRA_IF_NAME:-eth0}
ENV BORDER_ROUTING=${BORDER_ROUTING:-1}
ENV BACKBONE_ROUTER=${BACKBONE_ROUTER:-1}
ENV OT_BACKBONE_CI=${OT_BACKBONE_CI:-0}
ENV OTBR_MDNS=${MDNS:-avahi}
ENV OTBR_OPTIONS=${OTBR_OPTIONS:-"-DOT_THREAD_VERSION=1.3 -DOT_FULL_LOGS=ON -DOT_DUA=ON -DOT_MLR=ON -DOTBR_DBUS=OFF -DOTBR_TREL=ON -DOT_DIAGNOSTIC=1 -DOT_LINK_RAW=1 -DOTBR_VENDOR_NAME=HomeAssistant -DOTBR_PRODUCT_NAME=OpenThreadBorderRouter -DBUILD_TESTING=OFF -DCMAKE_INSTALL_PREFIX=/usr -DOTBR_FEATURE_FLAGS=ON -DOTBR_DNSSD_DISCOVERY_PROXY=ON -DOTBR_SRP_ADVERTISING_PROXY=ON -DOTBR_MDNS=avahi -DOTBR_WEB=ON -DOTBR_BORDER_ROUTING=ON -DOTBR_REST=ON -DOTBR_BACKBONE_ROUTER=ON -DOTBR_NAT64=ON -DOT_POSIX_NAT64_CIDR="192.168.255.0/24" -DOTBR_DNS_UPSTREAM_QUERY=ON -DOT_CHANNEL_MONITOR=ON -DOT_COAP=OFF -DOT_COAPS=OFF"}
ENV DEBIAN_FRONTEND noninteractive
ENV PLATFORM ubuntu
ENV REFERENCE_DEVICE=${REFERENCE_DEVICE:-0}
ENV RELEASE=${RELEASE:-1}
ENV NAT64=${NAT64:-1}
ENV NAT64_SERVICE=${NAT64_SERVICE:-openthread}
ENV NAT64_DYNAMIC_POOL=${NAT64_DYNAMIC_POOL:-192.168.255.0/24}
ENV DNS64=${DNS64:-0}
ENV WEB_GUI=${WEB_GUI:-1}
ENV REST_API=${REST_API:-1}
ENV FIREWALL=${FIREWALL:-1}
ENV DOCKER 1

RUN env

ENV OTBR_DOCKER_REQS sudo python3

ENV OTBR_DOCKER_DEPS git ca-certificates

ENV OTBR_BUILD_DEPS apt-utils build-essential psmisc ninja-build cmake wget ca-certificates
libreadline-dev libncurses-dev libcpputest-dev libdbus-1-dev libavahi-common-dev
libavahi-client-dev libboost-dev libboost-filesystem-dev libboost-system-dev
libnetfilter-queue-dev

ENV OTBR_OT_BACKBONE_CI_DEPS curl lcov wget build-essential python3-dbus python3-zeroconf

ENV OTBR_NORELEASE_DEPS
cpputest-dev

RUN apt-get update
&& apt-get install --no-install-recommends -y $OTBR_DOCKER_REQS $OTBR_DOCKER_DEPS
&& ([ "${OT_BACKBONE_CI}" != "1" ] || apt-get install --no-install-recommends -y $OTBR_OT_BACKBONE_CI_DEPS)
&& ln -fs /usr/share/zoneinfo/UTC /etc/localtime

COPY ./script /app/script
COPY ./third_party/mDNSResponder /app/third_party/mDNSResponder
WORKDIR /app

RUN ./script/bootstrap
COPY . .
RUN ./script/setup

RUN ([ "${DNS64}" = "0" ] || chmod 644 /etc/bind/named.conf.options)
&& ([ "${OT_BACKBONE_CI}" = "1" ] || (
mv ./script /tmp
&& mv ./etc /tmp
&& find . -delete
&& rm -rf /usr/include
&& mv /tmp/script .
&& mv /tmp/etc .
&& apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false $OTBR_DOCKER_DEPS
&& apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false $OTBR_BUILD_DEPS
&& ([ "${RELEASE}" = 1 ] || apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false "$OTBR_NORELEASE_DEPS";)
&& rm -rf /var/lib/apt/lists/*
&& rm -rf /tmp/*
))

ENTRYPOINT ["/app/etc/docker/docker_entrypoint.sh"]

EXPOSE 80

My Docker Compose.

version: "3.4"
services:
openthread_border_router:
container_name: openthread
image: openthread-trel-test:latest #openthread/otbr-trel
restart: unless-stopped
network_mode: host
privileged: true
devices:
- /dev/ttyACM1
volumes:
- /etc/localtime:/etc/localtime:ro
- /run/avahi-daemon/socket:/run/avahi-daemon/socket
- /var/run/dbus:/var/run/dbus:ro
ports:
- 8080:8080
- 8081:8081
command: ["--radio-url", "spinel+hdlc+uart:///dev/ttyACM1", "--backbone-interface", "${BACKBONE_INTERFACE:-eno1}", "--trel-url", "trel://${BACKBONE_INTERFACE:-eno1}",]

 deploy:
   resources:
     limits:
       memory: 1G

image

image

image

@jwhui
Copy link
Member

jwhui commented May 23, 2024

Closing stale issue.

@jwhui jwhui closed this as completed May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants