Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Software gets stuck until restart #98

Closed
bartekd123 opened this issue Mar 5, 2022 · 20 comments
Closed

Software gets stuck until restart #98

bartekd123 opened this issue Mar 5, 2022 · 20 comments
Labels
enhancement New feature or request
Milestone

Comments

@bartekd123
Copy link

Is there anything that can be done to prevent the container from being stuck and not processing anything until I restart? Here is an example of my log:

You can see march 2nd it stopped updating, and something is not running. Then I just restarted it and it continues on again until it gets stuck. Usually lasts a few hours to a day before it is stuck like this.

Let me know what info you need from me.

[2022-03-02 02:40:54] {"Time":"2022-03-02T02:40:54.85853604Z","Offset":0,"Length":0,"Type":"SCM+","Message":{"FrameSync":5795,"ProtocolID":30,"EndpointType":171,"EndpointID":12345,"Consumption":189410,"Tamper":18688,"PacketCRC":39682}}
[2022-03-02 02:40:54] Meter "12345" - Consumption 189410. Sending value to MQTT.
[2022-03-02 02:40:54] Predicted reading: 189333.7495830916 - Actual reading: 189410
[2022-03-02 02:40:54] Grow rate avg: 39.44736842105263
[2022-03-02 02:40:54] Possible anomaly detected!
[2022-03-02 02:40:54] Distance from prediction: 76.25041690841317
[2022-03-02 02:40:54] Threshold for anomaly: 189373.19695151263
[2022-03-02 02:40:54] Sending message to MQTT:
[2022-03-02 02:40:54]  > topic => rtlamr/12345/attributes
[2022-03-02 02:40:54]  > payload => {"Message Type": "SCM+", "Predicted": 189333.7495830916, "Anomaly": true, "FrameSync": 5795, "ProtocolID": 30, "EndpointType": 171, "EndpointID": 12345, "Consumption": 189410, "Tamper": 18688, "PacketCRC": 39682}
[2022-03-02 02:40:54]  > retain => True
[2022-03-02 02:40:54] Sending message to MQTT:
[2022-03-02 02:40:54]  > topic => rtlamr/12345/state
[2022-03-02 02:40:54]  > payload => 189410
[2022-03-02 02:40:54]  > retain => True
[2022-03-02 02:40:54] Sleep_for defined, time to sleep!
[2022-03-02 02:40:54] Terminating all subprocess...
[2022-03-02 02:40:54] Kill process called.
[2022-03-02 02:40:54] Killing RTL_TCP...
[2022-03-02 02:40:59] Killed.
[2022-03-02 02:40:59] Killing RTLAMR...
[2022-03-02 02:40:59] Killed in the first attempt.
[2022-03-02 02:40:59] Sleeping for 300 seconds, see you later...
[2022-03-02 02:46:00] Sending message to MQTT:
[2022-03-02 02:46:00]  > topic => rtlamr/status
[2022-03-02 02:46:00]  > payload => online
[2022-03-02 02:46:00]  > retain => True
[2022-03-02 02:46:00] Trying to start RTL_TCP: /usr/bin/rtl_tcp -s 2048000
[2022-03-02 02:46:00] RTL_TCP started with PID 1569
[2022-03-02 02:46:05] RTL_TCP is ready to receive connections!
[2022-03-02 02:46:05] Trying to start RTLAMR: /usr/bin/rtlamr -msgtype=scm+ -format=json -filterid=12345 -unique=true -symbollength=32
[2022-03-02 02:46:05] RTLAMR started with PID 1574
[2022-03-02 02:46:05] 02:46:05.104925 decode.go:45: CenterFreq: 912600155
[2022-03-02 02:46:05] 02:46:05.105145 decode.go:46: SampleRate: 1048576
[2022-03-02 02:46:05] 02:46:05.105158 decode.go:47: DataRate: 32768
[2022-03-02 02:46:05] 02:46:05.105168 decode.go:48: ChipLength: 32
[2022-03-02 02:46:05] 02:46:05.105179 decode.go:49: PreambleSymbols: 16
[2022-03-02 02:46:05] 02:46:05.105189 decode.go:50: PreambleLength: 1024
[2022-03-02 02:46:05] 02:46:05.105199 decode.go:51: PacketSymbols: 128
[2022-03-02 02:46:05] 02:46:05.105209 decode.go:52: PacketLength: 8192
[2022-03-02 02:46:05] 02:46:05.105222 decode.go:59: Protocols: scm+
[2022-03-02 02:46:05] 02:46:05.105233 decode.go:60: Preambles: 0001011010100011
**[2022-03-02 02:46:05] 02:46:05.105244 main.go:124: GainCount: 29
[2022-03-05 00:57:05] Shutdown detected, killing process.**
[2022-03-05 00:57:05] Killing RTL_TCP...
[2022-03-05 00:57:10] Killed.
[2022-03-05 00:57:10] Killing RTLAMR...
[2022-03-05 00:57:10] Killed in the first attempt.
[2022-03-05 00:57:10] Graceful shutdown.
[2022-03-05 00:57:10] Sending message to MQTT:
[2022-03-05 00:57:10]  > topic => rtlamr/status
[2022-03-05 00:57:10]  > payload => offline
[2022-03-05 00:57:10]  > retain => True
[2022-03-05 00:57:14] Configured MQTT sender:
[2022-03-05 00:57:14]  > hostname => 10.150.1.231
[2022-03-05 00:57:14]  > port => 8045
[2022-03-05 00:57:14]  > username => None
[2022-03-05 00:57:14]  > client_id => rtlamr2mqtt
[2022-03-05 00:57:14] Sending message to MQTT:
[2022-03-05 00:57:14]  > topic => rtlamr/status
[2022-03-05 00:57:14]  > payload => online
[2022-03-05 00:57:14]  > retain => True
[2022-03-05 00:57:14] Trying to start RTL_TCP: /usr/bin/rtl_tcp -s 2048000
[2022-03-05 00:57:14] RTL_TCP started with PID 8
[2022-03-05 00:57:19] RTL_TCP is ready to receive connections!
[2022-03-05 00:57:19] Trying to start RTLAMR: /usr/bin/rtlamr -msgtype=scm+ -format=json -filterid=12345 -unique=true -symbollength=32
[2022-03-05 00:57:19] RTLAMR started with PID 13
[2022-03-05 00:57:19] 00:57:19.161118 decode.go:45: CenterFreq: 912600155
[2022-03-05 00:57:19] 00:57:19.162200 decode.go:46: SampleRate: 1048576
[2022-03-05 00:57:19] 00:57:19.162658 decode.go:47: DataRate: 32768
[2022-03-05 00:57:19] 00:57:19.162974 decode.go:48: ChipLength: 32
[2022-03-05 00:57:19] 00:57:19.162999 decode.go:49: PreambleSymbols: 16
[2022-03-05 00:57:19] 00:57:19.163014 decode.go:50: PreambleLength: 1024
[2022-03-05 00:57:19] 00:57:19.163025 decode.go:51: PacketSymbols: 128
[2022-03-05 00:57:19] 00:57:19.163037 decode.go:52: PacketLength: 8192
[2022-03-05 00:57:19] 00:57:19.163052 decode.go:59: Protocols: scm+
[2022-03-05 00:57:19] 00:57:19.163063 decode.go:60: Preambles: 0001011010100011
[2022-03-05 00:57:19] 00:57:19.163075 main.go:124: GainCount: 29
@allangood
Copy link
Owner

Hello,

Some meters don't advertise all the time, it might be the problem here. My USB device stops to work sometimes and I have to do a full reboot on my Pi to make it work again. Since I added a cooler this problem went away.

Have you tried to change the tickle_rtl_tcp parameter? It was reverted to false by default a few versions ago.

By the logs,there is no problem anywhere.

@bartekd123
Copy link
Author

My meter always sends info every 5 minutes. But the value only changes every hour. It’s never where it doesn’t run from what I saw, just the container fails.

sometimes I have to reboot as well for it to work. So something is getting stuck at the usb level, or os maybe? Is there a usb stick that works better?

I didn’t know about that parameter. I’ll give it a shot, thanks!

@allangood
Copy link
Owner

I am using this device. I had other similar reports with people using this same device. My guess is a problem with the USB device itself...

My device is working flawless for the past month now. I've changed it to a different USB port on my Pi and installed a small fan/cooler

@bartekd123
Copy link
Author

I have one from the same company, but its the nano 2.

I have set the parameter tickle_rtl_tcp to true, and it was still causing me issues. I have since then created a script to restart the container every hour, and it has been almost a day that it has been running fine.

Here is how I did the container restart in my stack:

  restartrtlamr:
    image: docker
    container_name: restartrtlamr
    volumes: ["/var/run/docker.sock:/var/run/docker.sock"]
    command: ["/bin/sh", "-c", "while true; do sleep 3600; docker restart rtlamr; done"]
    restart: unless-stopped

Ill add another note here in a day or 2 to confirm that things are still working.

@bartekd123
Copy link
Author

With those 2 changes (tickle and restarting container every hour), I still get the issue where it just gets stuck till I reboot. Any way of SSHing into the container and running a shell command to see if the USB is working or connected? Any way to troubleshoot there?

@allangood
Copy link
Owner

You can access the container with:

docker exec -ti <name of container> /bin/bash

@bartekd123
Copy link
Author

To shell into the container I know how to do. But was wondering if there is any commands I can run within the container to see if the usb device is functioning?

@allangood
Copy link
Owner

When I was experiencing this behavior, what I did was:

ps -axf   # To see the 2 pids I need to kill, rtl_tcp and rtlamr
kill PID
kill -9 PID # id needed, usually not
rtl_tcp
rtlamr

Basically is the same as the python program executes.
On my machine, these commands would return with error until I reboot my RaspPi.

What fixed the issue was adding a cooling/fan to my Pi.

@stu1811
Copy link

stu1811 commented Mar 29, 2022

I compiled this program at the link below and set it run against the RTL device every day at 1AM. This consistently works for me.
https://askubuntu.com/a/661/1154095

#!/bin/bash
date
RTLDEV=$(lsusb   |grep -i rtl| awk '{print "/dev/bus/usb/"$2"/"$4}'| sed 's/:$//')
sudo /home/stu/usbreset $RTLDEV

crontab-entry

0 1 * * * /home/stu/resetRTL > /home/stu/rtlResults

@allangood allangood added this to the Version 1.7.0 milestone Apr 1, 2022
@allangood allangood added the enhancement New feature or request label Apr 1, 2022
@allangood
Copy link
Owner

I am going to make some tests and will try to incorporate this USB reset code in the docker container.

@allangood
Copy link
Owner

Just for reference, the code to reset the USB is this:

    printf("Resetting USB device %s\n", filename);
    rc = ioctl(fd, USBDEVFS_RESET, 0);
    if (rc < 0) {
        perror("Error in ioctl");
        return 1;
    }
    printf("Reset successful\n");

@allangood
Copy link
Owner

Function added on #101
Ready to be tested.

If you want to test, use the main tag:
docker pull allangood/rtlamr2mqtt:main

@bartekd123
Copy link
Author

Thanks, I updated mine, will see how it does over the next couple days.

@allangood
Copy link
Owner

allangood commented Apr 2, 2022

I forgot to mention: To test this new feature you need to add a new parameter reset_usb to the general section:

general:
  sleep_for: 300
  verbosity: debug
  tickle_rtl_tcp: false
  reset_usb: "005:002"

You can get the bus and device number from lsusb

@JeffFaer
Copy link
Contributor

JeffFaer commented Apr 3, 2022

Good timing, I got the alert I mentioned in #85 (comment) this morning. I hadn't updated my home assistant add-on, but I ran the python commands manually:

$ ssh hass -p 22222
[hass] $ docker exec -it $(docker ps --filter 'name=rtlamr' --format '{{.ID}}') /bin/bash
[rtlamr] $ apt update
[rtlamr] $ apt install usbutils
[rtlamr] $ lsusb
Bus 002 Device 001: ID ...
Bus 001 Device 007: ID ... Realtek RTL2838UHIDIR
Bus 001 Device 003: ID ...
Bus 001 Device 002: ID ...
Bus 001 Device 001: ID ...
[rtlamr] $ python3
>>> busnum="001"
>>> devnum="007"
>>> filename = "/dev/bus/usb/{:03d}/{:03d}".format(int(busnum), int(devnum))
>>> print('Reseting USB device: {}'.format(filename))
Reseting USB device: /dev/bus/usb/001/007
>>> USBDEVFS_RESET = ord('U') << (4*2) | 20
>>> fd = open(filename, "wb")
>>> ioctl(fd, USBDEVFS_RESET, 0)
0
>>> fd.close()

And that does seem to have fixed the dongle. The home assistant add-on was able to get a reading

@allangood
Copy link
Owner

Added on version 1.7.0 by PR #104

@kspearrin
Copy link

I too have the "NooElec NESDR Mini" from Amazon. I contacted the manufacturer complaining about the issue of the device failing to respond until I restarted it (after a few hours of it running) and they suggested that it is a power problem from the device I have it plugged into (in my case, a proxmox server). They suggested a powered USB hub. I bought this one on amazon and have been using it for over a week now without the issue occurring.

@allangood
Copy link
Owner

Thank you for the heads up.
Since I've introduced the #101 (USB Reset) the problem went away for me. Hope this helps someone else! :)

@ADeadPixel
Copy link

ADeadPixel commented Nov 27, 2022

@kspearrin are you passing the "USB Vendor/Device ID" or the "USB port" to your HA VM? I have a very similar setup and I'm trying to dig into my issue (#181 ) which may be related to this. Here is how I'm currently passing it in.

Screenshot 2022-11-27 091227

@kspearrin
Copy link

@ADeadPixel I use proxmox and pass it through like so:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants