New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RPi4 + native-boot SSD sometimes fails to boot from USB3 port, and WiFi freezes when IOTstack comes up #219
Comments
I believe this is not directly related, but just to be careful with latest firmware ieeprom-2020-12-11.bin in combinations with Kingspec SSD. Before upgrade firmware, I had no problems with booting. But keep in mind, the Raspberrys worked fine for a months with only a few boots. |
Hello The more reboot, the more problems ... My symptoms are, I was thinking that my data traveller could have been damaged due to forced power shutdown. I discovered yesterday evening that the green light blinking periodically is a kind of Morse code for boot trouble. What is clear is that I could never get my iostack running again since even « Docker ps » is stucked trying to remove and install Docker did not help... So next step is try to boot from USB2 but my docker config is still wrong... I will trying install docker on a SD card with a minimum container config (piHole, openhab, nodered, influxDB , grafana ) to be able to get my PI4 running waiting for help from the community. |
I'm going on the information here but I really don't know what I'm doing. This is what I see on the three RPis:
While there are subtle formatting differences, they all seem to agree on the version signature and timestamp-in-seconds, as well as the "Current" and "Latest" values. I've been interpreting that to mean that, even though the three Pis are running different base builds and updates-since, they have the same EEPROM, and that all the updates have been applied, silently, behind the scenes. Do you think I'm misinterpreting? Should I be looking somewhere else? |
i dont know if it was the same problem, but i just now saw this because someone said something on the discord and paraphraser responded with this. anywho... one day after 2 months of running well i noticed my tasmota switches werent working, which led to me discovering the pi was not working. vnc'd in and os was running and responding but not the stack. i rebooted (seemed like the easiest way to restart everything) and found that the pi would not boot off of ssd anymore. i eventually decided to try a fresh sd card instead of usb ssd. booted fine. i then proceeded to rebuild everything this time being sure to backup a whole image to the sd card after i got to the point i liked how things were running. now my suspicions are some sort of corrupt boot partition, but honestly i dont have any expertise in this area with linux. i guess my thought was if all i did was clone an sd card back onto the ssd with a clean image then its probably not hardware related. since the firmware wasnt changed and seemed to boot fine with a new image im assuming this has something to do with a corrupted boot partition |
Hello, yesterday morning my first trial was to copy a well working SD card a got with my RPI4 originally . I need to check how add unify controller container in iostack as it is not in the menu... Pierre |
For anyone following this issue, on Jan 6 I rebuilt "new-dev" from the ground up, to boot and run from SSD (ie no SD). I started with the SSD attached to the upper USB3 port. Apart from one reboot during the initial setup which hung and needed a power-cycle, the RPi has performed flawlessly ever since.
Then I restored an IOTstack backup and brought up the stack. Earlier experience was that this was where the WiFi would freeze. WiFi was OK and has stayed OK. Previous experience was that an apt update / upgrade followed by a reboot while the stack was up was likely to cause a freeze. I've done a two of those in the intervening period without incident but the third one a few moments ago hung and needed two power-cycles plus moving the SSD to USB2 before the RPi came up again.
The OS/EEPROM version situation now is:
Summary: the OS version has changed from 5.10.3-v7l+ to 5.4.83-v7l+ but the EEPROM remains the same.
But, now I'm in the situation where:
So, something about the latest OS update... ... which hasn't also affected sec-dev which is also at the same OS release. I'm in two minds about whether to:
Option 1 implies that the initial apt update & full-upgrade will bring the machine to the same version it is at now, before docker and docker-compose are installed. If that works on USB3 then it kinda implies it is something about applying system updates over the top. Option 2 might (eventually) help isolate whatever is causing this to either the hardware of the new-dev RPi or the boot method, with SD+SSD being "more immune" than SSD only. Still very strange. Still very confused. |
See also Issue 253. I've just implemented the fix described there on new-dev, put the SSD back on the USB3 port, done several reboots with/without the stack running and, thus far, it all seems happy. The ping-reply pattern from the Granted that 3 tests are not statistically significant but it's two more than I could achieve before this patch. Anyone else facing the same problem might want to give it a whirl!!! |
I've implemented the Issue 253 fix on all of my Pis. That's:
All systems and associated IOTstack instances as happy as the proverbial Larry. For the record, what I've actually been doing is:
|
It's day 4 (give or take). I'm up to about 30 reboots on new-dev scattered across the period, interspersed with two separate apt update / apt upgrades followed by an immediate reboot, always with IOTstack up and running, the whole time with the SSD on the upper USB3 port. The thing hasn't missed a beat. I've also done a couple of reboots of iot-hub and sec-dev for good measure. They're happy too. I think we're on a winner here. @gbsmith ➡️ IOTstack Hall Of Fame! 👏👏👏 My standard build script now includes:
|
Since installing the "allowinterfaces eth0,wlan0", I've had one firing of
Anyway, it's a week since "allowinterfaces eth0,wlan0" and I haven't had a single freeze anywhere. |
Thanks for keeping us updated, @Paraphraser. Back on 23 January I mentioned on Discord that I would report back, and here I am fulfilling my (belated) promise. Since my RPi4 is in a rather awkward position under a desk, I didn't feel like crawling under there to do a series of rigorous tests, but I do have some good news. When I first brought the issue up on Discord, I had the SSD plugged into the top USB3. You suggested I try the bottom one, so I did. At the same time, I did an From my perspective, the problem is solved! |
Here we are - March 1st already. How time flies! I wanted to confirm that I have not had a single freeze, at boot time or at any other time, on any RPi since applying the "allowinterfaces eth0,wlan0" patch. I have rebuilt new-dev at least twice (not to solve any real problem, just refining and testing my build scripts). The most-recent build was on Feb 10. sec-dev used to boot SD, run SSD but I rebuilt that on Feb 23 to be native boot from SSD. For the record, the OS version and EEPROM status is:
|
I don't think this is really an IOTstack issue as such. I'm describing it here in case anyone else is experiencing problems with an RPi4 + native boot SSD + IOTstack causing odd behaviour like:
I don't think IOTstack causes either problem, but it may be involved in triggering underlying causes.
I have three 4GB RPi4s, named "iot-hub", "sec-dev" and "new-dev".
Each has an SSD attached to a USB3 port. iot-hub has an OWC SSD while the other two have Samsung T5 SSDs.
iot-hub and sec-dev were purchased last year so they boot from SD but run from SSD (built as per this gist). new-dev is a recent purchase and I went through the steps to make it "SSD bootable" but otherwise followed the same gist for add-on packages etc.
All three are Ethernet connected.
All SSDs are bus-powered.
The Raspberry Pi world seems to be full of people who offer "power" as the universal answer to almost any RPi problem. I'd like to address that topic because I'm quite sure "power" isn't the issue:
There are software differences:
apt upgrade
broke something (issue reported here)In terms of
uname -a
:I decided not to risk a repeat of
apt upgrade
breaking iot-hub because that's my "live" system, which is why it's stuck at 5.4.51. But, other than that, the add-on packages are the same and IOTstack is the same.Aside from the
apt upgrade
breakage mentioned above, iot-hub and sec-dev have been rock solid since the get-go (over 12 months). The only real issue of note is the WiFi interfaces occasionally going walkabout, the cure for which is documented at this gist.But I am having terrible trouble with new-dev. Frequently (like maybe every 5th reboot) it refuses to boot, even after a power-cycle. When I put the SD back in, boot from that and nose around, the SSD just isn't there. I can't see it in
lsblk
and I can't mount either logical volume.I normally use the upper USB3 port (no particular reason). Sometimes, when new-dev refuses to "see" the SSD, moving the SSD to the lower USB3 port will make it come good. Only "sometimes". Definitely not "always". The most reliable way of recovering is to attach the SSD to either one of the USB2 ports. That works every time!
All my RPi4s have inline power switches. Any time I'm going to insert/remove the SD, or move the SSD between ports, I turn off power before making the change and turn power back on afterwards. Because of that, I discount possibilities like:
It's downright weird.
Now, here's another wrinkle. I mentioned that iot-hub and sec-dev had occasional WiFi issues. With new-dev, I can pretty much kill WiFi simply by:
By "kill", I mean that:
pings
aimed at the WiFi interface from another machine; andifconfig wlan0
on new-dev has no IP address (either v4 or v6).The obvious variation on the theme is:
The ICMP echo reply pattern is replies come back during the shutdown, and briefly as the machine starts to come back up, but stop pretty much as soon as Docker tries to resume the stack.
If, instead of using
docker-compose up -d
, I bring up one container at a time (egdocker-compose up -d mosquitto
) then WiFi will generally stay up. But WiFi dies quite reliably if the whole stack comes up at once.The problem goes away completely if I connect the SSD to USB2. I can reboot with IOTstack up or down, or take the stack up or down while the system is running. WiFi remains up and stable.
Hypothesis: it's a load problem - IOTstack is overloading the Pi. Well:
Another thing that makes my eyebrows twitch is that the guts of the gist on WiFi issues mentioned earlier is
sudo dhclient wlan0
. On iot-hub and sec-dev, this command returns immediately and has the effect of resetting the interface which immediately comes good. When WiFi has gone walkabout on new-dev, the command takes over a minute to return and doesn't cure whatever has caused WiFi to go away. If it was a "load" issue then surely the command would work once IOTstack was up and stable? Why does bringing containers up one at a time sometimes avoid a WiFi freeze?I really do not know what to make of all this. For the life of me, I can't see why booting SSD+USB3 = problems while SSD+USB2 does not. The only real difference is somewhat slower access. A timing issue?
I've rebuilt the system three times, starting from a clean image with Balena Etcher. That hasn't helped.
I bought new-dev so I'd have a test vehicle for native-boot SSD (after which I was going to redeploy it in another project). I have no idea whether the new-dev hardware is a lemon, whether SSD native boot isn't all it's meant to be, whether this is a Raspbian difference between 2020-08-20 and 2020-12-02, or what. I don't think it's the Samsung SSD because I've got another one of those too and have already tried a swap: the problems follow new-dev.
I'm going to try re-building new-dev as an SD/SSD combo from 2020-12-02, then from 2020-08-20. I don't want to try moving sec-dev to SSD native boot in case this turns out to be a firmware issue and I wind up with two misbehaving RPi4s.
If anyone reading this has any suggestions for things to try out, I'm all ears!
2020 - a year sent to try our patience! With any luck, the expression "2020 hindsight" will take on a whole new meaning.
The text was updated successfully, but these errors were encountered: