Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SCD30 Stops Communicating Sometimes #61

Closed
keenanjohnson opened this issue Oct 27, 2021 · 28 comments · Fixed by #83
Closed

SCD30 Stops Communicating Sometimes #61

keenanjohnson opened this issue Oct 27, 2021 · 28 comments · Fixed by #83
Assignees
Labels
bug Something isn't working

Comments

@keenanjohnson
Copy link
Member

As I've deployed more Frog sensors, I've noticed that occasionally some of the SCD30 sensors will stop responding. The condition is resolved after a power cycle, so I don't believe that it's anything physical (loose connector, etc), but I'm open to be proven wrong.

I haven't been able to correlate this to any particular event.

Theories for cause:

  • Power Instability - Perhaps a power droop or spike can put the SCD30 sensor into a non-responsive state?
  • Software bug in SCD30 Python Library - Perhaps there is a software bug in the SCD30 Python library we use?
  • Software bug in the SCD30 Firmware - Perhaps there is a firmware bug in the SCD30 firmware itself?

Example Error Log

Service exited 'co2 sha256:d3e002ec75c5d21446f919b309b83ef25eac24c214e0c025227e0ef4386e8e2e'
Restarting service 'co2 sha256:d3e002ec75c5d21446f919b309b83ef25eac24c214e0c025227e0ef4386e8e2e'
 co2  Traceback (most recent call last):
 co2    File "/usr/local/lib/python3.9/site-packages/adafruit_bus_device/i2c_device.py", line 154, in __probe_for_device
 co2      self.i2c.writeto(self.device_address, b"")
 co2    File "/usr/local/lib/python3.9/site-packages/busio.py", line 159, in writeto
 co2      return self._i2c.writeto(address, buffer, stop=stop)
 co2    File "/usr/local/lib/python3.9/site-packages/adafruit_blinka/microcontroller/generic_linux/i2c.py", line 49, in writeto
 co2      self._i2c_bus.write_bytes(address, buffer[start:end])
 co2    File "/usr/local/lib/python3.9/site-packages/Adafruit_PureIO/smbus.py", line 314, in write_bytes
 co2      self._device.write(buf)
 co2  TimeoutError: [Errno 110] Connection timed out
 co2  
 co2  During handling of the above exception, another exception occurred:
 co2  
 co2  Traceback (most recent call last):
 co2    File "/usr/local/lib/python3.9/site-packages/adafruit_bus_device/i2c_device.py", line 160, in __probe_for_device
 co2      self.i2c.readfrom_into(self.device_address, result)
 co2    File "/usr/local/lib/python3.9/site-packages/busio.py", line 149, in readfrom_into
 co2      return self._i2c.readfrom_into(address, buffer, stop=stop)
 co2    File "/usr/local/lib/python3.9/site-packages/adafruit_blinka/microcontroller/generic_linux/i2c.py", line 56, in readfrom_into
 co2      readin = self._i2c_bus.read_bytes(address, end - start)
 co2    File "/usr/local/lib/python3.9/site-packages/Adafruit_PureIO/smbus.py", line 181, in read_bytes
 co2      return self._device.read(number)
 co2  TimeoutError: [Errno 110] Connection timed out
 co2  
 co2  During handling of the above exception, another exception occurred:
 co2  
 co2  Traceback (most recent call last):
 co2    File "/usr/src/co2.py", line 55, in <module>
 co2      scd = adafruit_scd30.SCD30(i2c_bus)
 co2    File "/usr/local/lib/python3.9/site-packages/adafruit_scd30.py", line 93, in __init__
 co2      self.i2c_device = i2c_device.I2CDevice(i2c_bus, address)
 co2    File "/usr/local/lib/python3.9/site-packages/adafruit_bus_device/i2c_device.py", line 50, in __init__
 co2      self.__probe_for_device()
 co2    File "/usr/local/lib/python3.9/site-packages/adafruit_bus_device/i2c_device.py", line 163, in __probe_for_device
 co2      raise ValueError("No I2C device at address: 0x%x" % self.device_address)
 co2  ValueError: No I2C device at address: 0x61
@keenanjohnson keenanjohnson added the bug Something isn't working label Oct 27, 2021
@keenanjohnson keenanjohnson added this to the Frog Sensor V2 milestone Nov 2, 2021
@keenanjohnson keenanjohnson self-assigned this Nov 3, 2021
@keenanjohnson
Copy link
Member Author

It's seems possible that slowing down the I2C bus may also help!

https://learn.adafruit.com/circuitpython-on-raspberrypi-linux/i2c-clock-stretching

@keenanjohnson
Copy link
Member Author

Seems like this can be set via

# Clock stretching by slowing down to 10KHz
dtparam=i2c_arm_baudrate=10000

@keenanjohnson
Copy link
Member Author

@mschwanzer this appears to be what's happening with your sensor. I'm going to try slowing the clock speed on I2C to see if this stops it.

@keenanjohnson
Copy link
Member Author

I changed the clock speed on a few test devices and verified by running the command:

cat /sys/kernel/debug/clk/clk_summary

I'll watch those few to see if I can get the condition to repeat.

@keenanjohnson
Copy link
Member Author

Unfortunately after changing this setting down to 10000, I've still seen this sensor disconnection issue reproduce.

@keenanjohnson
Copy link
Member Author

@eaudiffred it looks like my software fix here did not resolve the issue with your sensor. I'm going to try a second software fix related to the clock stretching of the communication bus that seems more promising!

More information : https://github.com/RequestForCoffee/rpi-i2c-timings

@eaudiffred
Copy link
Contributor

Great, thanks! Let me know when to reboot.

@keenanjohnson
Copy link
Member Author

Will do @eaudiffred!

Note for myself, the CM4 uses bcm2835 (i2c@7e804000)

@keenanjohnson
Copy link
Member Author

Ok I tried for a while to use the rpi-i2c-timing utility above, but couldn't get it to function (see RequestForCoffee/rpi-i2c-timings#1).

I found this related issue which suggested using parameter force_turbo=1 in the config.txt, so I'm going to try that.

@keenanjohnson
Copy link
Member Author

@eaudiffred if you want to give your sensor another reboot, we can see if that helps things :)

@djgood
Copy link
Contributor

djgood commented Nov 12, 2021

I’m also seeing this occasionally on my sensor (and right now actually). We should see what’s happening on the bus lines w/ a scope. I can do that tomorrow. Because I’m wondering if we’re running into a stuck/blocked I2C bus. A quick Google search shows that the i2c driver for the raspberry pi doesn’t implement any kind of i2c recovery routines, which is unfortunate because we’d have to implement our own. Sounds like fun, though :)

@keenanjohnson
Copy link
Member Author

keenanjohnson commented Nov 12, 2021 via email

@eaudiffred
Copy link
Contributor

Restarted this morning and moved it back outside. Just checked and unfortunately I don't see it online. I'll give it another restart when I get home from work. In the past it's taken 2 or 3 power cycles before I see the dot on the ribbit network map.

@djgood
Copy link
Contributor

djgood commented Nov 12, 2021

Scope captures

Looking at the bus while co2 service was reporting ValueError: No I2C device at address: 0x61.

Screen Shot 2021-11-12 at 5 53 28 PM

No clock cycles so it seems like it's not even trying to clock in data. Kind of a misleading error.

I disconnected the DPS310 to see if that would resolve anything, but nothing changed.

After power cycling SCD-30 by disconnecting/reconnecting the qwiic connect cable:

image

Bus looks happy and co2 service is functioning normally. I'd put my money on it being an issue where the clock stretching on the SCD-30 isn't supported. Do you know if the Adafruit library uses hardware or software i2c? The i2c peripheral on the bcm2835 apparently has a buggy implementation of clock stretching, but some software libraries support it better.

Less critical but ideally Ribbit would have a way to recover from a stuck bus, since it's bound to crop up somehow. A potential robustness improvement, maybe.

@keenanjohnson
Copy link
Member Author

Thanks for those scope traces @djgood! So since you disconnected the SCC30, does that make it most likely that the stuck condition is within the BCM chip on the Raspberry Pi?

I believe it uses the hardware i2c.

I had theorized that adding a power switch to switch power to the SCD30 on and off in case of a stuck bus would resolve the issue. It seems like your testing confirms this correct? Perhaps there is a better way in software, but as you mentioned maybe the buggy bcm stuff prevents that.

Sparkfun used to make a nice QWIIC power module but it's out of production it seems. Shouldn't be too hard for us to reproduce if we had to.

@djgood
Copy link
Contributor

djgood commented Nov 13, 2021

No problem! Hm, I was thinking that it was the SCD-30 that was stuck in that clock stretching condition, which blocks the bus and the Raspberry Pi doesn't know how to detect it. So once the SCD-30 releases the bus when it's powered off the Raspberry Pi can continue driving the bus as normal.

I think if we could power cycle the SCD30 (or even reset) that would fix the issue but it seems like the easiest alternative to me is to use the software I2C bus, more info here: https://github.com/fivdi/i2c-bus/blob/master/doc/raspberry-pi-software-i2c.md

This is a good discussion on I2C clock stretching on the Raspberry Pis: https://raspberrypi.stackexchange.com/questions/127271/does-the-raspberry-pi-i²c-bus-support-clock-stretching

@keenanjohnson
Copy link
Member Author

keenanjohnson commented Nov 13, 2021 via email

@keenanjohnson
Copy link
Member Author

Based on this forum post, it seems like it might be possible to use the existing hardware i2c as GPIOS using the software i2c. That would be rad.

@djgood
Copy link
Contributor

djgood commented Nov 13, 2021

Yeah, looks like that's totally doable! Awesome!

@keenanjohnson
Copy link
Member Author

I tried testing the software i2c by adding the following, but it doesn't seem to be working.

dtparam=i2c_arm=off
dtparam=i2c=off
dtoverlay=i2c-gpio,i2c_gpio_sda=2,i2c_gpio_scl=3

I created a balena forum post here to see if anyone else has any additional tips.

@djgood
Copy link
Contributor

djgood commented Nov 15, 2021

Have you tried specifying different pins? Wondering if the problem is with disabling the hardware i2c or enabling the software i2c. I’ll try getting something working on my device

@keenanjohnson
Copy link
Member Author

Yes I tried the same configuration with different pins, but same thing

dtparam=i2c_arm=off
dtparam=i2c=off
dtoverlay=i2c-gpio,i2c_gpio_sda=23,i2c_gpio_scl=24

@keenanjohnson
Copy link
Member Author

Per this forum post discussing the software i2c implementation, it seems like I should be using the settings below instead of my settings above. Will test shortly.

dtparam=i2c_arm=off
dtparam=i2c-gpio=on
dtoverlay=i2c-gpio,i2c_gpio_sda=2,i2c_gpio_scl=3

@keenanjohnson
Copy link
Member Author

Enabling Software I2C

All right! I was able to successfully enable the i2c-gpio interface on the same pins (2 and 3) as the hardware i2c on the raspberry pi via the following configuration

Define DT parameters = "i2c_arm=off","i2c-gpio=on"
Define DT overlays = "dwc2,dr_mode=host","i2c-gpio,i2c_gpio_delay_us=20,i2c_gpio_sda=2,i2c_gpio_scl=3"

This allowed to me to see the i2c and verify everything as shown below:

root@5582468:/usr/src# dmesg | grep i2c
[   10.451224] i2c-gpio ffffffff00000002.i2c: using lines 2 (SDA) and 3 (SCL)
[   15.229958] i2c /dev entries driver
root@5582468:/usr/src# i2cdetect -l
i2c-11  i2c             ffffffff00000002.i2c                    I2C adapter
root@5582468:/usr/src# i2cdetect -y 11
     0  1  2  3  4  5  6  7  8  9  a  b  c  d  e  f
00:                         -- -- -- -- -- -- -- -- 
10: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
20: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
30: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
40: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
50: -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
60: -- 61 -- -- -- -- -- -- -- -- -- -- -- -- -- -- 
70: -- -- -- -- -- -- -- 77

Testing to see if it fixes the issue with I2C bus lock-up

I then had to make a few changes to the source code to allow the SCD30 Python code to use the I2C bus.

I first install the https://github.com/adafruit/Adafruit_Python_Extended_Bus library in order to easily create the I2C object required by the SCD30 sensor constructor. I modified the co2.py file to initialize the I2C bus as shown below:

from adafruit_extended_bus import ExtendedI2C as I2C

#specificy bus number 11 (the software I2C bus)
i2c_bus = I2C(11)
scd = adafruit_scd30.SCD30(i2c_bus)

I started up the python script and I was able to connect and read data from the SCD30 and barometer just like before!

Unfortunately, this did not resolve the issue and the bus locked up in the same fashion as before after a few hours.

co2  Traceback (most recent call last):
 co2    File "/usr/local/lib/python3.10/site-packages/adafruit_bus_device/i2c_device.py", line 154, in __probe_for_device
 co2      self.i2c.writeto(self.device_address, b"")
 co2    File "/usr/local/lib/python3.10/site-packages/busio.py", line 166, in writeto
 co2      return self._i2c.writeto(address, buffer, stop=stop)
 co2    File "/usr/local/lib/python3.10/site-packages/adafruit_blinka/microcontroller/generic_linux/i2c.py", line 49, in writeto
 co2      self._i2c_bus.write_bytes(address, buffer[start:end])
 co2    File "/usr/local/lib/python3.10/site-packages/Adafruit_PureIO/smbus.py", line 314, in write_bytes
 co2      self._device.write(buf)
 co2  OSError: [Errno 6] No such device or address
 co2  
 co2  During handling of the above exception, another exception occurred:
 co2  
 co2  Traceback (most recent call last):
 co2    File "/usr/local/lib/python3.10/site-packages/adafruit_bus_device/i2c_device.py", line 160, in __probe_for_device
 co2      self.i2c.readfrom_into(self.device_address, result)
 co2    File "/usr/local/lib/python3.10/site-packages/busio.py", line 156, in readfrom_into
 co2      return self._i2c.readfrom_into(address, buffer, stop=stop)
 co2    File "/usr/local/lib/python3.10/site-packages/adafruit_blinka/microcontroller/generic_linux/i2c.py", line 56, in readfrom_into
 co2      readin = self._i2c_bus.read_bytes(address, end - start)
 co2    File "/usr/local/lib/python3.10/site-packages/Adafruit_PureIO/smbus.py", line 181, in read_bytes
 co2      return self._device.read(number)
 co2  OSError: [Errno 6] No such device or address
 co2  
 co2  During handling of the above exception, another exception occurred:
 co2  
 co2  Traceback (most recent call last):
 co2    File "/usr/src/co2.py", line 57, in <module>
 co2      scd = adafruit_scd30.SCD30(i2c_bus)
 co2    File "/usr/local/lib/python3.10/site-packages/adafruit_scd30.py", line 93, in __init__
 co2      self.i2c_device = i2c_device.I2CDevice(i2c_bus, address)
 co2    File "/usr/local/lib/python3.10/site-packages/adafruit_bus_device/i2c_device.py", line 50, in __init__
 co2      self.__probe_for_device()
 co2    File "/usr/local/lib/python3.10/site-packages/adafruit_bus_device/i2c_device.py", line 163, in __probe_for_device
 co2      raise ValueError("No I2C device at address: 0x%x" % self.device_address)
 co2  ValueError: No I2C device at address: 0x61

I had to reboot the power on the full system to recover.

Next Steps

I'm not exactly sure what to try next. I'm going to try slowing down the I2C bus a bit more via the i2c_gpio_delay_us=20 parameter.

@keenanjohnson
Copy link
Member Author

Setting i2c_gpio_delay_us=100 which should correspond to 10kHz I2C speed

@djgood
Copy link
Contributor

djgood commented Nov 19, 2021

Argggg! That's frustrating that it didn't solve the problem.

If slowing down the bus doesn't work, another thing we can try as a last resort is to try to bit bang those GPIOs to generate a bunch clock cycles if we detect that the bus is locked up. That would hopefully to get the SCD30 to free it. Not sure how difficult that would be with our current code, maybe we can interact with the software i2c somehow? Also, that's assuming that the SCD30 isn't just completely wedged and would actually respond.

@keenanjohnson
Copy link
Member Author

I have great news! After running the test raspberry pi for over 48 hours, the i2c connection seems to be going strong with the software gpio at 10kHz, so I think I'm feeling comfortable to call this done and start rolling it out to the wider fleet.

I'm going to clean up the code a bit a do the rollout next.

@keenanjohnson
Copy link
Member Author

Deployed to the Ribbit Network fleet:

image

Software release: c35bf3ccb713

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants