Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module: DS18B20, new module for 1-wire temperature sensor (WIP) #3462

Merged
merged 29 commits into from Feb 2, 2021

Conversation

theopensourcerer
Copy link
Contributor

Initial commit of code to support 1-wire (Dallas) sensors such
as the DS18B20. requires Linux kernel drivers to create a file
in /sysfs which is read by this module, and temperature
typically returned to a temperature_fan.

This code purely for testing/revew. No documentation as yet.

Signed-off-by: Alan Lord alanslists@gmail.com

Initial commit of code to support 1-wire (Dallas) sensors such
as the DS18B20. requires Linux kernel drivers to create a file
in /sysfs which is read by this module, and temperature
typically returned to a temperature_fan.

This code purely for testing/revew. No documentation as yet.

Signed-off-by: Alan Lord <alanslists@gmail.com>
@KevinOConnor
Copy link
Collaborator

Thanks. As we discussed on IRC, I think the file reading should really be done in the "linux mcu" code. Performing the IO from the MCU allows us to keep portability, permissions, and timing consistent. It would also be worthwhile to only perform the open() call once, as that call can have a bit of cpu overhead.

-Kevin

@theopensourcerer
Copy link
Contributor Author

@KevinOConnor Thanks. I made a comment on IRC about the linux/mcu code part. As this reads from a file, a user doesn't actually need to have the Linux MCU configured. The target fan's pin could be on any mcu, but the file read is not reliant on having the RPi configured as an MCU at all.

Secondly, I thought I had updated my code so open is only called once and I am using seek(0)... See line 51.

@KevinOConnor
Copy link
Collaborator

As this reads from a file, a user doesn't actually need to have the Linux MCU configured. The target fan's pin could be on any mcu, but the file read is not reliant on having the RPi configured as an MCU at all.

I still feel this should be implemented from the "linux mcu" code. Handling just this one device differently from all the others is not something I'm comfortable with (it would have less reliable timing, require host code privileges, use a different update thread, be constrained to only support devices directly connected to the host, etc.).

Secondly, I thought I had updated my code so open is only called once and I am using seek(0)... See line 51.

I missed that - thanks.

-Kevin

@theopensourcerer
Copy link
Contributor Author

NP - I guess I'd better start learning how to write C then. Can you point me to a good example of how the python code interfaces with some mcu code? I tried to look for something the other day but I failed to see how, for example, bus.py actually interfaces to the i2c.c code... Sorry for being dense.

@KevinOConnor
Copy link
Collaborator

You'll want to look at src/thermocouple.c and klippy/extras/spi_temperature.py . That code does something very similar.

I put together some demo code to show what the C code would look like - it's on the work-ds18b20-20201024 branch (see 7ac3eef). It is totally untested.

-Kevin

@theopensourcerer
Copy link
Contributor Author

That's awesome. Thanks Kevin.

@KevinOConnor
Copy link
Collaborator

Any further updates on this?

-Kevin

@theopensourcerer
Copy link
Contributor Author

I've not had much time of late but did mention something on irc last Sunday... Am getting a "Rescheduled timer in the past" error, but can't understand from the logs the root cause.

@KevinOConnor
Copy link
Collaborator

Ah, okay. If you update your latest code and attach the log, I'll try to take a look.

Cheers,
-Kevin

@theopensourcerer
Copy link
Contributor Author

Thanks very much Kevin - I have pushed the current code onto this branch. Attached is the klippy.log. the error always happens after the first read. You can see the value from the sensor is being returned. I'm guessing it's something to do with the clocks but I don't "grok" the logs well enough to really work out what's going on.

FWIW, I'm in no rush.

klippy.log

@KevinOConnor
Copy link
Collaborator

Odd. If you change ds18_read() to:

#include "board/misc.h"

// Read temperature and report
static void
ds18_read(struct ds18_s *d, uint32_t next_begin_time, uint8_t oid)
{
    uint32_t t1 = timer_read_time();
    int ret = lseek(d->fd, 0, SEEK_SET);
    uint32_t t2 = timer_read_time();
    if (ret < 0) {
        report_errno("seek ds18b20", ret);
        try_shutdown("Unable to seek DS18B20");
    }
    char data[16];
    uint32_t t3 = timer_read_time();
    ret = read(d->fd, data, sizeof(data)-1);
    uint32_t t4 = timer_read_time();
    if (ret < 0) {
        report_errno("read ds18b20", ret);
        try_shutdown("Unable to read DS18B20");
    }
    data[ret] = '\0';
    double val = atof(data);
    uint32_t ival = (uint32_t)(val * 1000 + .5);

    output("read timing t1=%u t2=%u t3=%u t4=%u", t1, t2, t3, t4);

    sendf("ds18_result oid=%c next_clock=%u value=%u"
          , oid, next_begin_time, ival);
    if (ival < d->min_value || ival > d->max_value)
        try_shutdown("DS18 out of range");
}

What does the output() call report?

-Kevin

@theopensourcerer
Copy link
Contributor Author

lol - was just about to hit the sack but caught your message - have separated your output message for clarity:

`Configured MCU 'mcu' (1024 moves)
Stats 16047.0: gcodein=0 mcu: mcu_awake=0.000 mcu_task_avg=0.000000 mcu_task_stddev=0.000000 bytes_write=767 bytes_read=3991 bytes_retransmit=0 bytes_invalid=0 send_seq=102 receive_seq=102 retransmit_seq=0 srtt=0.000 rttvar=0.000 rto=0.025 ready_bytes=0 stalled_bytes=0 freq=49983258 print_time=0.000 buffer_time=0.000 print_stall=0 sysload=0.07 cputime=0.785 memavail=329752
Stats 16048.0: gcodein=0 mcu: mcu_awake=0.000 mcu_task_avg=0.000000 mcu_task_stddev=0.000000 bytes_write=773 bytes_read=4007 bytes_retransmit=0 bytes_invalid=0 send_seq=103 receive_seq=103 retransmit_seq=0 srtt=0.000 rttvar=0.000 rto=0.025 ready_bytes=0 stalled_bytes=0 freq=49995035 print_time=0.000 buffer_time=0.000 print_stall=0 sysload=0.06 cputime=0.787 memavail=329752
Stats 16049.0: gcodein=0 mcu: mcu_awake=0.000 mcu_task_avg=0.000000 mcu_task_stddev=0.000000 bytes_write=779 bytes_read=4007 bytes_retransmit=7 bytes_invalid=0 send_seq=104 receive_seq=103 retransmit_seq=104 srtt=0.000 rttvar=0.000 rto=0.050 ready_bytes=0 stalled_bytes=0 freq=49995035 print_time=0.000 buffer_time=0.000 print_stall=0 sysload=0.06 cputime=0.790 memavail=329184

#output: read timing t1=2150002040 t2=2150003126 t3=2150003142 t4=2195569798

got {'#receive_time': 16049.048920544, u'next_clock': 2400000000L, u'oid': 0, u'value': 20750000, '#name': u'ds18_result', '#sent_time': 0.0}
MCU 'mcu' shutdown: DS18 out of range
clocksync state: mcu_freq=50000000 last_clock=2142599021 clock_est=(16046.707 2078497851 49995035.239) min_half_rtt=0.000049 min_rtt_time=16046.900 time_avg=16046.707(0.073) clock_avg=2078497851.033(3657801.296) pred_variance=1797264509.661
Dumping serial stats: bytes_write=779 bytes_read=4082 bytes_retransmit=7 bytes_invalid=0 send_seq=104 receive_seq=104 retransmit_seq=104 srtt=0.000 rttvar=0.000 rto=0.050 ready_bytes=0 stalled_bytes=0
`

@theopensourcerer
Copy link
Contributor Author

theopensourcerer commented Nov 10, 2020

The value should be val / 1000 by and not * 1000 on this line I believe:

uint32_t ival = (uint32_t)(val * 1000 + .5);

@theopensourcerer
Copy link
Contributor Author

If i change that to / 1000 then the log prints:

`
4 print_time=0.000 buffer_time=0.000 print_stall=0 sysload=0.35 cputime=0.782 memavail=329820

#output: read timing t1=750002225 t2=750003399 t3=750003409 t4=793569068

got {'#receive_time': 16286.010951027, u'next_clock': 1000000000, u'oid': 0, u'value': 21, '#name': u'ds18_result', '#sent_time': 0.0}
MCU 'mcu' shutdown: Rescheduled timer in the past
clocksync state: mcu_freq=50000000 last_clock=73659
`

@KevinOConnor
Copy link
Collaborator

Unfortunately, in both examples, it took ~900ms to read from the file. That isn't going to work in Klipper (neither in the mcu code nor in the host python code). We can't stop processing everything for that long.

Do you know if there is a way to read from that linux device in such a way that it doesn't force a pause? If we have to do it manually, it would require starting a new thread just to be able to read without blocking, which would be a bit of work.

-Kevin

@KevinOConnor
Copy link
Collaborator

The value should be val / 1000 by and not * 1000 on this line I believe:

Oh - what is the format of the file? If you just cat /some/path/to/devce/file does it report a floating point number or an integer? What are the units it reports (Celsius or milli-Celsius)?

@theopensourcerer
Copy link
Contributor Author

I knew it was slow - but that is very slow... I read something about it taking ~1/2 second after "opening" the file to read from. I'll do some research and come back if I can find anything.

I guess maybe that just 1-wire is too slow? Or maybe this kernel module itself is the delay? There are a few 1-wire C libraries on github (gpl or compatible) which are portable - I did have a thought about trying to integrate one so the 1-wire protocol was available on more MCUs than just the RPi, but my C knowldege is way too light for that right now.

I'll update in a day or two on what I find. Thanks for the help.

@theopensourcerer
Copy link
Contributor Author

theopensourcerer commented Nov 10, 2020

This kernel module outputs ascii. If I read the "temperature" file (rather than w1_slave), then it ONLY sends a 5 digit integer. e.g. 23750 which I guess is milli-Celsius and means 23.750'C

@KevinOConnor
Copy link
Collaborator

Or maybe this kernel module itself is the delay?

It's definitely in the kernel logic. One simple test would be to run the read() first and then do the lseek().

If I read the "temperature" file (rather than w1_slave), then it ONLY sends a 5 digit integer. e.g. 23750 which I guess is milli-Celsius and means 23.750'C

Ah, okay. Then we don't want to multiply or divide in the C code at all. We should just send the integer as it is and the python host code should do the divide by 1000 (and multiply by 1000 when calculating the min/max range).

Something like the following:

#include "board/misc.h"

// Read temperature and report
static void
ds18_read(struct ds18_s *d, uint32_t next_begin_time, uint8_t oid)
{
    // Read data and report
    char data[16];
    uint32_t t1 = timer_read_time();
    int ret = read(d->fd, data, sizeof(data)-1);
    uint32_t t2 = timer_read_time();
    if (ret < 0) {
        report_errno("read ds18b20", ret);
        try_shutdown("Unable to read DS18B20");
    }
    data[ret] = '\0';
    int val = atoi(data);
    sendf("ds18_result oid=%c next_clock=%u value=%u"
          , oid, next_begin_time, val);
    if (val < d->min_value || val > d->max_value)
        try_shutdown("DS18 out of range");

    // Seek file in preparation of next read
    uint32_t t3 = timer_read_time();
    ret = lseek(d->fd, 0, SEEK_SET);
    uint32_t t4 = timer_read_time();
    if (ret < 0) {
        report_errno("seek ds18b20", ret);
        try_shutdown("Unable to seek DS18B20");
    }

    output("read timing t1=%u t2=%u t3=%u t4=%u", t1, t2, t3, t4);
}

@theopensourcerer
Copy link
Contributor Author

Alas, I think it's the sensor... In the data sheet there's a discussion about the resolution:

The resolution of the temperature sensor is user-configurable to 9, 10, 11, or 12 bits, corresponding to increments of 0.5°C, 0.25°C, 0.125°C, and 0.0625°C, respectively. The default resolution at power-up is 12-bit.

And the later on in the datasheet it discusses the conversion time... At 12bit it says 750ms! See below. So unless the resolution is changed (9-bit is < 100ms) maybe this is never going to be that suitable inside Klipper?

Screenshot from 2020-11-11 07-44-42

Some background: I have been using these sensors with other Arduino & RPi projects for years - they are reliable, cheap and accurate. I added one to my printer which was then read and used by an Octoprint plugin to (pwm) control some fans in the electronics bay. Once I'd moved to mainsail/moonraker that is when this idea to integrate it into Klipper came about... Maybe another approach could be to use a standalone python script and poke the moonraker API? Would it be possible to control a temperature_fan via the API?

@theopensourcerer
Copy link
Contributor Author

theopensourcerer commented Nov 11, 2020

FWIW, I did some more investigations. I came across a thread[1] that referenced this bit of code to change the register:

https://github.com/blogmotion/BitBangingDS18B20/blob/master/configDS18B20.c

I compiled it on my RPi and when I run it, it returns the temperature much faster than 750ms (see gif, when I hit enter it comes back pretty much instantly - and I've not changed the resolution).

[1] https://raspberrypi.stackexchange.com/questions/14278/how-to-change-ds18b20-reading-resolution

dstemp-read

@theopensourcerer
Copy link
Contributor Author

Changing the code as you suggested (move the read before seek) the output statement shows:

#output: read timing t1=550003665 t2=595569526 t3=595571804 t4=595572237

@KevinOConnor
Copy link
Collaborator

Even a pause of a couple of milliseconds would not work well. So, I don't think this can be fixed by changing the sampling resolution.

FWIW, the issue isn't how long the sensor takes - a 1 second delay would be fine. The issue is that we can't stop processing to wait for that temperature result. Ideally the code could instruct Linux to take the measurement, and then pickup the result when it's ready. (Or, alternatively, after reading a result, start the next sample so that it is ready the next time we read the sensor.) I'm not sure how one can go about doing that with the ds18b20 Linux kernel module, but it should be possible as it's a pretty common requirement.

-Kevin

@theopensourcerer
Copy link
Contributor Author

Alas I don't think I have enough knowledge or experience to do this kind of thing. Please feel free to close this PR. If I find time I may try and review this again in the future, but I can live without it :-)

@jtrmal
Copy link

jtrmal commented Jan 31, 2021

One thing that comes to my mind that could be improved is error reporting when the temp id is not found -- ends up with exception

@theopensourcerer
Copy link
Contributor Author

theopensourcerer commented Jan 31, 2021

@jtrmal

One thing that comes to my mind that could be improved is error reporting when the temp id is not found -- ends up with exception

Thanks.

What do you mean by the "temp id"?

I miss configured my sensor's serial number and I'm not sure the error is that bad actually. It seems to be doing what I would expect and you do get a sensible message:

Invalid DS18B20 serial id 2

This is what other sensors will do if they can't be found. It seems that shutting down the MCU is the right thing to do if a temperature sensor isn't available?

Transition to shutdown state: MCU 'mcu' shutdown: Invalid DS18B20 serial id 2
Reactor garbage collection: (5435.858255113, 0.0, 0.0)
MCU error during connect
Traceback (most recent call last):
  File "/home/pi/klipper/klippy/klippy.py", line 151, in _connect
    cb()
  File "/home/pi/klipper/klippy/mcu.py", line 582, in _connect
    config_params = self._send_get_config()
  File "/home/pi/klipper/klippy/mcu.py", line 560, in _send_get_config
    self._name, self._shutdown_msg))
error: MCU 'mcu' error during config: Invalid DS18B20 serial id 2

@jtrmal
Copy link

jtrmal commented Jan 31, 2021 via email

@joshhead
Copy link

Adding a declaration for ret prior to its use fixed the problem, e.g. int ret; on line 158

Oops. I deleted the declaration in a "cleanup" commit which I didn't test. Sorry about that.

74de4f8#diff-f525538b58518a4f67b54bb6fbdafbc28d1091e230c1ab03b0ee88ae14b85ff5L153

@theopensourcerer
Copy link
Contributor Author

@joshhead I've merged your work into this branch now so the pull request is now up-to-date.

I have had it running on my printer for a few hours and no problems at all. The sensor is working well and my electronics bay is being kept nice and cool :-)

@jtrmal
Copy link

jtrmal commented Jan 31, 2021 via email

@theopensourcerer
Copy link
Contributor Author

Screenshot from 2021-01-31 15-46-46

@jtrmal
Copy link

jtrmal commented Jan 31, 2021

Sorry if it is off-topic, but it's actually quite trivial to add more thermometers in a very straightforward manner -- that's why I like this PR so much :)

IMG_2791

@KevinOConnor
Copy link
Collaborator

Thanks. In general it looks fine to me. One question - can you confirm that on a FIRMWARE_RESTART command that the thread is properly killed? (That is, that the code doesn't leak threads in that case.) I don't think it would be a problem, but worth double checking.

-Kevin

@jtrmal
Copy link

jtrmal commented Feb 1, 2021 via email

@joshhead
Copy link

joshhead commented Feb 1, 2021

One question - can you confirm that on a FIRMWARE_RESTART command that the thread is properly killed?

@KevinOConnor They should be killed since the linux klipper_mcu program is restarted with execv.

A call to any exec function from a process with more than one thread shall result in all threads being terminated and the new executable image being loaded and executed. No destructor functions or cleanup handlers shall be called.

IEEE Std 1003.1-2017 via stackoverflow

@joshhead
Copy link

joshhead commented Feb 1, 2021

To double check, I started Klipper, opened HTOP and filtered for "klipper_mcu". I edited printer.cfg and sent "FIRMWARE_RESTART" a few times, with 2, 1 or 0 temperature sensors configured. The number of threads changed to match the number of configured sensors (plus the main thread).

@theopensourcerer
Copy link
Contributor Author

@jtrmal, @joshhead @KevinOConnor

FWIW, I got this error, when I messed up configuration of something completely different (for mesh bed leveling, I wrote "mean" instead of "average")

This is interesting... I saw the exact same error on my bench RPi yesterday evening after a complete OS reboot. I only had one AttributeError: DS18B20 instance has no attribute '_report_clock' in my log - I only have one DS18B20 configured. But as I recall I didn't have any configuration issues. I restarted klipper and all was good.

So I'm beginning to wonder if this could be a timing issue? Maybe the w1 module and/or the /sys file system for it, is not yet "ready" when klipper is starting up? I will try and work out how to reliably re-create the problem.

@theopensourcerer
Copy link
Contributor Author

FYI I've tried "breaking" my test klipper and now I can't get that error to reappear.

@KevinOConnor
Copy link
Collaborator

Best to assign all values in the __init__() method to avoid that problem (ie, self._report_clock = 0).

Separately, it would also be good to implement the min_temp/max_temp checks in this PR. For reference, I recently added the checks to the LM75 code (commit 333f8c2).

-Kevin

Signed-off-by: Alan Lord <alanslists@gmail.com>
@theopensourcerer
Copy link
Contributor Author

Thanks. Have updated the PR.

@KevinOConnor KevinOConnor merged commit 7d4df65 into Klipper3d:master Feb 2, 2021
@KevinOConnor
Copy link
Collaborator

Thanks.

-Kevin

@jtrmal
Copy link

jtrmal commented Feb 2, 2021 via email

@theopensourcerer
Copy link
Contributor Author

W00t! Thanks Josh, Kevin.

@theopensourcerer theopensourcerer deleted the work_ds18b20_20201023 branch February 2, 2021 20:10
@joshhead
Copy link

joshhead commented Feb 2, 2021

Thanks all!

tntclaus pushed a commit to tntclaus/klipper that referenced this pull request Apr 18, 2021
Initial commit of code to support 1-wire (Dallas) sensors such
as the DS18B20. Requires Linux kernel drivers to create a file
in /sysfs which is read by this module, and temperature
typically returned to a temperature_fan.

Signed-off-by: Alan Lord <alanslists@gmail.com>
Signed-off-by: Josh Headapohl <joshhead@gmail.com>
@KevinOConnor KevinOConnor mentioned this pull request May 26, 2021
@github-actions github-actions bot locked and limited conversation to collaborators Oct 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants