Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix proposal for hardfault with batt_smbus module #15789

Merged
merged 7 commits into from
Sep 28, 2020

Conversation

flochir
Copy link
Contributor

@flochir flochir commented Sep 22, 2020

Added buffer length check to SMBus::block_read (triggered hardfault in batt_smbus module by writing beyond buffer end due to device returning longer byte_count than requested/expected)

I ran into a hardfault when reading Smart Battery data with SMBus::block_read.

The fault is triggered in line 126 of
/Firmware/src/lib/drivers/smbus/SMBus.cpp
when memcpy writes beyond the end of the buffer “data” provided by the caller.

In my case the smbus device returned a byte count larger than the requested length (13 instead of 8 bytes).

I fixed this by adding a check for the length limit before the memcpy call, see

I experienced hardfaults working on a SMBUS battery driver based on batt_smbus. They werde triggered by the device returning a bigger byte_count value than was requested (13 bytes in byte_count versus a buffer length of 8 in my case), which lead to memcpy(data, &rx_data[4], byte_count); writing beyond the limit of the provided buffer.

As a fix I added a buffer length check to SMBus::block_read to force the maximum received data length to the length of the buffer provided by the caller.

I experienced that issue with a 3DR Solo battery among others.

…n batt_smbus module by writing beyond buffer end due to device returning longer byte_count than requested/expected)
@dagar
Copy link
Member

dagar commented Sep 22, 2020

Minor code styles changes.

Screenshot from 2020-09-22 08-37-00

@flochir flochir changed the title Fix proposal for hardfault with battt_smbus module Fix proposal for hardfault with batt_smbus module Sep 22, 2020
@BazookaJoe1900
Copy link
Member

BazookaJoe1900 commented Sep 22, 2020

@flochir Thanks for notice.
before I approve of the change, which seems correct. I have questions.
do you have idea who can it be that byte byte_count from the device (comes from rx_data[3]) doesn't match to the expected length?
have you changed the packets you read from the smart battery? is it possible that you read packet on the wrong expected size?

I have also added commit to fix compilation issue on some boards (airmind_mindpx-v2_default for example).
@dagar any idea why there is difference? shouldn't the calling for math lib be either exist or not on all compilations?

@dagar
Copy link
Member

dagar commented Sep 23, 2020

I have also added commit to fix compilation issue on some boards (airmind_mindpx-v2_default for example).
@dagar any idea why there is difference? shouldn't the calling for math lib be either exist or not on all compilations?

I'm not seeing the difference? What could be different is the linking dependencies, which often aren't 100% correct because you typically get common dependencies (like mathlib) included by something else.

Copy link

@jimdgit jimdgit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But why did you leave cpy_len in?

@BazookaJoe1900
Copy link
Member

BazookaJoe1900 commented Sep 24, 2020

But why did you leave cpy_len in?

Can you explain? what is cpy_len? @jimdgit

@jimdgit
Copy link

jimdgit commented Sep 24, 2020

The code was changed, so it is not longer an issue.
I also do not understand why you would prefer a call to a lib over the conditional operator ? :, which is almost free time wise compared to the lib call. In other contexts I would not question it, but in this one I do.

@BazookaJoe1900
Copy link
Member

BazookaJoe1900 commented Sep 24, 2020

I didn't got answer from @flochir yet how he get to this situation.
but on my side the pr is ready for merge, @dagar can you squash the commits?

BazookaJoe1900
BazookaJoe1900 previously approved these changes Sep 24, 2020
@dagar
Copy link
Member

dagar commented Sep 24, 2020

The code was changed, so it is not longer an issue.
I also do not understand why you would prefer a call to a lib over the conditional operator ? :, which is almost free time wise compared to the lib call. In other contexts I would not question it, but in this one I do.

If you're referring to math::min those are all static constexpr, so it ends up being the same thing, but with a bit of type safety and easier to read. https://github.com/PX4/Firmware/blob/0dc8bb9c864dc49bbbe255ea0177480b46522d7d/src/lib/mathlib/math/Limits.hpp#L54-L57

@jimdgit
Copy link

jimdgit commented Sep 24, 2020

I see ok great!

src/lib/drivers/smbus/SMBus.cpp Outdated Show resolved Hide resolved
Co-authored-by: Beat Küng <beat-kueng@gmx.net>
bkueng
bkueng previously approved these changes Sep 25, 2020
Copy link
Member

@bkueng bkueng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@flochir
Copy link
Contributor Author

flochir commented Sep 25, 2020

@BazookaJoe1900
The way I understand SMBUS block read, the amount of bytes returned on a specific command is specified by the slave.
See http://smbus.org/specs/SMBus_3_0_20141220.pdf page 42.

So there really is no way to request a specific amount of bytes from the client. The slave will send the amount of bytes associated to that command. I2C::transfer() will limit its amount of read bytes to recv_len, but the byte_count originates directly from the device and will still be the full length of data available. To respect that on a lower level the SMBus driver may need to overwrite I2C::transfer().

Btw. there may be another issue to check. What happens if recv_len passed to I2C::transfer() is longer than the amount of data the SMBUS slave wants to transmit? Timeouts? Maybe there's some error or optimization potential here.

@jimdgit
Copy link

jimdgit commented Sep 25, 2020

I 100% agree. We should not be messing with that.

What happens if recv_len passed to I2C::transfer() is longer than the amount of data the SMBUS slave wants to transmit?

That needs to be handled elsewhere and so far there is no evidence to that effect. We don't want to break other things.
Plus we have no test case for that.

@flochir
Copy link
Contributor Author

flochir commented Sep 25, 2020

That needs to be handled elsewhere and so far there is no evidence to that effect. We don't want to break other things.
Plus we have no test case for that.

It's directly subject to the SMBus protocol, so I don't see where else it could be handled. I'll see if I can run some tests against my setup.

@BazookaJoe1900
Copy link
Member

BazookaJoe1900 commented Sep 25, 2020

@flochir thanks for the notes.
Have you tried to read a packet which isn't on the master code? is it possible that you called it with wrong byte_count?
maybe you are working with something else then smart-battery?
I am working a lot with the SMBUS, and never got hardfault, so I wonder what is the difference

@flochir
Copy link
Contributor Author

flochir commented Sep 25, 2020

@BazookaJoe1900

Have you tried to read a packet which isn't on the master code? is it possible that you called it with wrong byte_count?
maybe you are working with something else then smart-battery?
I am working a lot with the SMBUS, and never got hardfault, so I wonder what is the difference

It happens for me when I read DeviceName (SBS Cmd 0x21) from a smart battery with a TI chipset.
I just inserted a bit of debug code to block_read:

	uint8_t device_address = get_device_address();
	rx_data[0] = (device_address << 1) | 0x00;
	rx_data[1] = cmd_code;
	rx_data[2] = (device_address << 1) | 0x01;
	byte_count = rx_data[3];

	PX4_ERR("Command: 0x%x, length: %d, byte_count: %d",cmd_code, length, byte_count);

And this is what I get:
ERROR [drivers__smbus] Command: 0x21, length: 8, byte_count: 13

The call returns the correct value for device name.

@BazookaJoe1900
Copy link
Member

@flochir

@BazookaJoe1900

Have you tried to read a packet which isn't on the master code? is it possible that you called it with wrong byte_count?
maybe you are working with something else then smart-battery?
I am working a lot with the SMBUS, and never got hardfault, so I wonder what is the difference

It happens for me when I read DeviceName (SBS Cmd 0x21) from a smart battery with a TI chipset.
I just inserted a bit of debug code to block_read:

	uint8_t device_address = get_device_address();
	rx_data[0] = (device_address << 1) | 0x00;
	rx_data[1] = cmd_code;
	rx_data[2] = (device_address << 1) | 0x01;
	byte_count = rx_data[3];

	PX4_ERR("Command: 0x%x, length: %d, byte_count: %d",cmd_code, length, byte_count);

And this is what I get:
ERROR [drivers__smbus] Command: 0x21, length: 8, byte_count: 13

The call returns the correct value for device name.

its seems that getting DeviceName length's should be 21bytes, and not 8.
image

My guess that byte_count is "only" 13 is because this is null terminated string, so TI cut the return to 13.
for some reason its not 'BQ40Z50'/'BQ40Z80' which it might be what you think it should get, can you confirm what you get when letting the byte_count to be 21?
anyway you did found serious bug here.

@flochir
Copy link
Contributor Author

flochir commented Sep 25, 2020

It's a custom BMS for Li-Ion we developed (also not based on BQ40Z50) so it doesn't return the default device string. I get the expected feedback from the device and it's working fine.

Just hit that bug while working on it and wanted to get it discussed/cleared.

@jimdgit
Copy link

jimdgit commented Sep 26, 2020

A copy should always be length checked, but really on the surface it sounds like you only expected 8 bytes when you should have expected 13. No?

@BazookaJoe1900
Copy link
Member

so I think the bug understood, right?
is it ready to merge?
@flochir ? @bkueng ? @dagar ?

@flochir
Copy link
Contributor Author

flochir commented Sep 26, 2020

so I think the bug understood, right?
is it ready to merge?
@flochir ? @bkueng ? @dagar ?

Fine by me!

@flochir
Copy link
Contributor Author

flochir commented Sep 26, 2020

A copy should always be length checked, but really on the surface it sounds like you only expected 8 bytes when you should have expected 13. No?

In my case I only need the first 8 bytes. Of course that's how I hit this issue. It could happen with other batteries/chipsets anytime, though, even with a larger buffer.

@flochir
Copy link
Contributor Author

flochir commented Sep 26, 2020

Btw. there may be another issue to check. What happens if recv_len passed to I2C::transfer() is longer than the amount of data the SMBUS slave wants to transmit? Timeouts? Maybe there's some error or optimization potential here.

Side note: I did a quick test and requested a larger amount of data than returned through byte_count. Everything's stll working, so at least there don't seem to be any serious immediate problems.

@bkueng bkueng merged commit ec71088 into PX4:master Sep 28, 2020
mvacanti pushed a commit to mvacanti/PX4-Autopilot that referenced this pull request Nov 27, 2020
* Added buffer length check to SMBus::block_read (triggered hardfault in batt_smbus module by writing beyond buffer end due to device returning longer byte_count than requested/expected)

Co-authored-by: Florian Olbrich <florian.olbrich@scarabot.de>
Co-authored-by: Daniel Agar <daniel@agar.ca>
Co-authored-by: BazookaJoe1900 <BazookaJoe1900@gmail.com>
Co-authored-by: Beat Küng <beat-kueng@gmx.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants