Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replaced parse_descriptor() function, fixed some overruns #1460

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

seanm
Copy link
Contributor

@seanm seanm commented Feb 5, 2024

No description provided.

@mcuee mcuee added the core Related to common codes label Feb 5, 2024
@Youw
Copy link
Member

Youw commented Feb 6, 2024

To my personal taste - I like this (explicit) way better.
Although, I don't have an oppinion if the change is inline with the whole design of this part of the code.

I've reviewed it, and the change looks good.
I also found where the improvements are, but would be great to describe those explicitly in the PR description section.

@seanm
Copy link
Contributor Author

seanm commented Feb 6, 2024

To my personal taste - I like this (explicit) way better.

The change is not just one of taste, but correctness too. The old way was assuming specific struct alignment, and that struct fields did not have padding in between.

I've reviewed it, and the change looks good.

Thanks for reviewing.

I also found where the improvements are, but would be great to describe those explicitly in the PR description section.

I did write pretty detailed commit messages... is it useful to repeat in the PR description?

@Youw
Copy link
Member

Youw commented Feb 6, 2024

Oh, I didn't see the commit message(s).
Yes, that would suffice, but I believe each PR is merged into master as a single commit, so those commit messages would have been lost, unless someone explicitly combines them into a single message for a single (final) commit.

NOTE: libusb does not use squash-merge functionality github provides.

@seanm
Copy link
Contributor Author

seanm commented Feb 6, 2024

I believe each PR is merged into master as a single commit

Really?!?! Why?

That really reduces the usefulness of git bisect for example.

@Youw
Copy link
Member

Youw commented Feb 6, 2024

Really?!?! Why?

A very simple and verified over time philosophy: a single PR should contain a single independent change and solve a single "problem".
This worked very well for many years, not only for libusb.

If a PR is more than one commit - those are either not independent or should be split into separate PRs.

@seanm
Copy link
Contributor Author

seanm commented Feb 6, 2024

A very simple and verified over time philosophy: a single PR should contain a single independent change and solve a single "problem".

This PR does solve a single "problem". The problem of parse_descriptor() being broken. This single problem is divided into smaller subproblems.

If the official libusb policy is indeed 1 commit per PR, then it really should be documented in the HACKING file (or somewhere) because it is very unusual.

@tormodvolden
Copy link
Contributor

No, there is no rule saying a PR gets squashed. We will certainly want to squash fixup commits and other WIP artifacts, but if each commit is an independent change they go in one by one. Sometimes it is for instance good to separate preparatory commits from those commits that really make a change, it makes it easier to review up front, and to understand and verify afterwards.

@tormodvolden
Copy link
Contributor

And if and when we squash commits, no commit messages gets lost. When you use "squash" in git rebase it will add all commit messages, then you edit it before committing.

@Youw
Copy link
Member

Youw commented Feb 6, 2024

Alright, looks like my observatory assumption is not correct. Some habbit of mine to generalize things I see consistently.
Maybe I haven't been around long-enough to see PRs with more that one commit merged with rebase-merge strategy.

But in any case I find it way more convenient to see the summary of the PR upfront, rather than having to look for individual commit messages. Maybe it's just my habbit.

@tormodvolden
Copy link
Contributor

After a quick look at these, I see no need to squash them. OK, they all improve parse_descriptor, but in different ways. Each of them would have made sense without the others.

A bunch of easy-to-review commits is better than huge commits where you have to look back and forth to see what pieces play together. And the larger the commit the more sure there will be something wrong that slips in and is missed in a review. I think most regressions I have found are from mega-commits where too many things were changed at a time.

@Youw
Copy link
Member

Youw commented Feb 6, 2024

A bunch of easy-to-review commits is better than huge commits

Each of them would have made sense without the others

To my experience - that's a good reason to have separate PRs, not just separate commits.
I'm often seen cases when a change request or discussions to one of the commits would delay the merge of the others, not to mention potential inconvenience with interractive rebases/etc., to make changes only to some part of a multi-commit PR.

@tormodvolden
Copy link
Contributor

But in any case I find it way more convenient to see the summary of the PR upfront, rather than having to look for individual commit messages. Maybe it's just my habbit.

If you click on the "..." after each commit summary on this web page (e.g. under "seanm added 6 commits"), there is a drop-down with the full commit message,.

Most of the time it is the opposite problem, people write a lot in the PR description (which doesn't get into the repo) while their commit messages are poor or empty.

@tormodvolden
Copy link
Contributor

Each of them would have made sense without the others

To my experience - that's a good reason to have separate PRs, not just separate commits.

Sometimes they depend on each other to avoid conflicts, and it is just a saner way to work both for coder and reviewer. They fit together logically so they are best treated in the same PR. There is a risk that discussion about one commit delays the others, but it is rarely a big problem. The committer can in such cases also choose to cherry-pick the easy ones, and leave the questionable commits to be fixed up (and rebased if needed).

@Youw
Copy link
Member

Youw commented Feb 7, 2024

This commit, apparently introduces an issue:

C:\projects\libusb\libusb\descriptor.c(276,24): error C2220: the following warning is treated as an error [C:\projects\libusb\msvc\libusb_static.vcxproj]
         			ifp->extra_length = len;
         			                    ^
         
     2>C:\projects\libusb\libusb\descriptor.c(276,24): warning C4244: '=': conversion from 'intptr_t' to 'int', possible loss of data [C:\projects\libusb\msvc\libusb_static.vcxproj]
         			ifp->extra_length = len;
         			                    ^
         
     2>C:\projects\libusb\libusb\descriptor.c(427,[28](https://ci.appveyor.com/project/dickens/libusb/builds/49109043/job/or3trscp22wbflua#L28)): warning C4244: '+=': conversion from 'intptr_t' to 'int', possible loss of data [C:\projects\libusb\msvc\libusb_static.vcxproj]
         			config->extra_length += len;
         			                        ^

At least on MSVC x64 compilers.

@@ -127,7 +126,7 @@ static int parse_endpoint(struct libusb_context *ctx,

/* Copy any unknown descriptors into a storage area for drivers */
/* to later parse */
len = (int)(buffer - begin);
intptr_t len = (intptr_t)(buffer - begin);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now, when I looke a this - what's the point of using intptr_t and not using int as it was originally?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admittedly, not much.

The old cast is theoretically truncating (64 bit pointers, 32 bit ints), but of course they should never be so far apart.

This way is just slightly more correct. len is now the correct type, and at least the check on the next line (if (len <= 0) is therefore more correct too in the face of overflow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When we casted to int right away, in case of overflow there was a chance it would be cut off by the <0 check.
If we're to make it property right, then in addition to <0 check we should also check for (at least) INT_MAX.

Unless it is already checked elsewhere for some othe constant (e.g. something usb-specific), in which case casting it to int instead of intptr_t would make more elegant implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parse_configuration() or its sole caller raw_desc_to_config() is called with an int size. raw_desc_to_config() is only called from libusb_get_active_config_descriptor() where the size is taken from wTotalLength which therefore cannot be larger than 65535 ever.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the conclusion here, is the change pointless?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion is still that the change is a small improvement.

@tormodvolden
Copy link
Contributor

The last commit here is marked WIP, I guess the others are ready to go?

@seanm
Copy link
Contributor Author

seanm commented Feb 14, 2024

The last commit here is marked WIP, I guess the others are ready to go?

Let's figure out the WIP one before merging... Let me see if I remember what I was thinking those weeks ago....

@seanm
Copy link
Contributor Author

seanm commented Feb 14, 2024

Ah yes... The old code is this:

		// Second pass: Iterate through desc list, fill IAD structures
		consumed = 0;
		i = 0;
		while (consumed < size) {
			header.bLength = buffer[0];
			header.bDescriptorType = buffer[1];
			if (header.bDescriptorType == LIBUSB_DT_INTERFACE_ASSOCIATION) {
				iad[i].bLength = buffer[0];
				iad[i].bDescriptorType = buffer[1];
				iad[i].bFirstInterface = buffer[2];
				iad[i].bInterfaceCount = buffer[3];
				iad[i].bFunctionClass = buffer[4];
				iad[i].bFunctionSubClass = buffer[5];
				iad[i].bFunctionProtocol = buffer[6];
				iad[i].iFunction = buffer[7];
				i++;
			}
			
			buffer += header.bLength;
			consumed += header.bLength;
		}

We know the length of buffer is size because it's passed as a parameter to this function.

buffer is incremented here by header.bLength (aka buffer[0]). But if the contents of buffer[0] are unexpected/garbage, then it seems to me buffer could get incremented too far...

libusb/descriptor.c Outdated Show resolved Hide resolved
@seanm seanm force-pushed the remove-parse_descriptor branch 3 times, most recently from beb1831 to 8bae9eb Compare April 7, 2024 00:59
@seanm seanm force-pushed the remove-parse_descriptor branch from d324626 to 67ac1a7 Compare May 5, 2024 21:24
libusb/libusb.h Outdated Show resolved Hide resolved
@tormodvolden
Copy link
Contributor

BTW, I will merge #1428 first, and this one will need a small rework afterwards.

@seanm seanm force-pushed the remove-parse_descriptor branch 2 times, most recently from d219558 to 9e52150 Compare May 8, 2024 16:28
@seanm
Copy link
Contributor Author

seanm commented May 8, 2024

BTW, I will merge #1428 first, and this one will need a small rework afterwards.

I've rebased and pushed.

@tormodvolden
Copy link
Contributor

"Fixed potential buffer overread" still talks about WIP.

Here as well as in your other PR it would be good to see in the commit summary what part of the code you are changing. We have established a prefix convention that everybody is using. When reviewing or perusing the git log it has great value e.g. to be able to identify changes in core code vs in examples.

@seanm
Copy link
Contributor Author

seanm commented May 8, 2024

"Fixed potential buffer overread" still talks about WIP.

Yeah, would appreciate thoughts on that. I currently call usbi_warn and return, is that good?

Here as well as in your other PR it would be good to see in the commit summary what part of the code you are changing. We have established a prefix convention that everybody is using. When reviewing or perusing the git log it has great value e.g. to be able to identify changes in core code vs in examples.

Any docs on that? I see nothing in HACKING.

@tormodvolden
Copy link
Contributor

Just look at the git log, i.e. git log --oneline

libusb/descriptor.c Outdated Show resolved Hide resolved
@seanm seanm force-pushed the remove-parse_descriptor branch from 9e52150 to de91ffe Compare May 9, 2024 22:45
@seanm
Copy link
Contributor Author

seanm commented May 9, 2024

Just look at the git log, i.e. git log --oneline

Sure, but if this is required of commit messages, shouldn't it be documented in HACKING, which already discusses commit messages:

"Commit messages should be formatted to 72 chars width and have a
free-standing summary line. See for instance "Commit Guidelines" on
https://git-scm.com/book/en/v2/Distributed-Git-Contributing-to-a-Project
or https://cbea.ms/git-commit/ about how to make well-formed commit
messages.

Put detailed information in the commit message itself, which will end
up in the git history."

libusb/libusb.h Outdated
@@ -335,6 +335,7 @@ enum libusb_descriptor_type {
#define LIBUSB_DT_SS_ENDPOINT_COMPANION_SIZE 6
#define LIBUSB_DT_BOS_SIZE 5
#define LIBUSB_DT_DEVICE_CAPABILITY_SIZE 3
#define LIBUSB_DT_INTERFACE_ASSOCIATION_SIZE 8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many tabs, "8" should be aligned with the rest (with tab stop width 8).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, to me they all look unaligned because the Xcode project is configured for 4 space tabs.

We really should proceed with #1443 so that human effort reviewing and fixing trivialities like this can be automated.

Anyway, fixed it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'd recommend using "git diff" before or "git show" after committing, it exposes such whitespace errors easily and clearly and also lets you review other changes and commit messages from another angle. QED GUIs fail miserably on this.

@tormodvolden
Copy link
Contributor

Another tip: To spellcheck your commit messages you can use for instance:
git log origin/master.. | grep -v ^commit | aspell -a --dont-suggest | grep '^#'

{
return (uint32_t)((uint32_t)(p[3] << 24) |
(uint32_t)(p[2] << 16) |
(uint32_t)(p[1] << 8) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the cast/promotion be done before the shifting?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise I think we would only need the final cast.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cast before shifting is not needed because of implicit promotion: https://www.gnu.org/software/c-intro-and-ref/manual/html_node/Shift-Operations.html (as long as we don't care about 16-bit platforms!).
I wouldn't think casting before bitwise or'ing is needed either.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. I have restored the casting to be as it was in the macro.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why can't we just write
return (p[3] << 24) | (p[2] << 16) | (p[1] << 8) | p[0];
?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would implicitly promote to signed int (IIUC), and it seems safer to keep everything unsigned because shifting a 1 into the sign bit is technically UB. Besides, it was already like that, and presumably correct as it was.

seanm added 6 commits May 13, 2024 13:20
This function had a few problems:

- it takes two buffers as parameters but knows nothing about their length, making it easy to overrun them.
- callers make unwarranted assumptions about the alignment of structures that are passed to it (it assumes there's no padding)
- it has tricky pointer arithmetic and masking

With this new formulation, it's easier to see what's being read/written, especially the destination. It's now very clear that the destination is not being overrun because we are simply assigning to struct fields.

Also converted byte swapping macros to inline functions for more type safety.
This was checking that `size` is at least `LIBUSB_DT_CONFIG_SIZE` (9) bytes long, but then increments the pointer with `buf += header.bLength`. That could end up pointing past of the end of the buffer. There is a subsequest check that would prevent dereferencing it, but it's still UB to even create such a pointer.

Added a check with a similar pattern as elsewhere in this file.
All the right hand side is `dev_cap`, changed one outlier to match.

Also clarified the relationships between some magic numbers.

No change in behaviour here.
The first iteration of this loop was safe because the beginning of the function checked that `size` is at least LIBUSB_DT_CONFIG_SIZE (9) bytes long.

But for subsequent iterations, it could advance the pointer too far (which is undefined behaviour) depending on the content of the buffer itself.
@seanm
Copy link
Contributor Author

seanm commented May 13, 2024

Another tip: To spellcheck your commit messages you can use for instance: git log origin/master.. | grep -v ^commit | aspell -a --dont-suggest | grep '^#'

Spellings fixed.

@@ -1043,7 +1045,14 @@ int API_EXPORTED libusb_get_ssplus_usb_device_capability_descriptor(

// We can only parse the non-variable size part of the SuperSpeedPlus descriptor. The attributes
// have to be read "manually".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment doesn't make much sense now that we are not "parsing" any longer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please give me the comment text you'd like, and I'll update it.

@seanm
Copy link
Contributor Author

seanm commented May 14, 2024

Just look at the git log, i.e. git log --oneline

So, is it that you want me to add the prefix core: to all these commits?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Related to common codes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants