-
-
Notifications
You must be signed in to change notification settings - Fork 610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix security issue 22495 #14538
Fix security issue 22495 #14538
Conversation
|
Hmm it appears this solution would have to unfix https://issues.dlang.org/show_bug.cgi?id=13512. So we'd have to somehow know if the file is in UTF-8 or some ASCII-extended code page. How could we reliably detect that? |
Isn't the BOM always present when any utf16 or 32 characters exist within a file? |
The problem exists with UTF-8 too. Thinking about this, the only solution would be to detect the native encoding of the system, and then allow it in the shebang if and only if the rest of the file is pure ASCII. The problem is, that's too complicated to do as a bugfix PR. So essentially two choices:
Doing the latter would not allow "invisible" D code, but it would still allow disguising shebang line as something else than it really is. I should note that even with option 1, you can probably invisibly disable (didn't test) D code at the top of file, by using other line endings than "\n" so that the compiler will think the following lines belong to the shebang line. |
|
Thanks for your pull request and interest in making D better, @dukc! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please see CONTRIBUTING.md for more information. If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment. Bugzilla references
Testing this PR locallyIf you don't have a local development environment setup, you can use Digger to test this PR: dub run digger -- build "stable + dmd#14538" |
Which UTF-8 character controls the flow of text? Looking at the list of cases in the switch present here, there aren't any. |
|
OK, I see that UTF decided to use a range of high bits to denote a UTF-16 or 32 character, which didn't match my expectation of using bits xFF and xFFFF respectively. :-) I also see that this range of UTF bits are also in conflict with the high end bits used for KOI-8 characters. But could there really be an ambiguity? Just looking at the individual short hex codes against the KOI-8 table. Yeah, if the aim is to only match and reject bidirectional characters, then it looks like there can't be any ambiguity - at least for this KOI-8 encoding. If the problem is that If the problem is that |
|
On Posix complaint environments, I'd assume that LC_ALL would be set appropriately if you are not using UTF-8. |
|
If I reject only Bidi characters in the shebang but allow otherwise malformed UTF-8, there will be no problem with KOI-8, but potentially with other code pages such as Windows-1252. Putting just any garbage (from Unicode perspective) at the shebang line will usually work, but someone may someday get a strange error. That's unlikely enough though that it's probably the least evil option here. I'll do it that way. |
|
Done. @CyberShadow Got ya! Raw binary data in strings! EDIT: Not binary data, but non-graphic Unicode anyway. |
What is the problem, please? Also, wouldn't it make more sense to look for undesirable characters after decoding, not before? |
Oh sorry, I was probably too aggressive because I thought it was funny. Anyway, raw Unicode directionality overriding characters embedded in those strings. |
|
Got it, thanks. Thought it was something more benign like a no-break space. I agree it makes little sense to allow raw control characters like that (or at least only do so for specific circumstances, like if the entire line is part of a string literal). I'll fix it in ae tomorrow, feel free to send a PR to expedite that. |
|
BuildKite doesn't seem to be required so it can wait to tomorrow I think. |
Having a further think about it, who's parsing the shebang anyway? Surely not us, so what issue could arise if we just keep on incrementitg the buffer pointer until the first newline? It's not a security risk if we don't read it. I know it sounds like I'm saying just make it someone else's problem, but bidirectional controls being parsed in the shebang would be a shell bug, not compiler. |
Fixed and tagged. |
The potential issue is that if the shell has a vulnerability, a shebang line that seems to be doing something completely different can instead call DMD, where it could not if we kept this check. This is very theoretical though. If the shell indeed allows doing that it has far bigger problems than DMD not rechecking this one. Also it would be very unlikely for anyone to want to secretly invoke DMD on something that is openly a D source file. So I agree, not worth to keep the shebang check around. |
|
Done. I left the refactorings that were needed by the shebang check in place. They are, in my opinion, in the right direction anyway so no point in removing them since they're already in place. I'll remove them if you disagree. |
Maybe, but I'm just seeing it from the view of: if someone invoked the compiler by running
Better I think to focus on the part we have full control over, and directly affects us if we get it wrong. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm for now, if someone else has a grand idea for shebangs later, we'll run with that.
|
@dukc a brief changelog entry to make this visible would be appreciated. |
|
I think it's better that someone else writes the entry. Since this is a security fix it may justify some special way to do the changelog entry, so I think it's better that you or another central contributor decides about that and writes it. |
|
Still awaiting answer. |
compiler/src/dmd/lexer.d
Outdated
| string msg; | ||
| auto result = decodeUTFpure(msg); | ||
|
|
||
| if (msg) error("%.*s", cast(int)msg.length, msg.ptr); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if-statement body should be on its own line
I don't know what kind of special way you're thinking of, but here's my suggestion: |
Red fonts, or possibly asking to update immediately. I don't know whether that would be appreciated or needless drama, so I thought it'd be faster for someone who has authority to decide to word the message himself. Anyway, I added the entry exactly as you wrote it. |
|
ping @ibuclaw |
|
Every security vulnerability disclosed since meltdown that gives itself a name, a website... is needless drama. :-) |
|
I would not call this a vulnerability / (and the change a security fix). I think this is better described as a security precaution and improvement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've just noticed the changelog name, that should be properly prefixed so it's picked up. :-)
|
@ibuclaw I assume you mean it the casing had to be corrected. Done. Also fixed the if condition style Dennis said about earlier. |
|
Oh it really needed a prefix. No other entries in my local fork have that prefix so I didn't know what you were talking about. |
Yes, see #14600 for the change that enforces this. As part of the dmd/runtime merger we still need a way to differentiate between compiler and runtime changes (moving dmd changelog to compiler was rejected, so this is the next least worst thing we can do to keep mostly the same status quo for now). |
No description provided.