-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[clang][scan-build] Treat --use-cc and --use-c++ as shell commands #131932
Conversation
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-clang-static-analyzer-1 @llvm/pr-subscribers-clang Author: Florian Ragwitz (rafl) ChangesSo that things like --use-cc="ccache gcc" work. Fixes #26594. Also use the slightly simpler shellwords instead of quotewords. Cc-ing the Clang static analyzer maintainers @haoNoQ, @Xazax-hun, and @steakhal. Full diff: https://github.com/llvm/llvm-project/pull/131932.diff 1 Files Affected:
diff --git a/clang/tools/scan-build/libexec/ccc-analyzer b/clang/tools/scan-build/libexec/ccc-analyzer
index 74f812aef8fdf..655ded4b102be 100755
--- a/clang/tools/scan-build/libexec/ccc-analyzer
+++ b/clang/tools/scan-build/libexec/ccc-analyzer
@@ -63,6 +63,7 @@ sub SearchInPath {
}
my $Compiler;
+my @CompilerArgs;
my $Clang;
my $DefaultCCompiler;
my $DefaultCXXCompiler;
@@ -89,7 +90,7 @@ if (`uname -s` =~ m/Darwin/) {
}
if ($FindBin::Script =~ /c\+\+-analyzer/) {
- $Compiler = $ENV{'CCC_CXX'};
+ ($Compiler, @CompilerArgs) = shellwords($ENV{'CCC_CXX'});
if (!defined $Compiler || (! -x $Compiler && ! SearchInPath($Compiler))) { $Compiler = $DefaultCXXCompiler; }
$Clang = $ENV{'CLANG_CXX'};
@@ -98,7 +99,7 @@ if ($FindBin::Script =~ /c\+\+-analyzer/) {
$IsCXX = 1
}
else {
- $Compiler = $ENV{'CCC_CC'};
+ ($Compiler, @CompilerArgs) = shellwords($ENV{'CCC_CC'});
if (!defined $Compiler || (! -x $Compiler && ! SearchInPath($Compiler))) { $Compiler = $DefaultCCompiler; }
$Clang = $ENV{'CLANG'};
@@ -199,7 +200,7 @@ sub GetCCArgs {
die "could not find clang line\n" if (!defined $line);
# Strip leading and trailing whitespace characters.
$line =~ s/^\s+|\s+$//g;
- my @items = quotewords('\s+', 0, $line);
+ my @items = shellwords($line);
my $cmd = shift @items;
die "cannot find 'clang' in 'clang' command\n" if (!($cmd =~ /clang/ || basename($cmd) =~ /llvm/));
# If this is the llvm-driver the internal command will look like "llvm clang ...".
@@ -462,9 +463,9 @@ my $Output;
my %Uniqued;
# Forward arguments to gcc.
-my $Status = system($Compiler,@ARGV);
+my $Status = system($Compiler,@CompilerArgs,@ARGV);
if (defined $ENV{'CCC_ANALYZER_LOG'}) {
- print STDERR "$Compiler @ARGV\n";
+ print STDERR "$Compiler @CompilerArgs @ARGV\n";
}
if ($Status) { exit($Status >> 8); }
|
Hey, thanks for coming by. Disclaimer: I'm not familiar with Perl. When I looked at this patch and I had the impression that If we don't currently have a way to pass extra |
Thanks for your feedback, Balázs!
The author of the original code this change is modifying seemed to make the same assumption, but I believe this is not true. All of the following treats CC/CXX as a command with potential arguments:
The fact that all of these tools will consult
Your understanding of what the proposed change does is correct. However, most if not all common build tools I've worked with treat these kinds of options as commands, and that's what users seem to expect as well, as per #26594. This behaviour is very ergonomic, especially when you're truly swapping out the "compiler", such as when replacing "gcc" with "ccache gcc" or similar, without having to write wrapper executables, which is what's currently required by
To me, following the established convention of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the context. It looks good to me now.
@haoNoQ, maybe you know some Perl, could you have a second opinion?
Otherwise, let's merge this in a week.
Thanks for having another look @balazs-benics-sonarsource. While we wait for a potential second opinion, I was curious about what you think of the current behaviour of silently defaulting to Would you have any preference between these options?
My personal preference is option 3. The only downside to this which I see is potentially breaking some existing usage, but it'd do so in an obvious and easy to fix way way. I'm not sure if that's acceptable in the context of this project. Option 2 would be my second choice, with Option 1 being the least preferable. |
I'd prefer option 2, because why else would we have a default compiler if that wasn't used in some workflows. |
So that things like --use-cc="ccache gcc" work. Fixes llvm#26594. Also use the slightly simpler shellwords instead of quotewords.
…ction I want to adjust the logic for CCC_CC and CCC_CXX defaults, so not repeating that logic twice is convenient. As a nice side-effect, we end up with fewer globals and less code. Contains no behavioural changes, except for checking for /usr/bin/xcrun even if we're not on OSX when there's an -isysroot parameter, which should be fine. We also don't call uname -s twice.
To avoid user confusion when we're ignoring the provided option and use a default instead.
If you went out of your way to specify the option, you probably knew what you wanted and didn't want the default you'd get from not specifying the option in the first place. We just let the error from from system() propagate to tell the user that the option they provided wasn't usable: $ scan-build-19 --use-cc="enoent gcc" make scan-build: Using '/usr/lib/llvm-19/bin/clang' for static analysis ... Can't exec "enoent": No such file or directory ...
I've added a few additional commits:
|
Looks good as it is right now. Thanks for putting the effort into this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall I'm satisfied with the direction of the commit and I agree that explicitly specifying an invalid compiler should be a hard error. (I could also accept a warning as long as it's loud and visible, but the current "silently use the default" behavior is terrible and must be replaced.)
|
||
$IsCXX = 0 | ||
sub DetermineClang { | ||
my ($is_cxx) = @_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bikeshedding: Please use $UpperCamelCase
within the new functions DetermineCompiler
/ DetermineClang
because it's a bit jarring when $IsCXX
/ $is_cxx
appears both in $UpperCamelCase
and in $snake_case
. I see that these styles both appear in the script, but in these parts $UpperCamelCase
is more common, so I think it would be better if you used that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm with you on the mix of different styles in this code being far from ideal. I did briefly consider cleaning those things up a bit, but decided to not extend the scope of this changeset further and to perhaps save those kinds of changes for a separate pull-request if there was any interest in that.
Here's how I arrived at the names currently used in this pull-request:
-
Functions
Almost all existing functions use UpperCamelCase, so I stuck with that even though it's not at all typical for the majority of modern(-ish) Perl code -
Variables
These were also quite inconsistent, but I stuck with common modern(-ish) practices, where most regular lexical variables use$snake_case
, constants use$ALL_CAPS
, and "globals" use$UpperCamelCase
as per https://metacpan.org/dist/perl/view/pod/perlstyle.pod, the old Perl Best Practices book, and established community conventions.I'd find it much more jarring if a function was to potentially shadow a variable which reads like a
$Global
from a surrounding scope.
I don't feel particularly strongly about it, and would appreciate your input on the following suggestions for how to best move forwards:
- Just call the parameter
$IsCXX
in the functions as suggested.
This defies the expectations of many Perl programmers when it comes to naming, but so does most of the rest of the program. - Keep things as they are.
This is consistent with modern practices. - Call the local
$is_cxx
variable something else (but not$IsCXX
), or rename the "global"$IsCXX
.
To not potentially confuse people unfamiliar with Perl or its current common practices. - Just use the "global"
$IsCXX
directly.
Reducing some of the global state to make it slightly easier to reason about those individual functions seemed nice, but given how much of it there is anyway, maybe it doesn't even matter in this context.
Thanks for your review and your feedback on the above!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, thanks for the clarifications. I didn't know about these Perl naming conventions (last time I touched Perl was 10+ years ago), but now that you explained them, I agree that it's reasonable to follow them and keep the code as it is currently. (The $is_cxx
/ $IsCXX
difference is a bit strange for the untrained eye, but it won't cause confusion or misunderstandings.)
@rafl Congratulations on having your first Pull Request (PR) merged into the LLVM Project! Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR. Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues. How to do this, and the rest of the post-merge process, is covered in detail here. If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again. If you don't get any reports, no action is required from you. Your changes are working as expected, well done! |
} | ||
|
||
$AnalyzerTarget = $ENV{'CLANG_ANALYZER_TARGET'}; | ||
my $IsCXX = $FindBin::Script =~ /c\+\+-analyzer/; | ||
my ($Compiler, @CompilerArgs) = DetermineCompiler($IsCXX); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @rafl .
I'm not familiar with Perl. Just curious, how does it parse, for example, ccache gcc
? Would $Compiler
hold ccache
and gcc
becomes one of the @CompilerArgs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ziqingluo-90 that's exactly right. DetermineCompiler
returns a list of values (potentially a singleton list), and the call-site unpacks that return value into the first list element $Compiler
and the rest of the list @CompilerArgs
(which might be empty).
The actual parsing is done using shellwords from Text::ParseWords.
If you'd like to experiment with how it parses certain inputs, you can do something like this, which prints out the parsed list, one item per line:
$ CCC_CC='foo "bar baz" \"quux' perl -MText::ParseWords -le'print for shellwords $ENV{CCC_CC}'
foo
bar baz
"quux
So that things like --use-cc="ccache gcc" work.
Fixes #26594.
Also use the slightly simpler shellwords instead of quotewords.