Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[clang][scan-build] Treat --use-cc and --use-c++ as shell commands #131932

Merged
merged 5 commits into from
Mar 24, 2025

Conversation

rafl
Copy link
Contributor

@rafl rafl commented Mar 18, 2025

So that things like --use-cc="ccache gcc" work.

Fixes #26594.

Also use the slightly simpler shellwords instead of quotewords.

Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:static analyzer labels Mar 18, 2025
@llvmbot
Copy link
Member

llvmbot commented Mar 18, 2025

@llvm/pr-subscribers-clang-static-analyzer-1

@llvm/pr-subscribers-clang

Author: Florian Ragwitz (rafl)

Changes

So that things like --use-cc="ccache gcc" work.

Fixes #26594.

Also use the slightly simpler shellwords instead of quotewords.

Cc-ing the Clang static analyzer maintainers @haoNoQ, @Xazax-hun, and @steakhal.


Full diff: https://github.com/llvm/llvm-project/pull/131932.diff

1 Files Affected:

  • (modified) clang/tools/scan-build/libexec/ccc-analyzer (+6-5)
diff --git a/clang/tools/scan-build/libexec/ccc-analyzer b/clang/tools/scan-build/libexec/ccc-analyzer
index 74f812aef8fdf..655ded4b102be 100755
--- a/clang/tools/scan-build/libexec/ccc-analyzer
+++ b/clang/tools/scan-build/libexec/ccc-analyzer
@@ -63,6 +63,7 @@ sub SearchInPath {
 }
 
 my $Compiler;
+my @CompilerArgs;
 my $Clang;
 my $DefaultCCompiler;
 my $DefaultCXXCompiler;
@@ -89,7 +90,7 @@ if (`uname -s` =~ m/Darwin/) {
 }
 
 if ($FindBin::Script =~ /c\+\+-analyzer/) {
-  $Compiler = $ENV{'CCC_CXX'};
+  ($Compiler, @CompilerArgs) = shellwords($ENV{'CCC_CXX'});
   if (!defined $Compiler || (! -x $Compiler && ! SearchInPath($Compiler))) { $Compiler = $DefaultCXXCompiler; }
 
   $Clang = $ENV{'CLANG_CXX'};
@@ -98,7 +99,7 @@ if ($FindBin::Script =~ /c\+\+-analyzer/) {
   $IsCXX = 1
 }
 else {
-  $Compiler = $ENV{'CCC_CC'};
+  ($Compiler, @CompilerArgs) = shellwords($ENV{'CCC_CC'});
   if (!defined $Compiler || (! -x $Compiler && ! SearchInPath($Compiler))) { $Compiler = $DefaultCCompiler; }
 
   $Clang = $ENV{'CLANG'};
@@ -199,7 +200,7 @@ sub GetCCArgs {
   die "could not find clang line\n" if (!defined $line);
   # Strip leading and trailing whitespace characters.
   $line =~ s/^\s+|\s+$//g;
-  my @items = quotewords('\s+', 0, $line);
+  my @items = shellwords($line);
   my $cmd = shift @items;
   die "cannot find 'clang' in 'clang' command\n" if (!($cmd =~ /clang/ || basename($cmd) =~ /llvm/));
   # If this is the llvm-driver the internal command will look like "llvm clang ...".
@@ -462,9 +463,9 @@ my $Output;
 my %Uniqued;
 
 # Forward arguments to gcc.
-my $Status = system($Compiler,@ARGV);
+my $Status = system($Compiler,@CompilerArgs,@ARGV);
 if (defined $ENV{'CCC_ANALYZER_LOG'}) {
-  print STDERR "$Compiler @ARGV\n";
+  print STDERR "$Compiler @CompilerArgs @ARGV\n";
 }
 if ($Status) { exit($Status >> 8); }
 

@balazs-benics-sonarsource
Copy link
Contributor

Hey, thanks for coming by. Disclaimer: I'm not familiar with Perl.

When I looked at this patch and I had the impression that CCC_CXX is basically what one should use as CXX or CMAKE_CXX_COMPILER, which is just a path to a binary - without any arguments.
This patch would turn it into a combination of what other tools interpret as CXX and CXX_FLAGS, did I get this right?
This would mean we would deviate from the conventions of other tools, making it less ergonomic.

If we don't currently have a way to pass extra CXX flags, we should consider adding something like that instead of hacking the CCC_CXX option. WDYT?

@rafl
Copy link
Contributor Author

rafl commented Mar 19, 2025

Thanks for your feedback, Balázs!

When I looked at this patch and I had the impression that CCC_CXX is basically what one should use as CXX or CMAKE_CXX_COMPILER, which is just a path to a binary - without any arguments.

The author of the original code this change is modifying seemed to make the same assumption, but I believe this is not true. All of the following treats CC/CXX as a command with potential arguments:

CC="ccache gcc" ./configure
make CC="ccache gcc"
CXX="ccache g++" cmake ...

CMAKE_<LANG>_COMPILER also supports potential arguments:

cmake ... -DCMAKE_C_COMPILER='qcc;--arg1;--arg2'

The fact that all of these tools will consult PATH to resolve the compiler executable when it's not a fully-qualified path further supports that it's a command as opposed to a path to an executable.

This patch would turn it into a combination of what other tools interpret as CXX and CXX_FLAGS, did I get this right? This would mean we would deviate from the conventions of other tools, making it less ergonomic.

Your understanding of what the proposed change does is correct. However, most if not all common build tools I've worked with treat these kinds of options as commands, and that's what users seem to expect as well, as per #26594.

This behaviour is very ergonomic, especially when you're truly swapping out the "compiler", such as when replacing "gcc" with "ccache gcc" or similar, without having to write wrapper executables, which is what's currently required by scan-build in these situations. In many situations it's possible to achieve the same goal using a combination of CC and CFLAGS, but this is often less desirable as it can quickly run into issues with accidentally overwriting CFLAGS from other sources or having to merge multiple CFLAGS correctly.

If we don't currently have a way to pass extra CXX flags, we should consider adding something like that instead of hacking the CCC_CXX option. WDYT?

To me, following the established convention of CC/CXX being commands as opposed to paths to executables seems like the most straightforward approach, but I'm certainly open to alternatives if you see any practical downsides to this.

Copy link
Contributor

@balazs-benics-sonarsource balazs-benics-sonarsource left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the context. It looks good to me now.
@haoNoQ, maybe you know some Perl, could you have a second opinion?

Otherwise, let's merge this in a week.

@rafl
Copy link
Contributor Author

rafl commented Mar 20, 2025

Thanks for having another look @balazs-benics-sonarsource.

While we wait for a potential second opinion, I was curious about what you think of the current behaviour of silently defaulting to $Default{CC,CXX}Compiler if CCC_{CC,CXX} isn't executable and can't be resolved to an executable within PATH. I originally found this surprising when CC="ccache gcc" would be effectively ignored. Would it be worth changing this while we're at it?

Would you have any preference between these options?

  1. keeping it as it is
  2. warning when ignoring an option and falling back to a default
  3. providing an obvious error

My personal preference is option 3. The only downside to this which I see is potentially breaking some existing usage, but it'd do so in an obvious and easy to fix way way. I'm not sure if that's acceptable in the context of this project.

Option 2 would be my second choice, with Option 1 being the least preferable.

@balazs-benics-sonarsource
Copy link
Contributor

I'd prefer option 2, because why else would we have a default compiler if that wasn't used in some workflows.
A warning could never hurt. I'm also flexible on the subject.

rafl added 5 commits March 21, 2025 07:29
So that things like --use-cc="ccache gcc" work.

Fixes llvm#26594.

Also use the slightly simpler shellwords instead of quotewords.
…ction

I want to adjust the logic for CCC_CC and CCC_CXX defaults, so not
repeating that logic twice is convenient.

As a nice side-effect, we end up with fewer globals and less code.

Contains no behavioural changes, except for checking for /usr/bin/xcrun
even if we're not on OSX when there's an -isysroot parameter, which
should be fine. We also don't call uname -s twice.
To avoid user confusion when we're ignoring the provided option and use
a default instead.
If you went out of your way to specify the option, you probably knew
what you wanted and didn't want the default you'd get from not
specifying the option in the first place.

We just let the error from from system() propagate to tell the user that
the option they provided wasn't usable:

  $ scan-build-19 --use-cc="enoent gcc" make
  scan-build: Using '/usr/lib/llvm-19/bin/clang' for static analysis
  ...
  Can't exec "enoent": No such file or directory ...
@rafl
Copy link
Contributor Author

rafl commented Mar 21, 2025

I'd prefer option 2, because why else would we have a default compiler if that wasn't used in some workflows. A warning could never hurt.

--use-cc/--use-c++/CCC_CC/CCC_CXX are optional, so the default would still be used when those options are not specified, which might be the majority of use-cases.

I've added a few additional commits:

  • 6806e42 includes some minor simplifications and deduplicates the logic we're talking about without changing behaviour.
  • e101c97 implements option 2, as that was your preference.
  • b086d3e implements my preferred option 3. I'd be happy to drop this commit if we wanna stick with option 2 instead. The commit message illustrates what users would experience when misconfiguring one of the options.
  • 805396f brings the handling of CLANG/CLANG_CXX in line with CCC_CC/CCC_CXX if we end up going for option 3. We'd also drop this if we go for option 2.

@balazs-benics-sonarsource
Copy link
Contributor

Looks good as it is right now. Thanks for putting the effort into this.
I've invited the rest of the folks probably interested in this to get a second opinion.

Copy link
Contributor

@NagyDonat NagyDonat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I'm satisfied with the direction of the commit and I agree that explicitly specifying an invalid compiler should be a hard error. (I could also accept a warning as long as it's loud and visible, but the current "silently use the default" behavior is terrible and must be replaced.)


$IsCXX = 0
sub DetermineClang {
my ($is_cxx) = @_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bikeshedding: Please use $UpperCamelCase within the new functions DetermineCompiler / DetermineClang because it's a bit jarring when $IsCXX / $is_cxx appears both in $UpperCamelCase and in $snake_case. I see that these styles both appear in the script, but in these parts $UpperCamelCase is more common, so I think it would be better if you used that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm with you on the mix of different styles in this code being far from ideal. I did briefly consider cleaning those things up a bit, but decided to not extend the scope of this changeset further and to perhaps save those kinds of changes for a separate pull-request if there was any interest in that.

Here's how I arrived at the names currently used in this pull-request:

  • Functions
    Almost all existing functions use UpperCamelCase, so I stuck with that even though it's not at all typical for the majority of modern(-ish) Perl code

  • Variables
    These were also quite inconsistent, but I stuck with common modern(-ish) practices, where most regular lexical variables use $snake_case, constants use $ALL_CAPS, and "globals" use $UpperCamelCase as per https://metacpan.org/dist/perl/view/pod/perlstyle.pod, the old Perl Best Practices book, and established community conventions.

    I'd find it much more jarring if a function was to potentially shadow a variable which reads like a $Global from a surrounding scope.

I don't feel particularly strongly about it, and would appreciate your input on the following suggestions for how to best move forwards:

  1. Just call the parameter $IsCXX in the functions as suggested.
    This defies the expectations of many Perl programmers when it comes to naming, but so does most of the rest of the program.
  2. Keep things as they are.
    This is consistent with modern practices.
  3. Call the local $is_cxx variable something else (but not $IsCXX), or rename the "global" $IsCXX.
    To not potentially confuse people unfamiliar with Perl or its current common practices.
  4. Just use the "global" $IsCXX directly.
    Reducing some of the global state to make it slightly easier to reason about those individual functions seemed nice, but given how much of it there is anyway, maybe it doesn't even matter in this context.

Thanks for your review and your feedback on the above!

Copy link
Contributor

@NagyDonat NagyDonat Mar 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for the clarifications. I didn't know about these Perl naming conventions (last time I touched Perl was 10+ years ago), but now that you explained them, I agree that it's reasonable to follow them and keep the code as it is currently. (The $is_cxx / $IsCXX difference is a bit strange for the untrained eye, but it won't cause confusion or misunderstandings.)

@balazs-benics-sonarsource balazs-benics-sonarsource merged commit cc5f829 into llvm:main Mar 24, 2025
9 of 11 checks passed
Copy link

@rafl Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

}

$AnalyzerTarget = $ENV{'CLANG_ANALYZER_TARGET'};
my $IsCXX = $FindBin::Script =~ /c\+\+-analyzer/;
my ($Compiler, @CompilerArgs) = DetermineCompiler($IsCXX);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rafl .
I'm not familiar with Perl. Just curious, how does it parse, for example, ccache gcc? Would $Compiler hold ccache and gcc becomes one of the @CompilerArgs?

Copy link
Contributor Author

@rafl rafl Mar 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ziqingluo-90 that's exactly right. DetermineCompiler returns a list of values (potentially a singleton list), and the call-site unpacks that return value into the first list element $Compiler and the rest of the list @CompilerArgs (which might be empty).

The actual parsing is done using shellwords from Text::ParseWords.

If you'd like to experiment with how it parses certain inputs, you can do something like this, which prints out the parsed list, one item per line:

$ CCC_CC='foo "bar baz" \"quux' perl -MText::ParseWords -le'print for shellwords $ENV{CCC_CC}'
foo
bar baz
"quux

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:static analyzer clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ccc-analyzer does not accept CCC_CC/CCC_CXX with quotes
5 participants