Support non-self-contained files #45

kadircet · 2019-05-13T16:53:16Z

Currently clangd doesn't work with non-self-contained files, even if people have a way to build their projects with those files, for example by merging some/all of non-self-contained files into a single TU and building that file.

bstaletic · 2019-05-14T18:41:38Z

For posterity, ycmd/libclang allows this by allowing users to return something like { 'flags': list_of_flags, 'override_filename': some_file_name } from the extra confs.

That makes the TU main file be the value of override_filename, but completion is still requested for the original filename. As for the libclang API, this means:

clang_parseTranslationUnit2FullArgv gets passed the value of override_filename
clang_codeCompleteAt gets passed the original filename.

Barricade · 2019-11-07T23:46:52Z

Really waiting for this feature to release!

bstaletic · 2020-02-26T14:52:10Z

For the record, YCM has just heard from another users who needed override_filename. This feature can also be supported for users who use compile_commands.json, if the comp db specification can be extended.

[
  {
    "directory": "/home/bstaletic/test/",
    "command": [ "/usr/bin/c++", "-c", "bar.cpp" ],
    "file": "/home/bstaletic/test/bar.cpp",
    "override_filename": "/home/bstaletic/test/foo.cpp"
  }
]

In the above example:

directory is the same as it always was.
command is the same - contains the actual command line.
file would contain the file containing the cursor.
override_filename would contain the TU to be parsed

This is just an idea that, frankly, hasn't been given much thought and it doesn't address the problem of wiring this throughout the clangd code.

Considering that the latest cmake has implemented support for unity builds and that the comp db generation is broken when used with unity builds, override_filename should probably be considered again.

n00bmind · 2020-05-01T08:54:35Z

+1000 to this. I think unity builds are gaining more and more traction, and this would massively help a lot of people.

intractabilis · 2020-06-08T04:17:15Z

Documentation to llvm 8 said that to achieve what people ask in this bug, one needs to use clangd-indexer. These lines disappeared from llvm 10 documentation, it just says that clangd-indexer is for developers.

Why this feature is so complicated?

HighCommander4 · 2020-06-08T04:47:12Z

Documentation to llvm 8 said that to achieve what people ask in this bug, one needs to use clangd-indexer. These lines disappeared from llvm 10 documentation, it just says that clangd-indexer is for developers.

AFAIK, clangd-indexer has never supported the use cases described in this bug. The "project-wide view" mentioned in the linked doc just refers to e.g. being able to perform Find References and get results in other translation units.

Mention of clangd-indexer has been removed because that functionality is now built into clangd via the background indexing feature.

sam-mccall · 2020-06-08T09:45:52Z

Yes, what Nathan said.

Why this feature is so complicated?

A few main reasons:

clangd relies heavily on the preamble optimization: splitting the file in two parts which can be built separately. This is essential for performance (10-1000x speedup on common operations depending on the codebase) and greatly complicates the compilation model. Cross-file preambles will complicate it further.
preamble building is a part of clang, it's complicated and entangled with other features, none of us understand it well enough that making changes is straightforward.
clangd relies in many places on clang's notion of the main file, untangling the concepts of "interesting file" and "entrypoint" is a lot of work, and we don't have great ideas for isolating the complexity
(dons flame-retardant suit) Self-contained files are good software engineering practice (and don't prevent unity builds AFAIK). Most clangd developers are paid by employers who enforce this practice - in part to make tools easier to write. We do spend lots of time on things to make clangd more well-rounded in general even outside what $employer cares about directly, but it affects prioritization for sure, especially for features like this that are probably many months of work.

intractabilis · 2020-06-08T14:09:34Z

Self-contained files are good software engineering practice

Maybe, but I can't rewrite all C++ libraries. Jumping to definitions is most useful when exploring unfamiliar external code. I have no influence to force all the authors to implement good practices.

intractabilis · 2020-06-08T14:17:38Z

Wait, Nathan? Eclipse CDT supporter, who suggested to use clangd instead of CDT's cpp-indexer? You actually work on clangd? :)

bstaletic · 2020-06-08T15:50:50Z

(and don't prevent unity builds AFAIK)

They kinda do. If your entire codebase is "shoved" into a single TU, your compile_commands.json contains only that single entry which may or may not work for other directories in that codebase.

I've heard that people work around this problem by running cmake twice (now that cmake actually has built-in support for unity builds). Once with unity off, to get a working compile_commands.json and once with unity on, to compile the project. This is of no use to those who don't use cmake.

HighCommander4 · 2020-06-08T15:59:23Z

Wait, Nathan? Eclipse CDT supporter, who suggested to use clangd instead of CDT's cpp-indexer? You actually work on clangd? :)

I contribute to it, yeah :) I switched from Eclipse CDT about a year ago.

HighCommander4 · 2020-06-08T16:10:23Z

Self-contained files are good software engineering practice (and don't prevent unity builds AFAIK).

Clangd does not currently work for unity builds (see e.g. #147 (comment)). Perhaps you mean support for them could be added more easily than by solving the challenging issues you described?

Most clangd developers are paid by employers who enforce this practice

To be fair, LLVM itself does use non-self contained files at times. I'm thinking of things like #include "clang/Basic/TokenKinds.def".

sam-mccall · 2020-06-08T16:25:40Z

Clangd does not currently work for unity builds (see e.g. #147 (comment)).

Right, but if you have self-contained files, you don't have to always do a unity build with them. Wouldn't you only want to do the unity build as a final heavily-optimized, slow-building, non-incremental release build, but use a regular build for development?

To be fair, LLVM itself does use non-self contained files at times

LLVM does lots of questionable things :-) Tablegen .def format falls more into the "stupid pp tricks" bucket than sensible C++ code we're just having trouble parsing, though.

puremourning · 2020-06-08T16:29:10Z

but use a regular build for development?

one of the main purposes of jumbo TU is to reduce compile time where it's dominated by IO (particularly includes, etc.). At work i benchmarked speedup comparable to the preamble trick on compiling a libarary as a single jumbo vs -j on a 40 core system. in this case, dominating factor was the SAN

HighCommander4 · 2020-06-08T16:33:39Z

Wouldn't you only want to do the unity build as a final heavily-optimized, slow-building, non-incremental release build, but use a regular build for development?

No, some projects use unity builds for build performance (since it means common includes only have to be parsed once per unified-TU rather than once per individual-TU).

Some projects take this a step further and only support unity builds, since non-unity builds can easily break due to include-what-you-use violations if devs are doing unity builds locally, and the project may not have the CI resources to test both configurations. Such projects don't even have the option of doing a non-unity build specifically for clangd's consumption.

intractabilis · 2020-06-08T17:44:11Z

What is unity build?

intractabilis · 2020-06-08T17:47:41Z

I switched from Eclipse CDT about a year ago

I am switching now. Especially after C++20 CDT became unbearable.

HighCommander4 · 2020-06-08T18:19:10Z

What is unity build?

It's a technique where groups of source files are combined (by e.g. #include-ing several .cpp files into one file), and the resulting files are compiled, rather than the individual source files. It has several advantages, such as common includes only having to be parsed once per combined source file, more inlining / inter-procedural optimization opportunities, etc. (And also downsides, such as a greater potential to introduce undetected include-what-you-use violations, and incremental builds potentially being slower.)

intractabilis · 2020-06-08T18:39:51Z

It's a technique where groups of source files are combined (by e.g. #include-ing several .cpp files into one file), and the resulting files are compiled, rather than the individual source files.

Wow. Never've heard of this technique.

abpostelnicu · 2020-07-03T15:13:38Z

@sam-mccall so just to understand this correctly, for the moment this is very hard to implement because of #45 (comment). Does what you said earlier also apply in an unified-build environment?

Trass3r · 2020-10-11T21:05:28Z

(and don't prevent unity builds AFAIK)

They kinda do. If your entire codebase is "shoved" into a single TU, your compile_commands.json contains only that single entry which may or may not work for other directories in that codebase.

I've heard that people work around this problem by running cmake twice (now that cmake actually has built-in support for unity builds). Once with unity off, to get a working compile_commands.json and once with unity on, to compile the project. This is of no use to those who don't use cmake.

Vote for https://gitlab.kitware.com/cmake/cmake/-/issues/19826

puremourning · 2022-02-01T21:44:20Z

it needs to not add too much complexity, or at least isolate the complexity to a few places. "Priorities and effort" also means this can't become a maintenance timesink.

Ycmd's implementation was not a timesink at all. In fact, no maintenance was needed after the initial PR. Our implementation does mean that needed a path to the main file now needs two paths - file path to be parsed and a file path of the cursor location.

To confirm: the ycmd implementation to support this took me less than an hour. (apropos nothing; ycmd libclang engine is an invisible fraction of complexity compared with clangd)

Also added includes in *.ipp files to enable jump-to-declaration in IDEs. This sadly results in clangd "main file cannot be included recursively" error annotations in Qt Creator, probably due to clangd/clangd#45 . The definitions in inline-only ipp files will be moved to their corresponding headers so that at least the clangd error annotations can go away in CPPWAMP_COMPILED_LIB mode. Fixes #109

johnhubbard · 2022-09-04T20:36:26Z

I just ran into this problem while trying to look at things like wake_up_var() in today's Linux kernel. This symbol is in a file (wait_bit.c) that is included from within another file (build_utility.c), and so VS Code + clangd cannot browse any of the symbols within wait_bit.c.

This is a serious limitation. If you look at the top of build_utility.c (and also look through "git blame build_utility.c"), you'll notice that this is an intentional, recent effort to reduce build times. And that means that my chances of changing this are basically zero--people want it this way.

So now I'm reduced to maintaining local hacks in order to browse a big chunk of the sched unit in the Linux kernel, with a fair chance that the damage will continue to spread, because this is apparently a popular way to speed up builds.

This is an extremely bad situation that will drive all of us back to horrible older techniques, unless clangd loses the "this is unholy and impure and bad software engineering, so we are not interested" attitude. Please reconsider that, because Real Projects are messy and impure!

HighCommander4 · 2022-09-04T23:01:58Z

unless clangd loses the "this is unholy and impure and bad software engineering, so we are not interested" attitude. Please reconsider that, because Real Projects are messy and impure!

I believe it's been clarified in later comments (particularly this one) that the project is open to contributions to improve support for non-self-contained files.

It's just that there are non-trivial technical challenges to overcome (as detailed in this comment and this one), and someone invested in improving support for these scenarios will need to put in the legwork to solve them.

johnhubbard · 2022-09-05T00:42:31Z

Very good. In that case, I'll tell my compiler engineering colleagues about this, and see if it sparks any interest.

xulongzhe · 2023-02-16T13:09:34Z

nearly 4 years past, is there any body work on it? we have lots of legacy code write as non-self-contained style, looking forward this feature, so we can migrate to vscode from VS or CLion as our main IDE. Vscode + Clangd is so good for daily deploping, is three any plan?

mlabbe · 2023-05-11T19:08:04Z

Since January I have successfully worked around this with a method I have not seen proposed anywhere else, as outlined in this blog post which I wrote at the time. https://www.frogtoss.com/labs/clangd-with-unity-builds.html

simon8233 · 2023-08-31T03:45:58Z

another workaround is -include="xxx.h" in compile_commands or clangd file

detly · 2023-11-22T07:15:55Z

Just to add a data point to this that is not about unity builds: I have seen and used this style for DSP and embedded projects, where

there is a C module exposing a small number of API functions in module_a/include/module_a.h
module_a/src/module_a.c contains those function with external linkage, but also extra helper functions with internal linkage (ie. static functions)
we want to unit test the helper functions, because they're sufficiently complex to warrant it

I mention DSP and embedded as a context because it relates to point #2. It seems like one solution to this would be to have the helper functions have external linkage, and have separate private header files for them (or just put them in a separate module). Unit tests could then use that private module/header's API.

But this means that the functions have to appear in the symbol table, and can't be inlined, or can be inlined but need to be duplicated as non-inlined versions, even though they're not meant to be public. It's bloat and pessimisation we want to avoid for DSP/embedded applications.

Another solution that avoids this is to consider that the unit tests for that module form a kind of single-TU project (or subproject), where only the main function (or top-level test runner function) is exposed, and it has the same set of static "helper" functions as the real project. And so your unit tests include the implementation file (module_a/src/module_a.c):

// module_a/test/test_module_a.c
#include "unit_test_harness_whatever.h"
#include "test_module_a.h"
// Note include of implementation file!
#include "module_a/src/module_a.c"

static void test_module_a_helper_x_but_not_y(void) { /* ... */ }

static void test_module_a_helper_annoying_corner_case(void) { /* ... */ }

void test_module_a_all(void) {
    RUN_TEST(test_module_a_helper_x_but_not_y);
    RUN_TEST(test_module_a_helper_annoying_corner_case);
}

Yes it's niche, yes it looks jarring and unidiomatic at first, but it's valid, it makes sense and works in these contexts, and I've seen it in real, professional, high-quality C code bases. (It also tests that the implementation file is appropriately self-contained!)

There are other solutions to this as well, eg. #define your linkage keyword to be static for library builds but empty for unit test builds. But that has drawbacks, and the benefits of support from a tool that hadn't been invented yet couldn't really be factored in to the decision (and probably aren't motivating enough to do retroactively).

puremourning · 2023-11-22T07:37:45Z

Well yes. This fundamentally applies to any TU which is composed of multiple files (including headers, c, cpp, inc, etc). Clangd models TU=file and as the others have said it's a big refactor to fix it.

detly · 2023-11-22T08:54:40Z

Well yes. This fundamentally applies to any TU which is composed of multiple files

You're not wrong, I just wanted to provide another data point to help avoid some kind of unity-build-specific solution (I have no idea how that would arise, but you never know).

sliedes · 2024-05-08T09:41:24Z

clangd relies heavily on the preamble optimization: splitting the file in two parts which can be built separately. This is essential for performance (10-1000x speedup on common operations depending on the codebase) and greatly complicates the compilation model. Cross-file preambles will complicate it further.

Ok, I'm trying to understand the problem a bit more deeply. Is this a fair restatement of what you said above?:

Let's say we have CPython, which is a bit of a mess of non-self-contained headers, but basically the external interface is that anyone using it is supposed to #include <Python.h>. Probably that header is approximately the only header in the project that can be assumed to be self-contained.

While it would be possible to devise a mechanism to allow the user to say "do not even try to parse this header as a TU, parse Python.h instead"; but this would necessarily result in abysmal performance at least when someone edits one of those headers, as most of the entire Python.h needs to be reparsed. (For users, of course, I'd assume that they don't edit the headers often. For CPython developers, they might.)

Or would it even affect performance for people just using <Python.h> without going deeper into the private headers?

preamble building is a part of clang, it's complicated and entangled with other features, none of us understand it well enough that making changes is straightforward.

clangd relies in many places on clang's notion of the main file, untangling the concepts of "interesting file" and "entrypoint" is a lot of work, and we don't have great ideas for isolating the complexity

Is this more of a problem of "where do we get that information"? I noticed that there's support for IWYU pragmas for specifying things like "this header is private; if you want its symbols, always include header X instead". That seems very much the same kind of configuration data that would be needed by this, but I think it's currently only used for flagging non-direct includes and automatically adding includes.

Or is it something more nuanced?

(dons flame-retardant suit) Self-contained files are good software engineering practice (and don't prevent unity builds AFAIK). Most clangd developers are paid by employers who enforce this practice - in part to make tools easier to write. We do spend lots of time on things to make clangd more well-rounded in general even outside what $employer cares about directly, but it affects prioritization for sure, especially for features like this that are probably many months of work.

No arguments here. I'm a big enough fan of self-contained headers and even C++ modules that I'd generally be willing to spend a non-trivial amount of time to clean up things. With projects like CPython, I suspect selling a major header file refactoring to CPython upstream might be an uphill battle, even if they see the value in good IDE support...

sliedes · 2024-05-08T09:58:44Z

FWIW my best guess is:

we'd want to build a preamble for the entrypoint file and then parse everything from there. This involves eliminating from many places the assumptions that a) the preamble is part of the file we care about, b) the file we care about is the "main file" as far as SourceManager is concerned.

a significant challenge with this solution is the performance where the fragment file has #includes which are not part of the entrypoint file's preamble

the include graph in the background index or the includer cache in TUScheduler are a reasonable initial step for identifying entrypoints but we'll eventually want something better built by fast include-scanning.

I now see that some of my previous comment's questions were addressed in this comment, which I somehow missed on the first skim... Sorry about the noise!

jvikstrom mentioned this issue Jul 31, 2019

Clangd crashes due to toHalfOpenFileRange assertion #113

Closed

kadircet mentioned this issue Sep 9, 2019

Clangd does not resolve include, in spite of correct include path being present in compile command #147

Closed

sam-mccall added the enhancement New feature or request label Nov 28, 2019

kadircet mentioned this issue Jan 20, 2020

Inserting #include "config.h" into the top of all parsed header files? #257

Closed

konradsa mentioned this issue Jan 22, 2020

Issue with template implemention (tcc) files #262

Open

sam-mccall mentioned this issue Apr 14, 2020

Header-only library recursively including main file #337

Open

kadircet mentioned this issue May 26, 2020

Base highlighting and code navigation on all parsed files #402

Closed

kadircet mentioned this issue Jul 29, 2020

Code completion does not list macro after perform go-to-def on macro. #478

Open

HighCommander4 mentioned this issue Oct 11, 2020

Go to definition fails if *.cpp files are included clangd/vscode-clangd#83

Open

This was referenced Dec 20, 2021

Clangd shows errors when opening a standard library implementation header #136

Closed

Error extraneous_closing_brace in includes #200

Closed

HighCommander4 mentioned this issue Dec 20, 2021

Customize background index storage location #184

Open

zhwufd mentioned this issue Jan 10, 2022

clangd 13 crash when background indexing source code in xgboost project #983

Open

matta mentioned this issue Feb 8, 2022

The extractor sometimes picks a non-package-local .cc file's command for header files hedronvision/bazel-compile-commands-extractor#26

Closed

HighCommander4 mentioned this issue Apr 10, 2022

clangd is inferring the wrong source file from a unity build source file #1098

Closed

boxfire mentioned this issue Aug 3, 2022

Support intermediate build commands #1238

Open

HighCommander4 mentioned this issue Aug 29, 2022

clangd cannot infer that a header file was included in another header file (JUCE) #1268

Closed

IlVirtuoso mentioned this issue Jan 4, 2023

clangd crashes on delete[] operator (cuda project) #1452

Open

HighCommander4 mentioned this issue Mar 13, 2023

introduce the ability to browse translation units & inclusions instead of context-free header files #1534

Open

caxapexac mentioned this issue May 23, 2023

Crash on several project files on CLion #1645

Closed

HighCommander4 mentioned this issue Aug 13, 2023

Headers parsing behaviour depends on files opening order #1720

Open

sr-tream mentioned this issue Nov 16, 2023

[clangd] Add includes from source to non-self-contained headers llvm/llvm-project#72479

Closed

HighCommander4 mentioned this issue Dec 27, 2023

qemu clangd error #1880

Closed

mesbahamin mentioned this issue Feb 27, 2024

False Diagnostic: redefinition_different_kind in header file #1949

Open

This was referenced May 11, 2024

clangd do not recognise headers located in pch #2034

Closed

headers from pch are not recognized inside project #2040

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support non-self-contained files #45

Support non-self-contained files #45

kadircet commented May 13, 2019

bstaletic commented May 14, 2019

Barricade commented Nov 7, 2019

bstaletic commented Feb 26, 2020 •

edited

Loading

n00bmind commented May 1, 2020

intractabilis commented Jun 8, 2020

HighCommander4 commented Jun 8, 2020

sam-mccall commented Jun 8, 2020 •

edited

Loading

intractabilis commented Jun 8, 2020

intractabilis commented Jun 8, 2020

bstaletic commented Jun 8, 2020

HighCommander4 commented Jun 8, 2020

HighCommander4 commented Jun 8, 2020

sam-mccall commented Jun 8, 2020

puremourning commented Jun 8, 2020 •

edited

Loading

HighCommander4 commented Jun 8, 2020

intractabilis commented Jun 8, 2020

intractabilis commented Jun 8, 2020

HighCommander4 commented Jun 8, 2020 •

edited

Loading

intractabilis commented Jun 8, 2020

abpostelnicu commented Jul 3, 2020 •

edited

Loading

Trass3r commented Oct 11, 2020 •

edited

Loading

puremourning commented Feb 1, 2022

johnhubbard commented Sep 4, 2022 •

edited

Loading

HighCommander4 commented Sep 4, 2022

johnhubbard commented Sep 5, 2022

xulongzhe commented Feb 16, 2023 •

edited

Loading

mlabbe commented May 11, 2023

simon8233 commented Aug 31, 2023

detly commented Nov 22, 2023 •

edited

Loading

puremourning commented Nov 22, 2023

detly commented Nov 22, 2023

sliedes commented May 8, 2024

sliedes commented May 8, 2024 •

edited

Loading

Support non-self-contained files #45

Support non-self-contained files #45

Comments

kadircet commented May 13, 2019

bstaletic commented May 14, 2019

Barricade commented Nov 7, 2019

bstaletic commented Feb 26, 2020 • edited Loading

n00bmind commented May 1, 2020

intractabilis commented Jun 8, 2020

HighCommander4 commented Jun 8, 2020

sam-mccall commented Jun 8, 2020 • edited Loading

intractabilis commented Jun 8, 2020

intractabilis commented Jun 8, 2020

bstaletic commented Jun 8, 2020

HighCommander4 commented Jun 8, 2020

HighCommander4 commented Jun 8, 2020

sam-mccall commented Jun 8, 2020

puremourning commented Jun 8, 2020 • edited Loading

HighCommander4 commented Jun 8, 2020

intractabilis commented Jun 8, 2020

intractabilis commented Jun 8, 2020

HighCommander4 commented Jun 8, 2020 • edited Loading

intractabilis commented Jun 8, 2020

abpostelnicu commented Jul 3, 2020 • edited Loading

Trass3r commented Oct 11, 2020 • edited Loading

puremourning commented Feb 1, 2022

johnhubbard commented Sep 4, 2022 • edited Loading

HighCommander4 commented Sep 4, 2022

johnhubbard commented Sep 5, 2022

xulongzhe commented Feb 16, 2023 • edited Loading

mlabbe commented May 11, 2023

simon8233 commented Aug 31, 2023

detly commented Nov 22, 2023 • edited Loading

puremourning commented Nov 22, 2023

detly commented Nov 22, 2023

sliedes commented May 8, 2024

sliedes commented May 8, 2024 • edited Loading

bstaletic commented Feb 26, 2020 •

edited

Loading

sam-mccall commented Jun 8, 2020 •

edited

Loading

puremourning commented Jun 8, 2020 •

edited

Loading

HighCommander4 commented Jun 8, 2020 •

edited

Loading

abpostelnicu commented Jul 3, 2020 •

edited

Loading

Trass3r commented Oct 11, 2020 •

edited

Loading

johnhubbard commented Sep 4, 2022 •

edited

Loading

xulongzhe commented Feb 16, 2023 •

edited

Loading

detly commented Nov 22, 2023 •

edited

Loading

sliedes commented May 8, 2024 •

edited

Loading