Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support non-self-contained files #45

Open
kadircet opened this issue May 13, 2019 · 47 comments
Open

Support non-self-contained files #45

kadircet opened this issue May 13, 2019 · 47 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@kadircet
Copy link
Member

Currently clangd doesn't work with non-self-contained files, even if people have a way to build their projects with those files, for example by merging some/all of non-self-contained files into a single TU and building that file.

@bstaletic
Copy link

For posterity, ycmd/libclang allows this by allowing users to return something like { 'flags': list_of_flags, 'override_filename': some_file_name } from the extra confs.

That makes the TU main file be the value of override_filename, but completion is still requested for the original filename. As for the libclang API, this means:

  • clang_parseTranslationUnit2FullArgv gets passed the value of override_filename
  • clang_codeCompleteAt gets passed the original filename.

@Barricade
Copy link

Really waiting for this feature to release!

@bstaletic
Copy link

bstaletic commented Feb 26, 2020

For the record, YCM has just heard from another users who needed override_filename. This feature can also be supported for users who use compile_commands.json, if the comp db specification can be extended.

[
  {
    "directory": "/home/bstaletic/test/",
    "command": [ "/usr/bin/c++", "-c", "bar.cpp" ],
    "file": "/home/bstaletic/test/bar.cpp",
    "override_filename": "/home/bstaletic/test/foo.cpp"
  }
]

In the above example:

  • directory is the same as it always was.
  • command is the same - contains the actual command line.
  • file would contain the file containing the cursor.
  • override_filename would contain the TU to be parsed

This is just an idea that, frankly, hasn't been given much thought and it doesn't address the problem of wiring this throughout the clangd code.

Considering that the latest cmake has implemented support for unity builds and that the comp db generation is broken when used with unity builds, override_filename should probably be considered again.

@n00bmind
Copy link

n00bmind commented May 1, 2020

+1000 to this. I think unity builds are gaining more and more traction, and this would massively help a lot of people.

@intractabilis
Copy link

Documentation to llvm 8 said that to achieve what people ask in this bug, one needs to use clangd-indexer. These lines disappeared from llvm 10 documentation, it just says that clangd-indexer is for developers.

Why this feature is so complicated?

@HighCommander4
Copy link

Documentation to llvm 8 said that to achieve what people ask in this bug, one needs to use clangd-indexer. These lines disappeared from llvm 10 documentation, it just says that clangd-indexer is for developers.

AFAIK, clangd-indexer has never supported the use cases described in this bug. The "project-wide view" mentioned in the linked doc just refers to e.g. being able to perform Find References and get results in other translation units.

Mention of clangd-indexer has been removed because that functionality is now built into clangd via the background indexing feature.

@sam-mccall
Copy link
Member

sam-mccall commented Jun 8, 2020

Yes, what Nathan said.

Why this feature is so complicated?

A few main reasons:

  • clangd relies heavily on the preamble optimization: splitting the file in two parts which can be built separately. This is essential for performance (10-1000x speedup on common operations depending on the codebase) and greatly complicates the compilation model. Cross-file preambles will complicate it further.
  • preamble building is a part of clang, it's complicated and entangled with other features, none of us understand it well enough that making changes is straightforward.
  • clangd relies in many places on clang's notion of the main file, untangling the concepts of "interesting file" and "entrypoint" is a lot of work, and we don't have great ideas for isolating the complexity
  • (dons flame-retardant suit) Self-contained files are good software engineering practice (and don't prevent unity builds AFAIK). Most clangd developers are paid by employers who enforce this practice - in part to make tools easier to write. We do spend lots of time on things to make clangd more well-rounded in general even outside what $employer cares about directly, but it affects prioritization for sure, especially for features like this that are probably many months of work.

@intractabilis
Copy link

Self-contained files are good software engineering practice

Maybe, but I can't rewrite all C++ libraries. Jumping to definitions is most useful when exploring unfamiliar external code. I have no influence to force all the authors to implement good practices.

@intractabilis
Copy link

Wait, Nathan? Eclipse CDT supporter, who suggested to use clangd instead of CDT's cpp-indexer? You actually work on clangd? :)

@bstaletic
Copy link

(and don't prevent unity builds AFAIK)

They kinda do. If your entire codebase is "shoved" into a single TU, your compile_commands.json contains only that single entry which may or may not work for other directories in that codebase.

I've heard that people work around this problem by running cmake twice (now that cmake actually has built-in support for unity builds). Once with unity off, to get a working compile_commands.json and once with unity on, to compile the project. This is of no use to those who don't use cmake.

@HighCommander4
Copy link

Wait, Nathan? Eclipse CDT supporter, who suggested to use clangd instead of CDT's cpp-indexer? You actually work on clangd? :)

I contribute to it, yeah :) I switched from Eclipse CDT about a year ago.

@HighCommander4
Copy link

Self-contained files are good software engineering practice (and don't prevent unity builds AFAIK).

Clangd does not currently work for unity builds (see e.g. #147 (comment)). Perhaps you mean support for them could be added more easily than by solving the challenging issues you described?

Most clangd developers are paid by employers who enforce this practice

To be fair, LLVM itself does use non-self contained files at times. I'm thinking of things like #include "clang/Basic/TokenKinds.def".

@sam-mccall
Copy link
Member

Clangd does not currently work for unity builds (see e.g. #147 (comment)).

Right, but if you have self-contained files, you don't have to always do a unity build with them. Wouldn't you only want to do the unity build as a final heavily-optimized, slow-building, non-incremental release build, but use a regular build for development?

To be fair, LLVM itself does use non-self contained files at times

LLVM does lots of questionable things :-) Tablegen .def format falls more into the "stupid pp tricks" bucket than sensible C++ code we're just having trouble parsing, though.

@puremourning
Copy link

puremourning commented Jun 8, 2020

but use a regular build for development?

one of the main purposes of jumbo TU is to reduce compile time where it's dominated by IO (particularly includes, etc.). At work i benchmarked speedup comparable to the preamble trick on compiling a libarary as a single jumbo vs -j on a 40 core system. in this case, dominating factor was the SAN

@HighCommander4
Copy link

Wouldn't you only want to do the unity build as a final heavily-optimized, slow-building, non-incremental release build, but use a regular build for development?

No, some projects use unity builds for build performance (since it means common includes only have to be parsed once per unified-TU rather than once per individual-TU).

Some projects take this a step further and only support unity builds, since non-unity builds can easily break due to include-what-you-use violations if devs are doing unity builds locally, and the project may not have the CI resources to test both configurations. Such projects don't even have the option of doing a non-unity build specifically for clangd's consumption.

@intractabilis
Copy link

What is unity build?

@intractabilis
Copy link

I switched from Eclipse CDT about a year ago

I am switching now. Especially after C++20 CDT became unbearable.

@HighCommander4
Copy link

HighCommander4 commented Jun 8, 2020

What is unity build?

It's a technique where groups of source files are combined (by e.g. #include-ing several .cpp files into one file), and the resulting files are compiled, rather than the individual source files. It has several advantages, such as common includes only having to be parsed once per combined source file, more inlining / inter-procedural optimization opportunities, etc. (And also downsides, such as a greater potential to introduce undetected include-what-you-use violations, and incremental builds potentially being slower.)

@intractabilis
Copy link

It's a technique where groups of source files are combined (by e.g. #include-ing several .cpp files into one file), and the resulting files are compiled, rather than the individual source files.

Wow. Never've heard of this technique.

@abpostelnicu
Copy link

abpostelnicu commented Jul 3, 2020

@sam-mccall so just to understand this correctly, for the moment this is very hard to implement because of #45 (comment). Does what you said earlier also apply in an unified-build environment?

@Trass3r
Copy link

Trass3r commented Oct 11, 2020

(and don't prevent unity builds AFAIK)

They kinda do. If your entire codebase is "shoved" into a single TU, your compile_commands.json contains only that single entry which may or may not work for other directories in that codebase.

I've heard that people work around this problem by running cmake twice (now that cmake actually has built-in support for unity builds). Once with unity off, to get a working compile_commands.json and once with unity on, to compile the project. This is of no use to those who don't use cmake.

Vote for https://gitlab.kitware.com/cmake/cmake/-/issues/19826

@puremourning
Copy link

it needs to not add too much complexity, or at least isolate the complexity to a few places. "Priorities and effort" also means this can't become a maintenance timesink.

Ycmd's implementation was not a timesink at all. In fact, no maintenance was needed after the initial PR. Our implementation does mean that needed a path to the main file now needs two paths - file path to be parsed and a file path of the cursor location.

To confirm: the ycmd implementation to support this took me less than an hour. (apropos nothing; ycmd libclang engine is an invisible fraction of complexity compared with clangd)

ecorm added a commit to ecorm/cppwamp that referenced this issue Apr 20, 2022
Also added includes in *.ipp files to enable jump-to-declaration
in IDEs. This sadly results in clangd "main file cannot be included
recursively" error annotations in Qt Creator, probably due to
clangd/clangd#45 .

The definitions in inline-only ipp files will be moved to their
corresponding headers so that at least the clangd error annotations
can go away in CPPWAMP_COMPILED_LIB mode.

Fixes #109
@johnhubbard
Copy link

johnhubbard commented Sep 4, 2022

I just ran into this problem while trying to look at things like wake_up_var() in today's Linux kernel. This symbol is in a file (wait_bit.c) that is included from within another file (build_utility.c), and so VS Code + clangd cannot browse any of the symbols within wait_bit.c.

This is a serious limitation. If you look at the top of build_utility.c (and also look through "git blame build_utility.c"), you'll notice that this is an intentional, recent effort to reduce build times. And that means that my chances of changing this are basically zero--people want it this way.

So now I'm reduced to maintaining local hacks in order to browse a big chunk of the sched unit in the Linux kernel, with a fair chance that the damage will continue to spread, because this is apparently a popular way to speed up builds.

This is an extremely bad situation that will drive all of us back to horrible older techniques, unless clangd loses the "this is unholy and impure and bad software engineering, so we are not interested" attitude. Please reconsider that, because Real Projects are messy and impure!

@HighCommander4
Copy link

unless clangd loses the "this is unholy and impure and bad software engineering, so we are not interested" attitude. Please reconsider that, because Real Projects are messy and impure!

I believe it's been clarified in later comments (particularly this one) that the project is open to contributions to improve support for non-self-contained files.

It's just that there are non-trivial technical challenges to overcome (as detailed in this comment and this one), and someone invested in improving support for these scenarios will need to put in the legwork to solve them.

@johnhubbard
Copy link

Very good. In that case, I'll tell my compiler engineering colleagues about this, and see if it sparks any interest.

@xulongzhe
Copy link

xulongzhe commented Feb 16, 2023

nearly 4 years past, is there any body work on it? we have lots of legacy code write as non-self-contained style, looking forward this feature, so we can migrate to vscode from VS or CLion as our main IDE. Vscode + Clangd is so good for daily deploping, is three any plan?

@mlabbe
Copy link

mlabbe commented May 11, 2023

Since January I have successfully worked around this with a method I have not seen proposed anywhere else, as outlined in this blog post which I wrote at the time. https://www.frogtoss.com/labs/clangd-with-unity-builds.html

@simon8233
Copy link

another workaround is -include="xxx.h" in compile_commands or clangd file

@detly
Copy link

detly commented Nov 22, 2023

Just to add a data point to this that is not about unity builds: I have seen and used this style for DSP and embedded projects, where

  1. there is a C module exposing a small number of API functions in module_a/include/module_a.h
  2. module_a/src/module_a.c contains those function with external linkage, but also extra helper functions with internal linkage (ie. static functions)
  3. we want to unit test the helper functions, because they're sufficiently complex to warrant it

I mention DSP and embedded as a context because it relates to point #2. It seems like one solution to this would be to have the helper functions have external linkage, and have separate private header files for them (or just put them in a separate module). Unit tests could then use that private module/header's API.

But this means that the functions have to appear in the symbol table, and can't be inlined, or can be inlined but need to be duplicated as non-inlined versions, even though they're not meant to be public. It's bloat and pessimisation we want to avoid for DSP/embedded applications.

Another solution that avoids this is to consider that the unit tests for that module form a kind of single-TU project (or subproject), where only the main function (or top-level test runner function) is exposed, and it has the same set of static "helper" functions as the real project. And so your unit tests include the implementation file (module_a/src/module_a.c):

// module_a/test/test_module_a.c
#include "unit_test_harness_whatever.h"
#include "test_module_a.h"
// Note include of implementation file!
#include "module_a/src/module_a.c"

static void test_module_a_helper_x_but_not_y(void) { /* ... */ }

static void test_module_a_helper_annoying_corner_case(void) { /* ... */ }

void test_module_a_all(void) {
    RUN_TEST(test_module_a_helper_x_but_not_y);
    RUN_TEST(test_module_a_helper_annoying_corner_case);
}

Yes it's niche, yes it looks jarring and unidiomatic at first, but it's valid, it makes sense and works in these contexts, and I've seen it in real, professional, high-quality C code bases. (It also tests that the implementation file is appropriately self-contained!)

There are other solutions to this as well, eg. #define your linkage keyword to be static for library builds but empty for unit test builds. But that has drawbacks, and the benefits of support from a tool that hadn't been invented yet couldn't really be factored in to the decision (and probably aren't motivating enough to do retroactively).

@puremourning
Copy link

Well yes. This fundamentally applies to any TU which is composed of multiple files (including headers, c, cpp, inc, etc). Clangd models TU=file and as the others have said it's a big refactor to fix it.

@detly
Copy link

detly commented Nov 22, 2023

Well yes. This fundamentally applies to any TU which is composed of multiple files

You're not wrong, I just wanted to provide another data point to help avoid some kind of unity-build-specific solution (I have no idea how that would arise, but you never know).

@sliedes
Copy link

sliedes commented May 8, 2024

  • clangd relies heavily on the preamble optimization: splitting the file in two parts which can be built separately. This is essential for performance (10-1000x speedup on common operations depending on the codebase) and greatly complicates the compilation model. Cross-file preambles will complicate it further.

Ok, I'm trying to understand the problem a bit more deeply. Is this a fair restatement of what you said above?:

Let's say we have CPython, which is a bit of a mess of non-self-contained headers, but basically the external interface is that anyone using it is supposed to #include <Python.h>. Probably that header is approximately the only header in the project that can be assumed to be self-contained.

While it would be possible to devise a mechanism to allow the user to say "do not even try to parse this header as a TU, parse Python.h instead"; but this would necessarily result in abysmal performance at least when someone edits one of those headers, as most of the entire Python.h needs to be reparsed. (For users, of course, I'd assume that they don't edit the headers often. For CPython developers, they might.)

Or would it even affect performance for people just using <Python.h> without going deeper into the private headers?

  • preamble building is a part of clang, it's complicated and entangled with other features, none of us understand it well enough that making changes is straightforward.

  • clangd relies in many places on clang's notion of the main file, untangling the concepts of "interesting file" and "entrypoint" is a lot of work, and we don't have great ideas for isolating the complexity

Is this more of a problem of "where do we get that information"? I noticed that there's support for IWYU pragmas for specifying things like "this header is private; if you want its symbols, always include header X instead". That seems very much the same kind of configuration data that would be needed by this, but I think it's currently only used for flagging non-direct includes and automatically adding includes.

Or is it something more nuanced?

  • (dons flame-retardant suit) Self-contained files are good software engineering practice (and don't prevent unity builds AFAIK). Most clangd developers are paid by employers who enforce this practice - in part to make tools easier to write. We do spend lots of time on things to make clangd more well-rounded in general even outside what $employer cares about directly, but it affects prioritization for sure, especially for features like this that are probably many months of work.

No arguments here. I'm a big enough fan of self-contained headers and even C++ modules that I'd generally be willing to spend a non-trivial amount of time to clean up things. With projects like CPython, I suspect selling a major header file refactoring to CPython upstream might be an uphill battle, even if they see the value in good IDE support...

@sliedes
Copy link

sliedes commented May 8, 2024

FWIW my best guess is:

  • we'd want to build a preamble for the entrypoint file and then parse everything from there. This involves eliminating from many places the assumptions that a) the preamble is part of the file we care about, b) the file we care about is the "main file" as far as SourceManager is concerned.

  • a significant challenge with this solution is the performance where the fragment file has #includes which are not part of the entrypoint file's preamble

  • the include graph in the background index or the includer cache in TUScheduler are a reasonable initial step for identifying entrypoints but we'll eventually want something better built by fast include-scanning.

I now see that some of my previous comment's questions were addressed in this comment, which I somehow missed on the first skim... Sorry about the noise!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.