Multi project (workspaceFolders) support #1549

ghentschke · 2023-03-21T10:30:36Z

Hi,
we are developing a LSP based C/C++ Editor for the Eclipse CDT platform. In Eclipse CDT it's common practise to work on several C/C++ projects simultaneously.

At this point we come into trouble with the current clangd design, which is, when I understand the comments here right, based on the assumption that a connected editor works with one project at the moment.

Now you could argue why don't you use one LS for the whole workspace, since clangd searches and uses the individual compile_commands.json file in the source directory path to determine the CDB. But this has several drawbacks:

Indexing problems occurs when functions with same signature has been defined in separate projects (see this Issue #38)
Determining the CDB for external header files comes into troubles when there are several project roots. (see discussion in this Issue #907) : The "mitigation" below does not work for more than one project in the ws, because clangd infers the CDB from the first opened project source file. This could be a different project in the workspace.

However, during a recent Discord discussion I realized that if CompilationDatabase is specified in your project's .clangd file, it's also scoped to the project's directory, so it will not take effect for headers outside the project directory.

Worth noting this is somewhat mitigated by the TUScheduler::HeaderIncluderCache:

if you've had project/bar.cc open this session (and it has a compile command from a CDB)
and you open <some/header.h> which is transitively included by bar.cc
and we don't find a trustworthy command for header.h
then we'll transfer the bar.cc flags for parsing header.h
This is similar to how headers are typically treated in compile_commands.json, but works even if the header is not from the same project. It's not perfect but it doesn't require manual configuration and still works when multiple CDBs are in play.

A solution to determine the correct CDB for an external header file would be to fetch it from the origin file. This implies that clangd has the knowledge of hyperlinks from a source to an external header file, and from header to header. I'am not sure if the server can do that. If so, then the CDB from the origin file should be used. Otherwise this logic could be implemented in the client.

Are there any activities to support LSP workspace folders?

HighCommander4 · 2023-03-22T07:30:42Z

FWIW, I think that running a single clangd instance for the entire workspace would be the approach that corresponds more closely with Eclipse CDT's current behaviour.

The main reason is that this setup supports cross-project references, e.g. a function defined in one project (say a library) with call sites in another project (say an application). Even if the indexes are stored per project (next to each compile_commands.json), having a single clangd instance load all the indexes means that the symbols are cross-referenced in a single data structure. By contrast, one clangd instance per project means each index is loaded in a separate process and they know nothing about each other.

I appreciate that symbols with the same name in different projects pose a challenge, however I don't think this would be a new problem for CDT -- the current indexer struggles with this too (see for example bug 337583 and its numerous duplicates and linked bugs). Furthermore, I believe there is a path forward to improving the handling of multiple definitions in clangd, as described in #1101 (comment).

A solution to determine the correct CDB for an external header file would be to fetch it from the origin file. This implies that clangd has the knowledge of hyperlinks from a source to an external header file, and from header to header. I'am not sure if the server can do that. If so, then the CDB from the origin file should be used.

Clangd should be able to gather information about inclusion relationships during indexing (at least, for every file transitively included by some source file listed in compile_commands.json), and use this to choose a more accurate source file to infer the compile command from when opening a header. We have #123 open to track this, and it's something I hope to work on as time permits.

Are there any activities to support LSP workspace folders?

I'm not aware of anyone actively working on this, but as a feature specified in LSP, I think we are definitely open to contributions in this area.

I do see the potential for workspaceFolders to provide additional value on top of the improvements described above. One example that comes to mind that is that clangd currently loads compile_commands.json files (and therefore indexes) "lazily", i.e. upon opening the first source file in the relevant directory tree; in a workspace with multiple indexes this can mean e.g. go-to-definition giving a different result depending on whether the index containing the function definition has been loaded. workspaceFolders would allow us to discover and load all indexes up-front, resulting in more consistent behaviour.

ghentschke · 2023-03-23T12:02:42Z

one clangd instance per project means each index is loaded in a separate process and they know nothing about each other.

Why should they know each other? If a project A uses files from project B, this should be stated in the the compile_commands.json of project A.

A single clangd instance would be interesting when workspaceFolders are supported.

I think the use case that a header file has been opened by a hyperlink in the edited file is very common. Therefor clangd needs more context information. If clangd knows the origin, then their compile commands should be used.

Is it correct that the LSP provides no information how a file has been opened? So the LS cannot determine if it has been opened by a hyperlink nor the origin?

Edit: I checked the LSP documentation for linking. It seems that the server has no knowledge about the origin. The server is only informed about opened files.

HighCommander4 · 2023-03-24T05:10:03Z

one clangd instance per project means each index is loaded in a separate process and they know nothing about each other.

Why should they know each other?

My recollection is that when searching for occurrences of a symbol, Eclipse CDT would return results across all projects in the workspace.

If a project A uses files from project B, this should be stated in the the compile_commands.json of project A.

The compile_commands.json of project A will have include paths that point to the headers of project B, and thereby allow you to navigate to e.g. the declaration of a function in those headers. However, you would not be able to go from a use in project A to the definition of the function in project B (nor find references in project A starting from the definition in project B) without having some sort of merged view of the two indexes.

I think the use case that a header file has been opened by a hyperlink in the edited file is very common. Therefor clangd needs more context information. If clangd knows the origin, then their compile commands should be used.

Is it correct that the LSP provides no information how a file has been opened? So the LS cannot determine if it has been opened by a hyperlink nor the origin?

Edit: I checked the LSP documentation for linking. It seems that the server has no knowledge about the origin. The server is only informed about opened files.

Clangd implements a heuristic today where it keeps track of the include graphs of all open files, and uses that to choose a compile command for headers included transitively from an open file.

In the case of hyperlink navigation, the file containing the hyperlink is necessarily open, so this heuristic handles the scenario fairly well.

Concretely, if you're in source file A and you're following an inclusion of a header H, then:

If A is the only open file which includes H, clangd will use A's compile command.
If A is one of several open files which includes H, clangd will use the compile command of one those open files, but not necessarily A.
- This is still more accurate than choosing an arbitrary source file which may not include H at all, since you're going to get some compile command with which H is actually parsed.
- The only case in which this is less accurate than knowing that A is the orignating file, is if A and another open file that includes H have different compile commands, and the difference is relevant to H (e.g. they take different branches of a preprocessor conditional in H due to a different -D flag) -- then it could be that even if you open H from A, you get a different "view" of H than the one seen by A.

ghentschke · 2023-03-24T06:04:12Z

@HighCommander4 Thank you for the fast reply!
I'am a little confused. You said that:

The compile_commands.json of project A will have include paths that point to the headers of project B

I thought that the compile_commands.json has no entries for header files?

The only case in which this is less accurate than knowing that A is the orignating file, is if A and another open file that includes H have different compile commands, and the difference is relevant to H (e.g. they take different branches of a preprocessor conditional in H due to a different -D flag) -- then it could be that even if you open H from A, you get a different "view" of H than the one seen by A.

I assume that this case can be improved when working with workspaceFolders: the heuristic should use the opened files from the current workspace folder only to determine the compile commands.

HighCommander4 · 2023-03-24T06:15:34Z

@HighCommander4 Thank you for the fast reply! I'am a little confused. You said that:

The compile_commands.json of project A will have include paths that point to the headers of project B

I thought that the compile_commands.json has no entries for header files?

I'm not referring to entries for header files, but rather flags in the compile commands for project A source files pointing to project B include directories like so:

{
  "directory": "/workspace/projectA",
  "file": "/workspace/projectA/file1.cpp",
  "command": "/usr/bin/gcc -I/workspace/projectB/include -c /workspace/projectA/file1.cpp"
}

This would allow, even in a clangd-instance-per-project setup, for project A's clangd instance to open a header file located in /workspace/projectB/include containing the declaration of a function called from file1.cpp.

However, opening the definition of such a function inside project B would only work in a setup where the two projects share a clangd instance.

The only case in which this is less accurate than knowing that A is the orignating file, is if A and another open file that includes H have different compile commands, and the difference is relevant to H (e.g. they take different branches of a preprocessor conditional in H due to a different -D flag) -- then it could be that even if you open H from A, you get a different "view" of H than the one seen by A.

I assume that this case can be improved when working with workspaceFolders: the heuristic should use the opened files from the current workspace folder only to determine the compile commands.

Somewhat improved perhaps, but compile commands can also differ between files in the same project.

We may be able to offer a heuristic solution where we choose the source file for which we received the most recent LSP message, which is very likely to be the one you're navigating from.

ghentschke added the enhancement New feature or request label Mar 21, 2023

ghentschke changed the title ~~Multi project support~~ Multi project (workpaceFolder) support Mar 21, 2023

ghentschke changed the title ~~Multi project (workpaceFolder) support~~ Multi project (workspaceFolders) support Mar 21, 2023

ghentschke mentioned this issue Mar 22, 2023

Add hyperlinked documents to origin LS eclipse/lsp4e#555

Closed

ghentschke mentioned this issue Mar 24, 2023

The LS shall examine external file includes by using the projects compile settings eclipse-cdt/cdt-lsp#22

Closed

j-lakeman mentioned this issue May 6, 2023

Restarting language server gives ENOENT error clangd/vscode-clangd#454

Open

tortmayr mentioned this issue Jul 5, 2023

Improve multi-project support in a single workspace clangd/vscode-clangd#498

Open

GeorgeLyon mentioned this issue Oct 20, 2023

Add support for a compile database file not at the root of a workspace apple/sourcekit-lsp#915

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi project (workspaceFolders) support #1549

Multi project (workspaceFolders) support #1549

ghentschke commented Mar 21, 2023 •

edited

HighCommander4 commented Mar 22, 2023

ghentschke commented Mar 23, 2023 •

edited

HighCommander4 commented Mar 24, 2023

ghentschke commented Mar 24, 2023

HighCommander4 commented Mar 24, 2023

Multi project (workspaceFolders) support #1549

Multi project (workspaceFolders) support #1549

Comments

ghentschke commented Mar 21, 2023 • edited

HighCommander4 commented Mar 22, 2023

ghentschke commented Mar 23, 2023 • edited

HighCommander4 commented Mar 24, 2023

ghentschke commented Mar 24, 2023

HighCommander4 commented Mar 24, 2023

ghentschke commented Mar 21, 2023 •

edited

ghentschke commented Mar 23, 2023 •

edited