Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi project (workspaceFolders) support #1549

Open
ghentschke opened this issue Mar 21, 2023 · 5 comments
Open

Multi project (workspaceFolders) support #1549

ghentschke opened this issue Mar 21, 2023 · 5 comments
Labels
enhancement New feature or request

Comments

@ghentschke
Copy link

ghentschke commented Mar 21, 2023

Hi,
we are developing a LSP based C/C++ Editor for the Eclipse CDT platform. In Eclipse CDT it's common practise to work on several C/C++ projects simultaneously.

At this point we come into trouble with the current clangd design, which is, when I understand the comments here right, based on the assumption that a connected editor works with one project at the moment.

Now you could argue why don't you use one LS for the whole workspace, since clangd searches and uses the individual compile_commands.json file in the source directory path to determine the CDB. But this has several drawbacks:

  • Indexing problems occurs when functions with same signature has been defined in separate projects (see this Issue #38)
  • Determining the CDB for external header files comes into troubles when there are several project roots. (see discussion in this Issue #907) : The "mitigation" below does not work for more than one project in the ws, because clangd infers the CDB from the first opened project source file. This could be a different project in the workspace.

However, during a recent Discord discussion I realized that if CompilationDatabase is specified in your project's .clangd file, it's also scoped to the project's directory, so it will not take effect for headers outside the project directory.

Worth noting this is somewhat mitigated by the TUScheduler::HeaderIncluderCache:

if you've had project/bar.cc open this session (and it has a compile command from a CDB)
and you open <some/header.h> which is transitively included by bar.cc
and we don't find a trustworthy command for header.h
then we'll transfer the bar.cc flags for parsing header.h
This is similar to how headers are typically treated in compile_commands.json, but works even if the header is not from the same project. It's not perfect but it doesn't require manual configuration and still works when multiple CDBs are in play.

A solution to determine the correct CDB for an external header file would be to fetch it from the origin file. This implies that clangd has the knowledge of hyperlinks from a source to an external header file, and from header to header. I'am not sure if the server can do that. If so, then the CDB from the origin file should be used. Otherwise this logic could be implemented in the client.

Are there any activities to support LSP workspace folders?

@ghentschke ghentschke added the enhancement New feature or request label Mar 21, 2023
@ghentschke ghentschke changed the title Multi project support Multi project (workpaceFolder) support Mar 21, 2023
@ghentschke ghentschke changed the title Multi project (workpaceFolder) support Multi project (workspaceFolders) support Mar 21, 2023
@HighCommander4
Copy link

FWIW, I think that running a single clangd instance for the entire workspace would be the approach that corresponds more closely with Eclipse CDT's current behaviour.

The main reason is that this setup supports cross-project references, e.g. a function defined in one project (say a library) with call sites in another project (say an application). Even if the indexes are stored per project (next to each compile_commands.json), having a single clangd instance load all the indexes means that the symbols are cross-referenced in a single data structure. By contrast, one clangd instance per project means each index is loaded in a separate process and they know nothing about each other.

I appreciate that symbols with the same name in different projects pose a challenge, however I don't think this would be a new problem for CDT -- the current indexer struggles with this too (see for example bug 337583 and its numerous duplicates and linked bugs). Furthermore, I believe there is a path forward to improving the handling of multiple definitions in clangd, as described in #1101 (comment).

A solution to determine the correct CDB for an external header file would be to fetch it from the origin file. This implies that clangd has the knowledge of hyperlinks from a source to an external header file, and from header to header. I'am not sure if the server can do that. If so, then the CDB from the origin file should be used.

Clangd should be able to gather information about inclusion relationships during indexing (at least, for every file transitively included by some source file listed in compile_commands.json), and use this to choose a more accurate source file to infer the compile command from when opening a header. We have #123 open to track this, and it's something I hope to work on as time permits.

Are there any activities to support LSP workspace folders?

I'm not aware of anyone actively working on this, but as a feature specified in LSP, I think we are definitely open to contributions in this area.

I do see the potential for workspaceFolders to provide additional value on top of the improvements described above. One example that comes to mind that is that clangd currently loads compile_commands.json files (and therefore indexes) "lazily", i.e. upon opening the first source file in the relevant directory tree; in a workspace with multiple indexes this can mean e.g. go-to-definition giving a different result depending on whether the index containing the function definition has been loaded. workspaceFolders would allow us to discover and load all indexes up-front, resulting in more consistent behaviour.

@ghentschke
Copy link
Author

ghentschke commented Mar 23, 2023

one clangd instance per project means each index is loaded in a separate process and they know nothing about each other.

Why should they know each other? If a project A uses files from project B, this should be stated in the the compile_commands.json of project A.

A single clangd instance would be interesting when workspaceFolders are supported.

I think the use case that a header file has been opened by a hyperlink in the edited file is very common. Therefor clangd needs more context information. If clangd knows the origin, then their compile commands should be used.

Is it correct that the LSP provides no information how a file has been opened? So the LS cannot determine if it has been opened by a hyperlink nor the origin?

Edit: I checked the LSP documentation for linking. It seems that the server has no knowledge about the origin. The server is only informed about opened files.

@HighCommander4
Copy link

one clangd instance per project means each index is loaded in a separate process and they know nothing about each other.

Why should they know each other?

My recollection is that when searching for occurrences of a symbol, Eclipse CDT would return results across all projects in the workspace.

If a project A uses files from project B, this should be stated in the the compile_commands.json of project A.

The compile_commands.json of project A will have include paths that point to the headers of project B, and thereby allow you to navigate to e.g. the declaration of a function in those headers. However, you would not be able to go from a use in project A to the definition of the function in project B (nor find references in project A starting from the definition in project B) without having some sort of merged view of the two indexes.

I think the use case that a header file has been opened by a hyperlink in the edited file is very common. Therefor clangd needs more context information. If clangd knows the origin, then their compile commands should be used.

Is it correct that the LSP provides no information how a file has been opened? So the LS cannot determine if it has been opened by a hyperlink nor the origin?

Edit: I checked the LSP documentation for linking. It seems that the server has no knowledge about the origin. The server is only informed about opened files.

Clangd implements a heuristic today where it keeps track of the include graphs of all open files, and uses that to choose a compile command for headers included transitively from an open file.

In the case of hyperlink navigation, the file containing the hyperlink is necessarily open, so this heuristic handles the scenario fairly well.

Concretely, if you're in source file A and you're following an inclusion of a header H, then:

  • If A is the only open file which includes H, clangd will use A's compile command.
  • If A is one of several open files which includes H, clangd will use the compile command of one those open files, but not necessarily A.
    • This is still more accurate than choosing an arbitrary source file which may not include H at all, since you're going to get some compile command with which H is actually parsed.
    • The only case in which this is less accurate than knowing that A is the orignating file, is if A and another open file that includes H have different compile commands, and the difference is relevant to H (e.g. they take different branches of a preprocessor conditional in H due to a different -D flag) -- then it could be that even if you open H from A, you get a different "view" of H than the one seen by A.

@ghentschke
Copy link
Author

@HighCommander4 Thank you for the fast reply!
I'am a little confused. You said that:

The compile_commands.json of project A will have include paths that point to the headers of project B

I thought that the compile_commands.json has no entries for header files?

The only case in which this is less accurate than knowing that A is the orignating file, is if A and another open file that includes H have different compile commands, and the difference is relevant to H (e.g. they take different branches of a preprocessor conditional in H due to a different -D flag) -- then it could be that even if you open H from A, you get a different "view" of H than the one seen by A.

I assume that this case can be improved when working with workspaceFolders: the heuristic should use the opened files from the current workspace folder only to determine the compile commands.

@HighCommander4
Copy link

@HighCommander4 Thank you for the fast reply! I'am a little confused. You said that:

The compile_commands.json of project A will have include paths that point to the headers of project B

I thought that the compile_commands.json has no entries for header files?

I'm not referring to entries for header files, but rather flags in the compile commands for project A source files pointing to project B include directories like so:

{
  "directory": "/workspace/projectA",
  "file": "/workspace/projectA/file1.cpp",
  "command": "/usr/bin/gcc -I/workspace/projectB/include -c /workspace/projectA/file1.cpp"
}

This would allow, even in a clangd-instance-per-project setup, for project A's clangd instance to open a header file located in /workspace/projectB/include containing the declaration of a function called from file1.cpp.

However, opening the definition of such a function inside project B would only work in a setup where the two projects share a clangd instance.

The only case in which this is less accurate than knowing that A is the orignating file, is if A and another open file that includes H have different compile commands, and the difference is relevant to H (e.g. they take different branches of a preprocessor conditional in H due to a different -D flag) -- then it could be that even if you open H from A, you get a different "view" of H than the one seen by A.

I assume that this case can be improved when working with workspaceFolders: the heuristic should use the opened files from the current workspace folder only to determine the compile commands.

Somewhat improved perhaps, but compile commands can also differ between files in the same project.

We may be able to offer a heuristic solution where we choose the source file for which we received the most recent LSP message, which is very likely to be the one you're navigating from.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants