New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More granular symbol kinds for semantic highlighting #239

Closed
HighCommander4 opened this Issue Jan 6, 2018 · 12 comments

Comments

Projects
None yet
3 participants
@HighCommander4
Contributor

HighCommander4 commented Jan 6, 2018

I'm coming to cquery from Eclipse CDT which, while it doesn't support "rainbow semantic highlighting", does have more granular symbol kinds.

  • For variables, in addition to distinguishing between member and freestanding, it distinguishes between local, parameter, and namespace-scope variables.
  • For member functions and variables, it distinguishes between static and non-static.
  • For variables and functions, it distinguishes between declarations of the variable/function (including definitions) and references to it.
  • For types, it distinguishes between classes, enumerations, and typedefs.
  • There are highlightings for enumerators, template parameters, and namespaces.

Would there be interest in bringing some or all of these more granular categories to cquery?

@MaskRay

This comment has been minimized.

Show comment
Hide comment
@MaskRay

MaskRay Jan 6, 2018

Member

src/indexer.c:364

enum class VarClass {
  // probably a variable in system headers
  Unknown = 0,
  // a parameter or function variable
  Local = 1,
  // a macro, ie, #define FOO
  Macro = 2,
  // a global variable
  Global = 3,
  // a member variable of struct/union/class/enum
  Member = 4
};
MAKE_REFLECT_TYPE_PROXY(VarClass, std::underlying_type<VarClass>::type);

They can definitely be extended.

Tagging Index{Type,Func,Var} with more information from fine-grained CXCursorKind and the following clang-c APIs may be useful

CINDEX_LINKAGE enum CXLinkageKind clang_getCursorLinkage(CXCursor cursor);
CINDEX_LINKAGE enum CXVisibilityKind clang_getCursorVisibility(CXCursor cursor);

BTW, I would like to the design of Eclipse CDT PDOM https://github.com/eclipse/cdt/tree/master/core/org.eclipse.cdt.core/parser/org/eclipse/cdt/internal/core/pdom There is apparently very few articles about it https://wiki.eclipse.org/CDT/designs/PDOM/Overview

Member

MaskRay commented Jan 6, 2018

src/indexer.c:364

enum class VarClass {
  // probably a variable in system headers
  Unknown = 0,
  // a parameter or function variable
  Local = 1,
  // a macro, ie, #define FOO
  Macro = 2,
  // a global variable
  Global = 3,
  // a member variable of struct/union/class/enum
  Member = 4
};
MAKE_REFLECT_TYPE_PROXY(VarClass, std::underlying_type<VarClass>::type);

They can definitely be extended.

Tagging Index{Type,Func,Var} with more information from fine-grained CXCursorKind and the following clang-c APIs may be useful

CINDEX_LINKAGE enum CXLinkageKind clang_getCursorLinkage(CXCursor cursor);
CINDEX_LINKAGE enum CXVisibilityKind clang_getCursorVisibility(CXCursor cursor);

BTW, I would like to the design of Eclipse CDT PDOM https://github.com/eclipse/cdt/tree/master/core/org.eclipse.cdt.core/parser/org/eclipse/cdt/internal/core/pdom There is apparently very few articles about it https://wiki.eclipse.org/CDT/designs/PDOM/Overview

@HighCommander4

This comment has been minimized.

Show comment
Hide comment
@HighCommander4

HighCommander4 Jan 6, 2018

Contributor

BTW, I would like to the design of Eclipse CDT PDOM https://github.com/eclipse/cdt/tree/master/core/org.eclipse.cdt.core/parser/org/eclipse/cdt/internal/core/pdom There is apparently very few articles about it https://wiki.eclipse.org/CDT/designs/PDOM/Overview

Yeah, there isn't much documentation on it. I fixed some bugs in it, slowly gaining understanding of it by reading code and looking at previous changes.

Some high-level notes on it:

  • The storage format is a flat buffer which is stored on disk, and pages of it cached into memory.
  • The buffer has a malloc-like allocator interface to allocate arbitrary-size chunks of it.
  • The things which are allocated are mostly "records", which are fixed-sized structures (fixed size for each record type).
  • There are record types corresponding to many kinds of entities in the program, including variables, functions, and classes (and more specific kinds of these, forming an inheritance-like hierarchy, like "function template" and "member function template").
  • Records refer to each other by "pointer", which is to say an offset of bytes from the beginning of the database.
  • Records are linked in semantic ways. For example, the record for a variable refers to the record for the variable's type.
  • The PDOM is populated by walking the AST, and constructing PDOM records correspondings to the entities in the AST. There is a fair bit of duplication between the AST and PDOM representation, which is one of the things I didn't like about this codebase; it made adding new things fairly tedious.

If you have more specific questions about it, feel free to ask and I will try to answer.

Contributor

HighCommander4 commented Jan 6, 2018

BTW, I would like to the design of Eclipse CDT PDOM https://github.com/eclipse/cdt/tree/master/core/org.eclipse.cdt.core/parser/org/eclipse/cdt/internal/core/pdom There is apparently very few articles about it https://wiki.eclipse.org/CDT/designs/PDOM/Overview

Yeah, there isn't much documentation on it. I fixed some bugs in it, slowly gaining understanding of it by reading code and looking at previous changes.

Some high-level notes on it:

  • The storage format is a flat buffer which is stored on disk, and pages of it cached into memory.
  • The buffer has a malloc-like allocator interface to allocate arbitrary-size chunks of it.
  • The things which are allocated are mostly "records", which are fixed-sized structures (fixed size for each record type).
  • There are record types corresponding to many kinds of entities in the program, including variables, functions, and classes (and more specific kinds of these, forming an inheritance-like hierarchy, like "function template" and "member function template").
  • Records refer to each other by "pointer", which is to say an offset of bytes from the beginning of the database.
  • Records are linked in semantic ways. For example, the record for a variable refers to the record for the variable's type.
  • The PDOM is populated by walking the AST, and constructing PDOM records correspondings to the entities in the AST. There is a fair bit of duplication between the AST and PDOM representation, which is one of the things I didn't like about this codebase; it made adding new things fairly tedious.

If you have more specific questions about it, feel free to ask and I will try to answer.

@MaskRay

This comment has been minimized.

Show comment
Hide comment
@MaskRay

MaskRay Jan 6, 2018

Member

Thanks for taking time to write this! I think @khng300 would probably be intested. He mentioned MarkZ3 is developing clangd in his own branch with this model. If this model would ever get done also in cquery, I think we could use lmdb instead of rolling our own. I'm currently thinking about two issues which bother me more: memory footprint and cacheDirectory space consumption.

For your issue, I'll expand current VarClass to clang SymbolKind.

If you'd like to know how cquery indexers work and implement this feature.
Adding volatile static int z=0;while(!z)asm("pause"); (as a poor man's breakpoint) is a trick I often use at the entry points of OnIndexReference and OnIndexDeclaration. Then I can attach it with cgdb -p $(pgrep -fn debug/bin/cquery) (https://github.com/cgdb/cgdb )

These two functions are useful to evaluate with gdb p commands to understand the spelling range and extent of a Clang cursor:

Range loc = cursor.get_spelling_range();
Range extent = cursor.get_extent();
Member

MaskRay commented Jan 6, 2018

Thanks for taking time to write this! I think @khng300 would probably be intested. He mentioned MarkZ3 is developing clangd in his own branch with this model. If this model would ever get done also in cquery, I think we could use lmdb instead of rolling our own. I'm currently thinking about two issues which bother me more: memory footprint and cacheDirectory space consumption.

For your issue, I'll expand current VarClass to clang SymbolKind.

If you'd like to know how cquery indexers work and implement this feature.
Adding volatile static int z=0;while(!z)asm("pause"); (as a poor man's breakpoint) is a trick I often use at the entry points of OnIndexReference and OnIndexDeclaration. Then I can attach it with cgdb -p $(pgrep -fn debug/bin/cquery) (https://github.com/cgdb/cgdb )

These two functions are useful to evaluate with gdb p commands to understand the spelling range and extent of a Clang cursor:

Range loc = cursor.get_spelling_range();
Range extent = cursor.get_extent();
@MaskRay

This comment has been minimized.

Show comment
Hide comment
@MaskRay

MaskRay Jan 7, 2018

Member

Added fine-grained ClangSymbolKind in commit 7e6f0ac
kind will only be sent if serializer.cc is changed accordingly to serialize kind, at the cost of larger cache files.

template <typename TVisitor>
void Reflect(TVisitor& visitor, IndexType& value) {
  REFLECT_MEMBER_START(16);   // 16 -> 17 , and add kind
  REFLECT_MEMBER2("id", value.id);
  REFLECT_MEMBER2("usr", value.usr);
  REFLECT_MEMBER2("short_name", value.def.short_name);
  REFLECT_MEMBER2("detailed_name", value.def.detailed_name);
  REFLECT_MEMBER2("hover", value.def.hover);
  REFLECT_MEMBER2("comments", value.def.comments);
  REFLECT_MEMBER2("definition_spelling", value.def.definition_spelling);
Member

MaskRay commented Jan 7, 2018

Added fine-grained ClangSymbolKind in commit 7e6f0ac
kind will only be sent if serializer.cc is changed accordingly to serialize kind, at the cost of larger cache files.

template <typename TVisitor>
void Reflect(TVisitor& visitor, IndexType& value) {
  REFLECT_MEMBER_START(16);   // 16 -> 17 , and add kind
  REFLECT_MEMBER2("id", value.id);
  REFLECT_MEMBER2("usr", value.usr);
  REFLECT_MEMBER2("short_name", value.def.short_name);
  REFLECT_MEMBER2("detailed_name", value.def.detailed_name);
  REFLECT_MEMBER2("hover", value.def.hover);
  REFLECT_MEMBER2("comments", value.def.comments);
  REFLECT_MEMBER2("definition_spelling", value.def.definition_spelling);
@jacobdufault

This comment has been minimized.

Show comment
Hide comment
@jacobdufault

jacobdufault Jan 7, 2018

Member

kind will only be sent if serializer.cc is changed accordingly to serialize kind, at the cost of larger cache files.

We should probably be serializing kind especially if it is used for semantic highlighting.

Member

jacobdufault commented Jan 7, 2018

kind will only be sent if serializer.cc is changed accordingly to serialize kind, at the cost of larger cache files.

We should probably be serializing kind especially if it is used for semantic highlighting.

@MaskRay

This comment has been minimized.

Show comment
Hide comment
@MaskRay

MaskRay Jan 7, 2018

Member

The support on the server's side is added in 7e6f0ac
1731f1d

VSCode extension needs to incorporate these changes. I am not a fan of VSCode and the Emacs extension uses overlays which have severe performance issues so I cannot help.

Member

MaskRay commented Jan 7, 2018

The support on the server's side is added in 7e6f0ac
1731f1d

VSCode extension needs to incorporate these changes. I am not a fan of VSCode and the Emacs extension uses overlays which have severe performance issues so I cannot help.

@MaskRay

This comment has been minimized.

Show comment
Hide comment
@MaskRay

MaskRay Jan 16, 2018

Member

Experimental fine-grained semantic highlighting in Emacs https://github.com/cquery-project/emacs-cquery/

(cquery-use-default-rainbow-sem-highlight)
Member

MaskRay commented Jan 16, 2018

Experimental fine-grained semantic highlighting in Emacs https://github.com/cquery-project/emacs-cquery/

(cquery-use-default-rainbow-sem-highlight)
@HighCommander4

This comment has been minimized.

Show comment
Hide comment
@HighCommander4

HighCommander4 Jan 24, 2018

Contributor

VSCode extension needs to incorporate these changes.

Done in #345.

Contributor

HighCommander4 commented Jan 24, 2018

VSCode extension needs to incorporate these changes.

Done in #345.

@HighCommander4

This comment has been minimized.

Show comment
Hide comment
@HighCommander4

HighCommander4 Jan 25, 2018

Contributor

@MaskRay, I am curious, for the semantic symbol kinds, why did you make ClangSymbolKind mirror the SymbolKind enum from clang's C++ API (clang/Index/IndexSymbol.h), when cquery's source for the information is the CXIdxEntityKind enum from the libclang API (clang-c/Index.h)?

Is it necessary for ClangSymbolKind to continue to mirror clang's SymbolKind? For example, for #344 we would need to add a TemplateParameter enumerator to ClangSymbolKind, making it not mirror clang's SymbolKind any more. Is that OK?

Contributor

HighCommander4 commented Jan 25, 2018

@MaskRay, I am curious, for the semantic symbol kinds, why did you make ClangSymbolKind mirror the SymbolKind enum from clang's C++ API (clang/Index/IndexSymbol.h), when cquery's source for the information is the CXIdxEntityKind enum from the libclang API (clang-c/Index.h)?

Is it necessary for ClangSymbolKind to continue to mirror clang's SymbolKind? For example, for #344 we would need to add a TemplateParameter enumerator to ClangSymbolKind, making it not mirror clang's SymbolKind any more. Is that OK?

@MaskRay

This comment has been minimized.

Show comment
Hide comment
@MaskRay

MaskRay Jan 26, 2018

Member

There is no Parameter,Macro in CXIdxEntityKind.

Member

MaskRay commented Jan 26, 2018

There is no Parameter,Macro in CXIdxEntityKind.

@HighCommander4

This comment has been minimized.

Show comment
Hide comment
@HighCommander4

HighCommander4 Jan 26, 2018

Contributor

I think we can close this issue. The few things that remain are tracked in their own issues (template parameters #344, local vs. global variables #352).

Contributor

HighCommander4 commented Jan 26, 2018

I think we can close this issue. The few things that remain are tracked in their own issues (template parameters #344, local vs. global variables #352).

@HighCommander4

This comment has been minimized.

Show comment
Hide comment
@HighCommander4

HighCommander4 Jan 26, 2018

Contributor

There is no Parameter,Macro in CXIdxEntityKind.

Indeed. I'm not suggesting that we use CXIdxEntityKind, but rather our own enum (which can then contain new things we add like TemplateParameter).

Contributor

HighCommander4 commented Jan 26, 2018

There is no Parameter,Macro in CXIdxEntityKind.

Indeed. I'm not suggesting that we use CXIdxEntityKind, but rather our own enum (which can then contain new things we add like TemplateParameter).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment