New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve formatting of comments shown in hover pop-ups #187

Closed
wants to merge 3 commits into
base: master
from

Conversation

Projects
None yet
7 participants
@romix
Contributor

romix commented Dec 25, 2017

  • Remove different comment markers from the beginning and end of each comment line, perform some other kinds of preprocessing.
    - This is a very simplistic approach to parse comments. In the future, a more advanced real comment parser may need to be used.

  • Return two marked strings as a response to hover requests:

    • The first line is the formatted comment, which is reported as a markup string of type "text". This ensures that no C/C++ highlighting is used for the comment text.
      • The second line is the declaration/signature, which is reported as a markup string of type "c" or "c++".
@jacobdufault

This comment has been minimized.

Show comment
Hide comment
@jacobdufault

jacobdufault Dec 25, 2017

Member

Please make sure to write some unit tests for the format parser, for this type of thing they are extremely valuable.

Member

jacobdufault commented Dec 25, 2017

Please make sure to write some unit tests for the format parser, for this type of thing they are extremely valuable.

@jacobdufault

This comment has been minimized.

Show comment
Hide comment
@jacobdufault

jacobdufault Dec 25, 2017

Member

Btw comment parsing should happen in the indexer as well (instead of text_document_hover.cc) because the extractor is specific to how c++ comments are formatted.

Member

jacobdufault commented Dec 25, 2017

Btw comment parsing should happen in the indexer as well (instead of text_document_hover.cc) because the extractor is specific to how c++ comments are formatted.

@romix

This comment has been minimized.

Show comment
Hide comment
@romix

romix Dec 25, 2017

Contributor

@jacobdufault So, you mean we should parse format comments in the indexer so that text_document_hover.cc just uses already formatted comments stored in def.comments? Or do you mean something else?

Contributor

romix commented Dec 25, 2017

@jacobdufault So, you mean we should parse format comments in the indexer so that text_document_hover.cc just uses already formatted comments stored in def.comments? Or do you mean something else?

@jacobdufault

This comment has been minimized.

Show comment
Hide comment
@jacobdufault

jacobdufault Dec 25, 2017

Member

So, you mean we should parse format comments in the indexer so that text_document_hover.cc just uses already formatted comments stored in def.comments?

Yep.

Member

jacobdufault commented Dec 25, 2017

So, you mean we should parse format comments in the indexer so that text_document_hover.cc just uses already formatted comments stored in def.comments?

Yep.

@MaskRay

This comment has been minimized.

Show comment
Hide comment
@MaskRay

MaskRay Dec 25, 2017

Member

For this code snippet:

  /** A database transaction.
   *	Every operation requires a transaction handle.
   */
  U y;

CINDEX_LINKAGE CXString clang_Cursor_getRawCommentText(CXCursor C); extracts

/** A database transaction.
   *	Every operation requires a transaction handle.
   */

where the leading space is removed, thus the first line is actually un-indented but indentation of others is kept. We should find someway to un-indent other lines.

I'm still concerned whether it is a good idea to format the lines.

Member

MaskRay commented Dec 25, 2017

For this code snippet:

  /** A database transaction.
   *	Every operation requires a transaction handle.
   */
  U y;

CINDEX_LINKAGE CXString clang_Cursor_getRawCommentText(CXCursor C); extracts

/** A database transaction.
   *	Every operation requires a transaction handle.
   */

where the leading space is removed, thus the first line is actually un-indented but indentation of others is kept. We should find someway to un-indent other lines.

I'm still concerned whether it is a good idea to format the lines.

@romix

This comment has been minimized.

Show comment
Hide comment
@romix

romix Dec 25, 2017

Contributor

@MaskRay But this PR removes all the formatting altogether. In this case it will produce one long line. We can of course teach each about new paragraphs, etc.

My concern is that some comments like Doxygen comment use their special mark-up, which we do not handle yet.

There are also different approaches how to format/render comments. E.g. we may want to show sections like like:

  • brief description
  • detailed description
  • Parameters: ...
  • Exceptions: ...
  • Returns: ...

In ideal case, it would be nice to be able to click on the types and references inside this hover and go to their definitions....

But probably we should do it slowly, step-by-step. It will definitely require multiple PRs.

Contributor

romix commented Dec 25, 2017

@MaskRay But this PR removes all the formatting altogether. In this case it will produce one long line. We can of course teach each about new paragraphs, etc.

My concern is that some comments like Doxygen comment use their special mark-up, which we do not handle yet.

There are also different approaches how to format/render comments. E.g. we may want to show sections like like:

  • brief description
  • detailed description
  • Parameters: ...
  • Exceptions: ...
  • Returns: ...

In ideal case, it would be nice to be able to click on the types and references inside this hover and go to their definitions....

But probably we should do it slowly, step-by-step. It will definitely require multiple PRs.

@MaskRay

This comment has been minimized.

Show comment
Hide comment
@MaskRay

MaskRay Dec 25, 2017

Member

Unless we are open to clang C++ API, what we have are only these three functions:

/**
 * \brief Given a cursor that represents a declaration, return the associated
 * comment's source range.  The range may include multiple consecutive comments
 * with whitespace in between.
 */
CINDEX_LINKAGE CXSourceRange clang_Cursor_getCommentRange(CXCursor C);

/**
 * \brief Given a cursor that represents a declaration, return the associated
 * comment text, including comment markers.
 */
CINDEX_LINKAGE CXString clang_Cursor_getRawCommentText(CXCursor C);

/**
 * \brief Given a cursor that represents a documentable entity (e.g.,
 * declaration), return the associated \\brief paragraph; otherwise return the
 * first paragraph.
 */
CINDEX_LINKAGE CXString clang_Cursor_getBriefCommentText(CXCursor C);

For

//! first TextComment
//!
//! first line in second TextComment
//! second line in second TextComment
int t;

clang -std=c++11 -fsyntax-only -Xclang -ast-dump a.cc gives:

`-VarDecl 0x55ff65dfc928 <a.cc:5:1, col:5> col:5 t 'int'
  `-FullComment 0x55ff65dfcae0 <line:1:4, line:4:37>
    |-ParagraphComment 0x55ff65dfca40 <line:1:4, col:21>
    | `-TextComment 0x55ff65dfca10 <col:4, col:21> Text=" first TextComment"
    `-ParagraphComment 0x55ff65dfcab0 <line:3:4, line:4:37>
      |-TextComment 0x55ff65dfca60 <line:3:4, col:36> Text=" first line in second TextComment"
      `-TextComment 0x55ff65dfca80 <line:4:4, col:37> Text=" second line in second TextComment"

clang has idea of comment components but clang-c loses all of this detailed information. clang_Cursor_getRawCommentText is implemented by https://github.com/llvm-mirror/clang/blob/master/lib/AST/RawCommentList.cpp#L177 . I am playing with clang_Cursor_getCommentRange to see if that gives sufficient information to remove indentation in other lines.

Showing sections (Parameters, Exceptions, Returns...) is unlikely but i'd like to see what we can achieve with minimal effort.

Which editor does you use? For Emacs/Vim, I have reported the UI challenge to the language client plugins upstream (autozimu/LanguageClient-neovim#224 emacs-lsp/lsp-ui#17) sebastiencs has plan to add an interface for multi-line information and SpaceVim's author wsdjeg also plans to support LSP.

Member

MaskRay commented Dec 25, 2017

Unless we are open to clang C++ API, what we have are only these three functions:

/**
 * \brief Given a cursor that represents a declaration, return the associated
 * comment's source range.  The range may include multiple consecutive comments
 * with whitespace in between.
 */
CINDEX_LINKAGE CXSourceRange clang_Cursor_getCommentRange(CXCursor C);

/**
 * \brief Given a cursor that represents a declaration, return the associated
 * comment text, including comment markers.
 */
CINDEX_LINKAGE CXString clang_Cursor_getRawCommentText(CXCursor C);

/**
 * \brief Given a cursor that represents a documentable entity (e.g.,
 * declaration), return the associated \\brief paragraph; otherwise return the
 * first paragraph.
 */
CINDEX_LINKAGE CXString clang_Cursor_getBriefCommentText(CXCursor C);

For

//! first TextComment
//!
//! first line in second TextComment
//! second line in second TextComment
int t;

clang -std=c++11 -fsyntax-only -Xclang -ast-dump a.cc gives:

`-VarDecl 0x55ff65dfc928 <a.cc:5:1, col:5> col:5 t 'int'
  `-FullComment 0x55ff65dfcae0 <line:1:4, line:4:37>
    |-ParagraphComment 0x55ff65dfca40 <line:1:4, col:21>
    | `-TextComment 0x55ff65dfca10 <col:4, col:21> Text=" first TextComment"
    `-ParagraphComment 0x55ff65dfcab0 <line:3:4, line:4:37>
      |-TextComment 0x55ff65dfca60 <line:3:4, col:36> Text=" first line in second TextComment"
      `-TextComment 0x55ff65dfca80 <line:4:4, col:37> Text=" second line in second TextComment"

clang has idea of comment components but clang-c loses all of this detailed information. clang_Cursor_getRawCommentText is implemented by https://github.com/llvm-mirror/clang/blob/master/lib/AST/RawCommentList.cpp#L177 . I am playing with clang_Cursor_getCommentRange to see if that gives sufficient information to remove indentation in other lines.

Showing sections (Parameters, Exceptions, Returns...) is unlikely but i'd like to see what we can achieve with minimal effort.

Which editor does you use? For Emacs/Vim, I have reported the UI challenge to the language client plugins upstream (autozimu/LanguageClient-neovim#224 emacs-lsp/lsp-ui#17) sebastiencs has plan to add an interface for multi-line information and SpaceVim's author wsdjeg also plans to support LSP.

@romix

This comment has been minimized.

Show comment
Hide comment
@romix

romix Dec 25, 2017

Contributor

@MaskRay These days I mostly use Visual Studio Code with cquery. But I also use Spacemacs and NeoVim with ... YouCompleteMe, because I was not able to properly configure cquery in those editors ;-)

I agree that Clang's C++ API is more powerful and could be rather useful for comments processing.

Showing sections (Parameters, Exceptions, Returns...) is unlikely but i'd like to see what we can achieve with minimal effort.

I think it could be done, but I'm not sure about rendering it, as it may very much depend on the editor and how it render's hover responses with multiple strings and with different markup styles. E.g. in theory, one could produce HTML/Markdown markup for the documentation part of the hover and hope that the client would render it properly.

Contributor

romix commented Dec 25, 2017

@MaskRay These days I mostly use Visual Studio Code with cquery. But I also use Spacemacs and NeoVim with ... YouCompleteMe, because I was not able to properly configure cquery in those editors ;-)

I agree that Clang's C++ API is more powerful and could be rather useful for comments processing.

Showing sections (Parameters, Exceptions, Returns...) is unlikely but i'd like to see what we can achieve with minimal effort.

I think it could be done, but I'm not sure about rendering it, as it may very much depend on the editor and how it render's hover responses with multiple strings and with different markup styles. E.g. in theory, one could produce HTML/Markdown markup for the documentation part of the hover and hope that the client would render it properly.

@romix

This comment has been minimized.

Show comment
Hide comment
@romix

romix Dec 25, 2017

Contributor

BTW, Clang comments processing based on CXCursor has its own peculiarities. For some reason, it does not report comments for macro definitions. And I'm sure there are other edge cases like this.

Contributor

romix commented Dec 25, 2017

BTW, Clang comments processing based on CXCursor has its own peculiarities. For some reason, it does not report comments for macro definitions. And I'm sure there are other edge cases like this.

@romix

This comment has been minimized.

Show comment
Hide comment
@romix

romix Dec 25, 2017

Contributor

Another special case: With Clang C APIs it looks like we cannot collect comments like this:

int x; /// This is a comment for x

Even though -ast-dump shows that this comment is a child of a declaration.

Contributor

romix commented Dec 25, 2017

Another special case: With Clang C APIs it looks like we cannot collect comments like this:

int x; /// This is a comment for x

Even though -ast-dump shows that this comment is a child of a declaration.

@MaskRay

This comment has been minimized.

Show comment
Hide comment
@MaskRay

MaskRay Dec 25, 2017

Member

@romix https://github.com/llvm-mirror/clang/blob/master/test/Index/parse-all-comments.c seems to deliberately skip trailing ///. ///< markers should be used for trailing comments per https://www.stack.nl/~dimitri/doxygen/manual/docblocks.html

int trdoxyD;  // Not a Doxygen trailing comment.   trdoxyD NOT_DOXYGEN
              /// This comment doesn't get merged.   trdoxyE IS_DOXYGEN
int trdoxyE;

int trdoxyF;  /// A Doxygen non-trailing comment that gets dropped on the floor.
              // This comment will also be dropped.
int trdoxyG;  // This one won't.  trdoxyG NOT_DOXYGEN
Member

MaskRay commented Dec 25, 2017

@romix https://github.com/llvm-mirror/clang/blob/master/test/Index/parse-all-comments.c seems to deliberately skip trailing ///. ///< markers should be used for trailing comments per https://www.stack.nl/~dimitri/doxygen/manual/docblocks.html

int trdoxyD;  // Not a Doxygen trailing comment.   trdoxyD NOT_DOXYGEN
              /// This comment doesn't get merged.   trdoxyE IS_DOXYGEN
int trdoxyE;

int trdoxyF;  /// A Doxygen non-trailing comment that gets dropped on the floor.
              // This comment will also be dropped.
int trdoxyG;  // This one won't.  trdoxyG NOT_DOXYGEN

MaskRay added a commit that referenced this pull request Dec 26, 2017

@MaskRay

This comment has been minimized.

Show comment
Hide comment
@MaskRay

MaskRay Dec 26, 2017

Member

Adapted the multiple MarkedString part and commited 0c8e95e

Member

MaskRay commented Dec 26, 2017

Adapted the multiple MarkedString part and commited 0c8e95e

@romix

This comment has been minimized.

Show comment
Hide comment
@romix

romix Dec 26, 2017

Contributor

I'm working on an improved version, which understands the typical Doxygen markup and splits complex multi-line comments into "sections". It is somewhat similar to Clang's C++ API internals, but simpler. This should in theory allow for showing sections (Parameters, Exceptions, Returns...).

Contributor

romix commented Dec 26, 2017

I'm working on an improved version, which understands the typical Doxygen markup and splits complex multi-line comments into "sections". It is somewhat similar to Clang's C++ API internals, but simpler. This should in theory allow for showing sections (Parameters, Exceptions, Returns...).

@romix

This comment has been minimized.

Show comment
Hide comment
@romix

romix Dec 26, 2017

Contributor

@MaskRay Have a look at the updated version that I pushed here. It is not rebased on your commits because there are too many changes here. I'd like to hear your opinion about the overall approach.

Basically, I introduced a machinery to parse comments into a set of sections. Once the set of comment sections is created it can be formatted in any way to produce the final response for the hover request.

Contributor

romix commented Dec 26, 2017

@MaskRay Have a look at the updated version that I pushed here. It is not rebased on your commits because there are too many changes here. I'd like to hear your opinion about the overall approach.

Basically, I introduced a machinery to parse comments into a set of sections. Once the set of comment sections is created it can be formatted in any way to produce the final response for the hover request.

@MaskRay

This comment has been minimized.

Show comment
Hide comment
@MaskRay

MaskRay Dec 26, 2017

Member

Sorry if my commits cause any trouble to you. My concern is clang has idea of different comment types (https://github.com/llvm-mirror/clang/blob/master/lib/AST/RawCommentList.cpp https://github.com/llvm-mirror/clang/blob/master/lib/AST/CommentBriefParser.cpp ...). By using CXString clang_Cursor_getRawCommentText(...), we lose information which is available in its backing C++ interface and doing a parser on that seems fragile and a lot of extra work that can be avoided.

@jacobdufault

Member

MaskRay commented Dec 26, 2017

Sorry if my commits cause any trouble to you. My concern is clang has idea of different comment types (https://github.com/llvm-mirror/clang/blob/master/lib/AST/RawCommentList.cpp https://github.com/llvm-mirror/clang/blob/master/lib/AST/CommentBriefParser.cpp ...). By using CXString clang_Cursor_getRawCommentText(...), we lose information which is available in its backing C++ interface and doing a parser on that seems fragile and a lot of extra work that can be avoided.

@jacobdufault

@MaskRay

This comment has been minimized.

Show comment
Hide comment
@MaskRay

MaskRay Dec 26, 2017

Member

I want to see how comments are rendered in VSCode/Emacs/Vim, but sadly the cquery plugin in my vscode no longer works (there is no indexer thread in the release/bin/cquery process for unknown reason). There is some ongoing support in lsp-ui-doc.el emacs-lsp/lsp-ui#19 (thanks so much to sebastiencs!) I really want to see how the the language server (cquery) interoperates with UIs (lsp-ui-doc, neovim floating window SpaceVim/SpaceVim#1102 )

Member

MaskRay commented Dec 26, 2017

I want to see how comments are rendered in VSCode/Emacs/Vim, but sadly the cquery plugin in my vscode no longer works (there is no indexer thread in the release/bin/cquery process for unknown reason). There is some ongoing support in lsp-ui-doc.el emacs-lsp/lsp-ui#19 (thanks so much to sebastiencs!) I really want to see how the the language server (cquery) interoperates with UIs (lsp-ui-doc, neovim floating window SpaceVim/SpaceVim#1102 )

@romix

This comment has been minimized.

Show comment
Hide comment
@romix

romix Dec 26, 2017

Contributor

@MaskRay I rebased on master, so that it is possible to build for others.

Contributor

romix commented Dec 26, 2017

@MaskRay I rebased on master, so that it is possible to build for others.

@romix

This comment has been minimized.

Show comment
Hide comment
@romix

romix Dec 26, 2017

Contributor

As an example of a produced hover, for this function:

/// This is a test function.
///
/// \param[in] x first parameter
/// @param y third param has to
///          be >= 0 
/// @exceptions ex1, ex2
/// \returns the sum of params and is >= 0
int testFunction3(int x, int y) throw(ex1, ex2) {
  return x + y;
}

it produces this hover in VSCode:
hoverwithparametersdescription

Contributor

romix commented Dec 26, 2017

As an example of a produced hover, for this function:

/// This is a test function.
///
/// \param[in] x first parameter
/// @param y third param has to
///          be >= 0 
/// @exceptions ex1, ex2
/// \returns the sum of params and is >= 0
int testFunction3(int x, int y) throw(ex1, ex2) {
  return x + y;
}

it produces this hover in VSCode:
hoverwithparametersdescription

@romix

This comment has been minimized.

Show comment
Hide comment
@romix

romix Dec 26, 2017

Contributor

@jacobdufault @MaskRay BTW, one annoying thing that you can see in the hover image above is that the signature is shown without parameter names, but parameter descriptions refer to them. It would be nice if we could show parameter names in the signature as well, but I don't know if this information is stripped by cquery or clang.

Contributor

romix commented Dec 26, 2017

@jacobdufault @MaskRay BTW, one annoying thing that you can see in the hover image above is that the signature is shown without parameter names, but parameter descriptions refer to them. It would be nice if we could show parameter names in the signature as well, but I don't know if this information is stripped by cquery or clang.

@MaskRay

This comment has been minimized.

Show comment
Hide comment
@MaskRay

MaskRay Dec 26, 2017

Member

The type you see in the image comes from var->def.detailed_name, which in turn comes from
indexer.cc:1152

      std::string type_name =
          ToString(clang_getTypeSpelling(clang_getCursorType(decl->cursor)));
CXString clang_getTypeSpelling(CXType CT) {
  QualType T = GetQualType(CT);
  if (T.isNull())
    return cxstring::createEmpty();

  CXTranslationUnit TU = GetTU(CT);
  SmallString<64> Str;
  llvm::raw_svector_ostream OS(Str);
  PrintingPolicy PP(cxtu::getASTUnit(TU)->getASTContext().getLangOpts());

  T.print(OS, PP);

  return cxstring::createDup(OS.str());
}

I think parameter names are not stored in QualType.

Member

MaskRay commented Dec 26, 2017

The type you see in the image comes from var->def.detailed_name, which in turn comes from
indexer.cc:1152

      std::string type_name =
          ToString(clang_getTypeSpelling(clang_getCursorType(decl->cursor)));
CXString clang_getTypeSpelling(CXType CT) {
  QualType T = GetQualType(CT);
  if (T.isNull())
    return cxstring::createEmpty();

  CXTranslationUnit TU = GetTU(CT);
  SmallString<64> Str;
  llvm::raw_svector_ostream OS(Str);
  PrintingPolicy PP(cxtu::getASTUnit(TU)->getASTContext().getLangOpts());

  T.print(OS, PP);

  return cxstring::createDup(OS.str());
}

I think parameter names are not stored in QualType.

@romix

This comment has been minimized.

Show comment
Hide comment
@romix

romix Dec 27, 2017

Contributor

As for parameter names, I think I know how to produce nicely looking signatures with parameter names, should we want it.

Contributor

romix commented Dec 27, 2017

As for parameter names, I think I know how to produce nicely looking signatures with parameter names, should we want it.

@jacobdufault

This comment has been minimized.

Show comment
Hide comment
@jacobdufault

jacobdufault Dec 27, 2017

Member

I'm ok with the approach here, but the parsing code really needs tests. It shouldn't be too much work to write them.

As for parameter names, I think I know how to produce nicely looking signatures with parameter names, should we want it.

How? I think that'd be a nice improvement.

Member

jacobdufault commented Dec 27, 2017

I'm ok with the approach here, but the parsing code really needs tests. It shouldn't be too much work to write them.

As for parameter names, I think I know how to produce nicely looking signatures with parameter names, should we want it.

How? I think that'd be a nice improvement.

@romix

This comment has been minimized.

Show comment
Hide comment
@romix

romix Dec 27, 2017

Contributor

As for parameter names, I think I know how to produce nicely looking signatures with parameter names, should we want it.

@jacobdufault Here is one possible approach to create function signatures with parameter names. It is inspired by how KDevelop does it.

An alternative approach could be to take the signature that is currently produced by cquery and inject parameter names at the right positions. It would require some logic similar to the current logic used to find the place to inject a fully qualified function name.

Contributor

romix commented Dec 27, 2017

As for parameter names, I think I know how to produce nicely looking signatures with parameter names, should we want it.

@jacobdufault Here is one possible approach to create function signatures with parameter names. It is inspired by how KDevelop does it.

An alternative approach could be to take the signature that is currently produced by cquery and inject parameter names at the right positions. It would require some logic similar to the current logic used to find the place to inject a fully qualified function name.

@jacobdufault

This comment has been minimized.

Show comment
Hide comment
@jacobdufault

jacobdufault Dec 27, 2017

Member

I'm okay with either approach. Make sure to run --test-index which should will hopefully help you identify most of the special cases to consider.

Member

jacobdufault commented Dec 27, 2017

I'm okay with either approach. Make sure to run --test-index which should will hopefully help you identify most of the special cases to consider.

@romix

This comment has been minimized.

Show comment
Hide comment
@romix

romix Dec 27, 2017

Contributor

@jacobdufault For function signatures with parameter names, see #197

Contributor

romix commented Dec 27, 2017

@jacobdufault For function signatures with parameter names, see #197

@romix

This comment has been minimized.

Show comment
Hide comment
@romix

romix Dec 28, 2017

Contributor

Shall we use clang C++ interface for comment parsing (this) , textDocument/formatting , pretty-printing function signatures ?

It is a good question. I guess it is up to @jacobdufault to decide. Should we go for C++ APIs, we'd need to create new PRs for that

Contributor

romix commented Dec 28, 2017

Shall we use clang C++ interface for comment parsing (this) , textDocument/formatting , pretty-printing function signatures ?

It is a good question. I guess it is up to @jacobdufault to decide. Should we go for C++ APIs, we'd need to create new PRs for that

@jacobdufault

This comment has been minimized.

Show comment
Hide comment
@jacobdufault

jacobdufault Dec 29, 2017

Member

I'd like to avoid using the C++ APIs as much as possible, so I'm fine merging this as-is.

Member

jacobdufault commented Dec 29, 2017

I'd like to avoid using the C++ APIs as much as possible, so I'm fine merging this as-is.

@MaskRay

This comment has been minimized.

Show comment
Hide comment
@MaskRay

MaskRay Dec 30, 2017

Member

845e170 adds a minimalist lexer to skip Doxygen comment markers.

Also see
845e170#commitcomment-26556696 for the current approach used in cquery.

Member

MaskRay commented Dec 30, 2017

845e170 adds a minimalist lexer to skip Doxygen comment markers.

Also see
845e170#commitcomment-26556696 for the current approach used in cquery.

@topisani

This comment has been minimized.

Show comment
Hide comment
@topisani

topisani Dec 30, 2017

Contributor

I haven't seen it mentioned yet, but i assume this will be togglable? also, some way to specify a custom parser would be nice, for instance, i use standardese which uses syntax similar to doxygen, but still some custom options would be nice. There are a few options for this:

  1. an option to disable doxygen parsing, do the formatting clientside
  2. custom parsing options (probably hard to make useful)
  3. supplying an external executable name and format using a subprocess (probably too slow)
  4. write a few different parsers for common styles, and select one in the init options or the cli

i think a combination of 1 and 4 would be best

Contributor

topisani commented Dec 30, 2017

I haven't seen it mentioned yet, but i assume this will be togglable? also, some way to specify a custom parser would be nice, for instance, i use standardese which uses syntax similar to doxygen, but still some custom options would be nice. There are a few options for this:

  1. an option to disable doxygen parsing, do the formatting clientside
  2. custom parsing options (probably hard to make useful)
  3. supplying an external executable name and format using a subprocess (probably too slow)
  4. write a few different parsers for common styles, and select one in the init options or the cli

i think a combination of 1 and 4 would be best

@romix

This comment has been minimized.

Show comment
Hide comment
@romix

romix Dec 30, 2017

Contributor

@topisani Good questions.

Yes, we discussed it with @MaskRay. And your conclusion about 1 and 4 is essentially what we thought about. The idea it to make it configurable. It should be possible to disable/enable the formatting on the server-side and to pick the formatting style (should we support multiple styles).

Contributor

romix commented Dec 30, 2017

@topisani Good questions.

Yes, we discussed it with @MaskRay. And your conclusion about 1 and 4 is essentially what we thought about. The idea it to make it configurable. It should be possible to disable/enable the formatting on the server-side and to pick the formatting style (should we support multiple styles).

@sebastiencs

This comment has been minimized.

Show comment
Hide comment
@sebastiencs

sebastiencs Dec 31, 2017

Not sure if it's already assumed, but in addition to strip comment characters, convert doxygen to markdown would be nice too.

sebastiencs commented Dec 31, 2017

Not sure if it's already assumed, but in addition to strip comment characters, convert doxygen to markdown would be nice too.

@MaskRay

This comment has been minimized.

Show comment
Hide comment
@MaskRay

MaskRay Dec 31, 2017

Member

You may need to run git fetch origin; git rebase origin/master after the git filter-branch (or the commit corresponding to your branch point)

Member

MaskRay commented Dec 31, 2017

You may need to run git fetch origin; git rebase origin/master after the git filter-branch (or the commit corresponding to your branch point)

MaskRay added a commit that referenced this pull request Dec 31, 2017

MaskRay added a commit that referenced this pull request Dec 31, 2017

MaskRay added a commit that referenced this pull request Dec 31, 2017

@jacobdufault

This comment has been minimized.

Show comment
Hide comment
@jacobdufault

jacobdufault Dec 31, 2017

Member

Performing formatting in the hover response and indexing the full comment text SGTM since it should make supporting multiple formatters easy.

I think the formatters should be part of cquery since if someone is willing to invest the time to write one contributing upstream is relatively simple. Otherwise maybe embed a lua repl? I'd prefer to avoid that though.

Member

jacobdufault commented Dec 31, 2017

Performing formatting in the hover response and indexing the full comment text SGTM since it should make supporting multiple formatters easy.

I think the formatters should be part of cquery since if someone is willing to invest the time to write one contributing upstream is relatively simple. Otherwise maybe embed a lua repl? I'd prefer to avoid that though.

romix added some commits Dec 26, 2017

Improve formatting of hover documentation, add parameters descriptions
- Support Doxygen like commands, e.g. \param, \returns, \exceptions, @param, @returns, @Exceptions

- Represent parsed comments internally as a set of sections.
- Each section is either named and corresponds to a command like \param
- Or it is unnamed and corresponds e.g. to a paragraph in the comment
- This machinery allows for extensions in the future.

- A new formatted section for parameter descriptions/returns/exceptions is created if they were provided by means of Doxygen-like commands

- The hover response may contain this new parameter descriptions section
wip
@jacobdufault

This comment has been minimized.

Show comment
Hide comment
@jacobdufault

jacobdufault Jan 6, 2018

Member

FYI this code has a lot of complicated parsing logic, it needs tests before merging.

Member

jacobdufault commented Jan 6, 2018

FYI this code has a lot of complicated parsing logic, it needs tests before merging.

@romix

This comment has been minimized.

Show comment
Hide comment
@romix

romix Jan 6, 2018

Contributor

FYI this code has a lot of complicated parsing logic, it needs tests before merging.

Of course! I was just pretty busy with some other things recently.

Contributor

romix commented Jan 6, 2018

FYI this code has a lot of complicated parsing logic, it needs tests before merging.

Of course! I was just pretty busy with some other things recently.

@romix

This comment has been minimized.

Show comment
Hide comment
@romix

romix Jan 20, 2018

Contributor

@MaskRay Wow! What you found is very interesting.

Let's say we have this function declaration:

// Computes the edit distance of strings [a,a+la) and [b,b+lb) with Eugene W.
// Myers' O(ND) diff algorithm.
// Costs: insertion=1, deletion=1, no substitution.
// If the distance is larger than threshold, returns threshould + 1.
// \param a first string
// \param b second string
// \returns edit distance between \p a and \p b.
int MyersDiff(const char* a, int la, const char* b, int lb, int threshold);

I looked at the XML produced by the APIs and it looks like this:

<Function file="/home/xyz/src/cquery/src/working_files.cc" line="45" column="5">
<Name>MyersDiff</Name>
<USR>c:workingfiles.cc@aN@F@MyersDiff#*1C#I#S0#I#I#</USR>
<Declaration>int MyersDiff(const char a, int la, const char b, int lb, int threshold)</Declaration><Abstract>
<Para> Computes the edit distance of strings [a,a+la) and [b,b+lb) with Eugene W. Myers' O(ND) diff algorithm. Costs: insertion=1, deletion=1, no substitution. If the distance is larger than threshold, returns threshould + 1. </Para>
</Abstract>
<Parameters>
  <Parameter>
     <Name>a</Name>
     <Index>0</Index>
     <Direction isExplicit="0">in</Direction>
     <Discussion>
       <Para> first string </Para>
     </Discussion>
  </Parameter>
  <Parameter>
     <Name>b</Name>
     <Index>2</Index>
     <Direction isExplicit="0">in</Direction>
     <Discussion>
       <Para> second string </Para>
     </Discussion>
  </Parameter>
</Parameters>
<ResultDiscussion>
   <Para> edit distance between <monospaced>a</monospaced> and <monospaced>b.
   </monospaced></Para>
</ResultDiscussion>
</Function>

XML format seems to be the only one preserving the semantic structure of the comment. But as we can see, it is rather wasteful and it is also redundant when it comes to certain pieces of information like USR, file, Declaration.

BTW, ycmd seems to use it only to get the Declaration as far as I can see. But cquery does not have this issue, because we found another, less wasteful way to create declarations.

When it comes to comments formatting, the only thing YCMD seems to do is to take the RAW comment (and not what we see above) and strip comment markers at the beginning and at the end of all lines. They do not seem to e.g. highlight the parameter names or show them in bold, etc.

So I wonder what we should do with this API. One issue with it is that you can get the parsed comment only based on the cursor. You cannot get it e.g. from a string with the raw comment. It means that we need to obtain parsed comments during indexing. But then we need to serialize them somehow, I guess. And the only way provided by Clang for the serialization is to produce the XML. Of course, we can strip some parts of this XML, which are redundant. Or we can traverse the CXComment, extract only what we need and then serialize this extracted structural information.

Thoughts about how to proceed?

Contributor

romix commented Jan 20, 2018

@MaskRay Wow! What you found is very interesting.

Let's say we have this function declaration:

// Computes the edit distance of strings [a,a+la) and [b,b+lb) with Eugene W.
// Myers' O(ND) diff algorithm.
// Costs: insertion=1, deletion=1, no substitution.
// If the distance is larger than threshold, returns threshould + 1.
// \param a first string
// \param b second string
// \returns edit distance between \p a and \p b.
int MyersDiff(const char* a, int la, const char* b, int lb, int threshold);

I looked at the XML produced by the APIs and it looks like this:

<Function file="/home/xyz/src/cquery/src/working_files.cc" line="45" column="5">
<Name>MyersDiff</Name>
<USR>c:workingfiles.cc@aN@F@MyersDiff#*1C#I#S0#I#I#</USR>
<Declaration>int MyersDiff(const char a, int la, const char b, int lb, int threshold)</Declaration><Abstract>
<Para> Computes the edit distance of strings [a,a+la) and [b,b+lb) with Eugene W. Myers' O(ND) diff algorithm. Costs: insertion=1, deletion=1, no substitution. If the distance is larger than threshold, returns threshould + 1. </Para>
</Abstract>
<Parameters>
  <Parameter>
     <Name>a</Name>
     <Index>0</Index>
     <Direction isExplicit="0">in</Direction>
     <Discussion>
       <Para> first string </Para>
     </Discussion>
  </Parameter>
  <Parameter>
     <Name>b</Name>
     <Index>2</Index>
     <Direction isExplicit="0">in</Direction>
     <Discussion>
       <Para> second string </Para>
     </Discussion>
  </Parameter>
</Parameters>
<ResultDiscussion>
   <Para> edit distance between <monospaced>a</monospaced> and <monospaced>b.
   </monospaced></Para>
</ResultDiscussion>
</Function>

XML format seems to be the only one preserving the semantic structure of the comment. But as we can see, it is rather wasteful and it is also redundant when it comes to certain pieces of information like USR, file, Declaration.

BTW, ycmd seems to use it only to get the Declaration as far as I can see. But cquery does not have this issue, because we found another, less wasteful way to create declarations.

When it comes to comments formatting, the only thing YCMD seems to do is to take the RAW comment (and not what we see above) and strip comment markers at the beginning and at the end of all lines. They do not seem to e.g. highlight the parameter names or show them in bold, etc.

So I wonder what we should do with this API. One issue with it is that you can get the parsed comment only based on the cursor. You cannot get it e.g. from a string with the raw comment. It means that we need to obtain parsed comments during indexing. But then we need to serialize them somehow, I guess. And the only way provided by Clang for the serialization is to produce the XML. Of course, we can strip some parts of this XML, which are redundant. Or we can traverse the CXComment, extract only what we need and then serialize this extracted structural information.

Thoughts about how to proceed?

@romix

This comment has been minimized.

Show comment
Hide comment
@romix

romix Jan 20, 2018

Contributor

BTW, should we want to get the Declaration from the XML comment, it is a bit different from what we generate. It seems to basically contain the source range for the corresponding declaration rather than being reconstructed from the function types. This way, it looks more similar to what is actually written in the source code as apposed to the reconstructed view as seen by the Clang's type checker ;-)

Contributor

romix commented Jan 20, 2018

BTW, should we want to get the Declaration from the XML comment, it is a bit different from what we generate. It seems to basically contain the source range for the corresponding declaration rather than being reconstructed from the function types. This way, it looks more similar to what is actually written in the source code as apposed to the reconstructed view as seen by the Clang's type checker ;-)

@MaskRay

This comment has been minimized.

Show comment
Hide comment
@MaskRay

MaskRay Jan 21, 2018

Member

We may keep current type signature which uses qualified names. But it is indeed interesting to ask to what extent shall we describe the type signature.

Member

MaskRay commented Jan 21, 2018

We may keep current type signature which uses qualified names. But it is indeed interesting to ask to what extent shall we describe the type signature.

@romix

This comment has been minimized.

Show comment
Hide comment
@romix

romix Jan 21, 2018

Contributor

@MaskRay But what about the parsed comment itself? If we would use it, we wouldn't need a lot of complex code to parse it, split into sections, find parameter references, etc. Clang would do it for us.

Contributor

romix commented Jan 21, 2018

@MaskRay But what about the parsed comment itself? If we would use it, we wouldn't need a lot of complex code to parse it, split into sections, find parameter references, etc. Clang would do it for us.

@MaskRay

This comment has been minimized.

Show comment
Hide comment
@MaskRay

MaskRay Jan 21, 2018

Member

I'm not clear what UI benefits parsed comments will bring..... (probably because I'm using Emacs and the LSP toolchain definitely lacks such thing)

Member

MaskRay commented Jan 21, 2018

I'm not clear what UI benefits parsed comments will bring..... (probably because I'm using Emacs and the LSP toolchain definitely lacks such thing)

@HotelCalifornia

This comment has been minimized.

Show comment
Hide comment
@HotelCalifornia

HotelCalifornia Mar 12, 2018

I'm not clear what UI benefits parsed comments will bring.....

here's an example of some formatting as it is:

note that there seem to be missing newlines (e.g. when I document errno values and between parameters).

as for doxygen parsing, I think that would be a nice-to-have (would certainly look a little nicer without the doxygen directives), but is that something the language server should be responsible for? (I might not be too clear on the division of labor here)

HotelCalifornia commented Mar 12, 2018

I'm not clear what UI benefits parsed comments will bring.....

here's an example of some formatting as it is:

note that there seem to be missing newlines (e.g. when I document errno values and between parameters).

as for doxygen parsing, I think that would be a nice-to-have (would certainly look a little nicer without the doxygen directives), but is that something the language server should be responsible for? (I might not be too clear on the division of labor here)

@edjubuh

This comment has been minimized.

Show comment
Hide comment
@edjubuh

edjubuh Mar 20, 2018

Contributor

For formatting, clang gives us a formatted HTML string: https://github.com/llvm-mirror/clang/blob/master/include/clang-c/Documentation.h#L530.

I wonder if we could make this a config option for clients to select. If the client supports rendering HTML (like VSCode and Atom), then they can enable this option and cquery will call the necessary libclang functions to get the formatted HTML docstring.

Contributor

edjubuh commented Mar 20, 2018

For formatting, clang gives us a formatted HTML string: https://github.com/llvm-mirror/clang/blob/master/include/clang-c/Documentation.h#L530.

I wonder if we could make this a config option for clients to select. If the client supports rendering HTML (like VSCode and Atom), then they can enable this option and cquery will call the necessary libclang functions to get the formatted HTML docstring.

@jacobdufault

This comment has been minimized.

Show comment
Hide comment
@jacobdufault

jacobdufault Mar 20, 2018

Member

I'm willing to merge the PR, but there is non-trivial parsing logic so I'd like to have some unit tests before doing so.

Member

jacobdufault commented Mar 20, 2018

I'm willing to merge the PR, but there is non-trivial parsing logic so I'd like to have some unit tests before doing so.

@jacobdufault

This comment has been minimized.

Show comment
Hide comment
@jacobdufault

jacobdufault Mar 28, 2018

Member

@romix I'll close the PR for now, please reopen if you continue working on it.

Member

jacobdufault commented Mar 28, 2018

@romix I'll close the PR for now, please reopen if you continue working on it.

MaskRay added a commit to MaskRay/ccls that referenced this pull request Mar 31, 2018

MaskRay added a commit to MaskRay/ccls that referenced this pull request Mar 31, 2018

MaskRay added a commit to MaskRay/ccls that referenced this pull request Apr 17, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment