Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syntax highlighting #677

Closed
quicknir opened this issue May 9, 2016 · 21 comments
Closed

Syntax highlighting #677

quicknir opened this issue May 9, 2016 · 21 comments

Comments

@quicknir
Copy link

quicknir commented May 9, 2016

Is it completely insane to ask whether it would be possible to write a major mode for C++ that uses rtags to query the type of token for syntax highlighting, instead of doing it regex based?

I found myself contemplating a switch to spacemacs + rtags this weekend. There's a handful of places where it falls short of Eclipse (and of course places where it's stronger as well). One of these is that in Eclipse, syntax highlighting is based on the AST. Not only is this going to be accurate whenever your index is, but it also lets you make a lot more interesting distinctions. For example, static variables can be italicized (not just at declaration, but also whenever used), member functions can be a different color from functions (inside another member function of the same class they can't be distinguished by ege or regex), etc.

Given that rtatgs has all this information available, it seems like this should be possible in principle. So a major mode could try querying rtags for the information by calling appropriate functions, and then fall back to the regex when necessary.

@Andersbakken
Copy link
Owner

There are I think possibly two problems:

  1. Performance, this could probably be fixed by allowing elisp to ask for
    the whole syntax tree for a source file (or a range) and not have to do a
    lot of lookups.

  2. Clang more or less falls apart when things do not compile. I guess the
    falling back to regex could work for that.

In short, I think it's doable and I can try to put together something that
makes the querying faster.

It's currently possible to use:

(rtags-symbol-info-internal) to get an assoc list with all the necessary
information about the symbol under the cursor.

I'm not sure I'm entirely the right person to write the mode that does
this. Do you feel inclined to give it a go? I'd support the effort with
whatever APIs you need.

Anders

On Mon, May 9, 2016 at 2:43 PM, quicknir notifications@github.com wrote:

Is it completely insane to ask whether it would be possible to write a
major mode for C++ that uses rtags to query the type of token for syntax
highlighting, instead of doing it regex based?

I found myself contemplating a switch to spacemacs + rtags this weekend.
There's a handful of places where it falls short of Eclipse (and of course
places where it's stronger as well). One of these is that in Eclipse,
syntax highlighting is based on the AST. Not only is this going to be
accurate whenever your index is, but it also lets you make a lot more
interesting distinctions. For example, static variables can be italicized
(not just at declaration, but also whenever used), member functions can be
a different color from functions (inside another member function of the
same class they can't be distinguished by ege or regex), etc.

Given that rtatgs has all this information available, it seems like this
should be possible in principle. So a major mode could try querying rtags
for the information by calling appropriate functions, and then fall back to
the regex when necessary.


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#677

@quicknir
Copy link
Author

Those are both good points, sounds though like there are solutions for both. Possibly also for 2), the last symbol type could be kept from the last successful compile. In a typical workflow you will have a successful compile at some point, change something, then it fails, and 99% of the symbols would have the same type.

Truthfully, I am not yet an emacs user, nor have I ever written a line of elisp, so I'm probably not the right person. Where else in the ecosystem do you think I could bring this up? Emacs core, CEDET?

I realized it's a bit of a long shot to bring this up when I'm unlikely to implement it myself, but I figure if nothing else it's a datapoint.

@dvzubarev
Copy link

There is an attempt to bring a similar feature to ycmd.
ycm-core/ycmd#291

@Andersbakken
Copy link
Owner

It's definitely interesting. I certainly wouldn't mind doing the c++ part
of it. I wouldn't quite know where to start on the elisp though.

On Tue, May 10, 2016 at 7:12 AM, dvzubarev notifications@github.com wrote:

There is an attempt to bring a similar feature to ycmd.
ycm-core/ycmd#291 ycm-core/ycmd#291


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#677 (comment)

@JohnC32
Copy link
Contributor

JohnC32 commented May 11, 2016

This is a nice idea. It would solve issues where cc-mode is not handling C++11 syntax correctly. There are several outstanding issues and not much activity on cc-mode (I think). If using the rtags to syntax highlight, it would also be great to hook this up with clang-tidy to get indentation correct when the tab key is hit or you do M-x indent-region. The current cc-mode indentation and highlighting are coupled. One concern with doing this is that running clang (rdm/rp) on large files takes many seconds. The processing of clang looks at all headers, so if you've include a lot of STL/Boost, the size of the preprocessor output is in the millions of lines (ignoring blank lines). Whereas the current regex cc-mode approach is fast on similar files. If there were a way to incrementally analyze a source file or analyze code regions in a file without diving into the headers, this approach would be very viable and not too difficult to implement.

@Andersbakken
Copy link
Owner

Yeah. It's a bit problematic. I don't believe clang has a mode to only
check syntax, at least not in the public API. They do have an option called
-fsyntax-only but it's not noticably faster and I believe it traverses the
headers as well.

I think any successful mode would have to asynchronously wait for RTags'
syntax info and use cc-mode when it's not available/up-to-date.

Anders

On Wed, May 11, 2016 at 8:59 AM, JohnC32 notifications@github.com wrote:

This is a nice idea. It would solve issues where cc-mode is not handling
C++11 syntax correctly. There are several outstanding issues and not much
activity on cc-mode (I think). If using the rtags to syntax highlight, it
would also be great to hook this up with clang-tidy to get indentation
correct when the tab key is hit or you do M-x indent-region. The current
cc-mode indentation and highlighting are coupled. One concern with doing
this is that running clang (rdm/rp) on large files takes many seconds. The
processing of clang looks at all headers, so if you've include a lot of
STL/Boost, the size of the preprocessor output is in the millions of lines
(ignoring blank lines). Whereas the current regex cc-mode approach is fast
on similar files. If there were a way to incrementally analyze a source
file or analyze code regions in a file without diving into the headers,
this approach would be very viable and not too difficult to implement.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#677 (comment)

@quicknir
Copy link
Author

quicknir commented May 12, 2016

So I've made a bit of progress research wise. First off, it seems like one can can populate a variable called font-lock-keywords. This is a list of elements, each element can take several forms but one form of particular interest is a pair, with the first element being a function and the second function being a face. One calls the function which returns some description of the matched area, then the face in the second element is applied to the matched area. Font-lock works through this list of elements one by one. (http://www.gnu.org/software/emacs/manual/html_node/elisp/Search_002dbased-Fontification.html#Search_002dbased-Fontification)

In our case, these functions could basically have references to some array that's populated by querying the backend. One function per face; each function does a single scan over the array and sees which points in the text match the syntactic element its looking to highlight.

A second useful piece of information is that Steve Yegge actually wrote a proper AST emacs syntax highlighter for JS. He apparently did not use font lock at all, and did not fall back to regex parsing. His code is long as he actually wrote the entire js parser, but somewhere in that code, there is the part that actually sets the colors, which we can use.

@Andersbakken
Copy link
Owner

Yeah. I do use js2-mode which is quite the engineering achievement. I agree
that this methodology might work.

I have started exposing the token information to elisp already a little bit
so it seems we have most of the pieces we need to get started.

Anders

On Thu, May 12, 2016 at 8:22 AM, quicknir notifications@github.com wrote:

So I've made a bit of progress research wise. First off, it seems like one
can can populate a variable called font-lock-keywords. This is a list of
elements, each element can take several forms but one form of particular
interest is a pair, with the first element being a function and the second
function being a face. One calls the function which returns some
description of the matched area, then the face in the second element is
applied to the matched area. Font-lock works through this list of elements
one by one.

In our case, these functions could basically have references to some array
that's populated by querying the backend. One function per face; each
function does a single scan over the array and sees which points in the
text match the syntactic element its looking to highlight.

A second useful piece of information is that Steve Yegge actually wrote a
proper AST emacs syntax highlighter for JS. He apparently did not use font
lock at all, and did not fall back to regex parsing. His code is long as he
actually wrote the entire js parser, but somewhere in that code, there is
the part that actually sets the colors, which we can use.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#677 (comment)

@gvol
Copy link
Contributor

gvol commented May 13, 2016

I'm interested in this effort. Perhaps a good start would be handling nothing but raw string literals. It seems like that might simplify things and would solve an important problem area for me.

I don't think it's fair to say there is work on cc-mode (Alan Mackenzie has been quite responsive in my experience), but so far it doesn't support raw string literals. Hopefully that will change soon, but I think it's non-trivial.

In my estimation, as long as it uses clang instead of gcc it won't get into Emacs core--RMS will veto it. There are several threads about this that you can read if you're interested.

@quicknir
Copy link
Author

Yeah, that's a given. I recently emailed the emacs mailing list. Here was RMS' reply:

We develop GCC as well as Emacs. To adopt a competitor to GCC
as a "solition" would be self defeating.

A proper solution is to extend GCC so that it does the necessary job.

Shrug. Other people were more helpful and pointed me to font lock. But I don't think there's any appetite to do it themselves. gvol, how's your elisp? ;-)

@gvol
Copy link
Contributor

gvol commented May 14, 2016

My elisp is decent (I have a few MELPA packages) . My time is less so. :-( But I'll try to make some time since it would help me immensely. What I don't have a good handle on is calling rtags to get the information. But I'll see what I can get going.

@Andersbakken
Copy link
Owner

Hi Ivan

Currently one would call (rtags-tokens) to get the info. This is currently
a blocking call that might take a little while so I'm working on making it
async.

Anders

On Fri, May 13, 2016 at 7:46 PM, Ivan Andrus notifications@github.com
wrote:

My elisp is decent (I have a few MELPA packages) . My time is less so. :-(
But I'll try to make some time since it would help me immensely. What I
don't have a good handle on is calling rtags to get the information. But
I'll see what I can get going.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#677 (comment)

@jeaye
Copy link

jeaye commented May 18, 2016

I'll point out that https://github.com/jeaye/color_coded handles this quite nicely, in vim. Ideally, YCMD will provide an API which color_coded can use so that both projects don't end up compiling the source, and I encourage each of you to follow up with that request here ycm-core/ycmd#291.

As I'm currently in the middle of switching from vim to spacemacs, I'd really like to port color_coded over to a spacemacs layer; being able to ride on YCMD's tokens would make that easier, but it's not necessary. If someone is looking into implementing semantic highlighting for emacs, I recommend doing so as a port of color_coded; its native code should be able to remain unchanged.

@Andersbakken
Copy link
Owner

We're certainly not opposed to having rtags output the tokens in some
non-elisp format that color_coded and other tools could use.

Anders

On Tue, May 17, 2016 at 5:11 PM, jeaye notifications@github.com wrote:

I'll point out that https://github.com/jeaye/color_coded handles this
quite nicely, in vim. Ideally, YCMD will provide an API which color_coded
can use so that both projects don't end up compiling the source, and I
encourage each of you to follow up with that request here
ycm-core/ycmd#291 ycm-core/ycmd#291.

As I'm currently in the middle of switching from vim to spacemacs, I'd
really like to port color_coded over to a spacemacs layer; being able to
ride on YCMD's tokens would make that easier, but it's not necessary. If
someone is looking into implementing semantic highlighting for emacs, I
recommend doing so as a port of color_coded; its native code should be able
to remain unchanged.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#677 (comment)

@gvol
Copy link
Contributor

gvol commented May 24, 2016

Here's a very rough first attempt. I'm sure I haven't covered everything, I only tried it on a few cpp files. To test it out, evaluate the function below, open an rtags-enabled cpp file, and do

M-x rtags-fontify-region RET

You can also set it to be the default fontification function with in a cpp file with

M-: (setq-local font-lock-fontify-region-function #'rtags-fontify-region) RET

However, it currently requires that the file be saved and indexed, or all the offsets will be off (and hence fontification will be off), so it doesn't really work to edit. There are probably other issues that I haven't noticed.

(defun rtags-fontify-region (beg end loudly)
  "Fontify the text between BEG and END based on rtags tokens.
If LOUDLY is non-nil, print status messages while fontifying.
This function is for use as `font-lock-fontify-region-function'."

  ;; This has issues with preprocessor...
  ;; especially ifdef'ed out sections
  ;; Also I haven't tested performance
  ;; Until it's saved, things don't work right...
  (interactive (list (point-min) (point-max) t))
  (let ((inhibit-point-motion-hooks t)) ;; <---\
    (with-silent-modifications ;; These two are (save-buffer-state ...)
      (show "rtags-fontify-region")
      (if (not (rtags-is-indexed))
          ;; This is probably the wrong choice...
          (font-lock-default-fontify-region beg end loudly)
        (let* ((tokens (rtags-tokens beg end)))
          ;; (show tokens)
          (save-excursion
            (save-restriction
              (unless font-lock-dont-widen (widen))

              (font-lock-unfontify-region beg end)

              (when (and font-lock-syntactic-keywords
                         (null syntax-propertize-function))
                ;; Ensure the beginning of the file is properly syntactic-fontified.
                (let ((start beg))
                  (when (< font-lock-syntactically-fontified start)
                    (setq start (max font-lock-syntactically-fontified (point-min)))
                    (setq font-lock-syntactically-fontified end))
                  (font-lock-fontify-syntactic-keywords-region start end)))

              (dolist (token tokens)
                (let* ((kind (alist-get 'kind token))
                       (start (1+ (car token)))
                       (stop (+ start (alist-get 'length token))))
                  ;; (show kind)
                  (cond ((string-equal kind "Keyword")
                         ;; Some of these need to be types...
                         (let ((keyword (alist-get 'spelling token)))
                           (cond ((member keyword
                                          '("double" "float"
                                            "unsigned" "int" "char" "bool"
                                            "void"))
                                  (put-text-property start stop 'face 'font-lock-type-face))
                                 ((member keyword
                                          '("nullptr"))
                                  (put-text-property start stop 'face 'font-lock-builtin-face))
                                 (t
                                  (put-text-property start stop 'face 'font-lock-keyword-face)))))

                        ((string-equal kind "Punctuation")
                         (remove-text-properties start stop
                                                 '(syntax-table . nil)))

                        ((string-equal kind "Comment")
                         (put-text-property start stop 'face 'font-lock-comment-face))

                        ((string-equal kind "Identifier")
                         (let* ((sym (alist-get 'symbol token))
                                (kind (alist-get 'kind sym)))
                           (cond ((string-equal kind "inclusion directive")
                                  (put-text-property start stop 'face 'font-lock-preprocessor-face))
                                 ((string-equal kind "StructDecl")
                                  (put-text-property start stop 'face 'font-lock-type-face))
                                 ((string-equal kind "UsingDeclaration")
                                  ;; ???
                                  (put-text-property start stop 'face 'font-lock-type-face)
                                  )
                                 ((string-equal kind "VarDecl")
                                  (put-text-property start stop 'face 'font-lock-variable-name-face))
                                 ((string-equal kind "FieldDecl")
                                  (put-text-property start stop 'face 'font-lock-variable-name-face))
                                 ((string-equal kind "ParmDecl")
                                  (put-text-property start stop 'face 'font-lock-variable-name-face))
                                 ((string-equal kind "FunctionDecl")
                                  (put-text-property start stop 'face 'font-lock-function-name-face))
                                 ((string-equal kind "FunctionTemplate")
                                  (put-text-property start stop 'face 'font-lock-function-name-face))
                                 ((string-equal kind "TemplateTypeParameter")
                                  (put-text-property start stop 'face 'font-lock-type-face))
                                 ((string-equal kind "NonTypeTemplateParameter")
                                  (put-text-property start stop 'face 'font-lock-variable-name-face))
                                 ((string-equal kind "TemplateRef")
                                  (put-text-property start stop 'face 'font-lock-type-face))

                                 ((string-equal kind "MemberRefExpr")
                                  ;; Usage of a member reference
                                  )

                                 ((string-equal kind "MemberRef")
                                  ;; Usage of a member reference
                                  )

                                 ((string-equal kind "DeclRefExpr")
                                  ;; Variable/function usage
                                  ;; (put-text-property start stop 'face 'font-lock-variable-name-face)
                                  )
                                 ((string-equal kind "OverloadedDeclRef")
                                  ;; Variable/function usage
                                  ;; (put-text-property start stop 'face 'font-lock-variable-name-face)
                                  )
                                 ((string-equal kind "ClassTemplate")
                                  ;; Variable/function usage
                                  (put-text-property start stop 'face 'font-lock-type-face)
                                  )
                                 ((string-equal kind "CXXConstructor")
                                  (put-text-property start stop 'face 'font-lock-function-name-face)
                                  )
                                 ((string-equal kind "TypeRef")
                                  (put-text-property start stop 'face 'font-lock-type-face))
                                 ((string-equal kind "NamespaceRef")
                                  (put-text-property start stop 'face 'font-lock-builtin-face))

                                 ((string-equal kind "Namespace")
                                  (put-text-property start stop 'face 'font-lock-builtin-face))

                                 ((string-equal kind "macro expansion")
                                  (put-text-property start stop 'face 'font-lock-warning-face))

                                 ((string-equal kind "macro definition")
                                  (put-text-property start stop 'face 'font-lock-variable-name-face))

                                 ((null kind)
                                  ;; Not sure what this is, but it's X and Y in Test(X,Y)
                                  ;; (put-text-property start stop 'face 'font-lock-variable-name-face))
                                  )


                                 (t
                                  (show kind)
                                  (show token)))))

                        ((string-equal kind "Literal")
                         ;; Stuff to handle strings..., numbers, etc.
                         (let* ((sym (alist-get 'symbol token))
                                (spelling (alist-get 'spelling token))
                                (type (alist-get 'type sym)))

                           (cond ((string-match "^\"" spelling)
                                  (put-text-property start stop 'face 'font-lock-string-face))
                                 ((string-match "\\(L\\|u8\\|u\\|U\\)?R[\"\"]\\([^(]*\\)("
                                                (alist-get 'spelling token))
                                  (put-text-property start stop 'face 'font-lock-doc-face))
                                 ((string-match "^'" spelling)
                                  (put-text-property start stop 'face 'font-lock-string-face))

                                 ((string-match "^[0-9'xa-f]+$" spelling)
                                  ;; Nothing for
                                  )

                                 (t
                                  (show type)
                                  (show spelling)
                                  (show token))))

                         (when (and nil
                                    (string-match "\\(L\\|u8\\|u\\|U\\)?R[\"\"]\\([^(]*\\)("
                                                  (alist-get 'spelling token)))
                           (let* ((full (match-string 0 (alist-get 'spelling token)))
                                  (delimiter (match-string 2 (alist-get 'spelling token)))
                                  (qualifier (match-string 1 (alist-get 'spelling token)))
                                  ;; TODO: use (rtags-goto-offset) which handles the 1+ and multibyte
                                  (beg-beg (1+ (car token))) ; Is this the right thing?  I think rtags is off by 1 from
                                  (beg-end (+ beg-beg (length full)))
                                  (end-len (+ 2 (length delimiter)))
                                  (end-end (+ beg-beg (alist-get 'length token)))
                                  (end-beg (- end-end end-len)))
                             (remove-text-properties
                              (car token)
                              (+ (car token) (alist-get 'length token))
                              '(syntax-table . nil))
                             (put-text-property beg-beg (+ beg-beg (length qualifier) 1)
                                                'syntax-table
                                                (string-to-syntax "'"))
                             (put-text-property (+ beg-beg (length qualifier) 1)
                                                (+ beg-beg (length qualifier) 2)
                                                'syntax-table
                                                ;; (string-to-syntax "\"")
                                                (string-to-syntax "|"))
                             (put-text-property (1- end-end)
                                                end-end
                                                'syntax-table
                                                (string-to-syntax "|"))
                             ;; Make " inside the raw string, not string quotes
                             (goto-char beg-end)
                             (while (search-forward-regexp "[\"\n\"]" end-beg t)
                               ;; (show (buffer-substring-no-properties (1- (point)) (point)))
                               (put-text-property (1- (point)) (point)
                                                  'syntax-table
                                                  (string-to-syntax " "))))))

                        (t (show kind)))))

              ;; Return the bounds of what was actually fontified..
              (cons 'jit-lock-bounds (cons beg end)))))))))

@Andersbakken
Copy link
Owner

Thanks. I am not sure what one can do about the requirement for it being
saved/indexed actually. It's a little hard to imagine a good solution for
that. Maybe detect that it isn't the case and fall back to regular cc-mode.

Wanna file a pull request to get it into rtags.el?

Anders

On Tue, May 24, 2016 at 10:53 AM, Ivan Andrus notifications@github.com
wrote:

Here's a very rough first attempt. I'm sure I haven't covered
everything, I only tried it on a few cpp files. To test it out, evaluate
the function below, open an rtags-enabled cpp file, and do

M-x rtags-fontify-region RET

You can also set it to be the default fontification function with in a cpp
file with

M-: (setq-local font-lock-fontify-region-function #'rtags-fontify-region) RET

However, it currently requires that the file be saved and indexed, or all
the offsets will be off (and hence fontification will be off), so it
doesn't really work to edit. There are probably other issues that I haven't
noticed.

(defun rtags-fontify-region (beg end loudly)
"Fontify the text between BEG and END based on rtags tokens.
If LOUDLY is non-nil, print status messages while fontifying.
This function is for use as `font-lock-fontify-region-function'."

;; This has issues with preprocessor...
;; especially ifdef'ed out sections
;; Also I haven't tested performance
;; Until it's saved, things don't work right...
(interactive (list (point-min) (point-max) t))
(let ((inhibit-point-motion-hooks t)) ;; <---
(with-silent-modifications ;; These two are (save-buffer-state ...)
(show "rtags-fontify-region")
(if (not (rtags-is-indexed))
;; This is probably the wrong choice...
(font-lock-default-fontify-region beg end loudly)
(let* ((tokens (rtags-tokens beg end)))
;; (show tokens)
(save-excursion
(save-restriction
(unless font-lock-dont-widen (widen))

          (font-lock-unfontify-region beg end)

          (when (and font-lock-syntactic-keywords
                     (null syntax-propertize-function))
            ;; Ensure the beginning of the file is properly syntactic-fontified.
            (let ((start beg))
              (when (< font-lock-syntactically-fontified start)
                (setq start (max font-lock-syntactically-fontified (point-min)))
                (setq font-lock-syntactically-fontified end))
              (font-lock-fontify-syntactic-keywords-region start end)))

          (dolist (token tokens)
            (let* ((kind (alist-get 'kind token))
                   (start (1+ (car token)))
                   (stop (+ start (alist-get 'length token))))
              ;; (show kind)
              (cond ((string-equal kind "Keyword")
                     ;; Some of these need to be types...
                     (let ((keyword (alist-get 'spelling token)))
                       (cond ((member keyword
                                      '("double" "float"
                                        "unsigned" "int" "char" "bool"
                                        "void"))
                              (put-text-property start stop 'face 'font-lock-type-face))
                             ((member keyword
                                      '("nullptr"))
                              (put-text-property start stop 'face 'font-lock-builtin-face))
                             (t
                              (put-text-property start stop 'face 'font-lock-keyword-face)))))

                    ((string-equal kind "Punctuation")
                     (remove-text-properties start stop
                                             '(syntax-table . nil)))

                    ((string-equal kind "Comment")
                     (put-text-property start stop 'face 'font-lock-comment-face))

                    ((string-equal kind "Identifier")
                     (let* ((sym (alist-get 'symbol token))
                            (kind (alist-get 'kind sym)))
                       (cond ((string-equal kind "inclusion directive")
                              (put-text-property start stop 'face 'font-lock-preprocessor-face))
                             ((string-equal kind "StructDecl")
                              (put-text-property start stop 'face 'font-lock-type-face))
                             ((string-equal kind "UsingDeclaration")
                              ;; ???
                              (put-text-property start stop 'face 'font-lock-type-face)
                              )
                             ((string-equal kind "VarDecl")
                              (put-text-property start stop 'face 'font-lock-variable-name-face))
                             ((string-equal kind "FieldDecl")
                              (put-text-property start stop 'face 'font-lock-variable-name-face))
                             ((string-equal kind "ParmDecl")
                              (put-text-property start stop 'face 'font-lock-variable-name-face))
                             ((string-equal kind "FunctionDecl")
                              (put-text-property start stop 'face 'font-lock-function-name-face))
                             ((string-equal kind "FunctionTemplate")
                              (put-text-property start stop 'face 'font-lock-function-name-face))
                             ((string-equal kind "TemplateTypeParameter")
                              (put-text-property start stop 'face 'font-lock-type-face))
                             ((string-equal kind "NonTypeTemplateParameter")
                              (put-text-property start stop 'face 'font-lock-variable-name-face))
                             ((string-equal kind "TemplateRef")
                              (put-text-property start stop 'face 'font-lock-type-face))

                             ((string-equal kind "MemberRefExpr")
                              ;; Usage of a member reference
                              )

                             ((string-equal kind "MemberRef")
                              ;; Usage of a member reference
                              )

                             ((string-equal kind "DeclRefExpr")
                              ;; Variable/function usage
                              ;; (put-text-property start stop 'face 'font-lock-variable-name-face)
                              )
                             ((string-equal kind "OverloadedDeclRef")
                              ;; Variable/function usage
                              ;; (put-text-property start stop 'face 'font-lock-variable-name-face)
                              )
                             ((string-equal kind "ClassTemplate")
                              ;; Variable/function usage
                              (put-text-property start stop 'face 'font-lock-type-face)
                              )
                             ((string-equal kind "CXXConstructor")
                              (put-text-property start stop 'face 'font-lock-function-name-face)
                              )
                             ((string-equal kind "TypeRef")
                              (put-text-property start stop 'face 'font-lock-type-face))
                             ((string-equal kind "NamespaceRef")
                              (put-text-property start stop 'face 'font-lock-builtin-face))

                             ((string-equal kind "Namespace")
                              (put-text-property start stop 'face 'font-lock-builtin-face))

                             ((string-equal kind "macro expansion")
                              (put-text-property start stop 'face 'font-lock-warning-face))

                             ((string-equal kind "macro definition")
                              (put-text-property start stop 'face 'font-lock-variable-name-face))

                             ((null kind)
                              ;; Not sure what this is, but it's X and Y in Test(X,Y)
                              ;; (put-text-property start stop 'face 'font-lock-variable-name-face))
                              )


                             (t
                              (show kind)
                              (show token)))))

                    ((string-equal kind "Literal")
                     ;; Stuff to handle strings..., numbers, etc.
                     (let* ((sym (alist-get 'symbol token))
                            (spelling (alist-get 'spelling token))
                            (type (alist-get 'type sym)))

                       (cond ((string-match "^\"" spelling)
                              (put-text-property start stop 'face 'font-lock-string-face))
                             ((string-match "\\(L\\|u8\\|u\\|U\\)?R[\"\"]\\([^(]*\\)("
                                            (alist-get 'spelling token))
                              (put-text-property start stop 'face 'font-lock-doc-face))
                             ((string-match "^'" spelling)
                              (put-text-property start stop 'face 'font-lock-string-face))

                             ((string-match "^[0-9'xa-f]+$" spelling)
                              ;; Nothing for
                              )

                             (t
                              (show type)
                              (show spelling)
                              (show token))))

                     (when (and nil
                                (string-match "\\(L\\|u8\\|u\\|U\\)?R[\"\"]\\([^(]*\\)("
                                              (alist-get 'spelling token)))
                       (let* ((full (match-string 0 (alist-get 'spelling token)))
                              (delimiter (match-string 2 (alist-get 'spelling token)))
                              (qualifier (match-string 1 (alist-get 'spelling token)))
                              ;; TODO: use (rtags-goto-offset) which handles the 1+ and multibyte
                              (beg-beg (1+ (car token))) ; Is this the right thing?  I think rtags is off by 1 from
                              (beg-end (+ beg-beg (length full)))
                              (end-len (+ 2 (length delimiter)))
                              (end-end (+ beg-beg (alist-get 'length token)))
                              (end-beg (- end-end end-len)))
                         (remove-text-properties
                          (car token)
                          (+ (car token) (alist-get 'length token))
                          '(syntax-table . nil))
                         (put-text-property beg-beg (+ beg-beg (length qualifier) 1)
                                            'syntax-table
                                            (string-to-syntax "'"))
                         (put-text-property (+ beg-beg (length qualifier) 1)
                                            (+ beg-beg (length qualifier) 2)
                                            'syntax-table
                                            ;; (string-to-syntax "\"")
                                            (string-to-syntax "|"))
                         (put-text-property (1- end-end)
                                            end-end
                                            'syntax-table
                                            (string-to-syntax "|"))
                         ;; Make " inside the raw string, not string quotes
                         (goto-char beg-end)
                         (while (search-forward-regexp "[\"\n\"]" end-beg t)
                           ;; (show (buffer-substring-no-properties (1- (point)) (point)))
                           (put-text-property (1- (point)) (point)
                                              'syntax-table
                                              (string-to-syntax " "))))))

                    (t (show kind)))))

          ;; Return the bounds of what was actually fontified..
          (cons 'jit-lock-bounds (cons beg end)))))))))


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#677 (comment)

@gvol
Copy link
Contributor

gvol commented May 27, 2016

Having tried it a bit, I don't think I would use it and it would take a bit of work to get around being saved/indexed. If we just fall back to cc-mode whenever someone edits the file, then it seems there's not much point. I really wouldn't want my font-lock to "flicker". I also found https://github.com/terranpro/clang-faces today, which likely gives a better start then what I have. It probably makes more sense to improve rdm so that it can output color_coded and clang-faces output. Or change them to use rtags formatted output.

@ludwigpacifici
Copy link

ludwigpacifici commented May 30, 2016

Hello, I found this discussion via reddit. I recently started Font Lock for "Modern C++".

I discovered this weekend rtags and I read this thread, however, I am not sure I understood everything, so feel free to correct me if I am mistaken.

As a developer, I expect the font locking of an editor to be simple and quick - I expect it to work out of the box (no need for extra third party, such as Clang in this context) and I don't want to wait 1 second (or more) to see a keyword highlighted. Also, I expect it to work wherever the source file is (on local or distant machine). These are my main specifications, which seem to be reasonable?

If the font lock relies on the Clang AST, most (maybe all) of my previous specifications are not satisfied:

  1. "work out of the box": I need to install Clang. On some computer I will not have right to install it or it could not be installed (old OS, missing dependencies, etc.).
  2. "don't want to wait 1 second": In previous my job, I had to deal with "big" files (more than 10k lines of code). I was using flycheck and it was lagging. It was alright because it is not essential for reading or writing code. But I guess the mechanism would be the same idea to use Clang AST for the font lock. Maybe rtags provide a quick solution for having the AST?
  3. "wherever the source file is": it is an open question, will I have the syntax highlight if Clang is not installed the distant machine?

Anyway, I had a look at Clang AST to give me a better idea and I notice some limitations (or maybe I am missing some flags in the command line): it provides processed data. Let's consider this snippet: int main () { int foo = 0b11; }, run clang++ -Xclang -ast-dump -fsyntax-only test.cc, then you will see IntegerLiteral 0x10387b388 <col:25> 'int' 3. In this example, you cannot font lock integer literals. Plus, how do you build/update this AST when the user is writing code? How to deal with noise from header inclusion (they do not help to font lock the current file)?

Maybe a better solution is to code an elisp C++ lexer and provide it when your package is installed from Melpa. However, parsing C++ is not an easy task.

I started Font Lock for "Modern C++", with these specifications in mind. It aims to font lock only the C++ language (which is well defined so no need to build an AST). The downside is that user defined functions, types, etc. Are not recognized.

This is were rtags can provide a very good complementary font lock: I guess, all user defined elements (namespaces, functions, variables, etc.) are known to rtags, so it would make sense to font lock them. It will give a good feedback to the client (my function is highlighted, I can do a lookup!).

I disable only rtags, I will see the highlight of the C++ code. I disable only modern-c++-font-lock, I will see the highlight of words I can lookup. Both enabled, I have a fully highlighted code.

The user wants less font lock from rtags? Via a setting, you can just highlight the word under the cursor if a lookup can be performed.

I could help you to provide font locking for symbols recognized by rtags.

Please, tell me what you think about this approach?

@Andersbakken
Copy link
Owner

I think this approach is pretty sensible.

Would you be interested in adding some support for using rtags when
available in your package? I have info about the ranges integer literals
etc too so I think everything should be possible if the delay can be hidden
by multiple modes of operation.

You can currently asynchronously query info about the source file in
question using (rtags-tokens &optional FROM TO CALLBACK)

If there's any API or piece of information you'd need I'd definitely be
willing to add it.

Anders

On Mon, May 30, 2016 at 3:08 AM, Ludwig PACIFICI notifications@github.com
wrote:

Hello, I found this discussion via reddit
https://www.reddit.com/r/emacs/comments/4l66pf/my_first_minormode_modern_c_fontlock_for_emacs/d3l5uyg.
I recently started Font Lock for "Modern C++"
https://github.com/ludwigpacifici/modern-cpp-font-lock.

I discovered this weekend rtags and I read this thread, however, I am not
sure I understood everything, so feel free to correct me if I am mistaken.

As a developer, I expect the font locking of an editor to be simple and
quick - I expect it to work out of the box (no need for extra third party,
such as Clang in this context) and I don't want to wait 1 second (or more)
to see a keyword highlighted. Also, I expect it to work wherever the source
file is (on local or distant machine). These are my main specifications,
which seem to be reasonable?

If the font lock relies on the Clang AST
http://clang.llvm.org/docs/IntroductionToTheClangAST.html, most (maybe
all) of my previous specifications are not satisfied:

  1. "work out of the box": I need to install Clang. On some computer I
    will not have right to install it or it could not be installed (old OS,
    missing dependencies, etc.).
  2. "don't want to wait 1 second": In previous my job, I had to deal
    with "big" files (more than 10k lines of code). I was using flycheck
    and it was lagging. It was alright because it is not essential for reading
    or writing code. But I guess the mechanism would be the same idea to use
    Clang AST for the font lock. Maybe rtags provide a quick solution for
    having the AST?
  3. "wherever the source file is": it is an open question, will I have
    the syntax highlight if Clang is not installed the distant machine?

Anyway, I had a look at Clang AST to give me a better idea and I notice
some limitations (or maybe I am missing some flags in the command line): it
provides processed data. Let's consider this snippet: int main () { int
foo = 0b11; }, run clang++ -Xclang -ast-dump -fsyntax-only test.cc, then
you will see IntegerLiteral 0x10387b388 col:25 'int' 3. In this
example, you cannot font lock integer literals
http://en.cppreference.com/w/cpp/language/integer_literal. Plus, how do
you build/update this AST when the user is writing code? How to deal with
noise from header inclusion (they do not help to font lock the current
file)?

Maybe a better solution is to code an elisp C++ lexer and provide it when
your package is installed from Melpa. However, parsing C++ is not an easy
task.

I started Font Lock for "Modern C++"
https://github.com/ludwigpacifici/modern-cpp-font-lock, with these
specifications in mind. It aims to font lock only the C++ language (which
is well defined so no need to build an AST). The downside is that user
defined functions, types, etc. Are not recognized.

This is were rtags can provide a very good complementary font lock: I
guess, all user defined elements (namespaces, functions, variables, etc.)
are known to rtags, so it would make sense to font lock them. It will
give a good feedback to the client (my function is highlighted, I can do a
lookup!).

I disable only rtags, I will see the highlight of the C++ code. I disable
only modern-c++-font-lock, I will see the highlight of words I can
lookup. Both enabled, I have a fully highlighted code.

The user wants less font lock from rtags? Via a setting, you can just
highlight the word under the cursor if a lookup can be performed.

I could help you to provide font locking recognized by rtags.

Please, tell me what you think about this approach?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#677 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AAEdSoerqtwAnyG7U-zTe0N9LUTXQb0Tks5qGrc0gaJpZM4Iajoa
.

@ludwigpacifici
Copy link

I was thinking to add the font lock for symbols that can be lookup via rtags.

For example: int foo() {/*...*/}

  1. int would be highlighted by c++-mode or modern-c++-font-lock - it makes no sense to lookup for int. It's a C++ language keyword.
  2. foo will be highlighted by rtags because it makes sense to lookup this symbol.

Does it seems logical?

@Andersbakken
Copy link
Owner

I will close this since I think all the machinery that RTags needs to provide already is there.

@ludwigpacifici Sorry about the long delay. I think that approach is sensible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants