Syntax highlighted diffs #3101

niik · 2017-10-19T16:48:06Z

Wouldn't it be nice with some syntax highlighting in your diffs?

The Problem

Syntax highlighting is a well-understood problem with tons of options. Atom uses TextMate grammars to do theirs but since we're already using CodeMirror I took a stab at implementing ours using that.

Syntax highlighted diffs have been a much appreciated feature of GitHub.com for a long time now and one that I have missed in GitHub Desktop for a long time. Highlighting in diffs presents some added complexity over that of highlighting in a normal source file though. Pretty much all languages are contextual, in that what happened on some line "higher up" affects what's going on further down. As such you can't just pull out a line from a diff and expect it to be highlighted properly. Here's a good example

Had we just tried to highlight individual lines here we wouldn't have been able to infer that the first line was part of a multi-line comment.

Instead, we have to take the contents of the file before the change, and the contents of it after and run highlighting on both versions. Once that's done we can stitch these together to form one syntax highlighted diff.

The Approach

When we are about to perform highlighting on a diff we start out by scanning through the diff to figure out which lines we need from which file. Context lines can be pulled from either version while added/removed lines obviously need to come from a particular version. If we find that a file consists entirely of additions or entirely of deletions we can optimize further by adding a preference for one of the versions and thus getting away with loading just one file.

Once we've got that settled we load the first 256kb from both versions (256kb picked arbitrarily because I figured it should cover the majority of source files while adding a very manageable memory overhead for the feature). We then pass this content, along with which lines we want to get tokens for to one or two web workers which then run the modes. CodeMirror modes are synchronous but running them in a web worker means we can get on with other things while we're tokenizing up to half a megabyte of content in up to two different threads (threads in javascript, what have we come to). It also means that we have a real nice containment of the highlighting process and that we can terminate it should it for some reason end up taking a very long time to complete.

When we get the results from the workers we apply our own custom CodeMirror mode which takes the tokens from the language modes and applies them inside of our diff. That means that there's a small window in between when users see the diff and when highlighting gets applied. In my testing it's barely noticeable and it means we can deliver what really matters (the diff) as quickly as we've done before.

The Performance.

So far I've been blown away by how well this performs, even on my 2015 12" macbook it's real snappy. We do a couple of optimizations behind the scenes and we keep web workers alive once we've fired them up to avoid the penalty of parsing the javascript of all the modes.

Languages currently supported

We can obviously support every mode that CodeMirror supports but for now I've opted to include these until we get a sense of how well this is working.

JavaScript, JSON, TypeScript, HTML, Markdown, Yaml, XML, Objective-C, Scala, Java, C, C++, sh/bash, Go, Perl, PHP, Python, Ruby.

Feel free to clone https://github.com/niik/highlighter-tests to test some of the modes

Future improvements

TextMate grammars seem to have a ton of traction. We might want to look at using them instead of CodeMirror modes in the future. The problem with them is that they use RegExes that aren't 100% compatible with JavaScript's RegEx engine so we'd have to bring in another tool for that. Atom uses the oniguruma library which is a C++ library. With our current implementation using web workers that's not an option for us since we can't use native modules inside of a web worker safely. There's a neat package that uses the JavaScript implementation of oniguruma but the author does call out that there are incompatibilities between the two. The most likely thing for us to do is run highlighting inside of a renderer instead.
If we decide to stick with the CodeMirror modes and we add a bunch of other languages to the point where spinning up a worker becomes an issue we could consider loading modes on-demand in the worker.
With the performance having improved so much since I initially went down the web worker route it's quite possible that we could move highlighting back into the renderer and manage concurrency by chunking up our tokenization work into animation frames.

Caveats

I've spotted a fairly common occurrence bug in the typescript mode which can cause the entire file after a particular combination of tokens to be classifies as a string. This has been fixed in CodeMirror/master but not yet released.
There's a very small race condition window in between when we get the diff from Git and when we read the file contents of a modified file in the working directory where the contents of the file could have changed. The results would be pretty benign with syntax being offset or wrong. We could add some logic to ensure that lines match up after we've tokenized them. I did that in the beginning but removed it due to the overhead it added in the token model. We could also generate diffs with infinite context and manually collapse them in code. That's a much riskier approach from a memory conservation standpoint but it would mean that we could support expanding context dynamically like dotcom does (though there are other ways to achieve that as well).

Fixes #1312

See https://stackoverflow.com/questions/43234525/awesome-typescript-loader-load-compile-only-refrenced-files

See microsoft/TypeScript#11917 (comment)

Also, move the disable comment in lib/globals.d.ts to the top of the file �

Degeneralize the ESLint-disabling comment in highlighter/globals.d.ts

niik · 2017-10-30T14:09:41Z

Is there anything I can do to help out here @shiftkey?

app/src/lib/highlighter/types.ts

+ * information contained within the ILineTokens interface.
+ */
+export interface IToken {
+  length: number


app/src/lib/git/show.ts

@@ -39,3 +40,24 @@ export async function getBlobContents(

  return Buffer.from(blobContents.stdout, 'binary')
 }
+
+export async function getPartialBlobContents(


app/src/lib/highlighter/worker.ts

+  // exists.
+  const worker =
+    highlightWorkers.shift() ||
+    new Worker(`file:///${__dirname}/highlighter.js`)


niik · 2017-10-30T21:36:19Z

📼

docs/technical/syntax-highlighting.md

+
+### I want to add my favorite language
+
+Cool! As long as it's a language that [CodeMirror supports out of the box](https://codemirror.net/mode/index.html) we should be able to make it work. Open an issue and we'll take it from there. It would be swell if you could also submit a PR with a sample file for the language to [niik/highlighter-tests](https://github.com/niik/highlighter-tests) (we'll find a better spot for this in the future).


docs/technical/syntax-highlighting.md

+
+When we are about to perform highlighting on a diff we start out by scanning through the diff to figure out which lines we need from which file. Context lines can be pulled from either version while added/removed lines obviously need to come from a particular version. If we find that a file consists entirely of additions or entirely of deletions we can optimize further by adding a preference for one of the versions and thus getting away with loading just one file.
+
+Once we've got that settled we load the first 256kb from both versions (256kb picked arbitrarily because I figured it should cover the majority of source files while adding a very manageable memory overhead for the feature). We then pass this content, along with which lines we want to get tokens for to one or two web workers which then run the modes. CodeMirror modes are synchronous but running them in a web worker means we can get on with other things while we're tokenizing up to half a megabyte of content in up to two different threads (threads in javascript, what have we come to). It also means that we have a real nice containment of the highlighting process and that we can terminate it should it for some reason end up taking a very long time to complete.


app/src/lib/highlighter/worker.ts

+    timeout = window.setTimeout(() => {
+      worker.terminate()
+      log.error('Highlighting worker timed out')
+      reject(resolve({}))


app/src/lib/highlighter/types.ts

+   * stream to count columns. See CodeMirror's StringStream
+   * class for more details.
+   */
+  tabSize: number


app/src/lib/git/spawn.ts

-      }
-    })
-  })
+          if (exitCodes.has(code) || signal) {


app/src/ui/diff/index.tsx

+      0,
+      MaxHighlightContentLength - 1
+    )
+  } else if (file instanceof CommittedFileChange) {


niik · 2017-10-31T01:28:05Z

🍸

shiftkey · 2017-10-31T04:06:11Z

xt0rted · 2017-11-07T06:50:51Z

The release notes have #1312 as the issue/pr for the feature instead of this or #2550

shiftkey · 2017-11-07T08:27:54Z

@xt0rted yeah, this was generated by what-the-changelog - I think what happened here was that it chose the only linked issue (which was #1312) as we didn't link to the original issue.

nyssance · 2017-11-08T03:54:36Z

Need a preference to open/close the syntax, it's in a daze.

j-f1 · 2017-11-08T10:51:29Z

@nypisces Can you open a new issue and give us a few more details about your feature request there?

niik added 30 commits October 13, 2017 17:18

Be a little bit more conservative

7fdb158

Hella early syntax highlighting

ef900d9

Preserve commit sha for committed files

39e7cdd

Parse committed files as well

50251c3

🎨 cleanup

472ecd0

🎨 cleanup

3e1060b

Well, that didn't help

1cfb3d6

Don't typecheck all files, just the referenced ones

233b6b3

See https://stackoverflow.com/questions/43234525/awesome-typescript-loader-load-compile-only-refrenced-files

Don't auto-include all types

ce712a5

See microsoft/TypeScript#11917 (comment)

First attempt at a web worker highlighter

baeb942

Add some more modes, lookup by extension

796b476

Move execution logging into GitPerf for reuse

75187f3

🎨 Cleanup

5d066b7

Use GitPerf.measure in spawnAndComplete

dc7c3c1

throw errors and don't auto close in worker

84f6a23

First attempt at using worker for tokenization

f7e9184

Add some css highlighting

0d3ec80

Fix lookahead

3fb80cc

Fix broken markdown mode

5cfba2f

Keep workers around to serve more requests

907bcec

whoops, need these as well

9f253ba

🔥 logging

fc05520

Bail out when the world changes from underneath us

01767eb

Don't need this any more 🎉

6ae8152

Time out long running highlight threads

9b3b790

perf logging

b388000

Merge branch 'master' into shd

2bb0a0c

Bump timeout

340ceaa

Support yaml highlighting

fd26b8a

Happy path for empty content and don't expose worker

e38f118

niik and others added 2 commits October 26, 2017 13:53

You're off to a real bad start eslint

f660381

Degeneralize the ESLint-disabling comment in highlighter/globals.d.ts

5dd6993

Also, move the disable comment in lib/globals.d.ts to the top of the file �

j-f1 mentioned this pull request Oct 26, 2017

Degeneralize the ESLint-disabling comment in highlighter/globals.d.ts #3166

Merged

Merge pull request #3166 from j-f1/na-na-i-can't-hear-you

706bead

Degeneralize the ESLint-disabling comment in highlighter/globals.d.ts

joshaber reviewed Oct 30, 2017

View reviewed changes

app/src/lib/highlighter/types.ts Outdated

* information contained within the ILineTokens interface.

*/

export interface IToken {

length: number

This comment was marked as spam.

Sign in to view

joshaber reviewed Oct 30, 2017

View reviewed changes

app/src/lib/git/show.ts Outdated

@@ -39,3 +40,24 @@ export async function getBlobContents(

return Buffer.from(blobContents.stdout, 'binary')

}

export async function getPartialBlobContents(

This comment was marked as spam.

Sign in to view

joshaber reviewed Oct 30, 2017

View reviewed changes

app/src/lib/highlighter/worker.ts Outdated

// exists.

const worker =

highlightWorkers.shift() ||

new Worker(`file:///${__dirname}/highlighter.js`)

This comment was marked as spam.

Sign in to view

This comment was marked as spam.

Sign in to view

niik added 4 commits October 30, 2017 22:21

Merge branch 'master' into shd

26f7cdc

📖

636644d

These never have to change

4222c64

Safe uri to the worker

d4aa27d

shiftkey suggested changes Oct 31, 2017

View reviewed changes

niik added 3 commits October 31, 2017 02:20

trololol

e267607

readonly

e8a6cf6

Move this repo into the desktop org

e826814

shiftkey approved these changes Oct 31, 2017

View reviewed changes

shiftkey merged commit 453aa73 into master Oct 31, 2017

shiftkey deleted the shd branch October 31, 2017 04:09

shiftkey mentioned this pull request Oct 31, 2017

Feature request: syntax highlighting for diffs #2550

Closed

shiftkey mentioned this pull request Nov 7, 2017

use the PR associated with the feature, not the minor issue #3248

Closed

niik mentioned this pull request May 23, 2018

Dynamic importing of CodeMirror modes at runtime #4764

Merged

niik mentioned this pull request Jan 24, 2019

Text diff rendering improvements #6707

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Syntax highlighted diffs #3101

Syntax highlighted diffs #3101

niik commented Oct 19, 2017 •

edited

niik commented Oct 30, 2017

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

niik commented Oct 30, 2017

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

niik commented Oct 31, 2017

shiftkey commented Oct 31, 2017 •

edited

xt0rted commented Nov 7, 2017

shiftkey commented Nov 7, 2017 •

edited

nyssance commented Nov 8, 2017

j-f1 commented Nov 8, 2017


		### I want to add my favorite language

		Cool! As long as it's a language that [CodeMirror supports out of the box](https://codemirror.net/mode/index.html) we should be able to make it work. Open an issue and we'll take it from there. It would be swell if you could also submit a PR with a sample file for the language to [niik/highlighter-tests](https://github.com/niik/highlighter-tests) (we'll find a better spot for this in the future).

Syntax highlighted diffs #3101

Syntax highlighted diffs #3101

Conversation

niik commented Oct 19, 2017 • edited

The Problem

The Approach

The Performance.

Languages currently supported

Future improvements

Caveats

niik commented Oct 30, 2017

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

niik commented Oct 30, 2017

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

niik commented Oct 31, 2017

shiftkey commented Oct 31, 2017 • edited

xt0rted commented Nov 7, 2017

shiftkey commented Nov 7, 2017 • edited

nyssance commented Nov 8, 2017

j-f1 commented Nov 8, 2017

niik commented Oct 19, 2017 •

edited

shiftkey commented Oct 31, 2017 •

edited

shiftkey commented Nov 7, 2017 •

edited