Prioritize file type matching over first line matching in grammar scoring system #9513

Ingramz · 2015-11-10T19:24:36Z

This is a pull request version of #2734 (comment) in order to work together on solving the currently unfair grammar scoring system. The main problem with the current system is that it gives a higher score to HTML grammar for any file that begins with HTML doctype, but the file might be using PHP or some other (templating) language that utilizes HTML syntax (with a different file type, eg php).

@nathansobo suggested adding an option to fall back to using the old system in case the new one breaks something, but I am not sure how to do that exactly. Also another question would be whether this transition period is needed at all, because all it "breaks" right now is grammar selection for files, for which there is a known file type, but the first line is matched from some other grammar. A fix for that case would be just adding the file type via user configuration, to override the known type from the other language.

nathansobo · 2015-11-10T20:39:22Z

@Ingramz Thanks. I buy your argument that the existing override structure is enough. Let me review this...

nathansobo · 2015-11-10T20:43:35Z

@Ingramz If you look at Travis, It looks like some tests related to the change code are failing in core specs. It looks like we're asserting exactly the opposite of the new behavior, so you'll need to update that. Also, you should add a test that asserts that the path-based selection wins over the first line regexp.

nathansobo · 2015-11-10T20:45:29Z

src/grammar-registry.coffee

    bestMatch

  # Extended: Returns a {Number} representing how well the grammar matches the
  # `filePath` and `contents`.
  getGrammarScore: (grammar, filePath, contents) ->
+    return Infinity if @grammarOverrideForPath(filePath) is grammar.scopeName


This is good.

nathansobo · 2015-11-10T20:48:51Z

You had commented with a long description of this logic elsewhere. Could you transfer that into the body of this PR?

Ingramz · 2015-11-10T20:51:07Z

From the other post.

This will return Infinity as a score for the overridden grammar, which in my opinion is a little more elegant than 2 + length of file path.

The path scoring system is already good as-is. Longer match will get a higher score, plus 0.5 bias for user defined file types. I find it very clever that the user can select between two grammars by adding the file type in question to the user defined list, even if it already was defined in grammar so one of the two grammars always gets priority with the same matched longest file type.

Also there is a bias of 0.25 so that a non-bundle package would win over a bundled package, if their matches were the same length (ergo the file types matched are the same), this is the same as the code that was on lines 39/40. However one thing to note - if we get -1 as a file type score for all grammars and there is a non-bundle grammar, then that would win over all bundled grammars with a score of -0.75, it seems that this oddity exists also in original code. I think that this bias shouldn't be applied if the path score itself is -1. <- update: this has been fixed in PR

Finally a bias for the case where the first line of the file matches a file, which is currently the weakest - 0.125. This should help in case the two grammars were equal in the previous cases. Setting this with the smallest bias makes more sense to me as it should be used only as a last resort, which works really well for files which the type is unknown.

nathansobo · 2015-11-10T20:57:05Z

Cool, thanks for the explanation. @maxbrunsfeld do you think you could take a look at this too? Maybe we could screen share and just try to think through the implications of changing this and be sure I'm not missing anything.

Ingramz · 2015-11-10T21:06:28Z

The numbers come from the sequence 1/pow(2, n), for which the sum for n=1...infinity is 1, so if we keep going smaller, we hopefully never reach 1 (so let's make sure we're not summing all of the possible terms in that sequence). This way the small increments never override the file type's priority. This also applies recursively, sum for n=2..infinity is 1/2 etc, so all increments smaller than 0.5 will also stay smaller than 0.5, which helps with the priority.

nathansobo · 2015-11-10T21:18:07Z

Cool deal about the numbers. Very nice thought process. Now just I need to see tests of new functionality and a green build.

Fix activatePackages

Ingramz · 2015-11-11T07:38:26Z

@nathansobo seems like the tests are passing now.

nathansobo · 2015-11-12T23:12:25Z

Okay. I'm a bit nervous that I'm not thinking through some implication but this is why we have a beta phase. Merging. Thanks @Ingramz for the thoughtful implementation here. Awesome work.

Prioritize file type matching over first line matching in grammar scoring system

aki77 and others added 3 commits November 8, 2015 23:14

Fix activatePackages

396e8a4

Prioritize file type matching over first line matching

49f391f

Only allow bundle bias if a file type was matched

66255ba

nathansobo reviewed Nov 10, 2015
View reviewed changes

kevinsawicki and others added 6 commits November 10, 2015 13:29

⬆️ language-perl@0.31

0778860

Merge pull request #9490 from aki77/fix/activatePackages

91bece4

Fix activatePackages

Prioritize file type matching over first line matching

1239ba5

Only allow bundle bias if a file type was matched

dc4d6ad

Update spec

6520240

Merge remote-tracking branch 'origin/grammar-scores' into grammar-scores

18c2b6a

nathansobo pushed a commit that referenced this pull request Nov 12, 2015

Merge pull request #9513 from Ingramz/grammar-scores

df0671c

Prioritize file type matching over first line matching in grammar scoring system

nathansobo merged commit df0671c into atom:master Nov 12, 2015

Ingramz mentioned this pull request Apr 25, 2016

XML plists opened as Old-Style atom/language-property-list#5

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prioritize file type matching over first line matching in grammar scoring system #9513

Prioritize file type matching over first line matching in grammar scoring system #9513

Ingramz commented Nov 10, 2015

nathansobo commented Nov 10, 2015

nathansobo commented Nov 10, 2015

nathansobo Nov 10, 2015

nathansobo commented Nov 10, 2015

Ingramz commented Nov 10, 2015

nathansobo commented Nov 10, 2015

Ingramz commented Nov 10, 2015

nathansobo commented Nov 10, 2015

Ingramz commented Nov 11, 2015

nathansobo commented Nov 12, 2015

Prioritize file type matching over first line matching in grammar scoring system #9513

Prioritize file type matching over first line matching in grammar scoring system #9513

Conversation

Ingramz commented Nov 10, 2015

nathansobo commented Nov 10, 2015

nathansobo commented Nov 10, 2015

nathansobo Nov 10, 2015

Choose a reason for hiding this comment

nathansobo commented Nov 10, 2015

Ingramz commented Nov 10, 2015

nathansobo commented Nov 10, 2015

Ingramz commented Nov 10, 2015

nathansobo commented Nov 10, 2015

Ingramz commented Nov 11, 2015

nathansobo commented Nov 12, 2015