Integrate PHP parser server #256

toddmazierski · 2017-10-18T18:37:47Z

Replace all the existing PHP parsing code with an integration of the PHP parser server.

There are some major differences between the old and new parsers worth noting:

The AST is now much deeper overall. For example, an else branch from a unit test case has gone from 4 to 8 levels of depth. As a result, the masses have increased across the board, invalidating all fingerprint values and necessitating a change to the DEFAULT_MASS_THRESHOLD to keep the number of issues roughly consistent.
The AST now includes comments nodes, which need to be filtered out to avoid having them contribute to analysis.
The AST now has the correct end position for functions (the closing curly brace).
The new name for use statement nodes is Stmt_Use.

Task: codeclimate/app#5898.

TODO

I've been doing a lot of testing with the Symfony project to calibrate the DEFAULT_MASS_THRESHOLD and find bugs. While this is being reviewed, I'm going to find and test with a few more.

toddmazierski · 2017-10-18T22:22:10Z

It looks like getting the threshold right is a little more art than science. I collected data from a handful of popular PHP respositories here for a few different mass thresholds in this spreadsheet. I'm thinking a value of 75 is the sweet spot, but please let me know if you have any thoughts.

wfleming · 2017-10-19T14:21:42Z

I'm asking Devon to review instead of me if he has time: he helped break ground on switching these languages to the parser, and I don't think I have time to effectively review this today.

dblandin · 2017-10-19T16:52:19Z

lib/cc/engine/analyzers/php/main.rb

          ].freeze
          POINTS_PER_OVERAGE = 100_000
+          REQUEST_PATH = "/php"
+          COMMENT_MATCHER = Sexp::Matcher.parse("(_ (comments ___) ___)")


Is is possible to add this to DEFAULT_FILTERS instead of deleting the nodes?

Unfortunately, it's not. That's the first thing I tried! Adding this to DEFAULT_FILTERS causes the entire function to be omitted from analysis.

Hmmm, what about (comments ___) as a filter? Would that match just the comment instead of the entire function?

It doesn't, please see the test output below.

I discussed this with @wfleming, too, and he said that while this is not entirely desirable behavior, this is how we originally intended filtering to work.

Failures: 1) CC::Engine::Analyzers::Php::Main#run comments ignores PHPDoc comments Failure/Error: expect(issues.length).to be > 0 expected: > 0 got: 0 # ./spec/cc/engine/analyzers/php/main_spec.rb:232:in `block (4 levels) in <top (required)>' # ./spec/spec_helper.rb:27:in `block (4 levels) in <top (required)>' # ./spec/spec_helper.rb:26:in `chdir' # ./spec/spec_helper.rb:26:in `block (3 levels) in <top (required)>' # ./spec/spec_helper.rb:23:in `block (2 levels) in <top (required)>' 2) CC::Engine::Analyzers::Php::Main#run comments ignores one-line comments Failure/Error: expect(issues.length).to be > 0 expected: > 0 got: 0 # ./spec/cc/engine/analyzers/php/main_spec.rb:266:in `block (4 levels) in <top (required)>' # ./spec/spec_helper.rb:27:in `block (4 levels) in <top (required)>' # ./spec/spec_helper.rb:26:in `chdir' # ./spec/spec_helper.rb:26:in `block (3 levels) in <top (required)>' # ./spec/spec_helper.rb:23:in `block (2 levels) in <top (required)>'

while this is not entirely desirable behavior, this is how we originally intended filtering to work

Well, it's how the author of the filtering intended it to work, clearly. I'm not sure it's how "we" as in the product team intended it to work. I honestly don't know if if was what we intended or it was miscommunicated requirements & not appropriately QAed.

dblandin · 2017-10-19T16:53:47Z

spec/cc/engine/analyzers/php/main_spec.rb

      })
-      expect(json["remediation_points"]).to eq(900_000)
+      expect(json["remediation_points"]).to eq(2_200_000)


Should we adjust our remediation point multiplier as well so that the same duplication issues are given a similar remediation estimate?

Ah, I didn't know there was a multiplier to adjust. I'll take a look at that. 👍

Please see here for response!

toddmazierski · 2017-10-19T19:21:50Z

@dblandin, I've adjusted the POINTS_PER_OVERAGE from 100_000 to 40_000, again, trying to strike a balance between a couple projects. It's extremely rough, but we're closer than with the previous value. The values within the test case are also much closer.

Symfony

Cronopio

260,400,000 total points
152 total issues
1,713,157 points per issue

This branch (40,000 points per overage)

318,800,000 total points (+58,400,000)
210 total issues (+58)
1,518,095 points per issue (-195,062)

Codeigniter

Cronopio

565,200,000 total points
184 total issues
3,071,739 points per issue

This branch (40,000 points per overage)

594,360,000 total points (-29,160,000)
148 total issues (-36)
4,015,945 points per issue (+944,206)

Replace all the existing PHP parsing code with an integration of the PHP parser server. There are some major differences between the old and new parsers worth noting: 1. The AST is now much deeper overall. For example, an `else` branch from a unit test case has gone from 4 to 8 levels of depth. As a result, the masses have increased across the board, invalidating all `fingerprint` values and necessitating a change to the `DEFAULT_MASS_THRESHOLD` to keep the number of issues roughly consistent. 2. The AST now includes `comments` nodes, which need to be filtered out to avoid having them contribute to analysis. 3. The AST now has the correct `end` position for functions (the closing curly brace). 4. The new name for `use` statement nodes is `Stmt_Use`. Task: codeclimate/app#5898. Adjust mass threshold to 75 (for now) Points

dblandin

lgtm!

toddmazierski · 2017-10-24T16:55:58Z

Hi! I've created a new tab on the spreadsheet I've been using for analysis to expand upon the POINTS_PER_OVERAGE determination started in this earlier comment. With a few new projects added, it seems like 35_000 points (from 40_000 in our first pass) may be a better choice. Please let me know what you think!

dblandin · 2017-10-24T17:47:30Z

35_000 sounds good to me 👍

Includes a `POINTS_PER_OVERAGE` adjustment to 35K which unblocks this change (please see #256). Reverts the following commits from #259 (when we reverted the integration): * Use `SexpLines` for PHP parser (ef0b926) * Revert "Integrate PHP parser server" (89e795a) Original PHP parser server integration commit: 95a6d4e.

toddmazierski requested a review from codeclimate-hermes October 18, 2017 18:37

codeclimate-hermes requested review from wilson and removed request for codeclimate-hermes October 18, 2017 18:37

toddmazierski requested review from codeclimate-hermes and removed request for wilson October 18, 2017 22:02

codeclimate-hermes requested review from chrishulton and removed request for codeclimate-hermes October 18, 2017 22:02

toddmazierski requested a review from wfleming October 18, 2017 22:24

wfleming requested review from dblandin and removed request for wfleming October 19, 2017 14:19

dblandin reviewed Oct 19, 2017

View reviewed changes

toddmazierski force-pushed the todd/use-php-parser branch from 23f109a to 5332b2a Compare October 19, 2017 19:20

toddmazierski force-pushed the todd/use-php-parser branch from 5332b2a to 26e244d Compare October 19, 2017 19:22

dblandin approved these changes Oct 19, 2017

View reviewed changes

toddmazierski merged commit 95a6d4e into channel/cronopio Oct 20, 2017

toddmazierski deleted the todd/use-php-parser branch October 20, 2017 13:52

toddmazierski mentioned this pull request Dec 5, 2017

Restore PHP parser server integration #292

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Integrate PHP parser server #256

Integrate PHP parser server #256

Uh oh!

toddmazierski commented Oct 18, 2017 •

edited

Loading

Uh oh!

toddmazierski commented Oct 18, 2017

Uh oh!

wfleming commented Oct 19, 2017

Uh oh!

dblandin Oct 19, 2017

Uh oh!

toddmazierski Oct 19, 2017

Uh oh!

dblandin Oct 19, 2017

Uh oh!

toddmazierski Oct 19, 2017

Uh oh!

wfleming Oct 19, 2017 •

edited

Loading

Uh oh!

dblandin Oct 19, 2017

Uh oh!

toddmazierski Oct 19, 2017

Uh oh!

toddmazierski Oct 19, 2017

Uh oh!

toddmazierski commented Oct 19, 2017

Uh oh!

dblandin left a comment

Uh oh!

toddmazierski commented Oct 24, 2017

Uh oh!

dblandin commented Oct 24, 2017

Uh oh!

Uh oh!

Integrate PHP parser server #256

Integrate PHP parser server #256

Uh oh!

Conversation

toddmazierski commented Oct 18, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

TODO

Uh oh!

toddmazierski commented Oct 18, 2017

Uh oh!

wfleming commented Oct 19, 2017

Uh oh!

dblandin Oct 19, 2017

Choose a reason for hiding this comment

Uh oh!

toddmazierski Oct 19, 2017

Choose a reason for hiding this comment

Uh oh!

dblandin Oct 19, 2017

Choose a reason for hiding this comment

Uh oh!

toddmazierski Oct 19, 2017

Choose a reason for hiding this comment

Uh oh!

wfleming Oct 19, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dblandin Oct 19, 2017

Choose a reason for hiding this comment

Uh oh!

toddmazierski Oct 19, 2017

Choose a reason for hiding this comment

Uh oh!

toddmazierski Oct 19, 2017

Choose a reason for hiding this comment

Uh oh!

toddmazierski commented Oct 19, 2017

Symfony

Cronopio

This branch (40,000 points per overage)

Codeigniter

Cronopio

This branch (40,000 points per overage)

Uh oh!

dblandin left a comment

Choose a reason for hiding this comment

Uh oh!

toddmazierski commented Oct 24, 2017

Uh oh!

dblandin commented Oct 24, 2017

Uh oh!

Uh oh!

toddmazierski commented Oct 18, 2017 •

edited

Loading

wfleming Oct 19, 2017 •

edited

Loading