Skip to content
This repository was archived by the owner on Jul 19, 2025. It is now read-only.

Integrate PHP parser server #256

Merged
merged 1 commit into from
Oct 20, 2017
Merged

Conversation

toddmazierski
Copy link
Contributor

@toddmazierski toddmazierski commented Oct 18, 2017

Replace all the existing PHP parsing code with an integration of the PHP parser server.

There are some major differences between the old and new parsers worth noting:

  1. The AST is now much deeper overall. For example, an else branch from a unit test case has gone from 4 to 8 levels of depth. As a result, the masses have increased across the board, invalidating all fingerprint values and necessitating a change to the DEFAULT_MASS_THRESHOLD to keep the number of issues roughly consistent.

  2. The AST now includes comments nodes, which need to be filtered out to avoid having them contribute to analysis.

  3. The AST now has the correct end position for functions (the closing curly brace).

  4. The new name for use statement nodes is Stmt_Use.

Task: codeclimate/app#5898.

TODO

  • I've been doing a lot of testing with the Symfony project to calibrate the DEFAULT_MASS_THRESHOLD and find bugs. While this is being reviewed, I'm going to find and test with a few more.

@codeclimate-hermes codeclimate-hermes requested review from wilson and removed request for codeclimate-hermes October 18, 2017 18:37
@toddmazierski toddmazierski requested review from codeclimate-hermes and removed request for wilson October 18, 2017 22:02
@codeclimate-hermes codeclimate-hermes requested review from chrishulton and removed request for codeclimate-hermes October 18, 2017 22:02
@toddmazierski
Copy link
Contributor Author

It looks like getting the threshold right is a little more art than science. I collected data from a handful of popular PHP respositories here for a few different mass thresholds in this spreadsheet. I'm thinking a value of 75 is the sweet spot, but please let me know if you have any thoughts.

screen shot 2017-10-18 at 6 21 17 pm

@wfleming wfleming requested review from dblandin and removed request for wfleming October 19, 2017 14:19
@wfleming
Copy link
Contributor

I'm asking Devon to review instead of me if he has time: he helped break ground on switching these languages to the parser, and I don't think I have time to effectively review this today.

].freeze
POINTS_PER_OVERAGE = 100_000
REQUEST_PATH = "/php"
COMMENT_MATCHER = Sexp::Matcher.parse("(_ (comments ___) ___)")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is is possible to add this to DEFAULT_FILTERS instead of deleting the nodes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, it's not. That's the first thing I tried! Adding this to DEFAULT_FILTERS causes the entire function to be omitted from analysis.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, what about (comments ___) as a filter? Would that match just the comment instead of the entire function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't, please see the test output below.

I discussed this with @wfleming, too, and he said that while this is not entirely desirable behavior, this is how we originally intended filtering to work.


Failures:

  1) CC::Engine::Analyzers::Php::Main#run comments ignores PHPDoc comments
     Failure/Error: expect(issues.length).to be > 0
       expected: > 0
            got:   0
     # ./spec/cc/engine/analyzers/php/main_spec.rb:232:in `block (4 levels) in <top (required)>'
     # ./spec/spec_helper.rb:27:in `block (4 levels) in <top (required)>'
     # ./spec/spec_helper.rb:26:in `chdir'
     # ./spec/spec_helper.rb:26:in `block (3 levels) in <top (required)>'
     # ./spec/spec_helper.rb:23:in `block (2 levels) in <top (required)>'

  2) CC::Engine::Analyzers::Php::Main#run comments ignores one-line comments
     Failure/Error: expect(issues.length).to be > 0
       expected: > 0
            got:   0
     # ./spec/cc/engine/analyzers/php/main_spec.rb:266:in `block (4 levels) in <top (required)>'
     # ./spec/spec_helper.rb:27:in `block (4 levels) in <top (required)>'
     # ./spec/spec_helper.rb:26:in `chdir'
     # ./spec/spec_helper.rb:26:in `block (3 levels) in <top (required)>'
     # ./spec/spec_helper.rb:23:in `block (2 levels) in <top (required)>'

Copy link
Contributor

@wfleming wfleming Oct 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while this is not entirely desirable behavior, this is how we originally intended filtering to work

Well, it's how the author of the filtering intended it to work, clearly. I'm not sure it's how "we" as in the product team intended it to work. I honestly don't know if if was what we intended or it was miscommunicated requirements & not appropriately QAed.

})
expect(json["remediation_points"]).to eq(900_000)
expect(json["remediation_points"]).to eq(2_200_000)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we adjust our remediation point multiplier as well so that the same duplication issues are given a similar remediation estimate?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I didn't know there was a multiplier to adjust. I'll take a look at that. 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see here for response!

@toddmazierski
Copy link
Contributor Author

@dblandin, I've adjusted the POINTS_PER_OVERAGE from 100_000 to 40_000, again, trying to strike a balance between a couple projects. It's extremely rough, but we're closer than with the previous value. The values within the test case are also much closer.

Symfony

Cronopio

260,400,000 total points
152 total issues
1,713,157 points per issue

This branch (40,000 points per overage)

318,800,000 total points (+58,400,000)
210 total issues (+58)
1,518,095 points per issue (-195,062)

Codeigniter

Cronopio

565,200,000 total points
184 total issues
3,071,739 points per issue

This branch (40,000 points per overage)

594,360,000 total points (-29,160,000)
148 total issues (-36)
4,015,945 points per issue (+944,206)

Replace all the existing PHP parsing code with an integration of the PHP
parser server.

There are some major differences between the old and new parsers worth
noting:

1. The AST is now much deeper overall. For example, an `else` branch
   from a unit test case has gone from 4 to 8 levels of depth. As a
   result, the masses have increased across the board, invalidating all
   `fingerprint` values and necessitating a change to the
   `DEFAULT_MASS_THRESHOLD` to keep the number of issues roughly
   consistent.

2. The AST now includes `comments` nodes, which need to be filtered out
   to avoid having them contribute to analysis.

3. The AST now has the correct `end` position for functions (the closing
   curly brace).

4. The new name for `use` statement nodes is `Stmt_Use`.

Task: codeclimate/app#5898.

Adjust mass threshold to 75 (for now)

Points
Copy link
Contributor

@dblandin dblandin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@toddmazierski toddmazierski merged commit 95a6d4e into channel/cronopio Oct 20, 2017
@toddmazierski toddmazierski deleted the todd/use-php-parser branch October 20, 2017 13:52
@toddmazierski
Copy link
Contributor Author

Hi! I've created a new tab on the spreadsheet I've been using for analysis to expand upon the POINTS_PER_OVERAGE determination started in this earlier comment. With a few new projects added, it seems like 35_000 points (from 40_000 in our first pass) may be a better choice. Please let me know what you think!

@dblandin
Copy link
Contributor

35_000 sounds good to me 👍

toddmazierski added a commit that referenced this pull request Dec 5, 2017
Includes a `POINTS_PER_OVERAGE` adjustment to 35K which unblocks this
change (please see #256).

Reverts the following commits from #259 (when we reverted the
integration):

* Use `SexpLines` for PHP parser (ef0b926)
* Revert "Integrate PHP parser server" (89e795a)

Original PHP parser server integration commit: 95a6d4e.
toddmazierski added a commit that referenced this pull request Dec 5, 2017
Includes a `POINTS_PER_OVERAGE` adjustment to 35K which unblocks this
change (please see #256).

Reverts the following commits from #259 (when we reverted the
integration):

* Use `SexpLines` for PHP parser (ef0b926)
* Revert "Integrate PHP parser server" (89e795a)

Original PHP parser server integration commit: 95a6d4e.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants