Skip to content
This repository was archived by the owner on Jul 19, 2025. It is now read-only.

Added Sexp.flatter and use that to optimize non-ruby analysis. #164

Closed
wants to merge 7 commits into from
Closed

Added Sexp.flatter and use that to optimize non-ruby analysis. #164

wants to merge 7 commits into from

Conversation

zenspider
Copy link
Contributor

Specifically, for a fairly normal sized javascript file, the parsed
and converted sexp had a mass of 44381 and a depth of 191 (and in
would get a "stack level too deep" error calculating the depth). This
is because every key/value pair in the json became 2 levels of tree
when converted to a sexp.

Now, calling sexp.flatter before passing to flay converts that same
tree to have a mass of 22965 and a depth of 91, all while being
isomorphic to the original.

This takes the time to process that file from 27 seconds to 7. And the
project I was benchmarking went from timing out at 10 minutes to
finishing in 3.

Specifically, for a fairly normal sized javascript file, the parsed
and converted sexp had a mass of 44381 and a depth of 191 (and in
would get a "stack level too deep" error calculating the depth). This
is because every key/value pair in the json became 2 levels of tree
when converted to a sexp.

Now, calling `sexp.flatter` before passing to flay converts that same
tree to have a mass of 22965 and a depth of 91, all while being
isomorphic to the original.

This takes the time to process that file from 27 seconds to 7. And the
project I was benchmarking went from timing out at 10 minutes to
finishing in 3.
Alpine doesn't have a package for this and we need to pin to a version
Copy link
Contributor

@wfleming wfleming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome! Great optimization.

Once concern I have is that if this is changing the mass of JS expressions, it could dramatically change what expressions actually get reported as issues based on the configured mass threshold. Is that accurate? If so, I think we need to tread carefully. It wouldn't be too bad if we only had to think about changing the default MASS_THRESHOLD in Javascript::Main, but the mass threshold is configurable in user's .codeclimate.yml as well, which complicates things. Remediation points also affect the grades we assign to files/repos, so those suddenly shifting is going to affect grades.

I can see that the relationship between a flattened sexp & the original doesn't look like a simple predictable multiplication or anything: it varies based on actual sexp content, right? How feasible would it be to track the "original" mass of a node in a new attr during the flattening process, and then use those orginal mass values for the actual filtering/remediation point calculation?

It would also be nice to make RuboCop happy.

@maxjacobson
Copy link
Contributor

⚠️ just want to mention, we shouldn't merge this until we fix the bug on master. Or we should revert #161, which isn't currently deployed. Some context in case you missed it: (slack1) (slack2). Not sure who's owning fixing that, but want to make sure we don't accidentally re-ship it.

@zenspider
Copy link
Contributor Author

@wfleming the actual structure is isomorphic... the mass is changing because s().mass == 1 and those are getting folded. The current default threshold for javascript is 40 (way more than ruby's 16) which might be high, but I don't have enough data on real world javascript to generate meaningful statistics... It shouldn't matter too much regardless, because that threshold is just the farthest down we recurse, but all those structures are still getting analyzed via their parent nodes... so structural similarities of mass 40+ will still get reported properly... and that hasn't shifted that much given the flatter call...

Happy to change the default if someone wants to run some stats (or come up with something arbitrary... I don't have a vested interest in a number).

@zenspider
Copy link
Contributor Author

@maxjacobson this is addressing #161, at least partially... on the javascript side at least, some/most of the errors were timeouts. I don't have good data for PHP (don't know how popular it is on your service)

Added git. Needed by composer and/or bundler.
Fixed up variables to work with alpine-node recipe.
Fixed up `cd` sections so they're in a subshell and return to original pwd afterward.
Merged bundler into alpine-node recipe becauuse it needs the compiler tooling.

This also has to juggle where the workdir is in order to properly
clean up after itself. As such, there's some extra WORKDIR commands
and some slightly confusing subshells to go back to /usr/src/app.
@@ -10,13 +10,53 @@ COPY vendor/php-parser/composer.lock /usr/src/app/vendor/php-parser/

COPY package.json /usr/src/app/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to move copy all of the lock files and package files below the update and install steps? Otherwise, bumping gems will invalidate all of these layers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way the Dockerfile is currently written/architected, no. Bundle is part of the big build in order to get the package deletions to work.

We can if we switch to a bunch of RUN lines and use that --squash command line option... but that needs experienced eyeballs on it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bummer. I might take a swing at that another time then

Dockerfile Outdated
@@ -26,6 +66,9 @@ RUN adduser -u 9000 -D -h /usr/src/app -s /bin/false app
COPY . /usr/src/app
RUN chown -R app:app /usr/src/app

# ENV PURE_HASH=1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should either uncomment or remove these

lib/ccflay.rb Outdated

class Sexp
def flatter
r = dup.clear
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These variable names aren't very clear. Can you expand them to be more descriptive?

expect(json["other_locations"]).to eq([
{"path" => "foo.py", "lines" => { "begin" => 2, "end" => 2} },
{"path" => "foo.py", "lines" => { "begin" => 3, "end" => 3} },
])
expect(json["content"]["body"]).to match /This issue has a mass of 6/
expect(json["fingerprint"]).to eq("019118ceed60bf40b35aad581aae1b02")
expect(json["content"]["body"]).to match(/This issue has a mass of 5/)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What caused these masses to change?

expect(json["content"]["body"]).to match /This issue has a mass of 6/
expect(json["fingerprint"]).to eq("019118ceed60bf40b35aad581aae1b02")
expect(json["content"]["body"]).to match(/This issue has a mass of 5/)
expect(json["fingerprint"]).to eq("8941b71bb75571fca80cca37a3d23dc1")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question for these fingerprints - we need to be especially careful about changing fingerprints

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fingerprints include mass, so I believe this is a result of the masses changing. I tried to cover what I think is the customer impact of that a bit here, though I was more preoccupied with the mass thresholds changing what's considered an issue: #164 (review).

@wfleming
Copy link
Contributor

so structural similarities of mass 40+ will still get reported properly... and that hasn't shifted that much given the flatter call...

The trouble is that what nodes have a mass of 40+ will change. Instances of duplication that would be reported as issues by CC today would no longer be reported as issues, and effectively invalidate existing customer configurations.

I constructed a concrete example JS file, inline below demonstrating this: With default settings on master today, the engine reports these two functions as similar and a mass of 42. Run using this branch, the engine finds no issues with the default config, but if I change the config to a lower mass threshold, it once again reports these two functions as similar with mass of 38.

Similar to Gordon's comment about needing to be careful about changing fingerprints, we need to be careful about changing the actual results users will receive.

function a() {
  var a = 0, b = 1;
  console.log("foo");
  if (1 + 1 - 5) {
    console.log("foo");
    console.log("foo");
  }
}

function b() {
  var a = 0, b = 1;
  console.log("foo");
  if (1 + 1 - 5) {
    console.log("foo");
    console.log("foo");
  }
}

This fixes fingerprints and mass threshold issues, but lies about
mass... *shrug* Seems OK.
This allows ruby to be processed as-is, and everything else to be
flatter (as necessary, I have no data on python, for example, but I'm
flattening it for now because it is coming through the json bridge).
@gdiggs
Copy link
Contributor

gdiggs commented Mar 21, 2017

Closing this in favor of #166 and #167

@gdiggs gdiggs closed this Mar 21, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants