-
Notifications
You must be signed in to change notification settings - Fork 24
Added Sexp.flatter and use that to optimize non-ruby analysis. #164
Added Sexp.flatter and use that to optimize non-ruby analysis. #164
Conversation
Specifically, for a fairly normal sized javascript file, the parsed and converted sexp had a mass of 44381 and a depth of 191 (and in would get a "stack level too deep" error calculating the depth). This is because every key/value pair in the json became 2 levels of tree when converted to a sexp. Now, calling `sexp.flatter` before passing to flay converts that same tree to have a mass of 22965 and a depth of 91, all while being isomorphic to the original. This takes the time to process that file from 27 seconds to 7. And the project I was benchmarking went from timing out at 10 minutes to finishing in 3.
Alpine doesn't have a package for this and we need to pin to a version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks awesome! Great optimization.
Once concern I have is that if this is changing the mass of JS expressions, it could dramatically change what expressions actually get reported as issues based on the configured mass threshold. Is that accurate? If so, I think we need to tread carefully. It wouldn't be too bad if we only had to think about changing the default MASS_THRESHOLD
in Javascript::Main
, but the mass threshold is configurable in user's .codeclimate.yml
as well, which complicates things. Remediation points also affect the grades we assign to files/repos, so those suddenly shifting is going to affect grades.
I can see that the relationship between a flattened sexp & the original doesn't look like a simple predictable multiplication or anything: it varies based on actual sexp content, right? How feasible would it be to track the "original" mass of a node in a new attr during the flattening process, and then use those orginal mass values for the actual filtering/remediation point calculation?
It would also be nice to make RuboCop happy.
@wfleming the actual structure is isomorphic... the mass is changing because Happy to change the default if someone wants to run some stats (or come up with something arbitrary... I don't have a vested interest in a number). |
@maxjacobson this is addressing #161, at least partially... on the javascript side at least, some/most of the errors were timeouts. I don't have good data for PHP (don't know how popular it is on your service) |
Added git. Needed by composer and/or bundler. Fixed up variables to work with alpine-node recipe. Fixed up `cd` sections so they're in a subshell and return to original pwd afterward. Merged bundler into alpine-node recipe becauuse it needs the compiler tooling. This also has to juggle where the workdir is in order to properly clean up after itself. As such, there's some extra WORKDIR commands and some slightly confusing subshells to go back to /usr/src/app.
@@ -10,13 +10,53 @@ COPY vendor/php-parser/composer.lock /usr/src/app/vendor/php-parser/ | |||
|
|||
COPY package.json /usr/src/app/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to move copy all of the lock files and package files below the update and install steps? Otherwise, bumping gems will invalidate all of these layers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way the Dockerfile is currently written/architected, no. Bundle is part of the big build in order to get the package deletions to work.
We can if we switch to a bunch of RUN
lines and use that --squash
command line option... but that needs experienced eyeballs on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bummer. I might take a swing at that another time then
Dockerfile
Outdated
@@ -26,6 +66,9 @@ RUN adduser -u 9000 -D -h /usr/src/app -s /bin/false app | |||
COPY . /usr/src/app | |||
RUN chown -R app:app /usr/src/app | |||
|
|||
# ENV PURE_HASH=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should either uncomment or remove these
lib/ccflay.rb
Outdated
|
||
class Sexp | ||
def flatter | ||
r = dup.clear |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These variable names aren't very clear. Can you expand them to be more descriptive?
expect(json["other_locations"]).to eq([ | ||
{"path" => "foo.py", "lines" => { "begin" => 2, "end" => 2} }, | ||
{"path" => "foo.py", "lines" => { "begin" => 3, "end" => 3} }, | ||
]) | ||
expect(json["content"]["body"]).to match /This issue has a mass of 6/ | ||
expect(json["fingerprint"]).to eq("019118ceed60bf40b35aad581aae1b02") | ||
expect(json["content"]["body"]).to match(/This issue has a mass of 5/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What caused these masses to change?
expect(json["content"]["body"]).to match /This issue has a mass of 6/ | ||
expect(json["fingerprint"]).to eq("019118ceed60bf40b35aad581aae1b02") | ||
expect(json["content"]["body"]).to match(/This issue has a mass of 5/) | ||
expect(json["fingerprint"]).to eq("8941b71bb75571fca80cca37a3d23dc1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question for these fingerprints - we need to be especially careful about changing fingerprints
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fingerprints include mass, so I believe this is a result of the masses changing. I tried to cover what I think is the customer impact of that a bit here, though I was more preoccupied with the mass thresholds changing what's considered an issue: #164 (review).
The trouble is that what nodes have a mass of 40+ will change. Instances of duplication that would be reported as issues by CC today would no longer be reported as issues, and effectively invalidate existing customer configurations. I constructed a concrete example JS file, inline below demonstrating this: With default settings on Similar to Gordon's comment about needing to be careful about changing fingerprints, we need to be careful about changing the actual results users will receive. function a() {
var a = 0, b = 1;
console.log("foo");
if (1 + 1 - 5) {
console.log("foo");
console.log("foo");
}
}
function b() {
var a = 0, b = 1;
console.log("foo");
if (1 + 1 - 5) {
console.log("foo");
console.log("foo");
}
} |
This fixes fingerprints and mass threshold issues, but lies about mass... *shrug* Seems OK.
This allows ruby to be processed as-is, and everything else to be flatter (as necessary, I have no data on python, for example, but I'm flattening it for now because it is coming through the json bridge).
Specifically, for a fairly normal sized javascript file, the parsed
and converted sexp had a mass of 44381 and a depth of 191 (and in
would get a "stack level too deep" error calculating the depth). This
is because every key/value pair in the json became 2 levels of tree
when converted to a sexp.
Now, calling
sexp.flatter
before passing to flay converts that sametree to have a mass of 22965 and a depth of 91, all while being
isomorphic to the original.
This takes the time to process that file from 27 seconds to 7. And the
project I was benchmarking went from timing out at 10 minutes to
finishing in 3.