Implement a `count-offenses` command #340

davidstosik · 2023-03-29T09:07:55Z

What are you trying to accomplish?

As we are planning to track the evolution of Packwerk over time, and to measure the impact of our work, having a Packwerk command that provides a count for offenses per package would be very useful. This PR implements it.

What approach did you choose and why?

We had the choice to count offenses in the YAML file, or to find them all in the code and count that. Each option has its pros and cons:

counting the values in package_todo.yml files is fast
but files referenced in package_todo.yml get deduplicated, so the count is less accurate

We picked the second option, even though it is slower, in order to get a more accurate count and to be able to observe when the number of references grows or shrinks within a single file.

What should reviewers focus on?

I needed to expose the package_todos attribute of the OffenseCollection class, in order to count offenses. I imagine I could instead define the method OffenseCollection#count_package_todo so that I don't have to expose package_todo. Is there a preferred approach?
(Note that I also renamed the @package_todo instance variable to make it plural, as it makes more sense.)

Type of Change

Bugfix
New feature
Non-breaking change (a change that doesn't alter functionality - i.e., code refactor, configs, etc.)

Checklist

I have updated the documentation accordingly.
I have added tests to cover my changes.
It is safe to rollback this change.

davidstosik · 2023-03-29T09:23:51Z

lib/packwerk/package.rb

@@ -25,7 +25,6 @@ def initialize(name:, config: nil)
      @name = name
      @config = T.let(config || {}, T::Hash[String, T.untyped])
      @dependencies = T.let(Array(@config["dependencies"]).freeze, T::Array[String])
-      @public_path = T.let(nil, T.nilable(String))


This is never used.

lib/packwerk/cli.rb

@@ -57,6 +57,8 @@ def execute_command(args)
        output_result(parse_run(args).check)
      when "update-todo", "update"
        output_result(parse_run(args).update_todo)
+      when "count-offenses"


gmcgibbon · 2023-03-29T22:29:14Z

USAGE.md

+
+You can generate a count of all offenses by running:
+
+    bin/packwerk count-offenses


We should use code blocks like

bin/packwerk count-offenses

similar to other examples in the docs

It would also be good to add an example of the output.

gmcgibbon · 2023-03-29T22:34:43Z

test/unit/packwerk/parse_run_test.rb

+      )
+      result = parse_run.count_offenses
+
+      assert_equal "components/source,1\n", result.message


I think it would be useful to test a setup with multiple packages (3?).

lib/packwerk/parse_run.rb

davidstosik · 2023-03-30T08:40:14Z

USAGE.md

@@ -20,7 +20,6 @@
  * [Understanding how to respond to new violations](#understanding-how-to-respond-to-new-violations)
 * [Recording existing violations](#recording-existing-violations)
  * [Understanding the package todo file](#understanding-the-package-todo-file)
-  * [Understanding the list of deprecated references](#understanding-the-list-of-deprecated-references)


This points to a non-existent entry in the doc.

- Return an `Array` of `Offense`s instead of an `OffenseCollection` - Take in a proc instead of `show_errors` boolean to make behaviour configurable

gmcgibbon

This is looking great! Just one comment about process_file creation.

gmcgibbon · 2023-03-30T20:36:58Z

lib/packwerk/parse_run.rb

+        T.let(proc do |relative_file|
+          run_context.process_file(relative_file: relative_file)
+        end, ProcessFileProc)
+      end


Can we move this to a private method? This is slightly repetitive, but I think it is justified with how many times this gets called.

Do you mean something like this?

def process_file_proc(&block) if block T.let(proc do |relative_file| run_context.process_file(relative_file: relative_file).tap(&block) end, ProcessFileProc) else T.let(proc do |relative_file| run_context.process_file(relative_file: relative_file) end, ProcessFileProc) end end # Then I can call it where it's used: # ... @progress_formatter.started_inspection(@relative_file_set) do all_offenses = if @configuration.parallel? Parallel.flat_map(@relative_file_set, &process_file_proc(&block)) else serial_find_offenses(&process_file_proc(&block)) end end # ... # (Or I can keep a local variable if that seems better.)

gmcgibbon · 2023-03-30T20:40:54Z

lib/packwerk/cli.rb

@@ -57,6 +57,8 @@ def execute_command(args)
        output_result(parse_run(args).check)
      when "update-todo", "update"
        output_result(parse_run(args).update_todo)
+      when "show-offenses", "show"


I wonder if violations or offense is the right wording here. We should be more consistent with out wording internally. Since we've already gotten things like "offense formatters", this is probably fine.

I agree it would be nice to define the gem's vocabulary more precisely. It would not only help with understanding how to use it, but also make reading/writing code easier.

Big plus for this! We call these three things:

todos (e.g. update-todo, package_todo.yml

violations

offenses

IMO violations and offenses are somewhat aggressive terms and TODO captures the desire to change/fix while being less aggressive and also shorter to write!

I'd be in support of changing all language across the board to todos! Some things could be changed right away (e.g. internal implementation variable names), others could probably be changed while not being considered a public API breakage (e.g. output of commands), others would need to be deprecated or have a switchover before a major version change (e.g. CLI commands, offenses formatter, the word violation in package_todo files).

What do you all think? (We could start a discussion too.)

Not sure I agree with using the word TODO for all.
To me, the word todo in Packwerk has the same meaning as in Rubocop's .rubocop_todo.yml, which is there to disable offenses "to be fixed later":

The generated file .rubocop_todo.yml contains configuration to disable cops that currently detect an offense in the code by changing the configuration for the cop, excluding the offending files, or disabling the cop altogether once a file count limit has been reached. (source)

I think we could use the same word for violations and offenses, but would not conflate that concept with todo.
Offense feels a bit more neutral to me (and I guess it also matches Rubocop's vocabulary). The word is also already used more than twice more than violation:

ag -i offense | wc -l 799 ag -i violation | wc -l 325

I have a hunch that there might have been the intent to give violation and offense slightly different meanings, but 1. I'm not sure I'm right and 2. this might not be necessary, or could be expressed more explicitly:

offense: anything that's causing an error (config error, reference that violates declared dependencies, etc)

violation: an offense that is not recorded as a todo (and that triggers an error on check).

Interestingly, even though offense is used more often in the source code, violation (at least at Gusto) is used much more often by the client. Not sure if it's the same at Shopify, but here everyone uses the word violation, and it's probably in part because the word violation is in package_todo.yml files explicitly. Also, the output of runs uses the word violation (e.g. "no stale violations detected"), although the word offenses is used too sometimes.

I can buy the different use cases for the words. Something is an offense for example in packwerk if there's a file that can't be parsed correctly. A violation is an offense, but is also what happens when everything works correctly in packwerk, but there's a "violation" of packwerk directives (e.g. dependency usage, private API usage, etc.). TODOs are recorded violations, perhaps short for "violations TODO."

I'm not sure if all of this distinction is useful to the client using this system though, but also I haven't really heard of many folks confused about these terms besides us so perhaps it's not a problem! I'm just glad we replaced deprecated_references with todo though – I've found that has resolved a lot of language confusion on our end.

gmcgibbon · 2023-03-30T20:45:53Z

I'm really curious how this affects the performance of check and update. Can you provide some benchmarks if you have time?

davidstosik · 2023-03-31T03:16:15Z

@gmcgibbon A very quick time check on the Shopify core code base does not seem to show noticeable changes (either good or bad):

spring stop && time bin/packwerk ${command}

Spring stopped.
...
📦 Finished in XXX
...
bin/packwerk ${command} ___s user ___s system __% cpu YYY total

XXX and YYY times below:

command	`main`	`sto/count-violations`
`check`	16.12s / 45.424s	15.76s / 45.336
`update-todo`	16.29s / 43.725s	15.83s / 42.382
`show`	N/A	14.94s / 39.662

(show might be benefitting from not creating and populating the OffenseCollection object though.)

Should I try to properly use benchmark with the gem's fixtures?

rafaelfranca · 2023-04-20T17:17:18Z

What is the output of this command? And where are the tests?

davidstosik · 2023-04-26T07:26:43Z

Hello @rafaelfranca, sorry I forgot to reply to your comment. 🙇🏻‍♂️
We're taking a different route trying to use output formatters instead, but for that I've been working on a refactor of Packwerk's CLI code.

Setting this to draft for now.

Remove unused instance variable from initializer

df580ea

davidstosik requested a review from gmcgibbon March 29, 2023 09:09

davidstosik marked this pull request as ready for review March 29, 2023 09:09

davidstosik requested a review from a team as a code owner March 29, 2023 09:09

davidstosik force-pushed the sto/count-violations branch 2 times, most recently from 16e7f04 to 116876d Compare March 29, 2023 09:17

davidstosik commented Mar 29, 2023

View reviewed changes

gmcgibbon reviewed Mar 29, 2023

View reviewed changes

lib/packwerk/parse_run.rb Show resolved Hide resolved

davidstosik commented Mar 30, 2023

View reviewed changes

davidstosik force-pushed the sto/count-violations branch from 0962193 to 1e31c70 Compare March 30, 2023 08:44

davidstosik added 2 commits March 30, 2023 08:50

Remove orphan entry in USAGE.md TOC

a93824d

Refactor ParseRun#find_offenses

d1ac0ea

- Return an `Array` of `Offense`s instead of an `OffenseCollection` - Take in a proc instead of `show_errors` boolean to make behaviour configurable

davidstosik force-pushed the sto/count-violations branch from 1e31c70 to 2bafd67 Compare March 30, 2023 08:51

Implement packwerk show-offenses command

15cfab5

davidstosik force-pushed the sto/count-violations branch from 2bafd67 to 15cfab5 Compare March 30, 2023 08:58

gmcgibbon reviewed Mar 30, 2023

View reviewed changes

davidstosik marked this pull request as draft April 26, 2023 07:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement a `count-offenses` command #340

Implement a `count-offenses` command #340

davidstosik commented Mar 29, 2023 •

edited

Loading

davidstosik Mar 29, 2023 •

edited

Loading

This comment was marked as resolved.

gmcgibbon Mar 29, 2023

gmcgibbon Mar 29, 2023

davidstosik Mar 30, 2023

gmcgibbon left a comment

gmcgibbon Mar 30, 2023

davidstosik Mar 31, 2023

gmcgibbon Mar 30, 2023

davidstosik Mar 31, 2023 •

edited

Loading

alexevanczuk Mar 31, 2023

davidstosik Apr 2, 2023 •

edited

Loading

alexevanczuk Apr 3, 2023

gmcgibbon commented Mar 30, 2023

davidstosik commented Mar 31, 2023 •

edited

Loading

rafaelfranca commented Apr 20, 2023

davidstosik commented Apr 26, 2023


		You can generate a count of all offenses by running:

		bin/packwerk count-offenses

Implement a count-offenses command #340

Are you sure you want to change the base?

Implement a count-offenses command #340

Conversation

davidstosik commented Mar 29, 2023 • edited Loading

What are you trying to accomplish?

What approach did you choose and why?

What should reviewers focus on?

Type of Change

Checklist

davidstosik Mar 29, 2023 • edited Loading

Choose a reason for hiding this comment

This comment was marked as resolved.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gmcgibbon left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidstosik Mar 31, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davidstosik Apr 2, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gmcgibbon commented Mar 30, 2023

davidstosik commented Mar 31, 2023 • edited Loading

rafaelfranca commented Apr 20, 2023

davidstosik commented Apr 26, 2023

Implement a `count-offenses` command #340

Implement a `count-offenses` command #340

davidstosik commented Mar 29, 2023 •

edited

Loading

davidstosik Mar 29, 2023 •

edited

Loading

davidstosik Mar 31, 2023 •

edited

Loading

davidstosik Apr 2, 2023 •

edited

Loading

davidstosik commented Mar 31, 2023 •

edited

Loading