Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a count-offenses command #340

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from
Draft

Conversation

davidstosik
Copy link
Contributor

@davidstosik davidstosik commented Mar 29, 2023

What are you trying to accomplish?

As we are planning to track the evolution of Packwerk over time, and to measure the impact of our work, having a Packwerk command that provides a count for offenses per package would be very useful. This PR implements it.

What approach did you choose and why?

We had the choice to count offenses in the YAML file, or to find them all in the code and count that. Each option has its pros and cons:

  • counting the values in package_todo.yml files is fast
  • but files referenced in package_todo.yml get deduplicated, so the count is less accurate

We picked the second option, even though it is slower, in order to get a more accurate count and to be able to observe when the number of references grows or shrinks within a single file.

What should reviewers focus on?

I needed to expose the package_todos attribute of the OffenseCollection class, in order to count offenses. I imagine I could instead define the method OffenseCollection#count_package_todo so that I don't have to expose package_todo. Is there a preferred approach?
(Note that I also renamed the @package_todo instance variable to make it plural, as it makes more sense.)

Type of Change

  • Bugfix
  • New feature
  • Non-breaking change (a change that doesn't alter functionality - i.e., code refactor, configs, etc.)

Checklist

  • I have updated the documentation accordingly.
  • I have added tests to cover my changes.
  • It is safe to rollback this change.

@davidstosik davidstosik marked this pull request as ready for review March 29, 2023 09:09
@davidstosik davidstosik requested a review from a team as a code owner March 29, 2023 09:09
@davidstosik davidstosik force-pushed the sto/count-violations branch 2 times, most recently from 16e7f04 to 116876d Compare March 29, 2023 09:17
@@ -25,7 +25,6 @@ def initialize(name:, config: nil)
@name = name
@config = T.let(config || {}, T::Hash[String, T.untyped])
@dependencies = T.let(Array(@config["dependencies"]).freeze, T::Array[String])
@public_path = T.let(nil, T.nilable(String))
Copy link
Contributor Author

@davidstosik davidstosik Mar 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is never used.

@@ -57,6 +57,8 @@ def execute_command(args)
output_result(parse_run(args).check)
when "update-todo", "update"
output_result(parse_run(args).update_todo)
when "count-offenses"

This comment was marked as resolved.

USAGE.md Outdated

You can generate a count of all offenses by running:

bin/packwerk count-offenses
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use code blocks like

bin/packwerk count-offenses

similar to other examples in the docs

It would also be good to add an example of the output.

)
result = parse_run.count_offenses

assert_equal "components/source,1\n", result.message
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be useful to test a setup with multiple packages (3?).

@@ -20,7 +20,6 @@
* [Understanding how to respond to new violations](#understanding-how-to-respond-to-new-violations)
* [Recording existing violations](#recording-existing-violations)
* [Understanding the package todo file](#understanding-the-package-todo-file)
* [Understanding the list of deprecated references](#understanding-the-list-of-deprecated-references)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This points to a non-existent entry in the doc.

- Return an `Array` of `Offense`s instead of an `OffenseCollection`
- Take in a proc instead of `show_errors` boolean to make behaviour configurable
Copy link
Member

@gmcgibbon gmcgibbon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great! Just one comment about process_file creation.

T.let(proc do |relative_file|
run_context.process_file(relative_file: relative_file)
end, ProcessFileProc)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this to a private method? This is slightly repetitive, but I think it is justified with how many times this gets called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean something like this?

def process_file_proc(&block)
  if block
    T.let(proc do |relative_file|
      run_context.process_file(relative_file: relative_file).tap(&block)
    end, ProcessFileProc)
  else
    T.let(proc do |relative_file|
      run_context.process_file(relative_file: relative_file)
    end, ProcessFileProc)
  end
end

# Then I can call it where it's used:
# ...
      @progress_formatter.started_inspection(@relative_file_set) do
        all_offenses = if @configuration.parallel?
          Parallel.flat_map(@relative_file_set, &process_file_proc(&block))
        else
          serial_find_offenses(&process_file_proc(&block))
        end
      end
# ...
# (Or I can keep a local variable if that seems better.)

@@ -57,6 +57,8 @@ def execute_command(args)
output_result(parse_run(args).check)
when "update-todo", "update"
output_result(parse_run(args).update_todo)
when "show-offenses", "show"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if violations or offense is the right wording here. We should be more consistent with out wording internally. Since we've already gotten things like "offense formatters", this is probably fine.

Copy link
Contributor Author

@davidstosik davidstosik Mar 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it would be nice to define the gem's vocabulary more precisely. It would not only help with understanding how to use it, but also make reading/writing code easier.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big plus for this! We call these three things:

  • todos (e.g. update-todo, package_todo.yml
  • violations
  • offenses

IMO violations and offenses are somewhat aggressive terms and TODO captures the desire to change/fix while being less aggressive and also shorter to write!

I'd be in support of changing all language across the board to todos! Some things could be changed right away (e.g. internal implementation variable names), others could probably be changed while not being considered a public API breakage (e.g. output of commands), others would need to be deprecated or have a switchover before a major version change (e.g. CLI commands, offenses formatter, the word violation in package_todo files).

What do you all think? (We could start a discussion too.)

Copy link
Contributor Author

@davidstosik davidstosik Apr 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I agree with using the word TODO for all.
To me, the word todo in Packwerk has the same meaning as in Rubocop's .rubocop_todo.yml, which is there to disable offenses "to be fixed later":

The generated file .rubocop_todo.yml contains configuration to disable cops that currently detect an offense in the code by changing the configuration for the cop, excluding the offending files, or disabling the cop altogether once a file count limit has been reached. (source)

I think we could use the same word for violations and offenses, but would not conflate that concept with todo.
Offense feels a bit more neutral to me (and I guess it also matches Rubocop's vocabulary). The word is also already used more than twice more than violation:

ag -i offense | wc -l
     799
ag -i violation | wc -l
     325

I have a hunch that there might have been the intent to give violation and offense slightly different meanings, but 1. I'm not sure I'm right and 2. this might not be necessary, or could be expressed more explicitly:

  • offense: anything that's causing an error (config error, reference that violates declared dependencies, etc)
  • violation: an offense that is not recorded as a todo (and that triggers an error on check).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly, even though offense is used more often in the source code, violation (at least at Gusto) is used much more often by the client. Not sure if it's the same at Shopify, but here everyone uses the word violation, and it's probably in part because the word violation is in package_todo.yml files explicitly. Also, the output of runs uses the word violation (e.g. "no stale violations detected"), although the word offenses is used too sometimes.

I can buy the different use cases for the words. Something is an offense for example in packwerk if there's a file that can't be parsed correctly. A violation is an offense, but is also what happens when everything works correctly in packwerk, but there's a "violation" of packwerk directives (e.g. dependency usage, private API usage, etc.). TODOs are recorded violations, perhaps short for "violations TODO."

I'm not sure if all of this distinction is useful to the client using this system though, but also I haven't really heard of many folks confused about these terms besides us so perhaps it's not a problem! I'm just glad we replaced deprecated_references with todo though – I've found that has resolved a lot of language confusion on our end.

@gmcgibbon
Copy link
Member

I'm really curious how this affects the performance of check and update. Can you provide some benchmarks if you have time?

@davidstosik
Copy link
Contributor Author

davidstosik commented Mar 31, 2023

@gmcgibbon A very quick time check on the Shopify core code base does not seem to show noticeable changes (either good or bad):

spring stop && time bin/packwerk ${command}

Spring stopped.
...
📦 Finished in XXX
...
bin/packwerk ${command} ___s user ___s system __% cpu YYY total

XXX and YYY times below:

command main sto/count-violations
check 16.12s / 45.424s 15.76s / 45.336
update-todo 16.29s / 43.725s 15.83s / 42.382
show N/A 14.94s / 39.662

(show might be benefitting from not creating and populating the OffenseCollection object though.)

Should I try to properly use benchmark with the gem's fixtures?

@rafaelfranca
Copy link
Member

What is the output of this command? And where are the tests?

@davidstosik
Copy link
Contributor Author

Hello @rafaelfranca, sorry I forgot to reply to your comment. 🙇🏻‍♂️
We're taking a different route trying to use output formatters instead, but for that I've been working on a refactor of Packwerk's CLI code.

Setting this to draft for now.

@davidstosik davidstosik marked this pull request as draft April 26, 2023 07:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants