Add support for report generator plugins that process modified files as a batch #300

barrywhart · 2022-11-26T22:01:06Z

Fixes #299

…as a batch

barrywhart · 2022-11-26T22:07:21Z

README.rst

@@ -64,7 +64,8 @@ To install the development version:

    git clone https://github.com/Bachmann1234/diff-cover.git
    cd diff-cover
-    python setup.py install
+    poetry install
+    poetry shell


Unrelated to the rest of the PR -- I noticed that the instructions mention setup.py, which no longer exists.

Bachmann1234 · 2022-11-28T04:56:06Z

I'll look at this closely this week. Sorry I did not get to it yet

barrywhart · 2022-11-28T18:05:19Z

No problem! It's a roadmap item for us, not urgent. Hope you had a great holiday (if you're in the U.S.).

diff_cover/report_generator.py

Bachmann1234 · 2022-11-29T01:11:18Z

diff_cover/report_generator.py

            self._diff_violations_dict = {
                src_path: DiffViolations(
-                    self._violations.violations(src_path),
+                    violations.get(src_path, [])
+                    if violations is not None


so the violations reporter is supposed to cache these results so each is only processed once.

The function you are calling violations_batch could also be described as warm_cache.

Does that sound right to you? The reason I wanna rename this is because I want to hint that the implementor should be saving the results so we dont have to remember the state later (basically remove this if check)

basically....

Im wondering if you would go with renaming violations_batch to warm_cache and then removing the if violations is not None and just calling violations as is (under the assumption that the implementor has saved the results so the lookup is quick)

My assumption in checking for if violations is not None is that it will be None if the function is not implemented. In that case, wouldn't we need to call violations()?

Bigger picture: For SQLFluff, violations_batch() is not a warm cache. Rather, SQLFluff is a fairly slow-running tool that is often used to process hundreds or even thousands of files. If diff-quality calls SQLFluff one file at a time, there is no opportunity to use parallel processing. Adding a function like violations_batch() provides more opportunity for the implementor to use parallel processing or other clever strategies to improve performance. It can definitely be used for caching as well.

Does this change your thoughts on the above?

It may be helpful (in terms of code clarity) to remove the line violations = None and change the subsequent code to:

try: violations = self._violations.violations_batch(src_paths_changed) except NotImplementedError: violations = {src_path: self._violations.violations(src_path)} self._diff_violations_dict = { src_path: DiffViolations( violations.get(src_path, []), ...

This ensures that violations is always defined and not None.

To be clear, I'm okay renaming the function if you like, but I wanted to explain my thinking, since it's not always going to be used for caching.

I like your updated code block.

Essentially I was trying to think of a way where once we do the check for violations_batch we dont have to think about it any more.

So the idea behind the 'warming the cache' was that violations_batch would precompute the violations and then violations itself would just become a lookup rather than a full computation.

But the block you posted at the end achieves the same goal (I think). Violations would either come from violations_batch or the dict comprehension.

I just wanna make sure we only have to check if this function exists the one time and then sorta process as normal from there.

My proposed code above doesn't work "as is" because it uses src_path outside the loop. I'll try to do something similar. I was trying to avoid looping over the list of files twice, but maybe that's the best option.

diff_cover/report_generator.py

PR review

barrywhart · 2022-11-29T22:26:04Z

diff_cover/report_generator.py

@@ -172,15 +172,29 @@ def _diff_violations(self):

        To make this efficient, we cache and reuse the result.
        """
+        src_paths_changed = self._diff.src_paths_changed()


I suggest reviewing this file with "Hide whitespace" enabled in GitHub.

barrywhart · 2022-11-29T22:29:23Z

diff_cover/report_generator.py

+                for src_path in src_paths_changed
+                }
+            except NotImplementedError:
+                self._diff_violations_dict = {


This code is essentially identical to main (only difference is that we're looping over src_paths_changed.

barrywhart · 2022-11-29T22:31:45Z

diff_cover/report_generator.py

-                    self._violations.measured_lines(src_path),
-                    self._diff.lines_changed(src_path),
+            try:
+                violations = self._violations.violations_batch(


This code is very similar to the except block, but it uses results from violations_batch() rather than calling violations().

It duplicates some code from the except block, but overall the duplication seems to make the code more readable by keeping the two implementations (using violations_batch() versus violations()) distinct from one another.

barrywhart · 2022-11-29T22:32:24Z

Ready for another review, I think.

Bachmann1234 · 2022-11-30T03:04:41Z

pyproject.toml

@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "diff_cover"
-version = "7.1.0"
+version = "7.1.1"


let me handle this. I just do this when I bump versions.

This is a feature and non breaking so I would probably go ahead and make this release 7.2.0 anyway

Bachmann1234 · 2022-11-30T03:06:37Z

Ok. Think this can work. I think if you just update from main (and just remove the change to pyproject.toml ill do that when I cut the release) ill go ahead and merge this

Bachmann1234 · 2022-11-30T03:08:37Z

wait, im an admin, what am I doing. Illl fix the conflict

Bachmann1234 · 2022-11-30T03:13:01Z

@barrywhart I realized I should probably get a final "this is ready" from you before I merge and release. I tend to be a bit trigger happy with the release button. So just leave me a note and ill get this out asap (likely the evening I see it)

tests/test_report_generator.py

Remove commented-out code in test

barrywhart · 2022-11-30T12:10:20Z

I removed a few lines of commented-out code. This is ready to merge once the build passes.

Thanks for the review! SQLFluff users will appreciate the speedier performance once we update it to take advantage of this. 🙏

Bachmann1234 · 2022-12-01T03:52:43Z

https://pypi.org/project/diff-cover/7.2.0/ is released!

barrywhart · 2022-12-01T12:01:55Z

That's great -- thank you so much!

Add support for report generator plugins that process modified files …

a8609e6

…as a batch

barrywhart mentioned this pull request Nov 26, 2022

diff-qualify to take advantage of multiple processes sqlfluff/sqlfluff#4094

Closed

3 tasks

barrywhart commented Nov 26, 2022

View reviewed changes

barrywhart marked this pull request as draft November 26, 2022 22:26

Add a test that uses violations_batch()

9319713

barrywhart marked this pull request as ready for review November 26, 2022 22:48

Bachmann1234 reviewed Nov 29, 2022

View reviewed changes

diff_cover/report_generator.py Outdated Show resolved Hide resolved

Bachmann1234 reviewed Nov 29, 2022

View reviewed changes

barrywhart commented Nov 29, 2022

View reviewed changes

diff_cover/report_generator.py Outdated Show resolved Hide resolved

barrywhart added 2 commits November 29, 2022 10:06

Update diff_cover/report_generator.py

0d28aa7

PR review

PR review

de88a39

barrywhart commented Nov 29, 2022

View reviewed changes

Tweak to reduce diffs from current code

3bd6f29

barrywhart commented Nov 29, 2022

View reviewed changes

Bachmann1234 reviewed Nov 30, 2022

View reviewed changes

Merge branch 'main' into bhart-issue_299_batch_violations_support

286e3cf

barrywhart commented Nov 30, 2022

View reviewed changes

tests/test_report_generator.py Outdated Show resolved Hide resolved

barrywhart commented Nov 30, 2022

View reviewed changes

tests/test_report_generator.py Outdated Show resolved Hide resolved

Apply suggestions from code review

ab65f85

Remove commented-out code in test

Bachmann1234 merged commit 66fa119 into Bachmann1234:main Dec 1, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for report generator plugins that process modified files as a batch #300

Add support for report generator plugins that process modified files as a batch #300

barrywhart commented Nov 26, 2022

barrywhart Nov 26, 2022

Bachmann1234 commented Nov 28, 2022

barrywhart commented Nov 28, 2022

Bachmann1234 Nov 29, 2022

barrywhart Nov 29, 2022

barrywhart Nov 29, 2022

Bachmann1234 Nov 29, 2022

barrywhart Nov 29, 2022

barrywhart Nov 29, 2022

barrywhart Nov 29, 2022

barrywhart Nov 29, 2022

barrywhart commented Nov 29, 2022

Bachmann1234 Nov 30, 2022

Bachmann1234 commented Nov 30, 2022

Bachmann1234 commented Nov 30, 2022

Bachmann1234 commented Nov 30, 2022

barrywhart commented Nov 30, 2022

Bachmann1234 commented Dec 1, 2022

barrywhart commented Dec 1, 2022

Add support for report generator plugins that process modified files as a batch #300

Add support for report generator plugins that process modified files as a batch #300

Conversation

barrywhart commented Nov 26, 2022

Choose a reason for hiding this comment

Bachmann1234 commented Nov 28, 2022

barrywhart commented Nov 28, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

barrywhart commented Nov 29, 2022

Choose a reason for hiding this comment

Bachmann1234 commented Nov 30, 2022

Bachmann1234 commented Nov 30, 2022

Bachmann1234 commented Nov 30, 2022

barrywhart commented Nov 30, 2022

Bachmann1234 commented Dec 1, 2022

barrywhart commented Dec 1, 2022