Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

atddiff: provide output without line/column info for more stable results #377

Closed
mjambon opened this issue Oct 24, 2023 · 0 comments · Fixed by #382
Closed

atddiff: provide output without line/column info for more stable results #377

mjambon opened this issue Oct 24, 2023 · 0 comments · Fixed by #382
Assignees
Labels
feature request Big and small feature requests

Comments

@mjambon
Copy link
Collaborator

mjambon commented Oct 24, 2023

Problem

  • We compare the current version of an ATD file with past versions (using git difftool -x atddiff).
  • Each time something is reported, reviewed, but ignored, it continues to be reported by atddiff.

We want to ignore findings that we already reviewed. Our approach is to keep the atddiff output in git. Each new commit on the ATD file produces a report that diffed against the previous version of the report. We want this diff to be meaningful so as to show only the new findings or findings that disappeared.

Here's an example of a diff on the atddiff output:

 Incompatibility in both directions:
 File "semgrep_output_v1.atd", line 1071, characters 21-35
-File "semgrep_output_v1.atd", line 993, characters 21-28:
+File "semgrep_output_v1.atd", line 957, characters 21-28:
 Incompatible kinds of types: option is now a string.
 The following types are affected:
   project_metadata

The - and + lines exist only because some code was inserted and shifted many of the definitions in the file without changing their contents.

Proposed solution

  1. Support a --no-locations option that suppresses the output of the variable parts (e.g. line 1071, characters 21-35).
  2. If possible (and reasonably easy), output a hash identifying each finding based on the structure of the types being compared.

Producing a stable hash for each pair of types being compared may be a lot of work. An approximate solution may be to identify findings only based on the error message (which may be sufficiently unique in practice) e.g. hash the following text:

 Incompatible kinds of types: option is now a string.
 The following types are affected:
   project_metadata

When multiple findings end up having the same error message and the same identifier, we can report the number of such findings. It is only for diff purposes. For investigating the finding, the user would request the full output with locations anyway.

@mjambon mjambon self-assigned this Oct 24, 2023
@mjambon mjambon added the feature request Big and small feature requests label Oct 24, 2023
mjambon added a commit to mjambon/opam-repository that referenced this issue Oct 27, 2023
CHANGES:

* atddiff: Breaking changes in the JSON output format of atddiff (ahrefs/atd#382)
* atddiff: Fix `atddiff --version` output (ahrefs/atd#379)
* atddiff: New experimental option `--no-locations` aimed at
           producing more stable results that allow diffing successive
           atddiff reports to spot new findings and ignore old ones (ahrefs/atd#377)
nberth pushed a commit to nberth/opam-repository that referenced this issue Jun 18, 2024
CHANGES:

* atddiff: Breaking changes in the JSON output format of atddiff (ahrefs/atd#382)
* atddiff: Fix `atddiff --version` output (ahrefs/atd#379)
* atddiff: New experimental option `--no-locations` aimed at
           producing more stable results that allow diffing successive
           atddiff reports to spot new findings and ignore old ones (ahrefs/atd#377)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Big and small feature requests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant