Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build if-diff core feature #637

Closed
2 of 12 tasks
Tracked by #629 ...
jmcook1186 opened this issue Apr 11, 2024 · 23 comments · Fixed by #711
Closed
2 of 12 tasks
Tracked by #629 ...

Build if-diff core feature #637

jmcook1186 opened this issue Apr 11, 2024 · 23 comments · Fixed by #711
Assignees
Milestone

Comments

@jmcook1186
Copy link
Contributor

jmcook1186 commented Apr 11, 2024

Sub of: #629 -> #650

What

A command line utility that compares two manifest files and returns both a code and useful console logs for showing where two manifest files differ.

Why

As a user who needs to compare manifests files I would find it very useful to have a command line tool that can rapidly report whether two manifests are identical and if not how they differ.

Context

We can create a convenience tool for users to quickly see how manifests differ. This tool needs to be designed and specified before we start building. This ticket covers designing and documenting the tool prior to building.
The tool should be able to categorize differences, something along the lines of 'identical', tree match, context-match no-overlap or similar. The user should be able to get granular detail about where and how the manifests differ, similar to git diff.

CLI details:

  • if-diff is a script separate from ie - it is self contained and can be included in if/package.json.
  • if-diff should take two arguments: --source and --target
    • source is the "test case" file that represents the source of truth
    • target is the file that is being compared against source
  • if-diff should only be able to return either FILES MATCH or the details of the first difference it encounters.
  • if-diff should be able to receive a source from stdin as well as from a file path passed on the command line, e.g. ie --manifest source.yml | if-diff --target target.yml
  • if-diff should emit an exit value:
    • 0: files match
    • 1: files did not match
    • 2: some other error occurred during execution

Matching rules

  • all file content should be included in the matching
  • if-diff should support wildcard characters (*) that allow any value to count as a match
    • e.g. if source includes cpu-util: * then both cpu-util: 45 and cpu-util: 55 in the same position in target should be accepted as a match. Pass on any value - no need to check types etc here at the moment.

Output display format

  • If if-diff finds no in-scope differences between the source and target then it should return a success flag: FILES MATCH and exit code 0.
  • If if-diff detects an in-scope difference between the files, it should halt execution, return exit code 1 and report the error to the command line. The report should include the yaml path to the differing element in the tree, the value in the source and the value in the target, using the following schema:
Files do not match!
<yaml path to non-matching element>
source: <value in source file>
target: <value in target file>

If the difference relates to a missing node in the tree for source or target then <value in x file> should be either exists or missing and the yaml path should point to the highest level element that is missing (e.g. if an entire child component is missing, provide the path to the child component).

e.g. difference values detected:

Files do not match!
tree.children.vm1[4].cpu/utilization
source: 45
target:  43

Prerequisites

Related Issues:

SoW

  • Implement acceptance criteria
  • if-diff outputs conform to the schema defined in the rationale section above
  • Unit tests includes positive and negative test cases in if reposiroty
  • unit tests have 100% coverage and passing
  • Create PR
  • Incorporate PR feedback

Acceptance Criteria

Scenario 1: Executing if-diff

  • if-diff exists as a standalone tool that comes bundled with if
    • it is included in if/package.json
    • it is invoked using the if-diff command
    • it takes two arguments: --sourceand --target
    • it returns EITHER; the string FILES MATCH AND exit code 0 OR a difference report describing the first error it encounters while comparing the two files AND exit code 1.

Scenario 2: No change detected

  • GIVEN the user has downloaded and installed if
    WHEN they execute if-diff --source manifest1.yml --target manifest2.yml
    AND both manifest1.yml and manifest2.yml contain:
name: test
description:
tags:
initialize:
  plugins:
    sum:
      method: Sum
      path: "@grnsft/if-plugins"
      global-config:
        input-parameters: ["cpu/energy", "network/energy"]
        output-parameter: "energy"
tree:
  children:
    child:
      pipeline:
        - sum
      config:
        sum:
      inputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          network/energy: 0.001
      outputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          network/energy: 0.001
          energy: 0.002

THEN: if-diff should return FILES MATCH and exit code 0.

Scenario 3: Change detected

GIVEN the user has downloaded and installed if
WHEN they execute if-diff --source manifest1.yml --target manifest2.yml
AND manifest-1.yml contains

name: test
description:
tags:
initialize:
  plugins:
    sum:
      method: Sum
      path: "@grnsft/if-plugins"
      global-config:
        input-parameters: ["cpu/energy", "network/energy"]
        output-parameter: "energy"
tree:
  children:
    child:
      pipeline:
        - sum
      config:
        sum:
      inputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          network/energy: 0.001
      outputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          network/energy: 0.001
          energy: 0.002

but manifest-2.yml contains (different value for energy/network):

name: test
description:
tags:
initialize:
  plugins:
    sum:
      method: Sum
      path: "@grnsft/if-plugins"
      global-config:
        input-parameters: ["cpu/energy", "network/energy"]
        output-parameter: "energy"
tree:
  children:
    child:
      pipeline:
        - sum
      config:
        sum:
      inputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          network/energy: 0.001
      outputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          network/energy: 0.002
          energy: 0.003

THEN if-diff should emit exit code 1 and return

Files do not match!
tree.children.child.outputs[0]['network/energy']
source: 0.001
target: 0.002

Scenario 4: Reading output

  • if-diff can receive --source data from stdin as well as a given filepath
    • this will enable pipes, for example ie --manifest example1.yml | if-diff --target example-target.yml

GIVEN a user has downloaded and installed if
WHEN they run if-diff --source manifest-1.yml --target manifest-2.yml
WITH manifest-1.yml containing:

name: test
description:
tags:
execution:
  command: >-
    /home/joe/.npm/_npx/1bf7c3c15bf47d04/node_modules/.bin/ts-node
    /home/joe/Code/if/src/index.ts --manifest
    /home/joe/Code/if/examples/manifests/basic.yml --stdout
  environment:
    os: linux
    os-version: 5.15.0-102-generic
    node-version: 21.4.0
    date-time: '2024-05-01T08:35:57.060Z'
    dependencies:
      - '@babel/core@7.22.10'
      - '@babel/preset-typescript@7.23.3'
      - '@commitlint/cli@18.6.0'
      - '@commitlint/config-conventional@18.6.0'
      - '@grnsft/if-models@v0.1.6-beta'
      - '@grnsft/if-plugins@v0.3.2'
      - '@grnsft/if-unofficial-plugins@v0.3.1'
      - '@jest/globals@29.7.0'
      - '@types/jest@29.5.8'
      - '@types/js-yaml@4.0.9'
      - '@types/luxon@3.4.2'
      - '@types/node@20.9.0'
      - csv-stringify@6.4.6
      - fixpack@4.0.0
      - gts@5.2.0
      - husky@8.0.3
      - jest@29.7.0
      - js-yaml@4.1.0
      - luxon@3.4.4
      - rimraf@5.0.5
      - ts-command-line-args@2.5.1
      - ts-jest@29.1.1
      - typescript@5.2.2
      - winston@3.11.0
      - zod@3.22.4
initialize:
  plugins:
    sum:
      method: Sum
      path: "@grnsft/if-plugins"
      global-config:
        input-parameters: ["cpu/energy", "network/energy"]
        output-parameter: "energy"
tree:
  children:
    child:
      pipeline:
        - sum
      config:
        sum:
      inputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          network/energy: 0.001
      outputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          network/energy: 0.002
          energy: 0.003

AND manifest-2.yml contains

name: test
description:
tags:
execution:
  command: >-
    /home/joe/.npm/_npx/1bf7c3c15bf47d04/node_modules/.bin/ts-node
    /home/joe/Code/if/src/index.ts --manifest
    /home/joe/Code/if/examples/manifests/basic.yml --stdout
  environment:
    os: macOS
    os-version: darwin
    node-version: 21.3.0
    date-time: '2024-05-01T08:45:57.050Z'
    dependencies:
      - '@babel/core@7.22.10'
      - '@babel/preset-typescript@7.23.3'
      - '@commitlint/cli@18.6.0'
      - '@commitlint/config-conventional@18.6.0'
      - '@grnsft/if-models@v0.1.6-beta'
      - '@grnsft/if-plugins@v0.3.2'
      - '@grnsft/if-unofficial-plugins@v0.3.1'
      - '@jest/globals@29.7.0'
      - '@types/jest@29.5.8'
      - '@types/js-yaml@4.0.9'
      - '@types/luxon@3.4.2'
      - '@types/node@20.9.0'
      - csv-stringify@6.4.6
      - fixpack@4.0.0
      - gts@5.2.0
      - husky@8.0.3
      - jest@29.7.0
      - js-yaml@4.1.0
      - luxon@3.4.4
      - rimraf@5.0.5
      - ts-command-line-args@2.5.1
      - ts-jest@29.1.1
      - typescript@5.2.2
      - winston@3.11.0
      - zod@3.22.4
initialize:
  plugins:
    sum:
      method: Sum
      path: "@grnsft/if-plugins"
      global-config:
        input-parameters: ["cpu/energy", "network/energy"]
        output-parameter: "energy"
tree:
  children:
    child:
      pipeline:
        - sum
      config:
        sum:
      inputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          network/energy: 0.001
      outputs:
        - timestamp: 2023-08-06T00:00
          duration: 3600
          cpu/energy: 0.001
          network/energy: 0.002
          energy: 0.003

THEN if-diff should emit exit code 0 and return FILES MATCH.

  • if-diff conforms to the matching rules described above in rationale section
    e.g. for two manifests whose value in the posiiton tree.children.vm1.inputs[3]['cpu/energy']:
Files do not match!
tree.children.vm1.inputs[3]['cpu/energy']
source: 45
target:  43

for two manifests that have different nodes in their trees:

Files do not match!
tree.children.vm1
source: exists
target:  missing
  • if-diff can accept data from stdin instead of a file on disk
  • GIVEN a user has downloaded and installed if
    WHEN they pipe the output from ie to if-diff, for example using the following command:
    ie --manifest example1.yml --stdout| if-diff --target example-target.yml

THEN: they receive an if-diff output.

@jawache jawache changed the title Build if-diff [WIP] Build if-diff Apr 21, 2024
@jawache
Copy link
Contributor

jawache commented Apr 21, 2024

Thoughts on this tool so far

  • Order of nodes should not matter at all, we should expect that future versions of IF might output data in different order for a variety of reasons.
  • if-diff should have different match modes.
    • exact, means both files have to match exactly.
    • tree. means the tree (inputs, outputs, config etc...) should all match exactly but other nodes (like the proposed execution node could be different), we just care that the computed numbers match?
    • contains, the manifest doesn't have to match exactly but the source should contain at least what's in the target. This will allow us to test more dynamic cases.

Generally we can't really refine this properly until we start collecting the manifests we want to use for testing in #615, understanding the different manifest test files will drive the requirements of the if-diff tool.

@jmcook1186
Copy link
Contributor Author

jmcook1186 commented Apr 29, 2024

Some suggested acceptance criteria (with questions for @jawache or for discussion in spike call):

  • command line tool if-diff exists

    • if-diff does not have to share an entry point with ie - it can be developed separately and be included in if's package.json
    • it should take two arguments - paths to source and target output files, e.g. if-diff --source /home/file.yml --target /home/file2.yml
    • it should return an evaluation of the similarity between the given files.
  • if the files passed to if-diff are not output files (i.e. it is not yaml, or it does not contain the expected context, execution, tree and outputs blocks ) if-diff should error out.

    • Question should if-diff support diff'ing manifests as well as output files? In which case omit outputs block check.
  • if-diff returns a "similarity type" plus details about how files differ:

    • suggested similarity types:
      • precise match: byte-level agreement between two files, orders of nodes/fields match, equivalent to a checksum verification - the files are identical.
      • content-match: trees have identical structure and names, lengths of arrays are equal, keys and values in each array are identical, context keys and values are the same. Execution info allowed to differ. Order of elements in file allowed to differ.
      • tree-match: trees have identical names, array lengths and element keys/values. Order of tree elements is allowed to differ. Context is allowed to differ.
      • custom-match: provide specific elements to compare by passing yaml path.
      • no match: None of precise-match, content-match, tree-match are satisfied.
    • Question what information should be logged along with each similarity type?
      - full git diff style output?

@zanete
Copy link

zanete commented Apr 29, 2024

noting that after #615 is complete, @MariamKhalatova can turn attention to this and help design how this feature can be tested

@jawache
Copy link
Contributor

jawache commented Apr 30, 2024

@jmcook1186 agreed, if-diff is a separate script in the if package.json (all of our tools should be separate scripts, if-diff, if-check, if-blitz, ie (if-run))

As discussed today.

On matching

  • target is a manifest file that validates the source.
  • every node in the target file must exist in the source file with the same exact value OR if the value in the target is * then it must have a value in source but the value can be anything.
  • all nodes in the source must also exist in the target, if there is an additional node in source that does not exist in the target it should error out.
    • The exception to the above rule is the execution node since that contains runtime information, if there do exist nodes in the execution block in the target, these have to match the nodes in the source. if there are nodes in execution block in source but not in target, this can be ignored in the matching.
  • if-diff should return on the first difference, no need to be exhaustive.
  • if-diff needs to return an exit value so it works in other shell scripts (https://stackoverflow.com/questions/6971284/what-are-the-error-exit-values-for-diff) 0 means no difference, 1 means difference and >1 means something bad happened.

On cli

if-diff --source /home/file.yml --target /home/file2.yml

BUT if source is omitted it takes in source from stdin, that way it can be used in a linux pipeline like so.

ie --manifest source.yml | if-diff --target target.yml

if-check might then just be as simple as a bash file which runs this on every file in a folder.

if-env $file && npm install package.json && ie --manifest $file | if-diff $file

On display

if-diff should print out something like so on the difference it finds, perhaps something like the below but @jmcook1186 you should take a look at how other tools like diff display these kinds of differences:

On a difference

tree.children.vm1[4].cp/utilization
source: 45
target:  43

On a deleted node (in target but not in source)

tree.children.vm1[4].cp/utilization
source: ???
target:  43

On an added node (in source but not in target)

tree.children.vm1[4].cp/utilization
source: 45
target:  ???

@jmcook1186
Copy link
Contributor Author

@jawache

There is a bit of a conflict between the idea to provide a matching status (content-match, tree-match etc) and the idea to exit on the first difference if-diff encounters.

This is because this implies you need to know the matching tolerance in advance, maybe passing it as a piece of config, rather than having it being something returned by the tool.

I think scrap the idea of having matching status as an outcome and instead make it config. On the CLI you could pass --tolerance tree or --tolerance content and then that defines on which kinds of differences if-diff errors out.

Then the return types are either SUCCESS or it reports the first identified error within your stated tolerance. Here's a breakdown of how I see it working:

CLI details:

  • if-diff is a script separate from ie - it is self contained and can be included in if/package.json.
  • if-diff should take three arguments: --source, --target and --tolerance
    • source is the "test case" file that represents the source of truth
    • target is the file that is being compared against source
    • tolerance is configuration that determines the type of differences that are checked or ignored in the comparison between source and target. Can be precise, tree, content. This parameter determines what differences are in scope for if-diff.
  • if-diff should only be able to return either SUCCESS or the details of the first in-scope error it encounters.
  • if-diff should be able to receive a source from stdin as well as from a file path passed on the command line, e.g. ie --manifest source.yml | if-diff --target target.yml

Tolerance rules:

  • The tolerance parameter can be one of content-match or tree-match. The choice of tolerance determines when if-diff should report a difference between two files.

  • content-match:

    • report these differences:
      • trees contain different number of nodes
      • nodes in tree have different names
      • nodes in tree contain non-identical fields and/or values
      • any arrays in tree have unequal lengths and/or non-matching values
      • fields, subfields and values in context are non-identical
    • ignore differences in:
      • order of nodes in tree
      • order of fields in context
      • content of execution block
  • tree-match:

    • report these differences:

      • trees contain different number of nodes
      • nodes in tree have different names
      • nodes in tree contain non-identical fields and/or values
      • any arrays in tree have unequal lengths and/or non-matching values
    • ignore differences in:

      • order of nodes in tree
      • order of fields in context
      • content of execution block
      • content in context block

Global matching rules

  • Regardless of the tolerance there should be no elements in target that do not exist in source and vice-versa
  • Matching should not consider the runtime information included in the execution node. It can be completely ignored by if-diff.
  • if-diff should support wildcard characters (*) that allow any value to count as a match
    • e.g. if source includes cpu-util: * then both cpu-util: 45 and cpu-util: 55 in the same position in target should be accepted as a match. Pass on any value - no need to check types etc here at the moment.

Output display format

  • If if-diff finds no in-scope differences between the source and target then it should return a success flag: SUCCESS.

  • If if-diff detects an in-scope difference between the files, it should halt execution and report the error to the command line. The report should include the yaml path to the differing element in the tree, the value in the source and the value in the target.

  • e.g. difference values detected:

tree.children.vm1[4].cpu/utilization
source: 45
target:  43
  • e.g. field present in tree in source but not in target:
tree.children.vm1[4].cpu/utilization
source: missing
target:  43
  • e.g. field present in target but not in source:
tree.children.vm1[4].cpu/utilization
source: 43
target:  missing
  • e.g. entire section of tree present in source but missing from target:
tree.children.vm1
source: tree.children.vm1
target: missing

@jawache
Copy link
Contributor

jawache commented Apr 30, 2024

@jmcook1186 I think at least for the MVP we need to support automated testing, there isn't a need to differentiate between tree, context anymore like we originally discussed. I think we're talking right now about just content-match, tree-match I think we spoke about previously but after looking at the folder of manifests we need to test it's not required at least for automated testing.

I'm not sure I fully understand the issue of why having both causes problems, but if we only have content-match for this initial version, does that resolve your concerns?

@jmcook1186
Copy link
Contributor Author

yes, I think that removing tree-match from the scope of the task should simplify things enough.

@jmcook1186
Copy link
Contributor Author

Updated ticket with acceptance criteria. Happy for this to go to @narekhovhannisyan to think about implementation details and give feedback on the spec.
@zanete

@jmcook1186 jmcook1186 changed the title [WIP] Build if-diff Build if-diff May 1, 2024
@narekhovhannisyan
Copy link
Member

@jawache @jmcook1186 if we want to pipe the ie response to if-diff, then ie should return clean manifest without any logging. To do that we may have to add a logging flag. F.ex. ie --manifest <path> --stdout should log only output, and if we want to see all the other logs like a disclaimer or so on, then we have to pass --logging. Or vice versa.

@narekhovhannisyan
Copy link
Member

For having multiple entry points we can use package.json

{
  "bin": {
    "if-run": "./build/index.js",
    "if-diff": "./build/diff.js",
    "if-env": "./build/env.js"
  }
}

@jmcook1186
Copy link
Contributor Author

ahh I see, because we're logging on stdout and also using stdout to pipe manifests. I guess the two options are to hide our logs behind a --log flag or emit our logs via stderr? @narekhovhannisyan

@jawache
Copy link
Contributor

jawache commented May 7, 2024

Thanks @narekhovhannisyan that's a good point re: needing to have clean stdout to pipe.

@jmcook1186 I do prefer your approach, and in fact thinking about it I imagine that's exactly why STDERR and STDOUT were defined, so you can support piping without sending errors.

Also there are a bunch of errors when we run from some of the modules dependancies which themselves are printed to STDERR (I get rid of them while demoing by redirecting STDERR to /dev/null) so it's something other modules do at least.

@narekhovhannisyan
Copy link
Member

@jmcook1186 I think hiding behind --log or --verbose will be better. STDERR is for errors.

@narekhovhannisyan
Copy link
Member

@jawache @jmcook1186 The project will need minor re-architecture for this feature to support modularity and extension for other features. F.ex. args utility, validation and etc.

@jawache
Copy link
Contributor

jawache commented May 7, 2024

@narekhovhannisyan and @jmcook1186 this discussion supports logging going to stderr by default and "regular output" going to stout, so this looks like a pretty common software pattern: https://stackoverflow.com/questions/4919093/should-i-log-messages-to-stderr-or-stdout#:~:text=Stdout%20is%20the%20%22output%22%20or,why%20stderr%20is%20usually%20unbuffered.)

There is also an argument at the end that since stderr is unbuffered it's also where logging should go.

@jmcook1186
Copy link
Contributor Author

I agree - I looked into this and found several projects explicitly arguing in favour of logging to stderr to separate logs from product outputs, and it solves the problem neatly in this case. I am also not very keen on adding flags/commands for what will be a core feature.

@zanete zanete changed the title Build if-diff Build if-diff core feature May 9, 2024
@jawache
Copy link
Contributor

jawache commented May 9, 2024

@jmcook1186 as discussed let's add an AC that mirrors a negative testing manifest from the folder we created.

@zanete zanete mentioned this issue May 9, 2024
2 tasks
@zanete zanete added this to the Automated Testing milestone May 9, 2024
@zanete
Copy link

zanete commented May 9, 2024

@jmcook1186 please check over the acceptance criteria and updated SoW, I tried to add headings to separate out the scenarios, but got quite confused at the very and as I couldn't get where the last scenario starts, hope you can unpick that one, perhaps I have deleted too much? 🙏

@jmcook1186
Copy link
Contributor Author

yes some of the criteria have gotten scrambled between the issues. I'm cleaning it up now.

@jmcook1186 jmcook1186 changed the title Build if-diff core feature Apply more advanced matching criteria in if-diff May 9, 2024
@jmcook1186 jmcook1186 changed the title Apply more advanced matching criteria in if-diff Build if-diff core feature May 9, 2024
@narekhovhannisyan narekhovhannisyan linked a pull request May 15, 2024 that will close this issue
9 tasks
@zanete
Copy link

zanete commented May 21, 2024

@MariamKhalatova can you please link all the bugs raised around this feature so we are aware of what's left to fix until it can be merged (or what can be deprioritised so it doesn't hold up the release). I've seen #726, are there more and how severe are they. I believe we agreed to do this last bug triage, let me know if you have any questions.
cc @jmcook1186

@MariamKhalatova
Copy link
Contributor

@zanete Currently we have 1 issue #706 ,which I linked to the PR. I will also link it to the task(issue). And maybe 1 more bug (waiting for Joseph's decision). And a few scenarios which are not tested yet (will finish today).

@zanete
Copy link

zanete commented May 21, 2024

@jmcook1186 please review @MariamKhalatova PR #721

@MariamKhalatova
Copy link
Contributor

#733

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

5 participants