Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

borg diff should have --format option #4634

Closed
elho opened this issue Jun 22, 2019 · 23 comments
Closed

borg diff should have --format option #4634

elho opened this issue Jun 22, 2019 · 23 comments

Comments

@elho
Copy link
Contributor

elho commented Jun 22, 2019

The diff sub-command should have a --format option like e.g. the list sub-command has.

Placeholders should include (but not be limited to), path, type (directory entry type, same as with list), change (giving the type of change, ie. whether added, deleted, modified (content change), owner, group, mode, etc. - speaking long or like AME, whatever is deemed preferable) and whatever fields to represent what else diff has at its hands that it can output, e.g. oldsize, newsize, deltasize, etc.
If several aspects of a directory entry were changed, multiple lines, each with the according change value would be printed. This does not sound nice at first, bit fits the format-string style output (as opposed to more template style with conditionals etc.) and suits working on the output with shell tools like grep.

This would allow to do stuff like borg diff --format "{change}{type}:{path}{NUL}" | sed --null-data --quiet --expression='s/(A|M)-://p' | xargs --null dosomethingwithchangedfiles, thus being able to workaround/solve e.g. #3514 and #4191.

@palao
Copy link

palao commented Jan 20, 2021

I'd like to implement this. Is it ok?

@elho
Copy link
Contributor Author

elho commented Jan 20, 2021

I'd like to implement this. Is it ok?

Great, go ahead. 👍

Just note that it is not as trivial as it sounds, the current output is not generated by the diff function but deeper in the code where the actual changes between items are detected.

So based on what I remember from back then, when I did briefly look into the code to see whether I could just fix it quickly instead of opening the bug, I would imagine it to involve these steps:

  1. Redo that internal part to return/communicate changes using some defined proper data structure (e.g. a nested dict of the attributes changed, each with old and new maybe?) instead of strings.
  2. Make the existing diff function build the current output based on that data structure gathered from the internal diffing process.
  3. Make any other part of the code that eventually uses those strings to use the new data structure.
  4. Extent the existing diff function generate --format output based on those very same data structures

Further additions, such as some sort of JSON output would then be quite trivial.

@palao
Copy link

palao commented Feb 1, 2022

One question: what are the expected list of possible placeholders?

@diivi
Copy link
Contributor

diivi commented Feb 25, 2023

I would like to work on this. Please assign this issue to me, I'll ask for help when needed on the #borgbackup IRC channel.

@ThomasWaldmann
Copy link
Member

@diivi you have it, thanks for helping!

@diivi
Copy link
Contributor

diivi commented Feb 25, 2023

Hey @elho, the current ItemDiff methods _something_diff always return a 2-tuple: infos_in_a_dict, infos_formatted_str do you want me to get rid of infos_formatted_str and just use the dict to format outputs?

@diivi
Copy link
Contributor

diivi commented Feb 25, 2023

Also, regarding the placeholders -
I'll have to return the same dict structure from every _diff_* function right? So that I have the right keys to access when I create my own DiffFormatter class later. This means when the file's owner was changed, I will still return all other keys- dict(mode=(st1.mode,st2.mode),user=(st1.user,st.user)...) even though st1.user and st2.user are the same.
Another option-
Just show a message saying key did not change?

@RF-Tar-Railt
Copy link
Contributor

I'd like to get a try for solution of this

@ThomasWaldmann
Copy link
Member

@RF-Tar-Railt maybe @diivi is still working on this? @diivi are you?

@diivi
Copy link
Contributor

diivi commented Mar 10, 2023

@RF-Tar-Railt maybe @diivi is still working on this? @diivi are you?

Nope, I couldn't start working on this as it required some more design discussions.
@RF-Tar-Railt can work on it if they want to.

@RF-Tar-Railt
Copy link
Contributor

what would be the default format of diff?

  +1.7 kB   -1.7 kB [ctime: Wed, 2023-02-22 00:06:51 +0800 -> Sat, 2023-03-11 13:34:35 +0800] [mtime: Wed, 2023-02-22 00:06:51 +0800 -> Sat, 2023-03-11 13:34:35 +0800] dev/Alconna/setup.py
{
    "changes": [
        {
            "added": 1705, 
            "removed": 1705, 
            "type": "modified"
        },
        {
            "new_ctime": "2023-03-11T13:34:35.678141+08:00", 
            "old_ctime": "2023-02-22T00:08:47.979206+08:00", 
            "type": "ctime"
        }, 
        {
            "new_mtime": "2023-03-11T13:34:35.678141+08:00", 
            "old_mtime": "2023-02-22T00:08:47.979206+08:00", 
            "type": "mtime"
        }
    ], 
    "path": "dev/Alconna"
}

if follows what json-lines given, it may be:

"{mode}{size}{ctime}{mtime}{path}{NL}"
  • mode represents the change types added, deleted, modified
  • size represents the change size

@RF-Tar-Railt
Copy link
Contributor

and if don't mind, I'll make some changes to the whole diff instruction handling
It can check my fork to track my progress

@RF-Tar-Railt
Copy link
Contributor

I create a issue in my fork for collect some ideas: #1

@infectormp
Copy link
Contributor

IMHO, it would be better if you will collect your ideas here. For those who track changes in the main repository this is a better place to track changes and comments.

@RF-Tar-Railt
Copy link
Contributor

IMHO, it would be better if you will collect your ideas here. For those who track changes in the main repository this is a better place to track changes and comments.

I will also move the summary ideas here, thank you for your advice

@RF-Tar-Railt
Copy link
Contributor

Some thoughts now:

  1. let ItemDiff.changes return list[dict] or list[Diff] as Diff is a specific class with type and changes attribute
  2. every _diff_* function can return a "unchanged" Diff as fallback, that provide more information to other handlers while being ignored by DiffFormatter
  3. IMHO, all possible placeholders can be:
  • type: the type of difference, like modified
  • content: difference of size change, like +1.7 kB -1.7 kB; and difference of presence change, like added directory
  • mode: specific difference of mode change
  • owner: specifc difference of owner change
  • ctime, mtime: needless to say
  • path: needless to say

@RF-Tar-Railt
Copy link
Contributor

RF-Tar-Railt commented Mar 12, 2023

new idea:
let Diff perform like Item with lots of propertry

class DiffChange:
    type: str
    @property
    def content() -> dict: ...
    ...

and so that it can easily used by DiffFormater

@RF-Tar-Railt
Copy link
Contributor

Can I rewrite part of the BaseFormatter as well? @ThomasWaldmann

@RF-Tar-Railt
Copy link
Contributor

RF-Tar-Railt commented Mar 12, 2023

@elho A question need to resolve:
what is the expected behavior when current changes cannot match the given format pattern? cancel this output or keep placeholder?

@RF-Tar-Railt
Copy link
Contributor

#7440

@RF-Tar-Railt
Copy link
Contributor

#7440

@elho It maybe works

@RF-Tar-Railt
Copy link
Contributor

#7534

ThomasWaldmann pushed a commit that referenced this issue Jun 11, 2023
diff: add --format option

also: refactoring/improvements of BaseFormatter
@ThomasWaldmann
Copy link
Member

fixed by #7534

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants