status support in `gix` (crate) #1049

Byron · 2023-10-05T13:42:55Z

Based on #1030

Improve gix status to the point where it's suitable for use in reset functinoality.
Leads to a proper worktree reset implementation, eventually leading to a high-level reset similar to how git supports it.

Architecture

The reason this PR deals quite a bit with gix status is that for a safe implementation of reset() we need to be sure that the files we would want to touch don't don't carry modifications or are untracked files. In order to know what would need to be done, we have to diff the current-index with target-index. The set of files to touch can then be used to lookup information provided by git-status, like worktree modifications, index modifications, and untracked files, to know if we can proceed or not. Here is also where the reset-modes would affect the outcome, i.e. what to change and how.

This is a very modular approach which facilitates testing and understanding of what otherwise would be a very complex algorithm. Having a set of changes as output also allows to one day parallelize applying these changes.

This leaves us in a situation where the current checkout() implementation wants to become a fastpath for situations where the reset involves an empty tree as source (i.e. create everything and overwrite local changes).

On the way to reset() it's a valid choice to warm up more with the matter by improving on the current gix status implementation and assure correctness of what's there, which currently doesn't seem to be the case in comparison. Further, implementing gix status similarly to git status should be made possible.

Tasks

finally sort of ctime issue.
port gix_status::fs to gix_index and assure this form of metadata-retrieval is used everywhere (e.g. checkout…).
- Note that it can't go behind a feature toggle (or gix-fs behind feature toggle) due to the per-platform dependencies
generalize rename-tracking to assure we will be able to integrate proper rename tracking later
make sure diffed blobs go through filters
skip diffs on binary files, which should be figured out before we get the data.
keep track of or implement diff-only conversions, like textconv, and gitattributes integration
status in gix crate
fun: a way to apply filters in cat-file equivalent, and possibly textconv conversions just like in `git cat-file.
diff index with index to learn what we would want to do in the worktree, or alternatively, diff tree with index (with reverse-diff functionality to simulate diff of index with tree), for better performance as it would avoid having to allocate a whole index even though we are only interested in a diff.

Next PR

reset() that checks if it's allowed to perform a worktree modification is allowed, or if an entry should be skipped. That way we can postpone safety checks like --hard

Postponed

What follows is important for resets, but won't be needed for cargo worktree resets.

what about index/worktree rename tracking? git2 can do that. Needs generalization of what's available for tree/tree diffs, at least learn from it.
gix status with actual submodule support - needs status in gix (crate) effectively
gix status with actual conflict support
a way to obtain untracked files to learn if changes can be made. What about the untracked files extension?

Limitations

It seems that when CTime is newer then MTime, that the Rust std implementation sets Ctime to mtime which then causes us to do extra-work and 'fight' git as we will write an index with the normalized Ctime, but git will rewrite that next time it runs. This can be fixed with core.trustCTime=false

Research

Ignored files are considered expandable and can be overwritten on reset
How to integrate submodules - probably easy to answer once gix status can deal a little better with submodules. Even though in this case a lot of submodule-related information is needed for a complete reset, probably only doable by a higher-level caller which orchestrates it.
How to deal with various modes like merge and keep? How to control refresh? Maybe partial (only the files we touch), and full, to also update the files we don't touch as part of status? Maybe it's part of status if that is run before.
Worthwhile to make explicit the difference between git reset and git checkout in terms of HEAD modifications. With the former changing HEADs referent, and the latter changing HEAD itself.
figure out how this relates to the current checkout() method as technically that's a reset --hard with optional overwrite check. Could it be rolled into one, with pathspec support added?
- just keep them separate until it's clear that reset() performs just as well, which is unlikely as there is more overhead. But maybe it's not worth to maintain two versions over it. But if so, one should probably rename it.
for git status: what about rename tracking? It's available for tree-diffs and quite complex on its own. Probably only needs HEAD-vs-index rename tracking. No, also can have worktree rename tracking, even though it's hard to imagine how this can be fast unless it's tightly integrated with untracked-files handling. This screams for a generalization of the tracking code though as the testing and implementation is complex, but should be generalisable.

As opposed to the Rust standard library, this one will get the ctime from the file itself, instead of from the inode. That way, the index file written by `gix` will not continuously be expensively rewritten by `git`, and vice versa.

They generalize reneame tracking to the point where it can work for different kinds of changes. There is still some way to go until it is truly correct though, as it still lacks worktree conversions and diff filters.

Previously the rename tracking engine was integrated with tree-diffs, but already operates in a stand-alone fashion. Now it's officially generalized which allows it to be tested separately and used when tracking renames for diffs between index and tree, index and index, and index and worktree.

Byron force-pushed the gix-status branch 3 times, most recently from d0eef9a to ad5f6b7 Compare October 10, 2023 05:45

Byron force-pushed the gix-status branch 3 times, most recently from ed94986 to 7911093 Compare October 19, 2023 06:04

Byron force-pushed the gix-status branch 6 times, most recently from 9f828e4 to b9b21f3 Compare October 30, 2023 15:22

trulsma mentioned this pull request Nov 1, 2023

Better way of diffing changes in unstaged, staged, and head trulsma/intelligit#5

Open

Byron force-pushed the gix-status branch 4 times, most recently from 539a295 to 339a6c1 Compare November 1, 2023 19:58

EliahKagan mentioned this pull request Nov 2, 2023

Revise comments, docstrings, some messages, and a bit of code gitpython-developers/GitPython#1725

Merged

Byron force-pushed the gix-status branch 6 times, most recently from 630899e to 52b2859 Compare November 6, 2023 12:48

Byron mentioned this pull request Nov 9, 2023

Count removed bytes correctly #1100

Merged

Byron added 4 commits November 11, 2023 16:17

update crate-status with planned features related to status

63fa80e

feat!: Add git-style metadata support.

3c8421f

As opposed to the Rust standard library, this one will get the ctime from the file itself, instead of from the inode. That way, the index file written by `gix` will not continuously be expensively rewritten by `git`, and vice versa.

adapt to changes in gix-index

8134767

fix: remove unused dependency and improve documentation slightly

13ab629

Byron force-pushed the gix-status branch from 52b2859 to b7ba734 Compare November 11, 2023 15:25

Byron force-pushed the gix-status branch from b7ba734 to 4d471cd Compare November 11, 2023 16:37

Byron added 3 commits November 11, 2023 18:46

feat: provider new rename-tracking faciliites.

e2745fd

They generalize reneame tracking to the point where it can work for different kinds of changes. There is still some way to go until it is truly correct though, as it still lacks worktree conversions and diff filters.

adapt to changes in gix related rename tracking

a28bf90

Byron force-pushed the gix-status branch from 4d471cd to a28bf90 Compare November 11, 2023 17:57

Byron merged commit c87f2cc into main Nov 11, 2023
18 checks passed

Byron mentioned this pull request Nov 11, 2023

diff correctness #1106

Merged

18 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

status support in `gix` (crate) #1049

status support in `gix` (crate) #1049

Byron commented Oct 5, 2023 •

edited

status support in gix (crate) #1049

status support in gix (crate) #1049

Conversation

Byron commented Oct 5, 2023 • edited

Architecture

Tasks

Next PR

Postponed

Limitations

Research

status support in `gix` (crate) #1049

status support in `gix` (crate) #1049

Byron commented Oct 5, 2023 •

edited