Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Amending commits #5430

Merged
merged 10 commits into from Mar 4, 2021
Merged

ENH: Amending commits #5430

merged 10 commits into from Mar 4, 2021

Conversation

bpoldrack
Copy link
Member

@bpoldrack bpoldrack commented Feb 22, 2021

Looking into a resolution of #3244 (and subsequently #5412).

Introducing --amend for save. By default this keeps the previous commit message if none was passed to save. --amend can not be applied recursively for now (see #5455 ).

  • minor fix: GitRepo._call_git didn't respect its env parameter
  • minor fix: save exited zero w/o a result when called on empty repo (no commit or empty commit only). Now: either notneeded or proceeds in case of --amend.
  • minor enh: GitRepo.update_ref now optionally takes oldvalue parameter

@bpoldrack bpoldrack marked this pull request as draft February 22, 2021 15:37
@bpoldrack bpoldrack force-pushed the enh-amend-commits branch 4 times, most recently from d415b58 to e758d9a Compare February 22, 2021 17:44
@codecov
Copy link

codecov bot commented Feb 23, 2021

Codecov Report

Merging #5430 (bd1e75d) into master (92bb704) will decrease coverage by 0.00%.
The diff coverage is 87.89%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5430      +/-   ##
==========================================
- Coverage   89.54%   89.54%   -0.01%     
==========================================
  Files         296      296              
  Lines       41971    42114     +143     
==========================================
+ Hits        37585    37711     +126     
- Misses       4386     4403      +17     
Impacted Files Coverage Δ
datalad/support/annexrepo.py 89.27% <18.18%> (-1.43%) ⬇️
datalad/support/gitrepo.py 91.98% <96.29%> (+0.01%) ⬆️
datalad/core/local/save.py 98.91% <100.00%> (+0.08%) ⬆️
datalad/core/local/tests/test_save.py 96.66% <100.00%> (+0.58%) ⬆️
datalad/support/tests/test_gitrepo.py 99.79% <100.00%> (+<0.01%) ⬆️
datalad/core/local/create.py 94.44% <0.00%> (+1.38%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 92bb704...bd1e75d. Read the comment docs.

@bpoldrack bpoldrack force-pushed the enh-amend-commits branch 6 times, most recently from 38be0fb to 70aecd8 Compare February 24, 2021 19:11
@bpoldrack bpoldrack force-pushed the enh-amend-commits branch 5 times, most recently from 4be189d to 526779d Compare February 25, 2021 12:17
@bpoldrack
Copy link
Member Author

Rebased and cleaned up history after tests passed.

@bpoldrack bpoldrack marked this pull request as ready for review February 25, 2021 12:17
Copy link
Contributor

@kyleam kyleam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat. Testing it out on my end, that seems to work nicely. Here are my comments from a mostly complete read-through. (I didn't look over the tests very closely.)

datalad/core/local/save.py Outdated Show resolved Hide resolved
datalad/support/annexrepo.py Show resolved Hide resolved
datalad/support/annexrepo.py Outdated Show resolved Hide resolved
datalad/support/gitrepo.py Show resolved Hide resolved
datalad/support/gitrepo.py Outdated Show resolved Hide resolved
datalad/support/annexrepo.py Outdated Show resolved Hide resolved
@bpoldrack bpoldrack force-pushed the enh-amend-commits branch 3 times, most recently from 3552608 to 21dfad4 Compare February 26, 2021 14:15
@bpoldrack
Copy link
Member Author

bpoldrack commented Feb 26, 2021

So, the test for fixing the case where we amend a commit w/o parent fails on crippledFS and windows. The failure is real:

C:\Users\appveyor\test2>git log
commit 3faa0c63c6a3e4b6c48862fa53fde1f57f0f2161 (HEAD -> adjusted/master(unlocked))
Author: Appveyor Almighty <test@appveyor.land>
Date:   Fri Feb 26 14:25:35 2021 +0000

    git-annex adjusted branch

commit a8e13e782900d286caa816948be061144bb3efc9 (master, refs/basis/adjusted/master(unlocked))
Author: Appveyor Almighty <test@appveyor.land>
Date:   Fri Feb 26 14:25:35 2021 +0000

    commit before entering adjusted branch

C:\Users\appveyor\test2>datalad save -m new --amend
C:\Users\appveyor\test2>git log
commit 3faa0c63c6a3e4b6c48862fa53fde1f57f0f2161 (HEAD -> adjusted/master(unlocked))
Author: Appveyor Almighty <test@appveyor.land>
Date:   Fri Feb 26 14:25:35 2021 +0000

    git-annex adjusted branch

commit a8e13e782900d286caa816948be061144bb3efc9 (master, refs/basis/adjusted/master(unlocked))
Author: Appveyor Almighty <test@appveyor.land>
Date:   Fri Feb 26 14:25:35 2021 +0000

    commit before entering adjusted branch

However, that is when amending w/o a modification. With a change to commit it works and once that is done it does so w/o further change:

C:\Users\appveyor\test2>echo some > some
C:\Users\appveyor\test2>datalad save -m new --amend
add(ok): some (file)
save(ok): . (dataset)
action summary:
  add (ok: 1)
  save (ok: 1)

C:\Users\appveyor\test2>git log
commit d5e1ec2a442f790284f9a3c5c0874eee2760b059 (HEAD -> adjusted/master(unlocked))
Author: Appveyor Almighty <test@appveyor.land>
Date:   Fri Feb 26 14:25:35 2021 +0000

    git-annex adjusted branch

commit 7a71da7f1456c5e16ed774316671a48dd1cdd6fb (master, refs/basis/adjusted/master(unlocked))
Author: Appveyor Almighty <test@appveyor.land>
Date:   Fri Feb 26 14:25:35 2021 +0000

    new

C:\Users\appveyor\test2>datalad save -m brandnew --amend
save(ok): . (dataset)

C:\Users\appveyor\test2>git log
commit 49f8fec3a38dbf7d0ff4352caa29d2ce63ad8e3d (HEAD -> adjusted/master(unlocked))
Author: Appveyor Almighty <test@appveyor.land>
Date:   Fri Feb 26 14:25:35 2021 +0000

    git-annex adjusted branch

commit e3c5ec6dfeb24fd78c2179edc7681105b98d770a (master, refs/basis/adjusted/master(unlocked))
Author: Appveyor Almighty <test@appveyor.land>
Date:   Fri Feb 26 14:25:35 2021 +0000

    brandnew

So, I guess the issue here is not the adjusted branch (generally amending the first commit works w/ adjusted branch, too), but the fact, that the to be amended commit as well as the amendment are empty. Therefore there's no tree object to commit.

There seems to be no --allow-empty for git commit-tree, though. Hmmm.

edit:
That's not quite it yet. In regular git repo I can git commit-tree empty^{tree} -m msg.

@kyleam
Copy link
Contributor

kyleam commented Feb 26, 2021

With a change to commit it works and once that is done it does so w/o further change

I believe this is because with an empty tree, Dataset.saves paths_by_ds is empty and GitRepo.save is never called.

@bpoldrack
Copy link
Member Author

I believe this is because with an empty tree, the Dataset.saves paths_by_ds is empty and GitRepo.save is never called.

Yes, that seems to be it. It exits zero with no message and no result, though. So, even w/o amend needs a fix, I guess.

❱ mkdir empty-repo                                                                                                                         128 !
(datalad-dev) ben@tree in /tmp
❱ git -C empty-repo init
Initialized empty Git repository in /tmp/empty-repo/.git/
(datalad-dev) ben@tree in /tmp
❱ git -C empty-repo annex init
init  (scanning for unlocked files...)
ok
(recording state in git...)
(datalad-dev) ben@tree in /tmp
(datalad-dev) ben@tree in /tmp
❱ datalad -f json_pp save -d empty-repo -m empty                                                                                             2 !
(datalad-dev) ben@tree in /tmp
❱ 

@bpoldrack
Copy link
Member Author

Changed back to draft until everything (including CI mess) is sorted again.

Copy link
Member

@mih mih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! Thx for implementing this!

I took a look at the coverage report -- which seems invalid. The lines reported missing here are reported covered when running on a crippledfs locally.

I gave it a try with a couple of datasets that I am working with and it works smoothly in easy-peasy-mode and with adjusted mode!

Going through the code, I only spotted one piece that seems needlessly wasteful, and suggested a change that should be leaner and still good enough.

Thx again!

if self._git_runner.env else os.environ).copy()
# `message` might be empty - we need to take it from the to be
# amended commit in that case:
msg = message or self.format_commit("%B", org_commit_pointer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer, if we could limit the amount of git calls to collect this type of info. What about this:

author_name, author_email, author_date, old_parent, old_message = \
    self.format_commit("%an%x00%ae%x00%ad%x00%P%x00%B").split('\0')
msg = message or old_message

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I wondered what to use as separator that is certain to not be included in such fields as author_name. \0 seems like a reasonable solution. Will do.

Copy link
Member Author

@bpoldrack bpoldrack Mar 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hehe, nope. format_commit already splits by \0. Went for % instead. Hope, that's valid.

Forget that. I anticipated failure but format_commit actually does it right :)
So, back to \0.

Copy link
Member

@mih mih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A conflict appeared now, but otherwise LGTM! Thx!

Introduce --amend option for the save command. By default this keeps the
previous commit message if none was passed to save. This applies
recursively per dataset.

This is involves handling the case of annexes adjusted branches, where
we must not actually commit-amend to git-annex' adjusting commit.
Instead:
 * commit regularly
 * sync to have that commit in the corresponding branch
 * squash the last two commits in the corresponding branch via
   git-commit-tree
 * update refs for that branch and annex' basis ref for adjusting
 * sync back to adjusted branch

Note:
 * When amending recursively it can happen that we commit in a
   subdataset, that was clean before (changing the commit message only).
   In that case, save's modification detection beforehand would lead to
   such a subdataset not being committed in its superdataset, failing to
   anticipate the commit.
   Fix this, by simply setting the status of all subdatasets to
   'modified', when we are saving with --amend --recursive and a message,
   since this will either produce a commit in an otherwise clean
   subdataset or the subdataset was modified anyway.

(Closes datalad#3244)
- amend commit w/o parent
- amend empty commit
- fix save for not returning a result on empty repo (no commit or empty
  commit only)
- with --amend --recursive only fix paths_by_ds for clean subdatasets
In order to proceed, work on recursive amendment is put into dataladgh-5455 and
postponed. Make the two options mutually exclusive for now.
@bpoldrack bpoldrack merged commit 25f9a7c into datalad:master Mar 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants