Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VNL and other third-party updates touch a lot of files #2314

Open
phcerdan opened this issue Feb 15, 2021 · 22 comments
Open

VNL and other third-party updates touch a lot of files #2314

phcerdan opened this issue Feb 15, 2021 · 22 comments
Labels
type:Design Improvement in the design of a given area type:Infrastructure Infrastructure/ecosystem related changes, such as CMake or buildbots

Comments

@phcerdan
Copy link
Contributor

For example the last commit updating VNL: df92e582d82

image

12 commits in VNL provoke almost 3 million deletions everywhere in ITK
It's the update from upstream script having some hiccups? Is it expected?

I have noticed it when grabbing the history of a file, it always has these VNL and sometimes MetaIO massive patches on it, that do not seem related to the file.

For example. history ofitkEigenAnalysis2DImageFilter.h:
image

Before 2016-03-02, it wasn't happening.

@blowekamp @hjmjohnson @thewtex

@phcerdan phcerdan added type:Design Improvement in the design of a given area type:Infrastructure Infrastructure/ecosystem related changes, such as CMake or buildbots labels Feb 15, 2021
@thewtex
Copy link
Member

thewtex commented Feb 15, 2021

@phcerdan is your ITK version up-to-date before running the script?

@phcerdan
Copy link
Contributor Author

@thewtex I am not updating VNL or running the update script, I am just browsing the current git history of ITK. df92e582d82

@thewtex
Copy link
Member

thewtex commented Feb 16, 2021

@hjmjohnson are you aware of style updates, etc., that may have caused this?

@hjmjohnson
Copy link
Member

Yes, and a lot of clang-tidy changes as well to VNL core libraries.
Yes, there are a lot of minuscule changes to VNL over the past year to fortify the codebase and start the process of modernizing.

To be honest, I don't quite understand how to resolve this issue.

@phcerdan
Copy link
Contributor Author

phcerdan commented Feb 16, 2021

I don't think it's a VNL issue. It must be about options/usage of UpdateFromUpstream.sh. Same happened with MetaIO here: 17e744

@mathstuf
Copy link
Contributor

Is the UpdateFromUpstream.sh in sync with the upstream script? I don't know how ITK uses it.

@dzenanz
Copy link
Member

dzenanz commented Feb 16, 2021

I think @bradking updated it a week ago via 41f4951.

@bradking
Copy link
Member

Modules/ThirdParty/VNL/UpdateFromUpstream.sh uses the older Utilities/Maintenance/UpdateThirdPartyFromUpstream.sh script. My update was to Utilities/Maintenance/update-third-party.bash.

@bradking
Copy link
Member

I'll look at converting Modules/ThirdParty/VNL/UpdateFromUpstream.sh to use the newer script.

@bradking
Copy link
Member

I've updated #2324 to switch to the new script.

@bradking
Copy link
Member

#2324 demonstrated an update with only a few files from VNL changed. Can this be closed?

@phcerdan
Copy link
Contributor Author

Thanks @bradking for the improvement of the update script in #2324.
Before your changes, the old UpdateThirdPartyFromUpstream.sh run by @thewtex didn't either show the pollution reported in this issue. So maybe it was a local issue caused by running the script with outdated sources.

If at some point in the future ITK rewrites its git history, we could solve the pollution generated from older updates of VNL and MetaIO.

I recommend to keep an eye when updating third parties still using the old UpdateThirdPartyFromUpstream.sh method to check that they don't touch 16.000 files.

@phcerdan
Copy link
Contributor Author

phcerdan commented Oct 22, 2023

A couple of years have passed, and history is still getting polluted when using UpdateFromStream scripts.

Updated history of the same example file shown in my original post: itkEigenAnalysis2DImageFilter.h. The MetaIO commit touches 16K files (from this PR: #4116, in particular this commit: c42090c,
note that the changes are not shown in the PR diff).

image

I am able to reproduce this when trying to update Eigen to another branch (from branch for/itk-3.4 to for/itk-master).

diff --git a/Modules/ThirdParty/Eigen3/UpdateFromUpstream.sh b/Modules/ThirdParty/Eigen3/UpdateFromUpstream.sh
index e6c37c161bc..8e72f68af75 100755
--- a/Modules/ThirdParty/Eigen3/UpdateFromUpstream.sh
+++ b/Modules/ThirdParty/Eigen3/UpdateFromUpstream.sh
@@ -10,7 +10,7 @@ readonly subtree="Modules/ThirdParty/Eigen3/src/itkeigen"
 # readonly repo="https://gitlab.com/libeigen/eigen"
 # readonly repo="https://gitlab.kitware.com/third-party/eigen.git"
 readonly repo="https://gitlab.kitware.com/phcerdan/eigen.git"
-readonly tag="for/itk-3.4"
+readonly tag="for/itk-master"
 readonly paths="
 Eigen/Cholesky
 Eigen/CholmodSupport

Here are the logs after running the UpdateFromStream.sh for Eigen3 after that change.

update_from_upstream_touch16k_files.txt

The update seems successful, but it introduces a commit touching 16K files.

commit bec339ee6f2e809752844a8489761ddfe18271b7 (HEAD -> update_eigen_to_master)
Merge: 217a54a40b7 522968224c9
Author: Pablo Hernandez-Cerdan <pablo.hernandez.cerdan@outlook.com>
Date:   Mon Oct 23 00:35:46 2023 +0200

    Merge branch 'upstream-Eigen3' into update_eigen_to_master

    # By Eigen Upstream
    * upstream-Eigen3:
      Eigen3 2023-10-22 (9cd4aec6)

commit 522968224c93dfd0f2ead1eb24d882d7948df40c
Author: Eigen Upstream <kwrobot@kitware.com>
Date:   Sun Oct 22 01:42:43 2023 +0200

    Eigen3 2023-10-22 (9cd4aec6)

    Code extracted from:

        https://gitlab.kitware.com/phcerdan/eigen.git

    at commit 9cd4aec6122b29a37e8de3fbfff47de3af0fc371 (for/itk-master).

See PR #4265 as an example when updating Eigen.

@phcerdan phcerdan reopened this Oct 22, 2023
@mathstuf
Copy link
Contributor

Was this branch rebased at some point? Third party import merge commits cannot just be rebased; I typically remove the commit from the rebase -i and replace it with exec ThirdParty/import/update.sh to rerun the import during the rebase.

@mathstuf
Copy link
Contributor

Note that the robot does have a check to enforce that the merge happens properly and will detect and block any rebases like this. It does not seem the robot is enforcing this for ITK at the moment. I'll caution that it means that all changes to third parties must go through the import script.

The git-check-third-party script from this repo will check the state of an import. For the Eigen import it reports:

% git check-third-party Modules/ThirdParty/Eigen3/src/itkeigen
fatal: not imported via a merge commit

This import likely needs redone. Once performed, we can start enforcing all such imports.

State of imports:

  • double-conversion: local diffs
  • eigen: not imported via a merge commit
  • gdcm: local diffs
  • googletest: local diffs
  • hdf5: not imported via a merge commit
  • kwsys: ok
  • minc: local diffs
  • metaio: not imported via a separate branch
  • nifti: local diffs
  • vxl: not imported via a separate branch
  • zlib-ng: local diffs

@phcerdan
Copy link
Contributor Author

phcerdan commented Oct 23, 2023

Was this branch rebased at some point? Third party import merge commits cannot just be rebased;

Yes, it was rebased.

Before:

- Commit2
- Commit1
- HEAD external eigen (release 3.4)

I updated the HEAD of external eigen to master:

After:

- NewCommit
- HEAD external eigen (master)

I typically remove the commit from the rebase -i and replace it with exec ThirdParty/import/update.sh to rerun the import during the rebase.

I would like to try this. However, how do you interact with both ITK (or VTK), and the third party module in that way? I see that the update script creates (and then removes) a work/extract and work/upstream. Is in that work/upstream where you perform that rebase?

The git-check-third-party script from this repo will check the state of an import. For the Eigen import it reports:
% git check-third-party Modules/ThirdParty/Eigen3/src/itkeigen
fatal: not imported via a merge commit
This import likely needs redone. Once performed, we can start enforcing all such imports.

It seems that we should enforce that check from now on in ITK.

@mathstuf
Copy link
Contributor

Is in that work/upstream where you perform that rebase?

The import is not (and should not be) rebased. That is what git rebase does not understand. Using exec to rerun the import just redoes the merge using the proper -Xsubtree option (also not transcribed in the history for things like verification or reuse when reapplying the commit via either cherry-pick or rebase).

It seems that we should enforce that check from now on in ITK.

I agree. The first thing to do would be to get the "local diff" instances' diffs upstreamed into the imported repository. These can then be trivially reimported and should be "fine" according the the script.

I would also recommend using (unannotated) tags in the import repos to preserve imported contents "forever". VTK started using for/vtk-YYYYMMDD[.N]-BASE where BASE is either the tag or something like master-gXXXXXXX with XXXXXXX the shorthash of the base commit and master the branch name. This allows the for/vtk branch to be rebased without worrying about "losing" the source for the import due to repository pruning over time.

@phcerdan
Copy link
Contributor Author

I have been trying to remove current subtree (itkeigen), and try a fresh update-from-upstream, but getting conflicts in the git read-tree -u --prefix="$subtree-tmp/" "upstream-$name" with the files I have modified upstream (.gitattributes, CMakeLists.txt).

I typically remove the commit from the rebase -i and replace it with exec ThirdParty/import/update.sh

I still don't know how to perform the operation you suggested:

What commit from the ITK history do you remove and replace with the exec update?

Pinging @thewtex

@mathstuf
Copy link
Contributor

What commit from the ITK history do you remove and replace with the exec update?

What is in master is there. I'm talking about when you use git rebase -i and see a bunch of pick commands. I remove the commit related to the import and replace it with exec path/to/update.sh.

I have been trying to remove current subtree (itkeigen), and try a fresh update-from-upstream, but getting conflicts in the git read-tree -u --prefix="$subtree-tmp/" "upstream-$name" with the files I have modified upstream (.gitattributes, CMakeLists.txt).

Chuck added a mode where it detects an empty directory and redoes the import.

@thewtex
Copy link
Member

thewtex commented Oct 23, 2023

The update seems successful, but it introduces a commit touching 16K files.

We are working with branches that have different histories and subtrees -- the logs may be misleading on how many files are actually being touched.

See PR #4265 as an example when updating Eigen.

The update here looks good at a high level.

I have been trying to remove current subtree (itkeigen), and try a fresh update-from-upstream, but getting conflicts in the git read-tree -u --prefix="$subtree-tmp/" "upstream-$name" with the files I have modified upstream (.gitattributes, CMakeLists.txt).

If both upstream and ITK edited the files, then a conflict resolution may be necessary.

@mathstuf
Copy link
Contributor

the logs may be misleading on how many files are actually being touched.

A rebase put the imported tree into the main ITK history without an -Xsubtree option. The diff of the commit itself is bogus.

If both upstream and ITK edited the files, then a conflict resolution may be necessary.

This ended up being an endless source of extra work when rebasing the import repository. This is why it is preferred that all changes to the import come through update.sh and never through manual patching.

@phcerdan
Copy link
Contributor Author

phcerdan commented Oct 24, 2023

Was this branch rebased at some point? Third party import merge commits cannot just be rebased; I typically remove the commit from the rebase -i and replace it with exec ThirdParty/import/update.sh to rerun the import during the rebase.

I understand now. I thought you were referring about the third-party project being rebased.
So, no, I am not rebasing inside the ITK master branch. But the third-party project was rebased.
Good trick to know though! I tried it for this current situation and get the same result (as it should be, because I am not rebasing ITK history).

The update here looks good at a high level.

It looks fine in the PR, but the first of the two commits is touching all the ITK files.

The diff of the commit itself is bogus.

It might be, but it pollutes the git history of all ITK files, which I think is extremely annoying.


So, at the end I got it solved (I think!).

As background, Eigen used to be stored in bitbucket, and we were using an non-official mirror for git. However, Eigen officially moved to gitlab,, the mirror became obsolete, and all the git hashes changed, like a history rewrite.

That's why I was getting these merge problems, the for/itk-3.4 and for/itk-master have unrelated histories.

So, I locally removed rm -r ./Modules/ThirdParty/Eigen3/src/itkeigen the subtree. Modified https://github.com/phcerdan/ITK/blob/f1c48dc682fbab83563a6455ee548860fda3790f/Utilities/Maintenance/update-third-party.bash#L225

git read-tree -u --prefix="$subtree/" "upstream-$name" for git read-tree -u --prefix="$subtree-delme/" "upstream-$name"
And commented out the last two commands:

    git read-tree -u --prefix="$subtree-delme/" "upstream-$name"
fi
# git commit --no-edit
# git branch -d "upstream-$name"

Then move: mv ./Modules/ThirdParty/Eigen3/src/itkeigen-delme ./Modules/ThirdParty/Eigen3/src/itkeigen

git restore Utilities/Maintenance/update-third-party.bash
git add -A
git commit --no-edit
git branch -d upstream-Eigen3

These changes result in only the itkigen subtree files being touched, instead of all ITK files.

@phcerdan phcerdan changed the title VNL updates touch a lot of files VNL and other third-party updates touch a lot of files Oct 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:Design Improvement in the design of a given area type:Infrastructure Infrastructure/ecosystem related changes, such as CMake or buildbots
Projects
None yet
Development

No branches or pull requests

6 participants