Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Adds fallback for changed files for first push #234

Conversation

Eydamos
Copy link

@Eydamos Eydamos commented Jan 12, 2024

No description provided.

@Eydamos
Copy link
Author

Eydamos commented Jan 12, 2024

Example solution for issue #233
This worked fine in all my local tests including merging and rebasing

return $ranges;
}

private function getHashOfBranchOrigin(): string
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is the actual fix. All other files are just changes because the repository is needed.
With git reflog show I get a list of all changes since the branch was created.
This looks like this:

4cd8bca1a853cde29a8a238ea24a36cf64898661 (HEAD -> feature/test) feature/test@{0}: commit: Improvement 2
05b5141ca91790d2becac4465232fa8ad695793e feature/test@{1}: commit: Improvement 1
3e6bc3932b396aa86b8e7d67188944956701bc13 (origin/master, master) feature/test@{2}: branch: Created from HEAD

The bottom entry of this list is the creation of the branch and it's hash can be used to get a list of changed files since the branch was created.

Comment on lines +90 to +91
$processor = new Processor();
$reflog = $processor->run(sprintf('git reflog show --no-abbrev %s', $currentBranch))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be moved to SebastianFeldmann\Git\Repository or one of the operator classes

@sebastianfeldmann
Copy link
Collaborator

This is even trickier than I thought.

You tacked the problem at the Rage\Detector level because it is the way the ChangedFile implementation works at the moment.

And your fallback way of using the reflog to find the starting hash of the commit would work in most cases.
But relying on the reflog should really be a "last resort" kind of thing.

I tried to figure out some other ways to identify all the files that changed.

Normally any diff, diff-tree and log command should work since they all support git revision selections.

That means git log --name-only --format='' REMOTE/BRANCH..BRANCH should do the trick.

And that's kind of true with one caveat. There are two ways of referencing the REMOTE/BRANCH and depending on your configuration only one or the other works.

1. Using the upstream HEAD (default branch)

git log --name-only --format='' origin/HEAD..HEAD

This works if you have set a default branch.

$ git branch -la
* demo-branch
  main
  remotes/origin/HEAD -> origin/main <--- this line is important
  remotes/origin/main

You can set the default brach by executing git remote set-head origin --auto.
Using -a or --auto asks the remote for the default branch.

If you don't have this setup properly the log command will fail because origin/HEAD is an unknown/ambiguous revision

2. Using the branch reference directly

git log --name-only --format='' origin/demo-branch..HEAD

This works if you haven't set a default branch. The problem being that it fails if you have done it.

$ git branch -la
* demo-branch
  gh-pages
  main
  remotes/origin/gh-pages
  remotes/origin/main

If you have set a default branch it fails because, again origin/demo-branch is an unknown/ambiguous revision.

Conclusion

My preferred course of action would be:

  1. Use the stdIn data for detecting ranges if possible
  2. On first push (0000000 zero range) use a log/diff command to detect the changed files.
  3. If everything fails use your reflog fallback solution

I'm still struggling with (2).
I could just execute origin/HEAD..HEAD and if it fails run origin/BRANCH_NAME..HEAD but I'm very unhappy with blindly executing commands that I know could fail.

The only way I could find so far to figure out if a default branch is set is running.

git remote show origin -n

If the HEAD branch is listed, the default is set.

$ git remote show origin -n
* remote origin
  Fetch URL: ssh://git@github.com/captainhookphp/captainhook.git
  Push  URL: ssh://git@github.com/captainhookphp/captainhook.git
  HEAD branch: (not queried)                             <---- this line is important
  Remote branches: (status not queried)
    gh-pages
    main
  Local branch configured for 'git pull':
    main merges with remote main
  Local ref configured for 'git push' (status not queried):
    (matching) pushes to (matching)

So in order to detect the changed files I would:

  1. Check Ranges from stdIn
  2. Check the default branch status
  3. Run remote/HEAD or remote/BRANCH according to the result of that check
  4. Check the reflog for any clues

If you know of anything I'm missing here please let me know ;)

The even trickier part if figuring all this out for the first push of an so far empty repository. But I will tackle that problem, after we solved this one because that should really be an extreme edge case were someone has setup the Cap'n before even doing the first push ;)

/cc @shochdoerfer @heiglandreas @bcremer @ramsey maybe you have some better ideas.

@shochdoerfer
Copy link
Contributor

I am not so deep into Git internals that I could provide a better option. The only thing that comes to my mind would be to set CHANGED_FILES to all the files in the repo when Git does not provide the information. You could recursively scan the project without the need for git to "work". Surely not ideal but probably better than the current behaviour.

Also, I am wondering why I haven't been hitting the problem. Need to investigate, I guess.

@sebastianfeldmann
Copy link
Collaborator

@shochdoerfer
If you create feature branches in GitLab and check them out locally there is an origin/BRANCH_NAME reference and git will not use the zero hash for the current remote state and the Cap'n can determine the changed files just fine.

The issue comes up if a new branch gets pushed to a remote that does not exist on the remote so far.

So we can ignore the whole origin/HEAD thing I was rambling about.
It just works by accident, because the demo-branch was created from the default branch.
If you create a branch from a branch that is not the default branch this fails.

The only right way to check is REMOTE/BRANCH_NAME which doesn't exist if the branch was never pushed before.
That's why git outputs the zero commit hash for the remote.

Since we don't know the root-branch for the current branch for sure we can't do something like

git log --name-only --format='' ROOT_BRANCH..BRANCH

So far I could not find a way to reliably figure out the branch point without knowing the parent branch name.

I will go with the reflog solution for now.
As I mentioned in the beginning it should work most of the time.

@Eydamos
Copy link
Author

Eydamos commented Jan 15, 2024

I know the topic is very difficult. It took me a whole day to figure out a way to still get the files, which was quite weird for me as every GUI I know (being it specific git GUIs or an IDE like PhpStorm) shows me the changed files before I push. Even if the remote branch does not exists yet.

Even if there are edge cases where it will not work or not work 100% I think implementing it is better than not having any files at all, because as it is right now the CHANGED_FILES variable is not usable at all for our workflow.

Thank you for considering my solution and for the quick response

@sebastianfeldmann
Copy link
Collaborator

Yes you are absolutely right @Eydamos

I will switch some more things around to streamline the "changed files" thing even a bit more internally.

I will also, just as you suggested, move some of the logic to the sebastianfeldmann/git repo.

A huge thanks for taking the time to investigate this and providing the proof of concept implementation it is highly appreciated.

I think I need at least a day or two to get everything working properly again.

@Eydamos
Copy link
Author

Eydamos commented Jan 15, 2024

Take your time. I will just not use the pre-push for now as the check is not too important. If you need any help e.g. for testing or so let me know

sebastianfeldmann added a commit that referenced this pull request Jan 17, 2024
There was an issue where the changed file list was empty while pushing
branches that do not exist on the remote.

There is a more detailed discussion on the related pull request #234

Fixes: issue #233
sebastianfeldmann added a commit that referenced this pull request Jan 17, 2024
There was an issue were the changed file list was empty while pushing
branches that do not exist on the remote.

There is a more detailed discussion on the related pull request #234

Fixes: issue #233
@sebastianfeldmann
Copy link
Collaborator

It's fixed. Just in dev-master for now.

You can give it a try by requiring CaptainHook version ^6.0.0 in your composer.json.
Maybe you have to set

minimum-stability: dev,
prefer-stable: true,

but then you can test it locally if you want.

I will make some final tests, and then tag a new version.

@sebastianfeldmann
Copy link
Collaborator

I will close this since a lot of it made it's way to 5.20.0

@Eydamos Eydamos deleted the feature/changed_files_for_first_push branch January 22, 2024 08:10
@Eydamos
Copy link
Author

Eydamos commented Jan 22, 2024

Thank you very much for your awesome work. Everything works like a charm now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants