Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New branch creation causes the plugin to iterate over all changesets #1

Open
itay opened this issue Mar 2, 2015 · 30 comments
Open

Comments

@itay
Copy link

itay commented Mar 2, 2015

We have a very large repo (over 100K changes, 10 years of history). If we enable the plugin, then pushing a new branch to the server basically causes the push to hang and eventually die, because it will check every commit since the beginning of time.

The above happens because Stash gives you a from hash of 000000 and a to hash of <sha>, which will give you all changesets when you do the iteration.

There's no good fix, but a good medium term fix will be to just add a limit to how many changes are requested for a new branch push (e.g. 100).

@christiangalsterer
Copy link
Owner

I will look into this the next days.

@rogueluke
Copy link

This seems to also happen when new tags are pushed to the server

@gjoseph
Copy link

gjoseph commented Sep 25, 2015

Hey Christian, any news about this ?

@christiangalsterer
Copy link
Owner

I was on vacation and just returned. I will first concentrate on making the plugin compatible with Bitbucket 4.x (#5). Shortly after that I will look into this improvement. In order to know on which version to concentrate first it would be good to know which versions of Bitbucket you are running and/or if you plan to update to Bitbucket in the near future, so please leave a comment with the version information for Bitbucket.

@gjoseph
Copy link

gjoseph commented Sep 29, 2015

We're currently on Stash v3.11.0, but we'll probably upgrade to BBServer soon. (in fact, I hadn't paid attention to news, and it's your message above that made me aware of the rebranding)

@christiangalsterer
Copy link
Owner

I checked on the Bitbucket API, but I'm not sure yet how to implement this.
Detecting a new branch should be possible but how to limit the number of commits is not clear yet. If somebody has an idea, please share. Will keep on searching.

@gjoseph
Copy link

gjoseph commented Oct 10, 2015

IIRC in the case of new branches the "from" hash was 00000000, which is (again, IIRC) how we detected these cases in our gitolite hooks. The bash snippet to find the revlist to check was like this:


while read before after branch; do                                       # For each commit...
  # echo "$0: testing before='$before' after='$after' branch='$branch'"
  if ! [[ $after =~ ^0+$ ]]; then

    if [[ $before =~ ^0+$ ]]; then
      revlist="$(git rev-list "$after" --not --branches=*)"
    else
      revlist="$(git rev-list "$before..$after")"
    fi

@grumbelbart
Copy link

It seems that the bitbucket API has no way of running

git rev-list $new_commit --not --branches=*

If I understand this correctly. Some workaround to get only those commits that are really "new" to the object store would be required.

@grumbelbart
Copy link

Apparently, there is a way of filtering those commits which are new to the repository: https://answers.atlassian.com/questions/223143/how-to-get-only-new-changesets-in-pre-receive-hook-on-new-branch-push

@tarnowsc
Copy link

tarnowsc commented Feb 8, 2017

I've run into the same performance issue and updated the plugin to use diff-tree. Now on repo with >50k commits and >15k files the response is instant, whereas earlier I was getting timeouts.

Those changes are available here #26.

@christiangalsterer
Copy link
Owner

Thank you very much for the PR.
Currently I'm travelling, but I will look into it in roughly one week.

@tarnowsc
Copy link

I've found a limitation on using the diff-tree, it squashes the delta and omits files that were both added and removed in the range of commits.

My case is to clean up a repository with 5 years of history from binary files and to protect it from further degradation. Unfortunately it takes just hours for people to pollute it with branches containing old history and it makes whole cleanup pointless.

I was trying to modify the diff-tree call a bit unfortunately with no luck.

Now I'm looking into using git-log ...

But maybe there is some better way?

@mithun
Copy link

mithun commented May 18, 2017

Bitbucket Server 5.0 introduced a new ADDED_TO_REPOSITORY filter that can be used to filter and examine only new commits that will be added to the repository

@christiangalsterer
Copy link
Owner

christiangalsterer commented May 23, 2017 via email

@edenzik
Copy link

edenzik commented Jul 27, 2017

Also hitting this bug.

@christiangalsterer
Copy link
Owner

I have started to use the new Bitbucket API based on the changes done in #26, but I will need a few more days.

@christiangalsterer
Copy link
Owner

christiangalsterer commented Nov 11, 2017

@tarnowsc: You mentioned that you had some issues with the PR and that something is squashed.
I'm not quite sure what you mean with squashing in this special case and I tried to reproduce it, but I didn't found any issues. I know that some time has already passed since you provided the PR but it would be great if you can still share some more details on the problem you mentioned.

@christiangalsterer
Copy link
Owner

I think I was able to reproduce now the issue. Will check for alternatives.

@tarnowsc
Copy link

Hi @christiangalsterer to reproduce this just add a violating file and remove it in subsequent commit. If you check those two commits the squashed diff is fine, nevertheless the history contains a file that violates your rules.

@bbgobie
Copy link

bbgobie commented Dec 18, 2017

We are also having this issue.

We are on bitbucket 4.14.2
We used your source to do a similar plugin in which we fixed this bug, but I haven't tested the squash rebase issue mentioned. Are you planning to fix this in a release for version 5 and up only? I can try and apply our fix and send you a PR if you are planning to fix 4.x as well.

@christiangalsterer
Copy link
Owner

If you have a fix/solution available for this problem this would be assume. I and some other contributors tried different approaches and they all had some other drawbacks.

Before opening a PR can you share some code or link to your plugin to have a look how you solved it just to make sure that you don't spend time in creating a PR and it is maybe a approach we have already tried. Of course if you have a PR already available and/or is easy to create you can also create a PR.

Thanks in advance.

@raspy
Copy link
Contributor

raspy commented Dec 22, 2017

It seems that this is popular, so I have made my own updates to the plugin myself as well :-) My attempt however was different than presented here before; instead of rewriting everything and going into diff-tree call (which AFAIK will always produce delta before first and last commit, thus skipping all changes introduced in the meantime - "the squash rebase issue") I have reworked your original approach by reducing number of git operations required for different steps.

I am working on requesting a PR for the change, but I have to deal with some company OSS contributions regulations first, so probably not before January.

Let me just share my results: I have tested scenarios with file size hook with two different patterns set:

  • pushing 3 branches (2244, 1470 and 2130 new commits respectively) with some commits changing over 2000 files - hook time inspection: 4.5 seconds (reduced from 55 seconds),
  • pushing whole repository (2868 branches, 1.8 GiB in total) - hook time inspection: 89 seconds (previously hook was killed by Bitbucket after 3 hours of running).

Hope to get the code introduced into official release next year :-)

@christiangalsterer
Copy link
Owner

Reading the performance improvements, it seems you did a marvellous job and I'm really looking forward for the PR 👍

@bbgobie
Copy link

bbgobie commented Jan 19, 2018

Sorry, went on vacation and forgot about this.

Here's a snippet we use to use rev-list to see what changes are actually new.
We also inspect every commit, for "violations" so in theory I could make a commit that has a violation and a 2nd commit that reverts it and push it in one shot. This would be blocked by the implementation we have even though the "squashed" changes wouldn't be. We felt that every change in our repo should be buildable and correct.

public List<BasicCommit> getNewCommitsFromRefChange(final Repository repository, final RefChange refChange) {
        return ((GitScmCommandBuilder) scmService.createBuilder(repository))
                .revList()
                .format(CommitOutputHandler.FORMAT)
                .revs(refChange.getToHash(), "--not", "--all")
                .build(new CommitOutputHandler())
                .call();
    }

public List<BasicChange> getChangesFromCommits(final Repository repository, final String endCommit) {
        return getChangesFromCommits(repository,endCommit,endCommit);
}       

public List<BasicChange> getChangesFromCommits(final Repository repository, final String startCommit, final String endCommit) {    		
        return ((GitScmCommandBuilder) scmService.createBuilder(repository))
        		.command("diff")
        		.argument(startCommit+ "^1" + ".." + endCommit)
        		.argument("--name-status")
        		.build(new DiffOutputHandler())
        		.call();   	
}  

@bbgobie
Copy link

bbgobie commented Jan 19, 2018

Sorry, not sure why the first function isn't part of the code block

@christiangalsterer
Copy link
Owner

Hi,

thanks for the example. Will hopefully have some time in the next days to look into it.

@christiangalsterer
Copy link
Owner

Quick question I assume that the first and last commit from the result from getNewCommitsFromRefChange are the startCommit and endCommit parameter for getChangesFromCommits?

@raspy
Copy link
Contributor

raspy commented Jan 29, 2018

I have submitted pull request mentioned in December, please review at your spare time. It also uses ScmService, but keeps using Bitbucket API instead of custom git diff calls. I believe this should be sufficient.

raspy added a commit to raspy/stash-filehooks-plugin that referenced this issue Feb 1, 2018
@christiangalsterer
Copy link
Owner

christiangalsterer commented Feb 28, 2018

A version 3.3.0 is now available in the Atlassian Marketplace for Bitbucket versions 5.2.x and newer.
A version for older Bitbucket 4.x versions will be released in the next days.

Thanks again to @raspy for his great contribution to fix this issue.

@Mincol
Copy link

Mincol commented Nov 13, 2018

@christiangalsterer is there still plan to release this fix for 4.x?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests