Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

git history can get borked #39

Closed
dsabeti opened this Issue Mar 4, 2016 · 19 comments

Comments

Projects
None yet
4 participants
@dsabeti
Copy link

dsabeti commented Mar 4, 2016

Our git-resource for cf-deployment is having a few problems.

First, it seems to fetch the wrong branch. The desired branch is develop, which is at SHA 210655a. Instead, it seems to be fetching fa43ce from a 5-month-old WIP branch that we've since deleted. There hasn't been any activity on this branch, but for some reason, the three commits on this branch are the three topmost versions of git resource in the concourse UI.

Second, even without the inclusion of this WIP branch, the remainder of the history is missing the most recent commits on develop.

screen shot 2016-03-04 at 12 12 10 pm
screen shot 2016-03-04 at 12 12 36 pm

As you can see, ref 3e48b is the most recent commit in the history of develop that the git resource his picked up (it doesn't seem to have found refs 2782123, 9f545d2, or 210655a, even though they are all in the history of develop).

We're not sure what's causing this, or how to find out the root cause. (We were trying to delete a bunch of data from the build_events table. We're not sure how this would affect resource versions, but it could be that something was going wrong wit hthe db.) We're going to work around it by re-building the pipeline.

@concourse-bot

This comment has been minimized.

Copy link

concourse-bot commented Mar 4, 2016

Hi there!

We use Pivotal Tracker to provide visibility into what our team is working on. A story for this issue has been automatically created.

The current status is as follows:

This comment, as well as the labels on the issue, will be automatically updated as the status in Tracker changes.

@dsabeti

This comment has been minimized.

Copy link
Author

dsabeti commented Mar 18, 2016

We had another instance of this problem, where version history is out of order.

Git history:
screen shot 2016-02-18 at 3 59 12 pm

Concourse version history:
screen shot 2016-02-18 at 4 01 18 pm

As you can see, 813d597 is a merge commit with 7efa044 as one of its parents. Concourse has them out of order. We had to rename our resource to get this to work properly.

@vito

This comment has been minimized.

Copy link
Member

vito commented Jul 13, 2016

have you seen this recently? honestly not sure if there's much we can do for this without a way to reproduce it

@concourse-bot concourse-bot added in-flight and removed unscheduled labels Jul 13, 2016

@vito

This comment has been minimized.

Copy link
Member

vito commented Jul 24, 2016

closing; don't think there's much we can do for this right now. thanks for the info/screenshots though. i kinda wonder if this was some server-side fluke, to be honest. (rogue github server?)

@vito vito closed this Jul 24, 2016

@chipx86

This comment has been minimized.

Copy link

chipx86 commented Jul 31, 2016

I hit the same problem tonight. I'm leaving it in its broken state so I can gather information for you. The repository in question is open source, and I can provide any information needed to help diagnose this.

I'm running Concourse 1.6. I upgraded to this from 1.1 earlier today.

From the looks of it, the problem has to do with the latest commit on the target branch being a merge commit.

Let's say I have two branches in this example: release-0.7.x and release-0.6.x.

release-0.6.x is very old, containing very few changes. release-0.7.x is the stable release branch. At one point, a commit is made to release-0.6.x, and the branch is then merged into release-0.7.x. The resulting tree looks like this:

*   2cc3a55 - (HEAD, origin/release-0.7.x, release-0.7.x) Merge branch 'release-0.6.x'
|\
| * e12eda8 - (tag: release-0.6.10, origin/release-0.6.x, release-0.6.x) Django Evolution 0.6.10.
* | 9a61baf - Bump to 0.7.7 dev. (4 months ago) <Christian Hammond>

Concourse is set up to define a resource for the release-0.7.x branch, looking like this:

    - name: django-evolution-0.7.x
      type: git
      source:
          uri: git@github.com:beanbaginc/django-evolution.git
          branch: release-0.7.x
          private_key: {{django-evolution-git-private-key}}

It's almost as if the git resource is seeing the merge commit and jumping down to a parent commit on the wrong branch. I decided to run with that theory and dig into the version of the code in the git-resource shipped with Concourse 1.6, running the commands manually. It resulted in the correct SHA (2cc3a55) being fetched. I got the same result manually running check and in with the payloads used for my pipeline.

So it doesn't appear to be some major glitch in the resource scripts.

I dug into the build's history and saw that this began happening yesterday. We have made no commits to the tree in a few months, so it wasn't that. However, I have been working on getting our pipeline wired up, and some of the new jobs referenced the same resource. It's a bit hard right now for me to tell when the first incorrect revision was grabbed, and of course I have no idea why it was the wrong one anyway.

I did look in the reflogs for the affected builds. It appears that the initial clone was done, and then the commit in question was checked out, so it's definitely not starting with that commit. Obviously, there's not a lot of information to work with here, as the reflog doesn't give the commands or any sort of context for this.

A few questions:

  1. Is it possible for a task to alter the cached repository? For instance, if a script for some reason checked out an older commit in the tree it has access to, would this be altering the cached version, affecting other scripts?
  2. Where can I find the cache? Been looking all around, but I seem to be missing it.
  3. Any information I can provide to help diagnose this? It's a scary problem, and doesn't appear to be isolated to @dsabeti at this point. I'd be very interested in helping track it down.
@chipx86

This comment has been minimized.

Copy link

chipx86 commented Jul 31, 2016

3:36AM.. Great time for debugging... I realized that of course that cache data's in the database.

So I have two entries that reference this SHA. One (ID 388, in my case) contains all the information on the commit (in metadata), and the other (ID 401) containingnullinmetadata`. Each referencing a different resource ID.

-[ RECORD 1 ]-...
id            | 401
version       | {"ref":"e12eda87fd41f083fcf744870ffd2039d766aa91"}
metadata      | null
type          | git
enabled       | t
resource_id   | 23
modified_time | 2016-07-31 03:11:58.630715
check_order   | 64

-[ RECORD 2]-+-...
id            | 388
version       | {"ref":"e12eda87fd41f083fcf744870ffd2039d766aa91"}
metadata      | [{"Name":"commit","Value":"e12eda87fd41f083fcf744870ffd2039d766aa91"},{"Name":"author","Value":"Christian Hammond"},{"Name":"author_date","Value":"2016-02-21 23:57:48 -0800"},{"Name":"tags","Value":"release-0.6.10"},{"Name":"message","Value":"Release Django Evolution 0.6.10.\n\nIf you're reading this, you may be wondering what's in this release,\ngiven the commit history. Well, just a README update, actually. Surely\nthat's not worthy of a release, you may think. Well, it's not, but\nhere's the thing...\n\nThe 0.6.9 build was once uploaded to PyPI, but we removed it in favor of\nusing an external index, which is no longer supported by PyPI. So we\nwanted to move back, but PyPI does not allow existing filenames to be\nreleased. So, 0.6.9 is basically gone forever...\n\nSo in order to make a 0.6.x release installable via easy_install or pip,\nwe must produce a new release.\n\nThanks, PyPI.\n"}]
type          | git
enabled       | t
resource_id   | 9
modified_time | 2016-07-31 03:38:38.342691
check_order   | 212

I checked, and the one with the commit information is the resource I expected (django-evolution-0.7.x, above).

The second one with the null metadata references my brand new resource meant to temporarily work around this problem: django-evolution-0.7.x-2. Where is that coming from?

So I looked to see what cached info I had for that resource.

atc=# SELECT * FROM versioned_resource WHERE resource_id=23;

-[ RECORD 1]-...
id            | 398
version       | {"ref":"2cc3a5578f89d21da4d7bd2a6a67a250bf55a208"}
metadata      | [{"Name":"commit","Value":"2cc3a5578f89d21da4d7bd2a6a67a250bf55a208"},{"Name":"author","Value":"Christian Hammond"},{"Name":"author_date","Value":"2016-04-01 18:57:38 -0700"},{"Name":"committer","Value":"Christian Hammond"},{"Name":"committer_date","Value":"2016-04-01 18:57:38 -0700"},{"Name":"branch","Value":"release-0.7.x"},{"Name":"message","Value":"Merge branch 'release-0.6.x' into release-0.7.x\n"}]
type          | git
enabled       | t
resource_id   | 23
modified_time | 2016-07-31 03:38:38.342691
check_order   | 67

-[ RECORD 2]-...
id            | 401
version       | {"ref":"e12eda87fd41f083fcf744870ffd2039d766aa91"}
metadata      | null
type          | git
enabled       | t
resource_id   | 23
modified_time | 2016-07-31 03:11:58.630715
check_order   | 66

That seems odd to me. It's an older entry, with a lower check order (I am making assumptions about what this means, and I'm probably wrong here), enabled, and no metadata. Now, I just created this resource earlier to test things. I literally added the new entry by:

  1. Copying/pasting from the old
  2. Pushing the pipeline config
  3. Realizing I forgot to update a task to include that (alongside the broken one). Updated and re-pushed (in case this matters).
  4. Re-ran a job.

I saw (and continue to see) the django-evolution-0.7.x (with the bad SHA) and django-evolution-0.7.x-2 (with the good SHA) in there. However, I now wonder if it's going to revert to the bad one at some point as well, given that the one entry is in there (still don't know how it got there!).

I checked the tables for the django-evolution-0.7.x resource to see what cached info is in there, ordering by check_order. I see a bunch of entries, some with null metadata, some with legit items. The highest check_order has the bad SHA. I have plenty of other entries in there covering all sorts of commits from many months prior (many the same exact timestamp!), but nothing containing the good commit.

Is there any chance this is in part due to some migration that may have happened when upgrading to the latest release of Concourse? Should I just wipe a bunch of this state and pretend it never happened? I don't know the changes made to the product at any low level to say, so hopefully some of this information is useful to you.

@vito

This comment has been minimized.

Copy link
Member

vito commented Jul 31, 2016

Thanks for looking into this, @chipx86! I'll reopen the issue to keep tabs on it.

A few clarifications that may help:

  • Missing metadata is normal; right now /check just returns the versions alone, in order (this is how check_order is determined), and /in and /out populate it (so when you run a get or put step). This will probably be fixed as part of concourse/concourse#534
  • Having the same timestamp is also normal; that's just the date the entry was inserted or updated in the table, and with commits it's likely that we'll save a bunch at once.
  • Seeing a version repeated in versioned_resources is also normal so long as the resource_id is different. Existing versions with null metadata will be updated in-place once it's available (by being used in a build).
  • check_order is what determines our linear ordering of all of the versions. It is determined by the order of versions returned by /check. Notably, we also converge on this value, i.e. if you were to run fly check-resource -f ref:<initial commit>, you should get a full history of all commits that the resource sees. Doing that may actually be useful for debugging purposes in this case?
  • It's impossible for a task to mutate a cached checkout. Tasks get a copy-on-write volume of the original, and have no side-effects in the database.
  • Just to clarify terminology, let's refer to the versions in the database as the 'version history' or something, not cache, since that means something else in Concourse (the actual cached bits).
  • If you've configured a new resource and were able to cause this to happen with that new resource, that would indicate that this is unrelated to migrations.

I think it's unlikely for this to be related to migrations; if it's picking up commits that it shouldn't see at all, that'd be super weird and likely a bug in the resource. If the commits are in the wrong order, then maybe, but that still seems unlikely.

@vito vito reopened this Jul 31, 2016

@concourse-bot concourse-bot added unscheduled and removed in-flight labels Jul 31, 2016

@chipx86

This comment has been minimized.

Copy link

chipx86 commented Jul 31, 2016

Thanks for the info! Very helpful. I'm brand new to Concourse, so still feeling my way through this.

The new resource definitely has the bad SHA, and shouldn't have. I'm going to try again tonight to make it happen with a third resource, and do a better job of checking state after each operation, so I can help narrow this down..

@chipx86

This comment has been minimized.

Copy link

chipx86 commented Aug 1, 2016

Okay, I ran a test here, and need to get some additional information.

In my test, I created one more copy of the resource (django-evolution-0.7.x-3) at 01:58:13. I did not associate it with any jobs.

I repeatedly queried versioned_resources for the new resource ID, and saw an entry with the correct SHA:

id            | 403
version       | {"ref":"2cc3a5578f89d21da4d7bd2a6a67a250bf55a208"}
metadata      | null
type          | git
enabled       | t
resource_id   | 24
modified_time | 2016-08-01 01:58:48.36734
check_order   | 3

I left everything alone, and then continually queried again. A minute later, a new entry appeared with the wrong SHA:

id            | 404
version       | {"ref":"e12eda87fd41f083fcf744870ffd2039d766aa91"}
metadata      | null
type          | git
enabled       | t
resource_id   | 24
modified_time | 2016-08-01 01:59:49.778289
check_order   | 2

Somehow, an entry with the wrong SHA is appearing. I can reproduce this any number of times. Every new resource ends up with the bad entry, almost exactly one minute apart.

So is this running check the first time to create the cached resource, and then running check again against the clone? Or is there some other mechanism in play that I should be looking into?

I wasn't able to reproduce getting the wrong SHA by hand in a clone, but am still messing around with it. However, I have a successful, reliable repro case, so there's that!

@chipx86

This comment has been minimized.

Copy link

chipx86 commented Aug 1, 2016

I think I see what's happening.

So assuming check, when called a second time, is passed the previous ref in the payload's version.ref, then it looks like the log query is causing this revision to appear. Here, that translates to:

git log --format='%H' --reverse 2cc3a5578f89d21da4d7bd2a6a67a250bf55a208~1..HEAD

Which results in:

e12eda87fd41f083fcf744870ffd2039d766aa91
2cc3a5578f89d21da4d7bd2a6a67a250bf55a208

And that's where the bad SHA is coming from. How that ends up sometimes becoming the primary SHA used for pulling down the repository, I don't know, but at least now I know where that's coming from.

The computed log range is trying to find all ancestors of the given commit, including the commit itself. That is, "Give me all the log entries from the last ref to now, including that ref (using ~1 in order to start at the previous commit, due to the first commit in the range being excluded)." However, in the case of a merge commit, it's including the other parent of that commit in the list.

I think ideally, this should be returning all the merge commits, but not the commits from the other branches being merged in, correct? If so, this can use --first-parent, which will ensure we have those merge commits but not commits actually on those other branches showing up.

Here's how a tree may look without --first-parent:

* |   539aaf4 - Merge branch 'release-0.8.x' into release-0.9.x (9 weeks ago) <Barret
|\ \
| |/
| * a4e336d - Fix up ending tags in config page (9 weeks ago) <Barret Rennie>
* | 310d833 - Only animate the datagrid menu when it is not animating (9 weeks ago) <B
* |   63d521c - Merge branch 'release-0.8.x' into release-0.9.x (9 weeks ago) <Barret
|\ \
| |/
| * 35668b5 - Fix configuration forms not rendering JS config properly (9 weeks ago) <
* | b9dccbf - Add signals and attributes for datagrid mobile mode (3 months ago) <Barr
* |   09df21f - Merge branch 'release-0.8.x' into release-0.9.x (4 months ago) <Christ
|\ \
| |/
| * 15e3ef7 - Bump to 0.8.27 dev. (4 months ago) <Christian Hammond>
* | 24d8e29 - Bump to 0.9.4 dev. (4 months ago) <Christian Hammond>

There's lots of commits on the wrong branches that can end up in the resulting list of refs. But with --first-parent:

* 539aaf4 - Merge branch 'release-0.8.x' into release-0.9.x (9 weeks ago) <Barret Renn
* 310d833 - Only animate the datagrid menu when it is not animating (9 weeks ago) <Bar
* 63d521c - Merge branch 'release-0.8.x' into release-0.9.x (9 weeks ago) <Barret Renn
* b9dccbf - Add signals and attributes for datagrid mobile mode (3 months ago) <Barret
* 09df21f - Merge branch 'release-0.8.x' into release-0.9.x (4 months ago) <Christian
* 24d8e29 - Bump to 0.9.4 dev. (4 months ago) <Christian Hammond>

Now we only get what belongs to this branch.

This nicely solves the problem being hit in this bug, as the log call above becomes:

$ git log --first-parent --format='%H' --reverse 2cc3a5578f89d21da4d7bd2a6a67a250bf55a208~1..HEAD
2cc3a5578f89d21da4d7bd2a6a67a250bf55a208
@chipx86

This comment has been minimized.

Copy link

chipx86 commented Aug 1, 2016

Just to follow up (sorry for the flooded inbox here!) I applied the changes to my git-resource's rootfs.tar.gz, regenerated the version file, and added another copy of my resource.

Gave it a few minutes for the checks to occur, and found that the bad SHA never ended up in the database.

Of course, my original resource is still stuck on the bad ref, but I suspect as soon as I push an update (or wipe the versioned_resource entries), I'll be back in shape, without having to worry about this happening again.

@vito

This comment has been minimized.

Copy link
Member

vito commented Aug 1, 2016

@chipx86 Cheers for the sleuthing! I don't mind the spam at all.

I'm curious about one thing though. It should have at least had the correct commit saved as the latest one, given that it returned them in chronological order. Can you show the resource page listing the versions in order? It should have 2cc3a... up top with e12eda... below it.

--first-parent does sound like it would work, but I wonder if there's a more semantic flag we can pass on, something like "only on this branch". (Maybe --branches?) Can you link me to the repo so I can see how e.g. GitHub interprets the commit history?

@chipx86

This comment has been minimized.

Copy link

chipx86 commented Aug 1, 2016

So I unfortunately screwed up in an attempt to debug something and nuked the entries from versioned_resource corresponding to my initial broken resource... I do fortunately have a verbose SELECT of all the entries in a terminal, though (which I've saved to a text file). I can try to reconstruct their entries in the database from that. Alternatively, I can send you the data dump for everything in that table relevant to that resource ID.

All the dummy resources I've created have listed the refs in the correct order. They just had the bad entry in there, which is what set me toward trying to diagnose that.

@chipx86

This comment has been minimized.

Copy link

chipx86 commented Aug 2, 2016

It is entirely possible I am just really dumb.

As I was working to restore the entries for the database (which I've done), I noticed that the correct ref was set as disabled. That obviously would do it... I don't remember ever turning this off, and I was pretty sure I checked that before, but the database says otherwise. Sure enough, disabling the good ref reproduces the problem, and re-enabling it fixes the problem. Which is as designed.

Facepalm

(I don't suppose there's some automatic way that this would have been turned off without it being my fault? ;)

Anyway, I think the git change above is still relevant, as my version history is littered with commits from wrong branches, and that can lead to confusion and potentially other problems. Disabling the one ref shouldn't ever result in a different branch being built (and if it hadn't, I probably wouldn't have gone down this rabbit hole).

As for --first-parent, my understanding is that it actually is the "only on this branch" flag, in this particular case. When you do git merge some-branch, the merge commit is set up with two parents: Current branch, and some-branch, in that order. The current branch becomes the first parent of the merge commit. When you're on a branch, and you only want to see the individual commits and the merge commits (not the individual commits on a branch being merged in) in the log output, you'd use --first-parent.

That may also still fix the original problem filed by @dsabeti. If you have a complex series of interwoven merge commits, it may walk the graph in such a way where commits appear out of order from what you'd expect, which I think is what's happening above (but I don't have enough of that tree to be able to tell).

@vito

This comment has been minimized.

Copy link
Member

vito commented Aug 2, 2016

@chipx86 Resource versions are only ever manually enabled or disabled. :) But something must have gone weird for you to have ended up here, anyway. I doubt you randomly went and disabled a version. (I guess it could have been a mis-click.)

I'm still curious as to how these merges would show up on GitHub's UI. A commit being merged from branch B to A will/should make it show up in the history for git log and other UIs, but I think the question we're raising here is if that's semantically valuable as far as this resource and CI in general is concerned.

I'm thinking from a CI perspective, since it's monitoring one branch, it's inaccurate to track individual refs from other branches except for the commit merging them in, as there may have been conflicts. If it's a fast-forward merge things shouldn't really matter (there's no real way to tell anyway).

@chipx86

This comment has been minimized.

Copy link

chipx86 commented Aug 2, 2016

I probably did mis-click. It was late at night and who knows what happened. (That or gremlins. Probably gremlins, now that I think about it.)

A merge commit is a valid commit like any other, and an important part of the history of a branch that we'd want to include. It differs from any other commit just in that it has more than one parent, but it still represents all those changes from the branch being merged in, and should be able to be tested like any other.

A UI displaying a flat list of commits would start at HEAD, display info, grab the first parent of the commit, display info on that, grab the first parent of that, etc. Normally we see the graph form where it's traversing all parents of all commits and rendering that as parallel branches of history.

I agree that from a CI perspective, it's inaccurate to track individual refs from other branches. That's what Concourse is currently doing today whenever it finds new merge commits between the last stored ref and HEAD, and what --first-parent prevents.

@vito

This comment has been minimized.

Copy link
Member

vito commented Aug 2, 2016

Yep, confirmed in a local repo that --first-parent does what we want and --branches doesn't.

@chipx86

This comment has been minimized.

Copy link

chipx86 commented Aug 2, 2016

Thanks! :)

@vito

This comment has been minimized.

Copy link
Member

vito commented Aug 2, 2016

np. thanks for looking into it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.