Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add git-based detection of tags at HEAD to improve PublicRelease detection #876

Merged
merged 30 commits into from
Sep 16, 2023

Conversation

georg-jung
Copy link
Contributor

@georg-jung georg-jung commented Dec 15, 2022

Fixes #873.

This implements my second idea described in detail in #873.

My thoughts form there:

  • It would just work without any configuration needed (one might argue it's a breaking change though; could be opt-in to mitigate, wouldn't just work then).
  • It hits the notch that git is the only source of truth.
  • With more complicated pipelines and changing environments (e.g. the nbgv step vs. inside a docker build container), it is still straight forward (if .git is available, it works) and we don't need to worry about is this env var for the override available inside the container that performs the build?
  • It feels natural, because it does what I specified in version.json and what I specified there does work in the typical case, where GITHUB_REF is not stale.
  • It makes local builds where the tag is checked out equivalent to cloud builds initiated by tag push. Currently "^refs/tags/v\\d+\\.\\d+" doesn't have any effect locally if I get it right - it does in CI environments though.

This comes with parsing support for annotated tags for managed git and LibGit2. Lightweight tags pointing HEAD are considered as well as annotated tags. Nested annotated tags that indirectly point at HEAD are not considered (intentionally, changing probably wouldn't be hard). BuildingRef value is not changed.

Currently, the output of nbgv get-version -f json is extended as follows:

  "VersionHeight": 3,
  "VersionHeightOffset": 0,
  "BuildingRef": "2e6084bfeb07d7f8e426b52aa51bd7517faf5610",
  "BuildingTags": [
    "refs/tags/possiblyASecondTag",
    "refs/tags/v0.7.3",
  ],
  "Version": "0.7.3.11872",
  "CloudBuildAllVarsEnabled": false,
  "CloudBuildAllVars": {
    // ...
    "NBGV_BuildingRef": "2e6084bfeb07d7f8e426b52aa51bd7517faf5610",
    "NBGV_BuildingTags": "System.Collections.Generic.List`1[System.String]",
    "NBGV_Version": "0.7.3.11872",
    // ...
  },
  "CloudBuildVersionVarsEnabled": true,

Obviously the NBGV_BuildingTags value isn't here to stay. I thought I'd leave this for discussion, which of these values should be generated and what their value should be. I added an [Ignore] to skip it in a3089b1.

Probably I made some opinionated decisions when putting this together, don't hesitate to change what you don't like.

* Managed git does not support annotated tags yet
* json output reads: "NBGV_BuildingTags": "System.Collections.Generic.List`1[System.String]",
@dnfadmin
Copy link

dnfadmin commented Dec 15, 2022

CLA assistant check
All CLA requirements met.

@georg-jung
Copy link
Contributor Author

In 3c93a89 I added a workaround for GitPack.TryGetObject which kind of violates its implicit contract (it throws if it can not get an object of the requested type). I wasn't sure about the best way to fix this. Also this is kind of unrelated to the rest of this PR.

@AArnott AArnott changed the title Add git-based detection of tags at HEAD to improve RublicRelease detection, fix #873 Add git-based detection of tags at HEAD to improve PublicRelease detection, fix #873 Dec 15, 2022
Throwing was intentional to match HeadCanonicalName behaviour. Throwing fails the tests though. Maybe this is because the tests run in a cloud build environment and HeadCanonicalName isn't used there?

this.BuildingRef = cloudBuild?.BuildingTag ?? cloudBuild?.BuildingBranch ?? context.HeadCanonicalName;

This would mean that the tests would currently fail when executed locally.
Return null if no HEAD could be determined, return an empty collection if there are no matching tags.
@georg-jung
Copy link
Contributor Author

@AArnott anything I could do from my side to assist/simplify getting this merged?

@AArnott
Copy link
Collaborator

AArnott commented Feb 16, 2023

I'm looking at this now and preparing some local changes to push to your PR.

Copy link
Collaborator

@AArnott AArnott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together. I have several comments I look forward to discussing with you.

src/NerdBank.GitVersioning/ManagedGit/GitAnnotatedTag.cs Outdated Show resolved Hide resolved
Comment on lines 669 to 671
var tagObjId = GitObjectId.Parse(line.Substring(0, 40));
var refName = line.Substring(41);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to not handle a line pattern that I see in one of my repos:

^3d2137a79ee4e3621696cfbde8d4f1c0e98bc5f6
40614e83b9b3892e05efd6155c3b61035bb5542a refs/tags/drop/a.proddiag/official.00000.00
^2f0a22666a958c15f7e7486a0a14490f7b8e210f
9aa2dc36733d3f3076702b31a3dc754c144a9473 refs/tags/drop/a.proddiag/official.26601.00

See the leading caret? I don't know what that means, but it seems your line parsing assumes no such prefix exists.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ^ prefix indicates a "peeled line" that lists the object ID referenced by the tag object on the previous line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

git's source supports that: https://github.com/git/git/blob/d9d677b2d8cc5f70499db04e633ba7a400f64cbf/refs/packed-backend.c#L542

A "record" here is one line for the reference itself and zero or one peel lines that start with '^'.

Thus, if I understand correctly, simply skipping these lines would be functionally equivalent to if they didn't exist, because we'd peel the refs in the next step. Having them does provide us with a way to speed things up though and hits the notch @AArnott mentioned here - we could decide if we're interested in this ref in place. Right?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried a tag referencing another tag referencing a commit, and the peeled line after the first tag listed the object ID of the second tag, not the commit ID. So if you implement some optimisation that uses those lines, please consider this scenario.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to know, thanks! Actually I think this behaviour makes sense and is also in line with what the code in this PR currently does. My take on "tags at HEAD" behaves like git tag --points-at HEAD would. Means it includes any lightweight as well as annotated tags pointing at HEAD. But not any tags that point at HEAD transitively. A git show HEAD on the other hand does include transitive tags. Not including transitive tags might lead to less lookups => increased performance. In a broken repo, lookup including transitive tags could lead to an infinite loop.

I guess transitive tags are kind of an edge case so my plan was to just not support them at this point. I think specifying "We do what git tag --points-at HEAD does" makes sense too. If the need to support transitive tags would arise, a second PR would probably be easier than this first one. What do you think @AArnott?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main concern is that the behavior should not depend on whether the peeled lines are in packed-refs or not. Whether nbgv recursively follows tags in general is a matter of policy and either choice is fine with me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. You might want to take a look at my two recent commits in that regard.

src/NerdBank.GitVersioning/ManagedGit/GitPack.cs Outdated Show resolved Hide resolved
{
if (objectId.Equals(tagObjId))
{
tags.Add(tagNameCandidate);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filtering the tags at this point is quite late. NB.GV functions in some repos with tens of thousands of tags, and the code would have parsed and allocated multiple strings for all of them. I think we'll need to optimize this by walking the tags and immediately skipping lines with non-matching object IDs before we allocate anything.

Copy link
Contributor Author

@georg-jung georg-jung Feb 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this can be solved without the candidate list and that the number of string allocations can be reduced. The current implementation without tag support does however allocate multiple strings per packed-ref line too and it doesn't seem to be a performance hit or is there some detail I'm missing?

Looking up the target of an annotated tag might lead to an IO operation - read the relevant git pack. Thus, I'm wondering if collecting the candidates first wouldn't have a minor impact compared to opening the file.

I don't have any that large repo at hand. If this turns out to have a substantial negative performance impact, it might be an option to make annotated tag support opt-in/out, as reading the packed-refs file itself (plus the files in refs/tags) is sufficient for lightweight tag lookup. Edit: with the two commits incl. 49b9dc9 we don't need additional IO operations for annotated tags in packed-refs if peel lines are available.

I'm working on a commit to remove the candidate list and reduce the string allocations though. Edit: see a84ad8b

Without this fix we might have considered one level of transitive annotated tags, if a peel line is present in packed-refs. With this fix we also avoid reading the git pack of annotated tags that have peel lines and do not match the object id we are looking for.
@KalleOlaviNiemitalo
Copy link

I thought there should be something to detect if reftable is used, and fall back to LibGit2Context in that case, but…

@AArnott
Copy link
Collaborator

AArnott commented Feb 17, 2023

The build failure reproduces in main, so it's unlikely to be due to this PR. I don't know how that happened, but I'll be investigating it.

@AArnott
Copy link
Collaborator

AArnott commented Feb 23, 2023

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@AArnott AArnott enabled auto-merge (squash) March 7, 2023 20:41
@AArnott AArnott disabled auto-merge March 7, 2023 20:46
@AArnott
Copy link
Collaborator

AArnott commented Mar 7, 2023

@georg-jung I was prepared to merge this, but happened to hit an error related to the change. Can you check this out?

[20:42:24] Error: Unhandled exception: Newtonsoft.Json.JsonSerializationException: Error getting value from 'Tags' on 'Nerdbank.GitVersioning.VersionOracle'.
 ---> System.IndexOutOfRangeException: Index was outside the bounds of the array.
   at Nerdbank.GitVersioning.ManagedGit.GitObjectId.ParseHex(ReadOnlySpan`1 value) in /home/vsts/work/1/s/src/NerdBank.GitVersioning/ManagedGit/GitObjectId.cs:line 131
   at Nerdbank.GitVersioning.ManagedGit.GitAnnotatedTagReader.ReadObject(ReadOnlySpan`1 line) in /home/vsts/work/1/s/src/NerdBank.GitVersioning/ManagedGit/GitAnnotatedTagReader.cs:line 100
   at Nerdbank.GitVersioning.ManagedGit.GitAnnotatedTagReader.Read(ReadOnlySpan`1 tag, GitObjectId sha) in /home/vsts/work/1/s/src/NerdBank.GitVersioning/ManagedGit/GitAnnotatedTagReader.cs:line 72
   at Nerdbank.GitVersioning.ManagedGit.GitAnnotatedTagReader.Read(Stream stream, GitObjectId sha) in /home/vsts/work/1/s/src/NerdBank.GitVersioning/ManagedGit/GitAnnotatedTagReader.cs:line 48
   at Nerdbank.GitVersioning.ManagedGit.GitRepository.LookupTags(GitObjectId objectId) in /home/vsts/work/1/s/src/NerdBank.GitVersioning/ManagedGit/GitRepository.cs:line 678
   at Nerdbank.GitVersioning.Managed.ManagedGitContext.get_HeadTags() in /home/vsts/work/1/s/src/NerdBank.GitVersioning/Managed/ManagedGitContext.cs:line 64
   at Newtonsoft.Json.Serialization.ExpressionValueProvider.GetValue(Object target)

@georg-jung
Copy link
Contributor Author

@dotnet-policy-service agree

@georg-jung
Copy link
Contributor Author

georg-jung commented Mar 10, 2023

Hey @AArnott,

sorry for the delay. As a first measure, I tried to reproduce this and created a branch in my fork that incorporates the recent fixes in main (to make the pipelines in my repo work again) as well as this PR's code.

I wasn't able to reproduce though. The stacktrace points to a piece of code that wasn't touched by this PR - altough I don't really understand the line number in the stacktrace, as that line only reads var objectId = default(GitObjectId);. So I decided to ignore the line number for now and looked at that method in whole but I wasn't able to find any error and still came to the conclusion that this piece wasn't touched here.

What are the chances this was just some temporary hick up with the pipeline? An IO error in the checkout?

Does anything stop us form just running that pipeline again and see if its reproducible?

@AArnott
Copy link
Collaborator

AArnott commented Mar 17, 2023

The stacktrace points to a piece of code that wasn't touched by this PR - altough I don't really understand the line number in the stacktrace, as that line only reads var objectId = default(GitObjectId);

Are you looking at the same error? This is what I see:

[20:46:40] Error: Unhandled exception: Newtonsoft.Json.JsonSerializationException: Error getting value from 'Tags' on 'Nerdbank.GitVersioning.VersionOracle'.
 ---> System.IndexOutOfRangeException: Index was outside the bounds of the array.
   at Nerdbank.GitVersioning.ManagedGit.GitObjectId.ParseHex(ReadOnlySpan`1 value) in D:\a\1\s\src\NerdBank.GitVersioning\ManagedGit\GitObjectId.cs:line 131
   at Nerdbank.GitVersioning.ManagedGit.GitAnnotatedTagReader.ReadObject(ReadOnlySpan`1 line) in D:\a\1\s\src\NerdBank.GitVersioning\ManagedGit\GitAnnotatedTagReader.cs:line 100
   at Nerdbank.GitVersioning.ManagedGit.GitAnnotatedTagReader.Read(ReadOnlySpan`1 tag, GitObjectId sha) in D:\a\1\s\src\NerdBank.GitVersioning\ManagedGit\GitAnnotatedTagReader.cs:line 72
   at Nerdbank.GitVersioning.ManagedGit.GitAnnotatedTagReader.Read(Stream stream, GitObjectId sha) in D:\a\1\s\src\NerdBank.GitVersioning\ManagedGit\GitAnnotatedTagReader.cs:line 48
   at Nerdbank.GitVersioning.ManagedGit.GitRepository.LookupTags(GitObjectId objectId) in D:\a\1\s\src\NerdBank.GitVersioning\ManagedGit\GitRepository.cs:line 678
   at Nerdbank.GitVersioning.Managed.ManagedGitContext.get_HeadTags() in D:\a\1\s\src\NerdBank.GitVersioning\Managed\ManagedGitContext.cs:line 64
   at Newtonsoft.Json.Serialization.ExpressionValueProvider.GetValue(Object target)

That looks extremely relevant to this PR, considering this PR introduces GitAnnotatedTagReader which makes up several frames in the callstack.

What are the chances this was just some temporary hick up with the pipeline? An IO error in the checkout?

Checkout is long behind this failure. And exactly the same error happened on both Windows and linux, so no, I don't think we can right this one off.

Sometimes a failure can occur because of the particular way a git database is represented on disk, so that can be a challenge to repro on another clone of the repo. But it's not something we want to sweep under the rug, because if it can happen in a PR build, it can certainly happen on a dev box among the many thousands that this code runs on.

@AArnott
Copy link
Collaborator

AArnott commented Mar 17, 2023

I've been looking this over to try to figure out what may be going wrong, and it occurs to me that we don't have a single new test for the tags functionality in this PR. Sorry I missed this before, but we will need some tag tests to accept this PR as well.

@AArnott
Copy link
Collaborator

AArnott commented Mar 17, 2023

I added a test for Tags to your PR. If you can think of anything else to test, you're welcome to. We have the facility to check in whole repo databases in order to be able to test particular git db formats (pack files, peeled lines, etc) too.

@AArnott
Copy link
Collaborator

AArnott commented Mar 17, 2023

... and the re-build failed at exactly the same place, in the same way. So, at least it's reproducible. I can't reproduce it locally (yet).

@AArnott
Copy link
Collaborator

AArnott commented Mar 17, 2023

I pushed a commit that will capture the entire clone, including .git directory, next time it fails. That should help us repro it.

@AArnott
Copy link
Collaborator

AArnott commented Mar 20, 2023

Yay! I have a local repro by downloading the drop I changed the pipeline to produce on failure. I'm investigating now.

Copy link
Collaborator

@AArnott AArnott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a legitimate failure, introduced by the GitAnnotatedTagReader code, apparently.

You can repro by following these steps:

  1. Download the drop artifact from here and use 7-zip to extract it. Complete all the rest of these steps within the unzipped directory.
  2. Open VS to the solution within the unzipped directory.
  3. Add Debugger.Launch() to the GitObjectId.ParseHex method.
  4. Run dotnet publish -o ../nerdbank-gitversioning.npm/out/nbgv.cli/tools/net6.0/any from the src\nbgv directory.
  5. Run yarn build from the src\nerdbank-gitversioning.npm directory.
  6. When the JIT debugger attach dialog appears, have the VS with your unzipped solution open attach to the debuggee.
  7. Configure the debugger to break on first chance exceptions, at least for IndexOutOfRangeException.
  8. GitException is thrown a lot due to the issue of relying on exceptions to recognize the wrong object type. You can turn off break on first chance exceptions for this particular exception type.
  9. Hit F5 till you encounter a debug assertion failure and/or the IndexOutOfRangeException that causes the ultimate failure.

Can you please look into this?

Also, given the GitException is thrown about 200 times before this failure, this is likely going to need to be addressed. I already have a local code change that I hope will address this, although it causes other failures, so I'll work through that after we get this fatal error resolved.

@AArnott AArnott enabled auto-merge (squash) September 15, 2023 23:42
@AArnott AArnott merged commit 9675c18 into dotnet:main Sep 16, 2023
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

How to rebuild past tagged builds as public releases? / Add BuildingRef override
5 participants