Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(api/controller): support filtering rules Git subscriptions #1477

Merged
merged 9 commits into from
Mar 22, 2024

Conversation

maksimstankevic
Copy link
Contributor

@maksimstankevic maksimstankevic requested a review from a team as a code owner February 14, 2024 16:33
Copy link

netlify bot commented Feb 14, 2024

Deploy Preview for docs-kargo-akuity-io failed.

Name Link
🔨 Latest commit 5fed993
🔍 Latest deploy log https://app.netlify.com/sites/docs-kargo-akuity-io/deploys/65fdb3b9805fb100086b2b54

@krancour
Copy link
Member

Many apologies for the delay getting to this -- 👀 -- top priority for tomorrow.

Copy link
Contributor

@hiddeco hiddeco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution @maksimstankevic! 🍏

Have made a couple of comments and suggestions, but as I am quite new to the project I would wait for the final suggestions from @krancour before making any adjustments.

Shallow bool
// Shallow indicates depth of cloning. This is useful for speeding up
// cloning process.
Shallow uint8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would personally rename this to Depth (*uint8) to match the actual argument to git. Shallow could then theoretically continue to be a shorthand for Depth = 1.

However, as this is internal machinery. Just going with Depth would probably be fine as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree

Comment on lines 100 to 115
var cloneShallowness uint8
if sub.ScanPaths != nil || sub.IgnorePaths != nil {
cloneShallowness = 2
} else {
cloneShallowness = 1
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var cloneShallowness uint8
if sub.ScanPaths != nil || sub.IgnorePaths != nil {
cloneShallowness = 2
} else {
cloneShallowness = 1
}
var cloneDepth uint8 = 1
if sub.ScanPaths != nil || sub.IgnorePaths != nil {
cloneDepth = 2
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree

commit, err := r.getLastCommitIDFn(repo)
// head of the branch unless there are scanPaths/ignorePaths configured to
// handle.
commit, err1 := r.getLastCommitIDFn(repo)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think the error should be dealt with before further looking at the scan or ignore paths. Reason for this is that otherwise the commit you are making use of may actually not be available.

In addition, all err<num> can be changed to err.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree

repo, err := git.Clone(
sub.RepoURL,
*creds,
&git.CloneOptions{
Branch: sub.Branch,
SingleBranch: true,
Shallow: true,
Shallow: cloneShallowness,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(If my Depth comment is taken into account)

Suggested change
Shallow: cloneShallowness,
Depth: &cloneDepth,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree

// strings (regexList) to check if any of them match the stringToMatch string, if
// a regexp fails to compile it returns false and error, if match is found it
// returns true and if match is not found it returns false.
func matchesRegexList(stringToMatch string, regexList []string) (bool, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite a costly function, as recompilation of regular expressions happens repeatedly for every diffPath, while scanPaths and ignorePaths are static.

Given this, I would restructure the code in such a way that the lists of regular expressions are compiled once before attempting to match them against diff paths.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree

@maksimstankevic
Copy link
Contributor Author

@hiddeco thanks for your review! Agree with all points, @krancour , @kencochrane, please, give me a go if you agree as well with proposed changes.

@krancour
Copy link
Member

@maksimstankevic I'm going to look at this myself today as well, but I agree with all of @hiddeco's proposed changes. (He's a new maintainer on our team, btw. 🎉 )

@maksimstankevic
Copy link
Contributor Author

@krancour I had some spare time and implemented @hiddeco proposed amendments as you agreed to them as well, please, let me know what you think

@maksimstankevic
Copy link
Contributor Author

My approach was wrong, now I'm getting it - we need to obtain diffs between the commit that produced last Freight and the HEAD as you say, so Depth of clone loses its sense probably as the mentioned Freight-producing commit is at unknown Depth... Thanks a lot for your review @krancour ! I will reiterate.

@krancour
Copy link
Member

Depth of clone loses its sense

@maksimstankevic right... that was my next bit of feedback was that depth required to figure this out correctly cannot be predicted, but want to make sure we were on the same page with what diffs we need to find first... sounds like we now are.

Can't tell you how much we appreciate your effort on this!

@@ -216,6 +230,8 @@ type WarehouseStatus struct {
// ObservedGeneration represents the .metadata.generation that this Warehouse
// was reconciled against.
ObservedGeneration int64 `json:"observedGeneration,omitempty"`
// FreightReference refers to the last Freight produced by this Warehouse
FreightReference string `json:"freightReference,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maksimstankevic there is a FreightReference type. I propose something like this:

Suggested change
FreightReference string `json:"freightReference,omitempty"`
LastFreight *FreightReference `json:"lastFreight,omitempty"`

The benefits of this are twofold:

  1. No need to make a separate call later to retrieve Freight.
  2. Guards against the possibility that the last Freight shipped from the Warehouse has since been deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krancour, thanks, absolutely, I'm completely new to this k8s programming so still need to wrap my mind around a lot of ways :) I will refactor that once testing is successful. But I have a couple of points to raise, during my work on this I saw 2 instances of what I believe are bugs, can you give your take on those:

  1. I think it should not be possible to create a Warehouse with duplicated Git type subscriptions, verification should be added I suppose, below is an excerpt from a Warehouse running in my DEV env:
  uid: 46e4a742-0a97-48d5-8f9a-a719922820cb
spec:
  subscriptions:
  - git:
      branch: helm
      commitSelectionStrategy: NewestFromBranch
      repoURL: https://github.com/maksimstankevic/kargo-demp-876.git
  - git:
      commitSelectionStrategy: NewestFromBranch
      repoURL: https://github.com/maksimstankevic/kargo-demp-877.git
  - git:
      commitSelectionStrategy: NewestFromBranch
      repoURL: https://github.com/maksimstankevic/kargo-demp-877.git
status:
  freightReference: 70c665c38cf6f238e7fb0a6b7763ea902b29547d  
  1. the interesting one: per my testing I see that in case of multiple Git supscriptions in a Warehouse a valid failure to get a new commit from one of Git repo subscriptions blocks normal progress of others, meaning new valid commits in other Git subscriptions will not be able to produce new Freight if one Git subscription fails to select a commit due to valid reasons (scanPaths/ignorePaths or Tag configurations): https://github.com/akuity/kargo/blob/c9aebcd37e20d40f75c31504297126fafc926968/internal/controller/warehouses/git.go#L58C1-L60C29
    Don't you think that in such case we should just let the failing Git repo subscription to stay where it was is but if others have new valid commits - we should produce new Freight?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should not be possible to create a Warehouse with duplicated Git type subscriptions, verification should be added I suppose

Additional validations seldom hurt. I would suggest opening a new issue for it and not dealing with it here.

per my testing I see that in case of multiple Git supscriptions in a Warehouse a valid failure to get a new commit from one of Git repo subscriptions blocks normal progress of others

This is by design. If you think of a piece of Freight as a box containing all the artifacts required for your application, including image, manifests, etc., a failure to fetch any one of those means you have an incomplete set.

Don't you think that in such case we should just let the failing Git repo subscription to stay where it was

I guess this suggests that if I'm looking for versions of x and y and I find x, but don't find y, we should use the last known version of y... the problem is that if there were a version of y that fit your current constraints, it would have been found, meaning the last known version of y does not fit your constraints. The main scenario where I can see this happening is when someone has changed their constraints. They previously wanted ^1.0.0 of y and now want ^2.0.0. If nothing matching ^2.0.0 was found, it's incorrect to move forward with the old y that matched ^1.0.0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that makes sense, for validation of duplications I'll open an issue, that was my thought initially

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@krancour I pushed the changes, please, review when you get a chance

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maksimstankevic at a glance, this is looking like it's almost there. I promise a more thorough review tomorrow (Tuesday).

Let me say again how much we appreciate this. This was an ambitious feature to bite off and it's coming along nicely!

selectedCommits, err := r.selectCommitsFn(
ctx,
warehouse.Namespace,
warehouse.Spec.Subscriptions,
repoCommitMappings,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think things would be tidier and more testable overall if we passed the warehouse.Status.LastFreight here and let selectCommits() build the map.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored:
b8667c6

@krancour
Copy link
Member

@maksimstankevic this looks amazing! I commented inline with just small bits of feedback -- nothing major.

An unfortunate thing we do have to contend with here is a lot of merge conflicts. Most of them seem to arise from our codegen process having undergone an extreme makeover in #1515. Those are probably not resolvable by hand.

If you want to resolve them yourself, the easiest approach would be:

  1. Squash everything you've done to date in this PR into a single commit
  2. Remove all changes to the following generated files:
    • api/v1alpha1/types.proto
    • zz_generated.deepcopy.go
    • charts/kargo/crds/kargo.akuity.io_warehouses.yaml
    • internal/api/types/v1alpha1/types.go
    • pkg/api/v1alpha1/types.pb.go
    • ui/src/gen/schema/warehouses.kargo.akuity.io_v1alpha1.json
    • ui/src/gen/v1alpha1/types_pb.ts
  3. Rebase on main -- internal/controller/warehouses will have some conflicts to resolve, but I don't think they'll be too bad compared to generated code conflicts
  4. Re-run make hack-codegen
  5. Force push

Alternatively, I am happy to run point on resolving those because it's my slow review that put this behind #1515 in the first place. 🙏

Just let me know how you want to proceed.

@maksimstankevic
Copy link
Contributor Author

The issue with the orphaned branch is interesting...

I may not have understood completely, but my takeaway was that there's a problem here if the last commit that produced freight and the current commit don't have common history and diffs between the two can therefore not be determined. I think this would only ever occur in a few cases:

Someone has edited the warehouse and pointed it at a different branch -- or a different repo even.

Some naughty developer force-pushed to main or something.

I think in either of these cases, we just need to detect that the two commits are not comparable and default to treating the selected commit as something that should trigger production of new freight.

@krancour this is already being handled by "repo#branch - commit" mappings, such a new wild first commit in a reconfigured repo is not there in our LastFreight history - so it produces Freight by default, please take a look at my testing above.

@maksimstankevic
Copy link
Contributor Author

maksimstankevic commented Mar 22, 2024

On next reconciliations, naturally, the Warehouses gets the error attached that it could not get latest commit because there are no diffs between last commit and HEAD which are the same in this case

This is precisely the thing I find odd. It's already up-to-date, so why should it complain about filtering rules instead of it telling that there are simply no new changes?

I will leave this to be answered by @krancour, because I took the concept from existing commit tag filtering rules that as well result in an error getting last commit when rules are not satisfied.
And I tend to agree with that because it is an error - we cannot get a new Freight from the Warehouse "black box" in given circumstances and we clearly explain why the error is happening in the logs - due to tag or path filtering rules.

@hiddeco
Copy link
Contributor

hiddeco commented Mar 22, 2024

After warehouse is changed to use new orphan branch it produces a new freight because it does not have a baseCommit (this Git Repo Subscription is new and did not have a commit that produced Freight yet):

After a good night of sleep, I can not reproduce the first issue I reported anymore either. Sorry about creating noise, as I probably messed something up due to being tired. 🤦

[..] And I tend to agree with that because it is an error - we cannot get a new Freight from the Warehouse "black box" in given circumstances and we clearly explain why the error is happening in the logs - due to tag or path filtering rules.

But it's not a black box in this case, you can know that HEAD == base commit. Which means the expectation is there to be no divergence, because there simply are not any changes which should be filtered. At present, I find the combination of last freight having a reference to the same commit the error complains about highly confusing. Because it makes me think something is wrong with my filtering configuration that I need to address (which actually is not the case, it's just waiting for a new HEAD).

Status:
  Last Freight:
    Commits:
      Branch:           manifests-2
      Id:               566ef4865767f9876ed6f4a88068449043d80d14
      Message:          Init
      Repo URL:         https://github.com/hiddeco/kargo-filter-example.git
    Name:               91d8238ac6da37f4a5b38a86f06fb7be150d86f5
  Message:              error getting latest Freight from repositories: error syncing git repo subscriptions: error determining latest commit ID of git repo "https://github.com/hiddeco/kargo-filter-example.git": error selecting commit from git repo "https://github.com/hiddeco/kargo-filter-example.git": commit "566ef4865767f9876ed6f4a88068449043d80d14" not applicable due to includePaths/excludePaths configuration in repo "https://github.com/hiddeco/kargo-filter-example.git"
  Observed Generation:  1

Copy link
Contributor

@hiddeco hiddeco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the error is more of a user experience thing. I am inclined to say that this can be merged as is, and we can do refinements later.

Signed-off-by: Maksim Stankevic <maksim.stankevic1@gmail.com>
@maksimstankevic
Copy link
Contributor Author

maksimstankevic commented Mar 22, 2024

As the error is more of a user experience thing. I am inclined to say that this can be merged as is, and we can do refinements later.

@hiddeco I see your point now and agree actually, I sent a tiny change, now logs are clean on reconciliations when latest commit == baseCommit
thanks for pointing that out!
Please let me know if that is ok
@krancour fyi

@hiddeco
Copy link
Contributor

hiddeco commented Mar 22, 2024

Think this addresses it nicely, thanks 🙇

@hiddeco hiddeco changed the title feat(controller): add scanPaths and ignorePaths filters to Warehouse Git Subscription feat(api/controller): support filtering rules Git subscriptions Mar 22, 2024
@krancour
Copy link
Member

@maksimstankevic and @hiddeco great to sit down this morning and see you guys have hashed everything out!

I want to raise one final UX thing that I am sorry to say I overlooked until just now...

I get that it was likely modeled after tag filters, but are regular expressions really the best way to do path matching? I have some doubts about that. At least it doesn't feel like something I have commonly encountered.

wdyt?

@hiddeco
Copy link
Contributor

hiddeco commented Mar 22, 2024

I was on the fence about this myself as well. Based on experience in another project, we used a .gitignore format in the form of:

spec:
  ignore: |
    */
    !guestbook/

This worked really well, except that extensive ignore rules can become complex in cases where you have patterns in what you want to include (e.g. stages/{a,b,c}).

@maksimstankevic
Copy link
Contributor Author

maksimstankevic commented Mar 22, 2024

@maksimstankevic and @hiddeco great to sit down this morning and see you guys have hashed everything out!

I want to raise one final UX thing that I am sorry to say I overlooked until just now...

I get that it was likely modeled after tag filters, but are regular expressions really the best way to do path matching? I have some doubts about that. At least it doesn't feel like something I have commonly encountered.

wdyt?

My 2 cents, I am working as devops and part-time developer for 10 years now and when I saw this issue open I knew it has to absolutely be regexps in both filters right from the start. It gives me as a user the flexibility I need and if it hadn't regexp support I would bomb you with CRs for implementing it. I'm pretty sure most devops and developers know how to use regexp and that should not be an obstacle. I don't feel it's hitting performance either, but I don't pretend to be an expert here.

@krancour
Copy link
Member

@maksimstankevic I think my worry is that someone who doesn't read the docs might intuitively do something like foo/* which is a pretty common way of saying "everything in foo/," but when compiled as a regex, matches foobar/.

@hiddeco
Copy link
Contributor

hiddeco commented Mar 22, 2024

What I suggest to overcome this issue (and also to potentially allow adding e.g. glob support later) is to require an identifier to be specified.

For example:

includePaths:
- regexp:<actual expression>

Based on experience with these kind of identifiers, I would both allow regexp and regex, as there seem to be two groups of people.

@maksimstankevic
Copy link
Contributor Author

maksimstankevic commented Mar 22, 2024

@krancour yes I understand that perfectly. And I still need regexp here as a user.

Anything you decide, I can work on it. Not before the release though, I actually need a break, can that be done as a refinement after 0.5.0?

@krancour
Copy link
Member

@maksimstankevic you've been so accommodating. We're intent on getting this into v0.5.0. At the start of our current release cycle, we were fully prepared to be implementing this ourselves and it was a surprise to see a community member step up.

@hiddeco will take this the last mile and change nothing other than requiring a prefix to signal use of a regex and will otherwise assume items in the lists to be full paths or path prefixes.

All credit for this PR goes to you. You're going to end up with a big thank you in the release notes.

Signed-off-by: Hidde Beydals <hidde@hhh.computer>
Copy link
Member

@krancour krancour left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This gets an enthusiastic LGTM from me.

@hiddeco kept it as regex matching only, but simply requiring the "regex" or "regexp" prefix.

We can add prefix and glob based path matching in v0.6.0.

@krancour
Copy link
Member

@maksimstankevic thank you so much for the extensive work on this!

@krancour krancour enabled auto-merge March 22, 2024 16:41
@krancour krancour added this pull request to the merge queue Mar 22, 2024
Merged via the queue into akuity:main with commit 6643eb7 Mar 22, 2024
12 of 16 checks passed
@maksimstankevic
Copy link
Contributor Author

This gets an enthusiastic LGTM from me.

@hiddeco kept it as regex matching only, but simply requiring the "regex" or "regexp" prefix.

We can add prefix and glob based path matching in v0.6.0.

@hiddeco @krancour thanks!

I'd like to work on adding the other 2 matching methods if possible...

@krancour
Copy link
Member

@maksimstankevic that sounds great! Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants