Skip to content
This repository has been archived by the owner on Oct 12, 2023. It is now read-only.

Implement Git files discovery #45

Merged
merged 7 commits into from Jan 8, 2021

Conversation

rtnpro
Copy link
Contributor

@rtnpro rtnpro commented Sep 9, 2020

Fixes #29

@rtnpro rtnpro changed the title Implement Git files discovery [WIP] Implement Git files discovery Sep 9, 2020
@dgoodwin
Copy link
Contributor

Looks like the implementation for this PR isn't complete yet but so far looks fine.

Needs a conflict resolved.

@rtnpro
Copy link
Contributor Author

rtnpro commented Sep 24, 2020

@OmerKahani @dgoodwin One of the design decision I am faced with is if we should implement our own git utility for fetching the git repo and scan it's paths for finding the config json files.

Initially, I was leaning on using the argocd reposerver apiclient to fetch all the paths from the target repository, but it seems that ArgoCD repo server does not allow listing paths, but only applications. I am not sure if it will be possible or optimal to try to propose adding an API endpoint in ArgoCD server to list all paths in a git repository.

So, for the time being, I am thinking to just hit the ArgoCD reposerver to fetch the creds needed to clone a git repo like here:
https://github.com/argoproj/argo-cd/blob/89ee234634c10312d2fcb74c30e9d590fe01d4ff/server/application/application.go#L1279

and initialize a git client to clone the git repo. This sounds like a duplicate effort as ArgoCD reposerver already has a cache of the git repos, but I see the only way forward, for now, is to implement our little git client.

What do you think?

@OmerKahani
Copy link
Contributor

OmerKahani commented Sep 25, 2020

We had the same issue in the git directory generator. I think we need to add the functionality to the repo service, but this can take more time.

I did add a repo_service facade, you can add the needed functionality to it, and then we can work on migrating the code to the argocd repo service. This way it will be easy to refactor the code once we have it in the argocd repo service.

@alexec @xianlubird what do you think?

@xianlubird
Copy link
Member

@OmerKahani agree

@rtnpro
Copy link
Contributor Author

rtnpro commented Oct 20, 2020

I was away for 2 weeks, as I have become a father. I am back now, and have resumed work on this pull request.

@OmerKahani
Copy link
Contributor

I was away for 2 weeks, as I have become a father. I am back now, and have resumed work on this pull request.

Mazal Tov!

@rtnpro rtnpro changed the title [WIP] Implement Git files discovery Implement Git files discovery Oct 25, 2020
Copy link
Contributor Author

@rtnpro rtnpro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OmerKahani I got the Git files discovery generator working. I need a review here before I start working on the writing tests. Also, I am a bit confused on what tests to add for repo_service.go as I have added almost negligible logic in our code, but using the functions exposed by ArgoCD's git util.

May be, we can mock out the git helpers we added: GetPaths, GetFileContent, and do some functional testing in git_test.go.

repositoriesDB RepositoryDB
repoClientset apiclient.Clientset
repositoriesDB RepositoryDB
repoClientset apiclient.Clientset
}

type Apps interface {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should rename this interface to Repos instead of Apps, as we are doing more than just fetch App data now. We also call the reference of this interfaces instance as repos.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for that

@@ -66,3 +72,76 @@ func (a *argoCDService) GetApps(ctx context.Context, repoURL string, revision st

return res, nil
}

func (a *argoCDService) checkoutRepo(gitClient git.Client, revision string) error {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how best to write unit tests for these. We are not doing anything extra on our side, but just call the ArgoCD git util, which is unit tested.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should aim to 100% coverage,

why does the function get's a git client and not creating one based on the repo?

Copy link
Contributor Author

@rtnpro rtnpro Nov 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hoping to mock the gitClient for testing. But, just sending the repo object makes more sense. Actually, the gitClient name is a bit misleading, as the gitClient object is already tied to a repo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gitClient is actually tied to a repo instance, and it's not a generic git client like the name suggests. I sent gitClient with the hope of mocking it out for testing. But, there's not much to test here anyways, as the git client is already tested in ArgoCD codebase. I agree with you that sending repo instance makes more sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OmerKahani I gave it a lot of thought to pass repo object to checkoutRepo func, but there is a use case where gitClient is used later, e.g., https://github.com/argoproj-labs/applicationset/pull/45/files#diff-004c4dc22d5fb77bff9d0d6fd29d999d0e068fc99b996bf91a75213585a040efR93. It did not make sense to initialize gitClient multiple times.

So, what I did here: e4e1e5d is rename gitClient to getRepoClient which explicitly states the role of the variable. The git client instance is tied to a repo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, sounds good

return nil
}

func (a *argoCDService) GetPaths(ctx context.Context, repoURL string, revision string, pattern string) ([]string, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use GetPathes to get the applications? Does get path "*" will return all the top level directories and files?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GetPaths uses git ls-files under the hood, and it returns all the paths in the directory tree, matching the specified pattern, and not just the top level files or directories. So, it will need some extra work to retrieve applications using ls-files. We will also need to check if a directory is a valid app directory, by checking if it contains helm or kustomize artifacts. Since, ArgoCD repo server already returns the apps present in a repo, I think we should just piggy back on that functionality.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An application can be plain k8s objects (without Helm / Kustomize). Currently, the repo server will not return such directories, that's was the reason I asked. We should think about it after you merge this code.


return res, nil
}

func (g *GitGenerator) filter(Directories []argoprojiov1alpha1.GitDirectoryGeneratorItem, allApps []string) []string {
func (g *GitGenerator) generateParamsForGitFiles(appSetGenerator *argoprojiov1alpha1.ApplicationSetGenerator) ([]map[string]string, error) {
allPaths := []string{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is going to create a new git client for every file. I am not sure if this is a costly code (does it uses network / IO for it?), but if so, we should probably initialize it only once

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The git checkout call creates a local tmp directory, and it gets reused for subsequent calls, until the tmp directory is deleted.

Every git fetch call will involve network I/O to the git servers, but it will not pull the repo from scratch. So, I think we are good here.

Comment on lines 91 to 98
content, err := g.repos.GetFileContent(context.TODO(), appSetGenerator.Git.RepoURL, appSetGenerator.Git.Revision, path)
if err != nil {
return nil, err
}

config := make(map[string]interface{})
err = json.Unmarshal(content, &config)
if err != nil {
return nil, err
}

flat, err := flatten.Flatten(config, "", flatten.DotStyle)
if err != nil {
return nil, err
}
params := make(map[string]string)
for k, v := range flat {
params[k] = v.(string)
}

res = append(res, params)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be easier, for testing, to extract this to a new function.

}

for _, c := range cases {
cc := c
t.Run(cc.name, func(t *testing.T) {
argoCDServiceMock := argoCDServiceMock{}
argoCDServiceMock.On("GetApps", mock.Anything, mock.Anything, mock.Anything).Return(c.repoApps, c.repoError)
mockCalls := argoCDServiceMock.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OmerKahani I started writing test cases for git file generator scenario, however I am not able to use testify.Mock properly. What I am trying to do is setup multiple expectations per test case:

  • One for GetPaths
  • Many for GetFileContent

I tried to use Mock chaining as mentioned here: https://godoc.org/github.com/stretchr/testify/mock#Call.On, but still could not get it working. The test is failing with error below:

WARNING: Package "github.com/golang/protobuf/protoc-gen-go/generator" is deprecated.
        A future release of golang/protobuf will delete this package,
        which has long been excluded from the compatibility promise.

time="2020-11-18T21:58:55+05:30" level=info msg="applications result from the repo service" allAps="[app1 app2 p1/app3]" repoURL=RepoURL revision=Revision total=3
time="2020-11-18T21:58:55+05:30" level=info msg="applications result from the repo service" allAps="[app1 p1/app2 p1/p2/app3 p1/p2/p3/app4]" repoURL=RepoURL revision=Revision total=4
time="2020-11-18T21:58:55+05:30" level=info msg="applications result from the repo service" allAps="[]" repoURL=RepoURL revision=Revision total=0
--- FAIL: TestGitGenerateParams (0.00s)
    --- FAIL: TestGitGenerateParams/happy_flow_-_created_apps (0.00s)
        git_test.go:220: PASS:  GetApps(string,string,string)
        git_test.go:220: FAIL:  GetPaths(string,string,string,string)
                        at: [git_test.go:186]
        git_test.go:220: FAIL: 1 out of 2 expectation(s) were met.
                The code you are testing needs to make 1 more call(s).
                at: [git_test.go:220]
    --- FAIL: TestGitGenerateParams/It_filters_application_according_to_the_paths (0.00s)
        git_test.go:220: PASS:  GetApps(string,string,string)
        git_test.go:220: FAIL:  GetPaths(string,string,string,string)
                        at: [git_test.go:186]
        git_test.go:220: FAIL: 1 out of 2 expectation(s) were met.
                The code you are testing needs to make 1 more call(s).
                at: [git_test.go:220]
    --- FAIL: TestGitGenerateParams/handles_empty_response_from_repo_server (0.00s)
        git_test.go:220: PASS:  GetApps(string,string,string)
        git_test.go:220: FAIL:  GetPaths(string,string,string,string)
                        at: [git_test.go:186]
        git_test.go:220: FAIL: 1 out of 2 expectation(s) were met.
                The code you are testing needs to make 1 more call(s).
                at: [git_test.go:220]
    --- FAIL: TestGitGenerateParams/handles_error_from_repo_server (0.00s)
        git_test.go:220: PASS:  GetApps(string,string,string)
        git_test.go:220: FAIL:  GetPaths(string,string,string,string)
                        at: [git_test.go:186]
        git_test.go:220: FAIL: 1 out of 2 expectation(s) were met.
                The code you are testing needs to make 1 more call(s).
                at: [git_test.go:220]
    --- FAIL: TestGitGenerateParams/handles_error_from_repo_server#01 (0.00s)
        git_test.go:220: PASS:  GetApps(string,string,string)
        git_test.go:220: FAIL:  GetPaths(string,string,string,string)
                        at: [git_test.go:186]
        git_test.go:220: FAIL: 1 out of 2 expectation(s) were met.
                The code you are testing needs to make 1 more call(s).
                at: [git_test.go:220]
FAIL
FAIL    command-line-arguments  1.222s
FAIL

Removing or commenting out the new ArgoCDServiceMock.On calls allows the test to pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dgoodwin I need some help here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, sorry for the delay it was a busy week.

The problem is with line 186: https://github.com/argoproj-labs/applicationset/pull/45/files/1f6c6234c6514ba4d0fbf5693699c3c5ece966cc#diff-c1129e2ddfe8072c3267b62d8ac75bcd5dfa92143c40c478527853c08752b7eeR186

I think it will be best to create a new TestGitGenerateParams so there will be two functions:
TestGitGenerateParamsFromDirectory
TestGitGenerateParamsFromFile

everyone will have its own test cases and mock only one of the functions in the argocd service

Does it sound reasonable to you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Created separate test function for testing Git file generator.

return nil, err
}

paths, err := gitRepoClient.LsFiles(pattern)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The LsFiles implementation from Argo CD's git client does not handle recursive filters (eg**/*.json to recursively match all JSON files in a repo)

LsFiles is a wrapper around git ls-files --full-name -- (filter), which treats **/*.json as equivalent to */*.json (only matching from root folder)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And related: realized that since Argo CD's util/git utility calls to git directly, git needs to be included in the Dockerfile

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jgwest I ran git ls-files --full-name -- **/*.json on https://github.com/rtnpro/argocd-applicationset-examples/ and it returned

cluster-config/engineering/local/config.json

So, it supports searching for matching files in nested directories.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jgwest I have added more .json files to the repo: https://github.com/rtnpro/argocd-applicationset-examples/ to validate the output of git ls-files --full-name -- <pattern> and it seems to work as expected.

image

cc @OmerKahani

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rtnpro Agreed, thanks for investigating, it appears to work as expected for me as well... I'm not able to reproduce with the information I originally provided, and I'm not able to reproduce now (and no longer have the environment I originally used), so this can be resolved.

}

config := make(map[string]interface{})
err = json.Unmarshal(content, &config)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this only handles JSON data from the git repo, whereas the proposal notes that "referenced files would contain a json/yaml list of arbitrary structured objects"."

if err != nil {
return nil, err
}
allPaths = append(allPaths, paths...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing we may want to remove any duplicate files from the list, in the case where multiple path patterns are specified and they happen to (partially) overlap between one another.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jgwest Fixed by 69a891a

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@rtnpro
Copy link
Contributor Author

rtnpro commented Dec 22, 2020

@OmerKahani @jgwest

Apart from agreeing on whether to use git ls-files ... to find matching paths in a git repo, I have addressed all other review requests. I am confident that we are ready with this pull request.

Regarding the shortcomings mentioned by @jgwest in using git ls-files ... to list file paths matching a pattern, I tried reproducing the issues reported, but could not do so. Ref: #45 (comment)

I am using git version 2.28.0.

Copy link
Member

@jgwest jgwest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rtnpro, LGTM! Any outstanding items that arise can be handled under separate issues/PRs as needed.

Copy link
Contributor

@OmerKahani OmerKahani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rtnpro sorry for all of the delays. This was a great effort from your side and a huge contribution to the project.

@xianlubird xianlubird merged commit e57aae6 into argoproj:master Jan 8, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement Git Files Discovery
5 participants