Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Reusable workflows #9655

Closed
1 task done
candiduslynx opened this issue Apr 4, 2023 · 17 comments
Closed
1 task done

feat: Reusable workflows #9655

candiduslynx opened this issue Apr 4, 2023 · 17 comments
Assignees

Comments

@candiduslynx
Copy link
Contributor

candiduslynx commented Apr 4, 2023

Which problem is this feature request solving?

I would like to be able to reuse the workflows/jobs to reduce boilerplate code.

Warning
The previous CI issue (#9388) took a couple of weeks to get resolved.

Note
Currently only @erezrokah has the expertise required to review the changes.
Additionally, the changes have to be thoroughly tested.
This makes this issue a slow one to implement as well as review.

Describe the solution you'd like

There's a nice feature GitHub provides called reusable workflows.
This allows to extract the jobs that are copied many times into a separate workflows and instead of copying the workflow logic just call the reusable workflow.

This is extracted from #9388 to be implemented and discussed separately.

Pull request (optional)

  • I can submit a pull request
@candiduslynx candiduslynx self-assigned this Apr 4, 2023
kodiakhq bot pushed a commit that referenced this issue Apr 9, 2023
Extracted from #9518.

Implemented using relative call + added dependency on the reusable workflow file changes.

Part of reusable workflows (#9655) implementation
@candiduslynx
Copy link
Contributor Author

AWS source: validate-release done in #9780

@erezrokah
Copy link
Contributor

erezrokah commented Apr 11, 2023

Hi @candiduslynx I think we missed a crucial piece of logic with the wait for all and re-usable workflows, starting with #9780.

We’ll need to add logic to detect that if a re-usable workflow changes, we wait for all dependent workflows. For example if we change only validate_release.yml we need to wait for all workflows that use it.
WDYT?

I’m worried this might complicate things a bit, unless we find a good automated way to know the dependencies

kodiakhq bot pushed a commit that referenced this issue Apr 11, 2023
…gin workflow" (#9834)

Reverts #9780

See #9655 (comment). We should first take a look at that comment before using re-usable workflows
@candiduslynx
Copy link
Contributor Author

I’m worried this might complicate things a bit, unless we find a good automated way to know the dependencies

@erezrokah take a look at #9842 that does the following:

  1. Check if diff includes changes to workflows
  2. Recursively gets the dependent workflow files (so, if we changed workflow A, that was used in workflow B, that was used in workflow C, we'd add both B & C to the diff)
  3. It also logs the added logic (why the workflows B & C were added)

I think that will unblock implementing reusable workflows.
Additionally, I think I can add the regex logic to the workflow-to-action matching, so that we don't encounter the issues that I had to fix in #9798

@erezrokah
Copy link
Contributor

Hi @candiduslynx, thanks for the PR. I think it might be worth bringing this issue to a wider discussion with the rest of the team. Not sure we would like to make the wait for all logic even more complex than it is now, just to reduce code duplication.

WDYT?

@candiduslynx
Copy link
Contributor Author

Hi @erezrokah, I'm fine with bringing it to a wider discussion.

The wait logic is there only because of the stuff we're missing from GitHub, so it will always be a hack.

I would still want us to reduce the code duplication and maintain a more easily upgradable approach.

@bbernays
Copy link
Collaborator

Wouldn't using not generalized/templated workflows make it more difficult for a user to create a plugin in their own repo or would it be the same?

Beyond the user impact how much time and effort do we as a team spend editing or debugging github workflows? Is the reduction of a few duplicated files worth the custom code we would have to write and own?

@erezrokah
Copy link
Contributor

@bbernays I think this won't impact users writing their own plugins, as re-usable workflows will only be used in the monorepo (this repository). The workflows in the monorepo are already not usable in single plugin repos, and adding/not adding re-usable workflows won't change that I think

@bbernays
Copy link
Collaborator

Thank you @erezrokah for clarifying that for me!

@erezrokah
Copy link
Contributor

Beyond the user impact how much time and effort do we as a team spend editing or debugging github workflows?

On my end I would say I spend the most time when we need to update the wait for all workflow. Otherwise debugging a single plugin workflow (e.g. testing a new destination) is very much scoped to that plugin.

@erezrokah
Copy link
Contributor

erezrokah commented Apr 16, 2023

Maybe it would be helpful to summarize where we would like to use re-suable workflows:

  • resolve-runner - we have 5 of those
  • validate-release - we have 62 of those. Of of the 62, 6 require CGO_ENABLED: 1 so are different an will require a separate workflow
  • Test workflows - only relevant for sources as each destination requires a dedicate service to test it (e.g. PostgreSQL docker). We have 39 sources, out of those DB sources like PostgreSQL, MySQL, Firestore and soon OracleDB could be out of scope as they also require a dedicate container to test. AWS might be out of scope too, as it has a sanity test which needs additional work, see feat(aws): Update AWS source workflow #9541 (comment).

Did I miss anything @candiduslynx?

As for the time to do #9388, I think that's mostly since we misidentified the cause of the caching issue, and attributed it to a caching issue in all plugins, then coupled it to the re-usable workflows change. When we discovered the issue was scoped to AWS, GCP, K8S and Azure, we could submit a much smaller, safer fix.
Relevant comments are:
#9388 (comment)
#9388 (comment)

Please let me know if I missed anything @candiduslynx

@yevgenypats
Copy link
Member

I'll weigh in here a bit out of the blue so sorry about that, just wanted to bring an older discussion we had about a year ago.

We started with reusable workflows and we got to an understanding that it is not a very good idea because plugins even though they are in a single repo are standalone. sometimes they have the same logic sometimes different. In general the approach we take in CI and in code is KISS over DRY. Creating abstraction layers is a complex area and usually creates more harm then good because it makes maintenance much harder, moreover creating those in YAML is a nightmare (given right now GH is YAML based and not code based).

I never had a problem with our CI and it was always super easy to debug given the simplicity of it. I suggest not fixing something that works very well with complex solutions.

That's my 2 cents on the issue but we can discuss again tomorrow.

Re caching - of-course this is a great improvement but it has nothing to do with the introduction of re-usable workflows which seems a bit un-necessary especially if it is a complex task with a lot of gotchas.

@candiduslynx
Copy link
Contributor Author

@yevgenypats the caching is not finished yet, as the caches amount we use at the moment exceeds the quota, so we have the cache files removed & not used effectively.
The reusable workflows allow to address this in a simple change done in the reusable workflow file rather than propagating it over a lot of the workflows & making sure that it won't come up again.

@yevgenypats
Copy link
Member

That's ok - Im just saying that reusable workflow is fine if we are sure that this workflow is exactly the same 200% of the time. the moment there is at least one different we will either have to delete it or just use it for some workflows because a resuable workflow with huge if plugin == x else if plugin == y else statement is unusable and spreads plugin logic between many files.

@candiduslynx
Copy link
Contributor Author

That's ok - Im just saying that reusable workflow is fine if we are sure that this workflow is exactly the same 200% of the time. the moment there is at least one different we will either have to delete it or just use it for some workflows because a reusable workflow with huge if plugin == x else if plugin == y else statement is unusable and spreads plugin logic between many files.

That's 100% true. I don't want to create more work from maintaining CI, but rather make it more straightforward and easy to create new plugin workflow without the need to look through a long copy-pasted workflow & doing the replacing.

We basically want to make this scoped to the places where we can effectively reuse, already mentioned by @erezrokah in #9655 (comment):

  1. resolve-runner – really the same code and the easiest thing to introduce
  2. validate-release – this one has 2 flavors (CGO for several destinations and no CGO for everything else, includingcli & scaffold workflows):
  • We'll be introducing 2 different reusable workflows for this one (CGO & non-CGO)
  1. Source plugin tests:
  • Almost every source plugin has the same test job (apart from directory that it's running from)
  • Quirks like AWS (sanity check) and others can be either extracted into separate jobs or left as is, still, the majority is perfectly replaceable with the reusable one

@erezrokah
Copy link
Contributor

erezrokah commented Apr 16, 2023

@yevgenypats the caching is not finished yet, as the caches amount we use at the moment exceeds the quota, so we have the cache files removed & not used effectively.
The reusable workflows allow to address this in a simple change done in the reusable workflow file rather than propagating it over a lot of the workflows & making sure that it won't come up again.

I don't think we've proven that a different caching strategy than the default one we use works better, so I won't add that to the mix just yet. Even if we do decide on a different caching strategy, we could still do it without re-usable workflows.

I don't want to create more work from maintaining CI, but rather make it more straightforward and easy to create new plugin workflow without the need to look through a long copy-pasted workflow & doing the replacing.

Given the added complexity of re-usable workflows as mentioned here #9655 (comment), I'm not sure using will make it easier to maintain than single plugin scoped workflows.

Quirks like AWS (sanity check) and others can be either extracted into separate jobs or left as is, still, the majority is perfectly replaceable with the reusable one

Extracting those into separate jobs require adding more logic to the wait for all workflow, which means if a developer would like to customize a workflow they either need to copy paste (as we do now), or modify the wait for all logic.
I think I mentioned this here #9541 (comment)

@yevgenypats
Copy link
Member

I see. So if this is the case I think we should 100% put this on hold as really the value to changing something that works very well is minimal and we should focus on other high priority tasks like migrating destinations to the new SDK. We can have a sync call to make sure Im not missing details and we are all aligned.

@yevgenypats
Copy link
Member

Closing this for now given our discussion last week. if it will ever be a priority we can open this again and by then GitHub CI would have new features so prob everything will be different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

4 participants