-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pipelines] Change to a single-source, single-build pipeline, deploy+approve model #10872
Comments
Hey! Great proposal! Coming from the Amazon world myself, I would love to see CDK Pipelines come closer to this model. :) One use-case I have for multiple sources-builds is deploying a CDK app with a Java Lambda, given:
I would find it nice to have separate source/build triggers for each one of those packages in the pipeline, unless there is a cleaner way to do so. The Maybe there's a more idiomatic CDK solution to replicate a JAR artifact across multiple stages and using that as the Lambda code source, but I was unable to find a good example of this. |
Is there a special reason you are having two repos? Imho the CDK way is to have the Java lambda inside the same repo and reference it as an asset. It will then be built once and the jar file will be used in all stages. |
@hoegertn "The CDK way" - this is the first time I've heard of this, to be honest. Are there docs defining this as a best practice? But I guess my reasoning would be separation of concerns - it'd be great to avoid dumping code of different languages, environments and build systems (Java, Scala, Python, TypeScript) in the same codebase. That sounds a bit messy to me (just a personal preference), but maybe I'm not familiar with some tooling that would make it nicer. (but part of the reason is historic - this Lambda repo was never a part of a CDK application or any CI/CD process, which I'm trying to change). So your preferred project structure would be:
In this case, I suppose |
I have written a blog post with a sample: https://taimos.de/blog/build-a-basic-serverless-application-using-aws-cdk I will try to write something more into this ticket later today. |
@hoegertn Thanks! I took a brief look. It seems that like with most CDK examples, it assumes my Lambda will be in TypeScript, which is much more straightforward to build, package and deploy from a TypeScript CDK project, than a Java lambda. Alternatively, I guess you could have multiple projects in the same repo, if you wanted to:
And the infra package would build the other 2 projects prior to deploying. I don't know, to me it feels more natural to allow replicating a CodePipeline artifact to multiple stages. :/ In any case, looking forward for any tips you might have. :) |
@straygar this blog post goes into more detail around how you can build, package, and deploy with the CDK. It uses Golang as an example, but the same concepts can be applied to other languages. https://aws.amazon.com/blogs/devops/building-apps-with-aws-cdk/ |
@rix0rrr , @hoegertn , Currently we have a pattern that can includes some db schema migrations before and some testing after deployment of stages. Would it be the intention that in this mode of working migrations could occur in "'additional actions' between Prepare and Execute" or would they be before and separate to the AppDeployment within the stage? For the integration testing we are using jest that runs after the stage is deployed, so hopping that this will still be supported. Right now for both migration and testing we are using ShellScriptAction, would there be a place for something similar in this new overall approach? so support something per stage like: The approval ideas described will address other challenges we have around visibility and control. |
Is there a special reason you are not doing the DB migrations within your app deployment by using CFN custom resources for example? |
@hoegertn , Not so much a special reason, but right now we use yarn script to run a typeorm migration and trigger via a shell command in ShellScriptAction. Ideally would be good to keep this however... one of the main reasons i asked is to try and ensure that we are looking at all options, and options that will have a synergy with the project direction. |
@nbaillie to be clear:
This was intended to be about locations in the pipeline, not specifically about the action types themselves. Although we won't offer out-of-the-box support for all types of actions, we won't prevent you from adding custom ones. I've updated the comment to reflect that. But they will have to be in one of a couple of predefined positions:
Something like |
"I'm looking for compelling use cases that would require us to support multiple sources and/or multiple builds. Please supply them in this ticket if you have them." Sorry i'm coming to the party rather late, - This description is copied from a slack chat, sorry if its overly wordy. I'm working on a project where I'm using cdk pipelines to do a multi-accoutn deployment for dev/test/prod. As part of the stack, i'm deploying containers on ECS.. I have to be able to support container image building, and placement into ECR, when a Github repo is updated. There is multiple images, each with its own github repo. When any of the images are updated through their respective codebuild, I need to do a green blue reployement of that image through the various stages. As far as i could tell/discover, cdk pipelines dont' support multiple sources.. ( please, if they do, let me know how! ).. so, i had to find a way around this.. I thought the solution, I came up with, to work around this limitation was interesting, and i'd not seen it used before ( it may well have been of course ). What I did was to add a few extra lines into the buildspec that is used for the image build.
I'm putting the image tag ( which is a commit_id hash ) in a SSM parameter, and then having the code kick the code exectution off. The code and dockerfiles for the various containers come from different repos because the code/repos are owned by different business units, and in one case a different organization entirely. They do allow us to read the repos and to pull from the on commits. ( we have created codestar connections to them ). |
"This means we explicitly give up a bunch of freedom that CodePipeline allows, in order to make a simpler application deployment model. We'll reject adding multiple source stages, multiple builds, and many types of actions in weird locations people might want to insert into the pipeline; if they want to do that, they should drop down to building a complete CodePipeline pipeline." I read @clareliguori Blog and couldn't find anywhere specifically in that article that said there should only be a single source.? Edit: Asked Clare what she thinks |
Hey all! Thanks for pinging me @mrpackethead. In attempting to model how we do pipelines at Amazon in CDK, I think the number of source and build actions is less important than the order of stages in the pipeline and how the deployment actions are laid out. Amazon pipelines have one single source stage (as seen in the diagram above), but that source stage typically brings in multiple sources (as seen here). For example, pretty much all pipelines internally at least bring in both the 'primary' source code repo and that repo's dependency libraries in the source stage. A simple example to model in CDK would be bringing in the 'primary' source code (GitHub source action in CodePipeline terms) that will be built and deployed, plus a Docker base image (ECR source action in CodePipeline). Last time I checked, CodePipeline pipelines only have a single source stage for all source actions, so that already matches how we model pipelines internally. After the source stage, we do all the build steps. Using the simple example I gave above, this would mean compiling the source code and packaging it into a Docker image that is pushed into ECR. Since CodePipeline already does the work for you of passing immutable artifacts between the build actions, I'm not especially opinionated about this being a single build action/stage vs split across multiple build actions/stages. Some build steps like replicating the docker image across ECR repos in multiple regions/accounts for deployment are useful to do in individual build actions (one per region/account). The area where we tend to be opinionated internally is simply doing ALL the build steps before ANY of the deployment steps, such that you have a consistent set of artifacts that will be deployed later in the pipeline. When it comes to deployment stages, there is generally only one deployment action per AZ/region/account/any-other-unit in a wave's stage in an internal pipeline. The key here for us is whether the operator can rollback a deployment in a single step. To use the example of DB migration scripts from @nbaillie above, in a single internal pipeline, we wouldn't run a database migration script in a separate deployment action from deploying the microservice. Then we can't rollback in a single step: the operator has to roll them back in a specific order manually, which can delay recovery and introduces human error. In that case, we would either a) split database migrations into a separate pipeline from the microservice's pipeline, or b) combine them into a single deployment workflow action that deploys and rolls back in the correct order. We have an internal system for doing that, but @hoegertn's suggestion of using a single CFN stack for both DB migration and microservice deployment is functionally equivalent in this case (using source dependency relationships in the stack, you would ensure CFN will deploy and roll back in the correct order). Hope that helps! |
Thanks for the insight @clareliguori , thats really helpful. @rix0rrr when you said "We'll reject adding multiple source stages", - For clarity did this mean with a single source as well, or woudl this be a single source stage, with one or more sources? Its quite possible i've mis-understood your intent |
Firstly CDK pipelines are awesome and I have really enjoyed using them. I want to add my thoughts on multi-source builds and why it will be a good addition. A few examples have already been given above regarding polyglot projects. I also frequently encounter separate repos for infrastructure and application code in enterprises. There is a separation of responsibility between development and infrastructure/operations and permissions are also created along these lines. Multi-source builds will be a great batteries included solution in such cases. The alternatives are workable but have fall short in terms of being seamless -
The pace of development and deployment is a little slower in such cases, there is a hand-off between the development and infrastructure team. By having multi-source builds, the pipelines module will be flexible and unopinionated, allowing all kinds of teams to use it. As a reverse question, what is the downside of supporting multiple sources in the pipeline? |
Two downsides spring to mind:
To my mind, running
This is potentially a stronger one: the output of every dependency should always be a construct, that gets published to a package repository. Every published revision has a version number. Then, in a different package, the application that deploys into the pipeline pulls in that dependency by its version number. An upgrade is a bump of that version number as a commit, a rollback is a revert of that commit. A single commit in a single repository always completely defines the software that is getting deployed. Contrast this with if you have multiple packages, there is really no one commit you can point to and say "install and build that one to get an exact copy of the software that is now running in prod". After having written this, both of these have to do with reproducibility. At Amazon internally we have a different mechanism to ensure reproducible point-in-time builds across multiple packages, but since that mechanism doesn't exist outside our own build system, doing this would give us the same guarantees. |
The issue appears to be a severe limitation to me and pops up frequently, see, for example, here and here.
I was thinking about creating an extra git repository that includes all dependencies ( |
The suggested workaround is to use separate pipelines to publish your additional artifacts to package managers (or ECR, or some other artifact store), where they will get a version number, and then refer to the version number from your source. That way, there is a single commit in a single repository that represents your entire application and it's easy to know what commit caused a particular change and it can easily be rolled back. |
@rix0rrr how can this be automated so that a dev in the asset repo can push a change and have the infrastructure pipeline (in a single repo) get updated to the new asset version? |
@gshpychka I guess you would update your internal package version in your So your workflow would not be
but rather
Indeed, the approach guarantees backward compatibility as your older package versions are always available and the code in your cdk repository specifies compatibility among these internal package versions. What you give up is a "deploy & test" approach. Publishing and deploying a new internal version of some module comes with overhead and you rather want some confidence that your new internal version works as expected. |
How would I go about automating that, though? |
Add a new, modernized API to the `pipelines` library. Advantages of the new API are: - Removes the need to interact with the underlying AWS CodePipeline library for `Artifacts` and `Sources` - A streamlined API for sources (more sensible defaults allowing you to specify less) - `Synth` classes hide less from you, allowing you more control and remove the need to decide whether or not to "eject" from the convenience classes of the original API - Supports parallel deployments (speeding up large pipelines) - Supports stages of >25 stacks - Supports multiple sources powering the build - Gives more control over the CodeBuild projects that get generated In addition, by clearly separating out generic parts of the library from CodePipeline/CodeBuild-specific parts, allows easier development of construct libraries that target alternative deployment systems while reusing large parts of the logic of this library. This does not remove or deprecate the old API, though starting today its use is discouraged in favor of the new API, which will see more development in the future. Closes #10872. ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
|
Add a new, modernized API to the `pipelines` library. Advantages of the new API are: - Removes the need to interact with the underlying AWS CodePipeline library for `Artifacts` and `Sources` - A streamlined API for sources (more sensible defaults allowing you to specify less) - `Synth` classes hide less from you, allowing you more control and remove the need to decide whether or not to "eject" from the convenience classes of the original API - Supports parallel deployments (speeding up large pipelines) - Supports stages of >25 stacks - Supports multiple sources powering the build - Gives more control over the CodeBuild projects that get generated In addition, by clearly separating out generic parts of the library from CodePipeline/CodeBuild-specific parts, allows easier development of construct libraries that target alternative deployment systems while reusing large parts of the logic of this library. This does not remove or deprecate the old API, though starting today its use is discouraged in favor of the new API, which will see more development in the future. Closes aws#10872. ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
Add a new, modernized API to the `pipelines` library. Advantages of the new API are: - Removes the need to interact with the underlying AWS CodePipeline library for `Artifacts` and `Sources` - A streamlined API for sources (more sensible defaults allowing you to specify less) - `Synth` classes hide less from you, allowing you more control and remove the need to decide whether or not to "eject" from the convenience classes of the original API - Supports parallel deployments (speeding up large pipelines) - Supports stages of >25 stacks - Supports multiple sources powering the build - Gives more control over the CodeBuild projects that get generated In addition, by clearly separating out generic parts of the library from CodePipeline/CodeBuild-specific parts, allows easier development of construct libraries that target alternative deployment systems while reusing large parts of the logic of this library. This does not remove or deprecate the old API, though starting today its use is discouraged in favor of the new API, which will see more development in the future. Closes aws#10872. ---- *By submitting this pull request, I confirm that my contribution is made under the terms of the Apache-2.0 license*
I figured out a hacky way to do this for my service, which deploys an application to ECS Fargate. The trick is to have an AWS custom resource in your application asset pipeline that triggers the infra pipeline by automatically updating the infra pipeline's remote repository. Specifically, you can update a file in the infra pipeline repo with the asset hash of the app's DockerImageAsset. Your app source code repository should contain 3 things:
The cdk subdirectory should contain a simple code pipeline that deploys a single stack (DockerStack). The DockerStack builds your application source code in a DockerImageAsset, copies the DockerImageAsset to an ECR repository, and triggers a lambda function that updates a file in your cdk infrastructure git repo with the DockerImageAsset's asset hash. This will trigger your cdk infrastructure code pipeline, which can deploy an ECS container image from the ECR repository using the asset hash. I created two gists with the DockerStack and the lambda source code that updates the remote GitHub repository (using GitPython):
Please comment if you can think of a more idiomatic way to do this; however, it does work! |
Hello, I saw this pull request #12326, which solves this problem, but there is no detailed explanation. Is there any best practice instructions for implementing this process? |
Realign the API model of the library more towards what's described in Automating Safe, Hands-off deployments.
This means we explicitly give up a bunch of freedom that CodePipeline allows, in order to make a simpler application deployment model. We'll reject adding multiple source stages, multiple builds, and many types of actions in weird locations people might want to insert into the pipeline; if they want to do that, they should drop down to building a complete CodePipeline pipeline.
This means we do the following things:
GitHubSourceAction
and others which have fewer required parameters. We'll only ever have one source action, so no need to specify Artifacts.I think the "action" model (with hooks for users to insert approvals) needs to look somewhat like this:
The "application approvals" should probably be configurable in direct style, it's okay if the changeset approvals require implementing an interface of some sort (not ideal, but acceptable).
This might entail making changes to the CodePipeline L2 library; but only in order to make implementing CDK Pipelines easier, NOT to improve the API of the CodePipeline library. Some candidates:
codepipeline_actions.Action
base class to thecodepipeline
library in order to make it easier to implement Actions.I'm looking for compelling use cases that would require us to support multiple sources and/or multiple builds. Please supply them in this ticket if you have them.
On the deployment front, CDK pipelines will never do anything more than deploying CDK apps using CloudFormation, and running some limited set of validations.
The text was updated successfully, but these errors were encountered: