Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using external credential managers for build credentials #291

Closed
xoebus opened this issue Feb 17, 2016 · 64 comments
Closed

Using external credential managers for build credentials #291

xoebus opened this issue Feb 17, 2016 · 64 comments
Assignees
Labels
Milestone

Comments

@xoebus
Copy link
Contributor

xoebus commented Feb 17, 2016

Motivation

There are so many credentials flowing through a Concourse server at the moment that it's keeping me up at night. Some of this risk can be mitigated by fetching the secrets as late as possible and using pipeline templates to insert them into the pipeline. This at least means that you'll never have the pipeline secrets stored on local workstations.

This isn't good enough.

Credentials are still stored in the Concourse database, aren't internally rotated, and do not have any audit trail as to which jobs and builds used them. Let's fix these things.

Goals

  • Build credentials are never stored on a disk (apart from in Vault where they are stored encrypted).
  • Build credentials are never stored locally when setting the pipeline using fly.
  • An audit trail of which build used a secret and when it used it should be visible.

Proposal

I don't want to have to build a secret management system. This means that we're looking to use one of the existing ones. There are a few to choose from in this regard but I'd like to go forward by using Vault. Vault seems to strike the right balance of confidence in implementation, good technical decisions, and generically usable. I'm not ruling out the ability to use a different secret store in the future. In fact, a good implementation of this proposal would allow pluggable secret stores with minimal effort.

So, what would this look like?

resources:
- name: s3-file
  type: s3
  source:
    access_key_id: {{!vault.s3-file.access_key_id}}
    secret_access_key: {{!vault.s3-file.secret_access_key}}

Here we have a snippet of pipeline configuration. Everything looks normal except the secrets on the last 2 lines. This would be a new syntactic convention that the regular templating would ignore and pass directly up to the server. When the server needed to do something that required a realized value of one of these references it would fetch it as late as possible from the server and then handle revoking it once it isn't required anymore. This means that (assuming Vault is set up correctly) each check run and each build would get a single use credential that would become invalid immediately. This is rad. 💯

The syntax shown above is not finalized but needs to have a few components:

  • vault refers to the name of the secret store from which the credential should be fetched. This would allow multiple stores per Concourse server as long as they were all named uniquely. A secret store should probably belong to a team rather than being global.
  • s3-file refers to the name of the secret inside Vault.
  • access_key_id and secret_access_key refer to the attributes on the secret.

Open Questions

  • What if the server goes down in the middle of using one of these secrets. Who revokes it?
  • How does Concourse authenticate with the secret store? Does this just move the problem?

Risks

  • Yet another thing to deploy next to Concourse in order to use the feature. This moves away from secure by default. This could be partially mitigated by making it easy for people to get started with Vault. (BOSH release, adding something to the monolithic binaries)
@concourse-bot
Copy link
Collaborator

concourse-bot commented Feb 17, 2016

Hi there!

We use Pivotal Tracker to provide visibility into what our team is working on. A story for this issue has been automatically created.

The current status is as follows:

  • #141798327 I can configure a CredHub backend for my team
  • #126871499 I can configure a Vault backend for my team
  • #127009961 When the same property is referenced multiple times, it should only be fetched once
  • #140235551 Using Vault for Build Credentials

This comment, as well as the labels on the issue, will be automatically updated as the status in Tracker changes.

@cppforlife
Copy link

one thing to think about is pipeline scoped creds e.g. two pipelines with same config but different creds. (same problem exists for similar style of creds in bosh where each deployment may need different scope of creds).

@zachgersh
Copy link
Contributor

Hey @xoebus I've been thinking about this question:

How does Concourse authenticate with the secret store? Does this just move the problem?

Why doesn't fly handle unsealing / sealing the secret store? Each ATC could also be provided with a read-only token to the secret store. The secret store could be resealed and the token could be set to expire after a reasonable time period (don't know whether the ATC handles these things). Even though you've "moved the problem" you still have made the pipelines more secure by default.

@robdimsdale
Copy link
Contributor

@zachgersh if the ATC can't renew its token automatically it could potentially be a poor user experience: "my builds all stopped working after a month". If it can renew its token, how much more secure is it? I think I agree with you that it's probably more secure, but it doesn't feel like the final answer in security to me.

@eddiewebb
Copy link

thanks for starting this conversation! I love the potential that concourse brings, but it's not viable in our environment until we address the encryption and management of secrets.

@cppforlife - agree on need for pipeline scoped credentials. Seems plausible to align to vaults tree concept.

@xoebus - is the revocation risk not addressable with reasonable lease windows in vault? I would prefer to trust a lease to destroy cred than a final type task in pipeline.

@holstvoogd
Copy link

I have been thinking about how this could be set up with the App ID auth backend. I would say each worker should have an unique User ID and each Pipeline should have an unique App ID. Builds get access to secrets based on their App ID and User ID, so they would be scope-able per pipeline and it ensures only known workers are used.

This all would require some manual configuration whenever you add a new pipeline and/or worker. But I guess that is kind of the point of ensuring it is secure: all secrets are managed out of band in vault as well as the access to it.

I've considered some setups where ATC manages App & User IDs for you, but this just seems to break the security model of vault and it becomes very complicated fast ;-)

@eddiewebb
Copy link

eddiewebb commented Jun 27, 2016

ANyone know how Stark & Wayne's planned summit topic relates?
https://cfsummit2016.sched.org/event/6Zld/air-marshal-no-more-secrets-in-your-manifests-michael-brodhead-stark-wayne

Enter Air Marshal. Air Marshal acts as a proxy between staff workstations and BOSH or Concourse. Manifests checked into source control have placeholders wherever secrets are necessary. Air Marshal reads secrets from a secure back-end, adding them to manifests where needed.

CC: @mkb

@xoebus
Copy link
Contributor Author

xoebus commented Jun 27, 2016

It's an interesting, pragmatic, and novel solution but doesn't solve the core problem of the credentials being stored in plain text in the system.

@eddiewebb
Copy link

Yeah, i agree @xoebus,
I reviewed the video with my team today, might be suitable for some use cases, but not the enterprise ready solution that moves away from assuming such frequent human access over CLI.

We'd like a system (or user) to configure the pipeline independent of cipher values, which should be decrypted at time of use, not time of configuration. This is important to allow key rotation and the like.

So any solution that depends on fly interaction to replace values in manifest is not viable.

@freelock
Copy link
Contributor

I've come up with a similar solution -- I have a bot in a Matrix chat room that listens for git commits, and when it sees them, it uses fly to update the pipeline to the latest version. The bot has access to the secrets, regular users don't. So this allows our regular users that the bot authorizes to update the pipeline without having access to the secrets.

I'm still interested in how secrets may be stored in the ATC, though, especially as we add clients that may have more governance issues to address than we currently have, and might need to have different sets of secrets for different clients/pipelines.

@eddiewebb
Copy link

The use of variables is one area that has pretty wide impact. Validation in fly cli for instance doesn't like {{!secret}} and using {{secret}} requires vars provided at config time.

So do we want to invade fly and other points to tolerate a new format, or find something that flies under the radar of yaml validation, or more simply use a hack of forcing quotes of an alternate format

resources:
- name: s3-file
  type: s3
  source:
    access_key_id: "{{!vault.s3-file.access_key_id}}"
    secret_access_key: "{{!vault.s3-file.secret_access_key}}"

Playing a bit all components support that pattern without validation or configuration errors.

That also lets the altered manifest go unnoticed by all existing systems, and would let us focus enhancements in ATC as the place to trade out properties for actual values (I think -- ATC is aware of pipeline config, assigns workers, and could grant and revoke secrets based on lifecycle of workers.).

I'm still learning/exploring concourse architecture, would love feedback on those thoughts.

@vito
Copy link
Member

vito commented Jul 17, 2016

Doing some more thinking and investigation for what this may look like now.

Here's one syntax I landed on:

yaml_syntax:
  abc: $vault.foo/bar.baz

json_syntax: {abc: $vault.foo/bar.baz}

double_string_yaml_syntax:
  abc: "$vault.foo/bar.baz"

single_string_yaml_syntax:
  abc: '$vault.foo/bar.baz'

double_string_json_syntax: {abc: "$vault.foo/bar.baz"}

single_string_json_syntax: {abc: '$vault.foo/bar.baz'}

Each scenario parses as the same value, so it seems to work pretty well. I also like that it moves away from {{foo}} syntax, and implies that it's bringing in a value in full, whereas {{foo}} implies that it supports interpolation (e.g. {{foo}}bar). The one downside is that it may imply that you can do ${vault.foo/bar.baz} which brings us back to square 1, but hopefully no one tries that. 😉

We may end up deprecating {{params}} functionality altogether and only supporting these credential variables, based on where #360 went.

@vito
Copy link
Member

vito commented Jul 17, 2016

Interesting blog post detailing one approach to the tricky bootstrapping problem with authorizing the ATC with Vault:

https://www.hashicorp.com/blog/vault-cubbyhole-principles.html

I'm a little concerned about how extreme we want to go with this - having Concourse become a pain in the arse to automate would be an unfortunate result. We obviously want to get unencrypted credentials out of the database, though, and storing an unencrypted Vault token in there is a non-solution. We may want to land on a middle ground where we've made it trivial to rotate and automate pipeline credentials via Vault, have removed all unencrypted creds from the DB, but don't try to solve the bootstrapping problem completely. This could mean the ATC storing encrypted Vault tokens in the DB and using a local key, similar to the session signing key. Then the risk is at least localized to the web nodes.

@vito
Copy link
Member

vito commented Jul 18, 2016

Tying in with #360 (comment) which has more thoughts on the templating question (and suggests not doing $vault.foo/bar.baz).

@eddiewebb
Copy link

This could mean the ATC storing encrypted Vault tokens in the DB and using a local key, similar to the session signing key.

@vito - just to clarify, are you saying the encrypted creds that ATC needs are for Vault access, and not the secrets within vault correct?

(if not, why aren't we considering ATC grabbing the creds from vault directly at time of execution?)

@vito
Copy link
Member

vito commented Jul 18, 2016

@eddiewebb Yes, encrypting the credentials to Vault itself. Everything else would be fetched on-demand from Vault.

jtarchie added a commit to vmware-archive/atc that referenced this issue Jun 21, 2017
concourse/concourse#291

Signed-off-by: Alex Suraci <suraci.alex@gmail.com>
jtarchie added a commit to vmware-archive/atc that referenced this issue Jun 21, 2017
also rename --vault-server-url to --vault-url for consistency

concourse/concourse#291

Signed-off-by: Alex Suraci <suraci.alex@gmail.com>
jtarchie added a commit that referenced this issue Jun 21, 2017
#291

Submodule src/github.com/concourse/atc 33758f6..9182887:
  > don't ignore err
  > support configuring TLS for vault
  > noop variable store should never find a value
Submodule src/github.com/concourse/testflight 8d13935f1..b0acfdb1a:
  > waste less engineering time

Signed-off-by: Alex Suraci <suraci.alex@gmail.com>
@vito
Copy link
Member

vito commented Jun 21, 2017

Update on this: we've completed what I think is a reasonable candidate for MVP, and hope to ship this very soon as 3.3!

Here's what we've left out:

  • Absolute path references with ((/foo)). This raised a couple security questions that we don't have great answers for yet. Primarily, it would allow a pipeline in team B to reference credentials from team A, which is pretty bad. We may want to go forward with either this feature being disabled by default (risking "knobs" anti-pattern) or instead have per-team tokens configured. Either way we need to think about it a bit more.
  • For configurations that fetch separate fields off the same key, we will currently fetch the same key each time, meaning if you have keys that rotate on every fetch, the fields won't match up with each other. For example, foo.access_key_id and foo.secret_access_key will result in a secret key for the wrong access key, if you're using Vault's AWS backend with dynamic IAM.

One other thing we may want to tackle in another iteration is automatic revocation of keys after their build has finished. No one's mentioned this since the original opening of the issue, but I imagine people will want it. (I want it, anyway.)

We will address these in later iterations, and ship what we have so far to get quicker feedback. So I propose those be made as separate issues once we close this one.

vito added a commit to vmware-archive/atc that referenced this issue Jun 22, 2017
vito added a commit that referenced this issue Jun 22, 2017
#291

Submodule src/github.com/concourse/atc 9182887..c5d2ecd:
  > log in to vault if client cert/key provided
  > kill unused method
@vito
Copy link
Member

vito commented Jun 23, 2017

Upon attempting to dogfood what we have so far, I found that our auth support wasn't quite realistic. We only supported being given a token, and had no periodic renewal set up. This is unrealistic as Vault tokens are always set to expire after some amount of time (max 32 days by default), and having to re-generate a token and re-deploy the ATC periodically would be quite burdensome.

As part of our implementation for TLS support, we had opted to support configuration for client cert based auth in addition to token-based, as it's a bit easier to securely manage and rotate with our BOSH + CredHub tooling. However upon trying to actually use it we found that you need to explicitly log in to Vault with the client cert/key to acquire a token, so then we added that.

We may want to additionally implement automatic token renewing however, as the initial log in grants a token that expires after some amount of time configured by the Vault operator. It would be a bummer if the ATC suddenly stopped being able to acquire credentials and had to be restarted periodically. We would want to do the same for the client token specified, which should in most cases be a periodic token so that the initially configured token remains valid after an ATC restart (as long as we've been renewing it).

vito added a commit to vmware-archive/atc that referenced this issue Jun 23, 2017
this extracts all the vault-specific code from atccmd/ into the vault
package, paving the way for support for future credential managers

concourse/concourse#291
vito added a commit that referenced this issue Jun 23, 2017
#291

Submodule src/github.com/concourse/atc c5d2ecd..3c7785c:
  > credential managers dynamically register
Submodule src/github.com/concourse/topgun 445f42d..8b558b0:
  > underscorify bosh properties
  > parameterize vault and git server release versions
  > move support code below tests
  > update ops file for new properties
  > split vault test into token auth and TLS auth
  > test vault TLS auth
vito pushed a commit to vmware-archive/atc that referenced this issue Jun 23, 2017
concourse/concourse#291

Signed-off-by: Alex Suraci <suraci.alex@gmail.com>
vito pushed a commit that referenced this issue Jun 23, 2017
#291

Submodule src/github.com/concourse/atc 3c7785c..7342ff4:
  > periodically renew vault token

Signed-off-by: Alex Suraci <suraci.alex@gmail.com>
vito added a commit to vmware-archive/atc that referenced this issue Jun 26, 2017
rather than explicitly supporting cert/approle/etc, just allow the user
to specify the backend and params directly

concourse/concourse#291
vito added a commit that referenced this issue Jun 26, 2017
#291

Submodule src/github.com/concourse/atc 7342ff4..62fc500:
  > make vault auth backend and config generic
vito pushed a commit that referenced this issue Jun 26, 2017
#291

Submodule src/github.com/concourse/topgun 8b558b0..78c872c:
  > ensure renew loop is cleaned up
  > fix redundant testing for vault
  > test for continuous token renewal

Signed-off-by: Alex Suraci <suraci.alex@gmail.com>
@vito
Copy link
Member

vito commented Jun 26, 2017

Slight change from before: we went ahead and just made the auth fully generic. So you specify --vault-auth-backend and --vault-auth-param.

For example, to use TLS auth, you'd just say --vault-auth-backend cert (or whatever you named your cert auth backend mount) and specify a client cert/key. To configure approle you'd do something like --vault-auth-backend approle --vault-auth-param role_id=db02de05-fa39-4855-059b-67221c5c2f63 --vault-auth-param secret_id=6a174c20-f6de-a53c-74d2-6018fcceff64.

We also now test cert, approle, and direct token configurations, and also test that they're renewed by the ATC automatically (using a periodic token for direct token configuration).

Next step is to try dogfooding again now that we've got the auth flow fleshed out.

vito pushed a commit to vmware-archive/topgun that referenced this issue Jun 26, 2017
otherwise this'll fail to resolve `((vault_ip))` in the initial non-TLS
deploy, if the director is configured with a config server

concourse/concourse#291

Signed-off-by: Alex Suraci <suraci.alex@gmail.com>
vito pushed a commit that referenced this issue Jun 26, 2017
#291

Submodule src/github.com/concourse/topgun 78c872c..3bb802b:
  > patch in variables when they're needed

Signed-off-by: Alex Suraci <suraci.alex@gmail.com>
vito pushed a commit that referenced this issue Jun 26, 2017
#291

Submodule src/github.com/concourse/atc a042f5b..266c14303:
  > vendor vault api
  > someone should make a petition for this

Signed-off-by: Alex Suraci <suraci.alex@gmail.com>
vito pushed a commit that referenced this issue Jun 26, 2017
#291

Submodule src/github.com/concourse/atc b65319f..01c22b4:
  > sacrifice our principles
Submodule src/github.com/concourse/topgun ea42d5d..a0b8c03:
  > test plucking fields off of vault provided params

Signed-off-by: Alex Suraci <suraci.alex@gmail.com>
@vito
Copy link
Member

vito commented Jun 27, 2017

I've converted all our pipelines over to Vault, and all is going well so far. I wrote a little tool to make the migration easier.

@vito
Copy link
Member

vito commented Jun 28, 2017

Looks good so far; calling this done as of now! The remaining bits of work will be modeled as separate issues.

@rkoster
Copy link
Contributor

rkoster commented Jul 6, 2017

@vito what is the timeline for credhub support?

xtremerui pushed a commit to vmware-archive/skymarshal that referenced this issue Jan 11, 2018
rather than always evaluating all params for every resource type, defer
it as late as possible, to the point where we actually use the custom
type.

this makes the one-off builds API consistent with build plans created by
the scheduler; both will be uninterpolated, and interpolated at run
time.

concourse/concourse#291

Signed-off-by: Alex Suraci <asuraci@pivotal.io>
ddadlani pushed a commit that referenced this issue Jul 17, 2018
#787

Submodule src/github.com/concourse/atc 3d921fb..74a9d16:
  > Change YAML and JSON marshaller data type to interface{}
  > Rename memory_size to container_limits_marshaller
  > Add ContainerLimit parsing logic to the decoder for TaskConfig
  > Add CPU and memory limit garden options for tasks
  > Merge pull request #291 from EngineerBetter/master

Signed-off-by: Divya Dadlani <ddadlani@pivotal.io>
@SleeperSmith
Copy link

Hi Guys. Tinkering around with Concourse.

I am running concourse workers on ECS and ECS Task Role doesn't seem to be working. From concourse/docker-image-resource#109 it seems that the preference is to explicitly define the credential is it?

  • When executing docker-image push to ecr, it seems to be picking up the EC2 IAM role. This should be disabled if the design goal is to be explicit?
  • Perhaps model the IAM task role permission as a provider and allow it to be used upon explicit reference? The permission is attached to the deployment of worker "groups" (as a service on ECS) so perhaps make it a requirement to specify worker requirement alone with the configuration?

I just find it odd the worker can access the ec2 task role while not the ecs task role.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests