Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(GH-94) Support for service principal authentication #93

Merged
merged 23 commits into from
Sep 23, 2022

Conversation

kev-in-shu
Copy link
Contributor

@kev-in-shu kev-in-shu commented Aug 8, 2022

Implements fetching keyvault access token via service principal authentication.
Allows using this module on agents and servers that cannot use Azure Managed Identities.

This new mode of authentication is optional, and enabled by passing a new key service_principal_credentials to hiera options.
This key contains a path to a YAML file on puppet server where credentials can be found.

Here is a sample hiera.yaml file

---
version: 5
defaults:
  datadir: 'hieradata'
  data_hash: yaml_data
hierarchy:
  - name: "azure_keyvault_poc"
    lookup_key: azure_key_vault::lookup
    options:
      vault_name: my_vault_name
      vault_api_version: '2016-10-01'
      key_replacement_token: '-'
      confine_to_keys:
        - '^azure_kv::.*'
      metadata_api_version: '2018-04-02'
      service_principal_credentials: '/etc/puppetlabs/puppet/azure_keyvault.yaml'

Which works with the following azure_keyvault.yaml file:

azure_tenant_id: '00000000-0000-1234-1234-000000000000'
azure_client_id: '00000000-0000-1234-1234-000000000000'
azure_client_secret: some-secret

Possible enhancements:

  • implement service principal authentication for secret() function
    • however, I don't know how the credentials would be passed
  • metadata_api_version still needs to be passed, even though it is not used anymore.

@TraGicCode TraGicCode added the feature New feature or request label Aug 8, 2022
@TraGicCode
Copy link
Owner

Hey @kev-in-shu ,

Thank you for contributing! Before we can go through the code review, I would like to talk through a couple things to make sure this feels like the right changes.

metadata_api_version still needs to be passed, even though it is not used anymore.

I think we could fix this issue by simply writing some validation to ensure that both matadata_api_version AND service_principal_credentials are not specified. This can be done by following a similiar pattern shown here

if confine_keys
raise ArgumentError, 'confine_to_keys must be an array' unless confine_keys.is_a?(Array)

You can also write a unit test to validate the validation is working as expected as well here.

https://github.com/TraGicCode/tragiccode-azure_key_vault/blob/68c2e2376bb326979e90dda95a59ba8c53d20c95/spec/functions/azure_key_vault_lookup_spec.rb

implement service principal authentication for secret() function. however, I don't know how the credentials would be passed

I think in this case we would not utilize a file to pull from and instead pass the required information as parameters to the function like so.

$important_secret = azure_key_vault::secret('production-vault', 'important-secret', {
  vault_api_version    => '2016-10-01',
  azure_tenant_id      => '00000000-0000-1234-1234-000000000000',
  azure_client_id: '00000000-0000-1234-1234-000000000000',
  azure_client_secret: Sensitive('some-secret'),
})

We would also need to do the same validation check as mentioned above here as well ( ensuring that both matadata_api_version AND service_principal_credentials are not specified ).

You can also write a unit test to validate the validation is working as expected as well here.

https://github.com/TraGicCode/tragiccode-azure_key_vault/blob/master/spec/functions/azure_key_vault_secret_spec.rb

The above change would also mean you would need to perform the yaml file inside of https://github.com/TraGicCode/tragiccode-azure_key_vault/blob/master/lib/puppet/functions/azure_key_vault/lookup.rb instead of https://github.com/TraGicCode/tragiccode-azure_key_vault/blob/master/lib/puppet_x/tragiccode/azure.rb to allow for the secret function to reuse the code for retrieving the access_token.

Documentation is missing for this feature

In order to call the feature complete, the documentation ( in the readme.md ) would need to be updated. Every feature that comes into this module has to have clear and thorough documentation to allow others to consume and use this module correctly and without confusion.

Are there any gotchas or design issues we need to be aware of?

Can you think of any issues that might occur if someone tried to mix the secret function ( non-deffered and deffered ) with MSI and service_principal_credentials credentials? Does this make or not make sense in certain situations? If so we should probably highlight this in the documentation changes for this feature.

Also be aware, there is some caching magic that happens in the function ( along with hiera ) for the access_tokens needed to query key vault.

@TraGicCode TraGicCode changed the title Support for service principal authentication (GH-94) Support for service principal authentication Aug 9, 2022
@TraGicCode
Copy link
Owner

Hey @kev-in-shu,

Let me know if this is something you are able to do.

Thanks!

@kev-in-shu
Copy link
Contributor Author

kev-in-shu commented Aug 10, 2022

Hi @TraGicCode,

Thank you for your feedback! I'm glad that you agree with the general idea. I agree with most of your suggestions regarding secret(), validations and tests. And I can see that you have already started implementing some of it.

Regarding documentation, I think we now have a few distinct use cases. We could document some of these use cases in a examples folder:

  • hiera lookup with MSI
  • secret() non-deferred with MSI
  • secret() deferred with MSI
  • hiera lookup with service principal auth
  • secret() non-deferred with service principal auth
  • secret() deferred with service principal auth

@kev-in-shu
Copy link
Contributor Author

kev-in-shu commented Aug 10, 2022

Regarding mixing things, even without putting deferred functions in the mix, and just focusing on puppet server side, caching of the token is indeed a problem if multiple authentications are used:

  • MSI + service principal
  • or service principal 1 + service principal 2

The current design design does not prevent this...

If this use case needs to be supported, we could change the string that identifies the cached token.

access_token = context.cached_value('access_token')

Something like this:

access_token_id = if using_msi
                    'access_token:msi'
                  else
                    "access_token:sp:#{yaml['azure_tenant_id']}/#{yaml['azure_client_id']}"
                  end
access_token_id = context.cached_value(access_token_id)
if access_token.nil?
  # ...
  context.cache(access_token_id, access_token)
end

Expects at least one of metadata_api_version or service_principal_credentials
@kev-in-shu
Copy link
Contributor Author

kev-in-shu commented Aug 13, 2022

I pushed some changes in this PR, but it is not complete yet.

About the lookup() function:

  • I have made some small changes with a more explicit error message when none of service_principal_credentials or metadata_api_version is specified.
  • I did some tests with caching of authentication token with lookup(). Apparently Puppet LookupContext is already doing the job of contextualizing the cache, so lookup() does not need any change in this regard.
  • except for missing documentation, I think the lookup() function is almost done.

I also started implementing support for service principal authentication in secret(). Instead of adding a new parameter, I reused and renamed parameter api_versions_hash. In this case the cached token code had to be adapted to cope with multiple authentication methods.

Still missing:

  • unit test updates
  • testing behavior of secret() with Puppet deferred function
  • documentation of the new features

# @param secret_version The version of the secret you want to retrieve. This parameter is optional and if not passed the default behavior is to retrieve the latest version.
# @return [Sensitive[String]] Returns the secret as a String wrapped with the Sensitive data type.
dispatch :secret do
cache_param
required_param 'String', :vault_name
required_param 'String', :secret_name
required_param 'Hash', :api_versions_hash
param 'Struct[{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enforcing the type clarifies what is expected for this parameter, but there is a chance it breaks the code of current users of the module.

README.md Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
@kev-in-shu
Copy link
Contributor Author

kev-in-shu commented Aug 24, 2022

@TraGicCode,

Regarding mixing things, I tried using all possible combinations in a single puppet manifest, and everything seemed ok.

Table below summarizes all these combinations and which ones are preferred or should not be used.

Syntax MSI SP MSI+deferred SP+deferred
secret() Good Ok Good Probably wrong
lookup() Good Ok N/A N/A

Good = recommended
Ok = usage of service principal, only if MSI is not possible

I cannot see any technical problem with invoking secret() as a deferred function and passing service principal credentials, but it looks like a wrong pattern, because both the keyvault credentials and the protected secrets are going to be seen by both agent and server.

In the end, I think deferred function call only looks reasonable in one scenario. But I do not have much experience with deferred function so I may be wrong.

Not sure how to put that in the documentation.

@kev-in-shu
Copy link
Contributor Author

kev-in-shu commented Sep 1, 2022

Hi,

I just noticed that Deferred functions can be badly broken under certain circumstances, so I would not want to rely on them yet: https://tickets.puppetlabs.com/browse/PUP-10928

As a consequence, I consider the topic of deferred functions out of the scope of this PR, and this Pull Request should be complete as-is.

Let me know if I should change anything

Kind regards,

Kevin

@TraGicCode
Copy link
Owner

Hey @kev-in-shu ,

Sorry for getting back soo late on this. I've been very busy on my side!

I just noticed that Deferred functions can be badly broken under certain circumstances, so I would not want to rely on them yet: https://tickets.puppetlabs.com/browse/PUP-10928

I will take a look at this but the way you are describing it things sound bad!!

I will review this pr and make any remaining documentation changes or touchups that are remaining.

Thanks for being patient!

@kev-in-shu
Copy link
Contributor Author

Hi @TraGicCode

About deferred functions, well, I did make it sound like deferred functions have a big issue. Guess I'm taking as an excuse, and I don't feel like starting the documentation about this usage since I have so little experience with it. The issue I mentioned seems to have some workarounds, but it looked like a lot of work to apply it in my case.

Glad to see you back, and let me know if I can help!

@TraGicCode
Copy link
Owner

Hi @TraGicCode

About deferred functions, well, I did make it sound like deferred functions have a big issue. Guess I'm taking as an excuse, and I don't feel like starting the documentation about this usage since I have so little experience with it. The issue I mentioned seems to have some workarounds, but it looked like a lot of work to apply it in my case.

Glad to see you back, and let me know if I can help!

Hey @kev-in-shu ,

Please bare with me. Very busy on my end. I will plan to hopefully get to this and finish it this weekend...

@TraGicCode
Copy link
Owner

Hey @kev-in-shu ,

Pushed changes to readme and now will be cutting a new release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants