-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generic Secret Injection for Connector Configuration #20862
Comments
I am working on this and plan to put up a PR soon to show my implementation and to keep this moving forward. |
I don't quite understand what's wrong with the incoming map. Can you explain in detail? BTW: I think you need to mention a PIP first. Thanks. |
The map is not the problem. In the section you quoted, I am describing the currently available mechanism to access secrets from a sink or a source. The current flow has two options: Option 1:
Option 2:
The problem with option 2 is that it hasn't been consistently implemented, and it leaves the correct implementation up to the sink/source creator instead of giving users a secure default. For example, the Canal Source connector takes the effort to create the config class and correctly annotate the methods: pulsar/pulsar-io/canal/src/main/java/org/apache/pulsar/io/canal/CanalSourceConfig.java Lines 39 to 50 in ec102fb
However, when the connector loads the configuration, it does so without the pulsar/pulsar-io/canal/src/main/java/org/apache/pulsar/io/canal/CanalSourceConfig.java Lines 83 to 92 in ec102fb
As such, a user must put the There are also connectors that do not use any annotations and just treat passwords like normal config. The pulsar/pulsar-io/aerospike/src/main/java/org/apache/pulsar/io/aerospike/AerospikeSinkConfig.java Lines 32 to 58 in ec102fb
The above examples are all in the main apache/pulsar repo. We can expect even more confusion from third party connectors. My core thesis is that the existing framework is not well documented and requires connector maintainers to do an extra step that could easily be handled by the framework itself.
A similar feature did not need a PIP #20116. I think this feature is small enough that it does not need a PIP. In your opinion, why should this require a PIP? Thanks. |
Hi, @michaeljmarshall. Thanks for your explanation. I got it.
I see new configurations introduced in this PR: Also, I do want to discuss the configuration. Maybe we can merge
What do you think? |
For what it's worth, I already started a discussion on the mailing list. You raise valid points about the best way to solve this. I am going to take a closer look and see if it might be best solved with better documentation and a few bug fixes to take advantage of the |
Hi, @michaeljmarshall |
@nlu90 after further exploration, I no longer think option 1 is viable due to wrapper third party connectors. |
Are you referring to the pulsar-debezium connectors? Can you give an example that |
The Kafka Connect Adapter is a generic wrapper that is designed to use existing connectors written for the Kafka ecosystem in the Pulsar ecosystem. For reference, here is a repo with many examples of wrapping third party connectors https://github.com/datastax/pulsar-3rdparty-connector. Because we're configuring code meant for another ecosystem, it is not possible to use the |
PIP: #20903 Relates to: #20862 ### Motivation The primary motivation is to make it possible to configure Pulsar Connectors in a secure, non-plaintext way. See the PIP for background and relevant details. The new interpolation feature only applies when deploying with functions to Kubernetes. ### Modifications * Add `SecretsProvider#interpolateSecretForValue` method with a default that maintains the current behavior. * Override `interpolateSecretForValue` in the `EnvironmentBasedSecretsProvider` so that configuration values formatted as `${my-env-var}` will be replaced with the result of `System.getEnv("my-env-var")` if the result is not `null`. * Implement a recursive string interpolation method that will replace any configuration value that the `interpolateSecretForValue` implementation determines ought to be replaced. ### Verifying this change Tests are added/modified. ### Documentation - [x] `doc-required` ### Matching PR in forked repository PR in forked repository: michaeljmarshall#55
PIP: apache#20903 Relates to: apache#20862 The primary motivation is to make it possible to configure Pulsar Connectors in a secure, non-plaintext way. See the PIP for background and relevant details. The new interpolation feature only applies when deploying with functions to Kubernetes. * Add `SecretsProvider#interpolateSecretForValue` method with a default that maintains the current behavior. * Override `interpolateSecretForValue` in the `EnvironmentBasedSecretsProvider` so that configuration values formatted as `${my-env-var}` will be replaced with the result of `System.getEnv("my-env-var")` if the result is not `null`. * Implement a recursive string interpolation method that will replace any configuration value that the `interpolateSecretForValue` implementation determines ought to be replaced. Tests are added/modified. - [x] `doc-required` PR in forked repository: michaeljmarshall#55 (cherry picked from commit bfde0de)
Search before asking
Motivation
Provide a generic way to inject secrets into the
config
map for connectors without requiring a rewrite of each source/sink and without leaking them on the command line used to start the connector.Context
The recent CVE-2023-37579 resulted in the potential to leak source/sink credentials because some credentials are stored in connector configuration instead of in the connector's
secrets
map.The current way to configure secrets for connectors requires each source or sink to implement correct secret handling by either getting a secret from the SecretsProvider or by using custom annotations and this special configuration loader. This implementation assumes that users have a
Configuration
class that can be annotated, but that is not always the case because the connector framework passes a configuration map ofMap<String, Object>
. Note that the current mechanism is not well documented and is not used by all of the official Apache Pulsar connectors.Solution
I propose that we materialize and merge all secrets from the
secrets
map into theconfig
that is passed to the connector when we callSource#open
orSink#open
. Materializing the secrets would look like:They key benefit to this solution is that it will work for all sinks and sources, and it will leverage the
SecretsProvider
interface to materialize the secrets.This will benefit all deployment methods, but is most helpful for the kubernetes runtime.
Trade off
The one drawback to this solution is that it could theoretically break existing connector configuration. However, I think this is very unlikely because it only breaks when a configuration and a secret are passed with the samekey
.I was able to resolve this trade off in two ways. First, I put this feature behind a feature flag. Second, I replaced
put
withputIfAbsent
in the merge logic so that the existing configuration has precedence.Alternatives
We could consider interpreting configuration values that start with a well known prefix, like
env:
, as values that need to be read from the environment. The primary drawback to this solution is that there is not an easy way to configure the function at this point in the code, which means thatThis solution would look something like adding this code block
to this method
pulsar/pulsar-functions/instance/src/main/java/org/apache/pulsar/functions/instance/JavaInstanceRunnable.java
Lines 884 to 929 in f7c0b3c
Anything else?
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: