Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generic Secret Injection for Connector Configuration #20862

Closed
2 tasks done
michaeljmarshall opened this issue Jul 24, 2023 · 10 comments · Fixed by #20903
Closed
2 tasks done

Generic Secret Injection for Connector Configuration #20862

michaeljmarshall opened this issue Jul 24, 2023 · 10 comments · Fixed by #20903
Assignees
Labels
area/connector area/security type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages

Comments

@michaeljmarshall
Copy link
Member

michaeljmarshall commented Jul 24, 2023

Search before asking

  • I searched in the issues and found nothing similar.

Motivation

Provide a generic way to inject secrets into the config map for connectors without requiring a rewrite of each source/sink and without leaking them on the command line used to start the connector.

Context

The recent CVE-2023-37579 resulted in the potential to leak source/sink credentials because some credentials are stored in connector configuration instead of in the connector's secrets map.

The current way to configure secrets for connectors requires each source or sink to implement correct secret handling by either getting a secret from the SecretsProvider or by using custom annotations and this special configuration loader. This implementation assumes that users have a Configuration class that can be annotated, but that is not always the case because the connector framework passes a configuration map of Map<String, Object>. Note that the current mechanism is not well documented and is not used by all of the official Apache Pulsar connectors.

Solution

I propose that we materialize and merge all secrets from the secrets map into the config that is passed to the connector when we call Source#open or Sink#open. Materializing the secrets would look like:

    private void mergeSecretsIntoConfigs(Map<String, Object> secrets, Map<String, Object> configs) {
        for (Map.Entry<String, Object> entry : secrets.entrySet()) {
            Object oldValue = configs.put(entry.getKey(),
                    secretsProvider.provideSecret(entry.getKey(), entry.getValue()));
            if (oldValue != null) {
                log.warn("Config value for {} replaced by secret's configuration value.", entry.getKey());
            }
        }
    }

They key benefit to this solution is that it will work for all sinks and sources, and it will leverage the SecretsProvider interface to materialize the secrets.

This will benefit all deployment methods, but is most helpful for the kubernetes runtime.

Trade off

The one drawback to this solution is that it could theoretically break existing connector configuration. However, I think this is very unlikely because it only breaks when a configuration and a secret are passed with the same key.

I was able to resolve this trade off in two ways. First, I put this feature behind a feature flag. Second, I replaced put with putIfAbsent in the merge logic so that the existing configuration has precedence.

Alternatives

We could consider interpreting configuration values that start with a well known prefix, like env:, as values that need to be read from the environment. The primary drawback to this solution is that there is not an easy way to configure the function at this point in the code, which means that

This solution would look something like adding this code block

    // Replace environment variable pointers with their environment variable values
    for (Map.Entry<String, Object> entry : config.entrySet()) {
        if (entry.getValue() instanceof String && ((String) entry.getValue()).toLowerCase().startsWith("env:")) {
            String envVariableName = ((String) entry.getValue()).substring("env:".length());
            String envVariableValue = System.getenv(envVariableName);
            entry.setValue(envVariableValue);
        }
    }

to this method

static Map<String, Object> parseComponentConfig(String connectorConfigs,
InstanceConfig instanceConfig,
ClassLoader componentClassLoader,
org.apache.pulsar.functions.proto.Function
.FunctionDetails.ComponentType componentType)
throws IOException {
final Map<String, Object> config = ObjectMapperFactory
.getMapper()
.reader()
.forType(new TypeReference<Map<String, Object>>() {})
.readValue(connectorConfigs);
if (instanceConfig.isIgnoreUnknownConfigFields() && componentClassLoader instanceof NarClassLoader) {
final String configClassName;
if (componentType == org.apache.pulsar.functions.proto.Function.FunctionDetails.ComponentType.SOURCE) {
configClassName = ConnectorUtils
.getConnectorDefinition((NarClassLoader) componentClassLoader).getSourceConfigClass();
} else if (componentType == org.apache.pulsar.functions.proto.Function.FunctionDetails.ComponentType.SINK) {
configClassName = ConnectorUtils
.getConnectorDefinition((NarClassLoader) componentClassLoader).getSinkConfigClass();
} else {
return config;
}
if (configClassName != null) {
Class<?> configClass;
try {
configClass = Class.forName(configClassName,
true, Thread.currentThread().getContextClassLoader());
} catch (ClassNotFoundException e) {
throw new RuntimeException("Config class not found: " + configClassName, e);
}
final List<String> allFields = BeanPropertiesReader.getBeanProperties(configClass);
for (String s : config.keySet()) {
if (!allFields.contains(s)) {
log.error("Field '{}' not defined in the {} configuration {}, the field will be ignored",
s,
componentType,
configClass);
config.remove(s);
}
}
}
}
return config;
}

Anything else?

No response

Are you willing to submit a PR?

  • I'm willing to submit a PR!
@michaeljmarshall michaeljmarshall added type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages area/connector area/security labels Jul 24, 2023
@michaeljmarshall michaeljmarshall self-assigned this Jul 24, 2023
@michaeljmarshall
Copy link
Member Author

I am working on this and plan to put up a PR soon to show my implementation and to keep this moving forward.

@shibd
Copy link
Member

shibd commented Jul 25, 2023

@michaeljmarshall hi,

but that is not always the case because the connector framework passes a configuration map of Map<String, Object>.

I don't quite understand what's wrong with the incoming map. Can you explain in detail?

BTW: I think you need to mention a PIP first. Thanks.

@michaeljmarshall
Copy link
Member Author

but that is not always the case because the connector framework passes a configuration map of Map<String, Object>.

I don't quite understand what's wrong with the incoming map. Can you explain in detail?

The map is not the problem. In the section you quoted, I am describing the currently available mechanism to access secrets from a sink or a source. The current flow has two options:

Option 1:

  1. Access all secrets from the context's getSecret(String) method.

Option 2:

  1. Create a class for the configuration of the connector.
  2. Annotate fields that are sensitive correctly. (This annotation is a custom Pulsar annotation and is not documented.)
  3. Convert the Map<String, Object> to the class from step 1 using the IOConfigUtils class, and that class will pull secrets using the getSecret method described in option 1.

The problem with option 2 is that it hasn't been consistently implemented, and it leaves the correct implementation up to the sink/source creator instead of giving users a secure default.

For example, the Canal Source connector takes the effort to create the config class and correctly annotate the methods:

@FieldDoc(
required = true,
defaultValue = "",
sensitive = true,
help = "Username to connect to mysql database")
private String username;
@FieldDoc(
required = true,
defaultValue = "",
sensitive = true,
help = "Password to connect to mysql database")
private String password;

However, when the connector loads the configuration, it does so without the IOConfigUtils, which is not compliant with option 2.

public static CanalSourceConfig load(String yamlFile) throws IOException {
ObjectMapper mapper = new ObjectMapper(new YAMLFactory());
return mapper.readValue(new File(yamlFile), CanalSourceConfig.class);
}
public static CanalSourceConfig load(Map<String, Object> map) throws IOException {
ObjectMapper mapper = new ObjectMapper();
return mapper.readValue(mapper.writeValueAsString(map), CanalSourceConfig.class);
}

As such, a user must put the username and the password in the source configuration instead of in the secrets configuration. Other connectors that use the annotation with sensitive=true but without securely loading the config are SolrSinkConfig, InfluxDBSinkConfig, JdbcSinkConfig, RabbitMQAbstractConfig, and RedisAbstractConfig.

There are also connectors that do not use any annotations and just treat passwords like normal config. The AerospikeSinkConfig is one example:

public class AerospikeSinkConfig implements Serializable {
private static final long serialVersionUID = 1L;
private String seedHosts;
private String keyspace;
private String columnName;
// Optional
private String userName;
private String password;
private String keySet;
private int maxConcurrentRequests = 100;
private int timeoutMs = 100;
private int retries = 1;
public static AerospikeSinkConfig load(String yamlFile) throws IOException {
ObjectMapper mapper = new ObjectMapper(new YAMLFactory());
return mapper.readValue(new File(yamlFile), AerospikeSinkConfig.class);
}
public static AerospikeSinkConfig load(Map<String, Object> map) throws IOException {
ObjectMapper mapper = new ObjectMapper();
return mapper.readValue(mapper.writeValueAsString(map), AerospikeSinkConfig.class);
}
}

The above examples are all in the main apache/pulsar repo. We can expect even more confusion from third party connectors.

My core thesis is that the existing framework is not well documented and requires connector maintainers to do an extra step that could easily be handled by the framework itself.

BTW: I think you need to mention a PIP first. Thanks.

A similar feature did not need a PIP #20116. I think this feature is small enough that it does not need a PIP. In your opinion, why should this require a PIP? Thanks.

@shibd
Copy link
Member

shibd commented Jul 26, 2023

Hi, @michaeljmarshall. Thanks for your explanation. I got it.

A similar feature did not need a PIP #20116. I think this feature is small enough that it does not need a PIP. In your opinion, why should this require a PIP? Thanks.

I see new configurations introduced in this PR: merge_secrets_into_config_map. According to the PIP rules, we need to initiate a PIP discussion.

Also, I do want to discuss the configuration. Maybe we can merge secrets into config map by default. There are no compatibility issues here.

  1. If the user only configures the password in --input-specs , we will not override it, which is compatible.
  2. If the user only configures the password in --secrets , we should merge secrets into config map by default, This doesn't break any compatibility, it's more like fixing a bug.
  3. If the user has configured a password in both --input-specs and --secrets.
    • Point 1: We won't overwrite it, as you did in your PR, print the WARN log.
    • Point 2: We should use secrets to override it. This should be in line with user expectations, not break compatibility. (This is more like fixing the current bug).

What do you think?

@michaeljmarshall
Copy link
Member Author

I see new configurations introduced in this PR: merge_secrets_into_config_map. According to the PIP rules, we need to initiate a PIP discussion.

For what it's worth, I already started a discussion on the mailing list.

You raise valid points about the best way to solve this. I am going to take a closer look and see if it might be best solved with better documentation and a few bug fixes to take advantage of the sensitive annotation. I'll post more information here and on the mailing list, as needed.

@michaeljmarshall
Copy link
Member Author

@shibd - I created a PIP for this change #20903. I also determined a different approach would be best.

@nlu90
Copy link
Member

nlu90 commented Jul 29, 2023

Option 1:

Access all secrets from the context's getSecret(String) method.

Hi, @michaeljmarshall
What's the problem with Option 1? Why do we refrain from using this existing mechanism?

@michaeljmarshall
Copy link
Member Author

@nlu90 after further exploration, I no longer think option 1 is viable due to wrapper third party connectors.

@nlu90
Copy link
Member

nlu90 commented Jul 29, 2023

Are you referring to the pulsar-debezium connectors?

Can you give an example that context.getSecret() doesn't work while the new proposed way can work seamlessly?

@michaeljmarshall
Copy link
Member Author

Are you referring to the pulsar-debezium connectors?

Can you give an example that context.getSecret() doesn't work while the new proposed way can work seamlessly?

The Kafka Connect Adapter is a generic wrapper that is designed to use existing connectors written for the Kafka ecosystem in the Pulsar ecosystem. For reference, here is a repo with many examples of wrapping third party connectors https://github.com/datastax/pulsar-3rdparty-connector. Because we're configuring code meant for another ecosystem, it is not possible to use the sensitive annotation field.

michaeljmarshall added a commit that referenced this issue Aug 16, 2023
PIP: #20903
Relates to: #20862 

### Motivation

The primary motivation is to make it possible to configure Pulsar Connectors in a secure, non-plaintext way. See the PIP for background and relevant details. The new interpolation feature only applies when deploying with functions to Kubernetes.

### Modifications

* Add `SecretsProvider#interpolateSecretForValue` method with a default that maintains the current behavior.
* Override `interpolateSecretForValue` in the `EnvironmentBasedSecretsProvider` so that configuration values formatted as `${my-env-var}` will be replaced with the result of `System.getEnv("my-env-var")` if the result is not `null`.
* Implement a recursive string interpolation method that will replace any configuration value that the `interpolateSecretForValue` implementation determines ought to be replaced.

### Verifying this change

Tests are added/modified.

### Documentation

- [x] `doc-required`

### Matching PR in forked repository

PR in forked repository: michaeljmarshall#55
michaeljmarshall added a commit to datastax/pulsar that referenced this issue Aug 16, 2023
PIP: apache#20903
Relates to: apache#20862

The primary motivation is to make it possible to configure Pulsar Connectors in a secure, non-plaintext way. See the PIP for background and relevant details. The new interpolation feature only applies when deploying with functions to Kubernetes.

* Add `SecretsProvider#interpolateSecretForValue` method with a default that maintains the current behavior.
* Override `interpolateSecretForValue` in the `EnvironmentBasedSecretsProvider` so that configuration values formatted as `${my-env-var}` will be replaced with the result of `System.getEnv("my-env-var")` if the result is not `null`.
* Implement a recursive string interpolation method that will replace any configuration value that the `interpolateSecretForValue` implementation determines ought to be replaced.

Tests are added/modified.

- [x] `doc-required`

PR in forked repository: michaeljmarshall#55

(cherry picked from commit bfde0de)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connector area/security type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages
Projects
None yet
3 participants