Generic Secret Injection for Connector Configuration #20862

michaeljmarshall · 2023-07-24T17:22:11Z

Search before asking

I searched in the issues and found nothing similar.

Motivation

Provide a generic way to inject secrets into the config map for connectors without requiring a rewrite of each source/sink and without leaking them on the command line used to start the connector.

Context

The recent CVE-2023-37579 resulted in the potential to leak source/sink credentials because some credentials are stored in connector configuration instead of in the connector's secrets map.

The current way to configure secrets for connectors requires each source or sink to implement correct secret handling by either getting a secret from the SecretsProvider or by using custom annotations and this special configuration loader. This implementation assumes that users have a Configuration class that can be annotated, but that is not always the case because the connector framework passes a configuration map of Map<String, Object>. Note that the current mechanism is not well documented and is not used by all of the official Apache Pulsar connectors.

Solution

I propose that we materialize and merge all secrets from the secrets map into the config that is passed to the connector when we call Source#open or Sink#open. Materializing the secrets would look like:

    private void mergeSecretsIntoConfigs(Map<String, Object> secrets, Map<String, Object> configs) {
        for (Map.Entry<String, Object> entry : secrets.entrySet()) {
            Object oldValue = configs.put(entry.getKey(),
                    secretsProvider.provideSecret(entry.getKey(), entry.getValue()));
            if (oldValue != null) {
                log.warn("Config value for {} replaced by secret's configuration value.", entry.getKey());
            }
        }
    }

They key benefit to this solution is that it will work for all sinks and sources, and it will leverage the SecretsProvider interface to materialize the secrets.

This will benefit all deployment methods, but is most helpful for the kubernetes runtime.

Trade off

The one drawback to this solution is that it could theoretically break existing connector configuration. However, I think this is very unlikely because it only breaks when a configuration and a secret are passed with the same key.

I was able to resolve this trade off in two ways. First, I put this feature behind a feature flag. Second, I replaced put with putIfAbsent in the merge logic so that the existing configuration has precedence.

Alternatives

We could consider interpreting configuration values that start with a well known prefix, like env:, as values that need to be read from the environment. The primary drawback to this solution is that there is not an easy way to configure the function at this point in the code, which means that

This solution would look something like adding this code block

    // Replace environment variable pointers with their environment variable values
    for (Map.Entry<String, Object> entry : config.entrySet()) {
        if (entry.getValue() instanceof String && ((String) entry.getValue()).toLowerCase().startsWith("env:")) {
            String envVariableName = ((String) entry.getValue()).substring("env:".length());
            String envVariableValue = System.getenv(envVariableName);
            entry.setValue(envVariableValue);
        }
    }

to this method

pulsar/pulsar-functions/instance/src/main/java/org/apache/pulsar/functions/instance/JavaInstanceRunnable.java

Lines 884 to 929 in f7c0b3c

    
           static Map<String, Object> parseComponentConfig(String connectorConfigs, 
        
                                                           InstanceConfig instanceConfig, 
        
                                                           ClassLoader componentClassLoader, 
        
                                                           org.apache.pulsar.functions.proto.Function 
        
                                                                   .FunctionDetails.ComponentType componentType) 
        
                   throws IOException { 
        
               final Map<String, Object> config = ObjectMapperFactory 
        
                       .getMapper() 
        
                       .reader() 
        
                       .forType(new TypeReference<Map<String, Object>>() {}) 
        
                       .readValue(connectorConfigs); 
        
               if (instanceConfig.isIgnoreUnknownConfigFields() && componentClassLoader instanceof NarClassLoader) { 
        
                   final String configClassName; 
        
                   if (componentType == org.apache.pulsar.functions.proto.Function.FunctionDetails.ComponentType.SOURCE) { 
        
                       configClassName = ConnectorUtils 
        
                               .getConnectorDefinition((NarClassLoader) componentClassLoader).getSourceConfigClass(); 
        
                   } else if (componentType == org.apache.pulsar.functions.proto.Function.FunctionDetails.ComponentType.SINK) { 
        
                       configClassName =  ConnectorUtils 
        
                               .getConnectorDefinition((NarClassLoader) componentClassLoader).getSinkConfigClass(); 
        
                   } else { 
        
                       return config; 
        
                   } 
        
                   if (configClassName != null) { 
        
                       Class<?> configClass; 
        
                       try { 
        
                           configClass = Class.forName(configClassName, 
        
                                   true, Thread.currentThread().getContextClassLoader()); 
        
                       } catch (ClassNotFoundException e) { 
        
                           throw new RuntimeException("Config class not found: " + configClassName, e); 
        
                       } 
        
                       final List<String> allFields = BeanPropertiesReader.getBeanProperties(configClass); 
        
                       for (String s : config.keySet()) { 
        
                           if (!allFields.contains(s)) { 
        
                               log.error("Field '{}' not defined in the {} configuration {}, the field will be ignored", 
        
                                       s, 
        
                                       componentType, 
        
                                       configClass); 
        
                               config.remove(s); 
        
                           } 
        
                       } 
        
                   } 
        
               } 
        
               return config; 
        
           }

Anything else?

No response

Are you willing to submit a PR?

I'm willing to submit a PR!

The text was updated successfully, but these errors were encountered:

michaeljmarshall · 2023-07-24T17:57:21Z

I am working on this and plan to put up a PR soon to show my implementation and to keep this moving forward.

shibd · 2023-07-25T14:14:06Z

@michaeljmarshall hi,

but that is not always the case because the connector framework passes a configuration map of Map<String, Object>.

I don't quite understand what's wrong with the incoming map. Can you explain in detail?

BTW: I think you need to mention a PIP first. Thanks.

michaeljmarshall · 2023-07-25T17:24:14Z

but that is not always the case because the connector framework passes a configuration map of Map<String, Object>.

I don't quite understand what's wrong with the incoming map. Can you explain in detail?

The map is not the problem. In the section you quoted, I am describing the currently available mechanism to access secrets from a sink or a source. The current flow has two options:

Option 1:

Access all secrets from the context's getSecret(String) method.

Option 2:

Create a class for the configuration of the connector.
Annotate fields that are sensitive correctly. (This annotation is a custom Pulsar annotation and is not documented.)
Convert the Map<String, Object> to the class from step 1 using the IOConfigUtils class, and that class will pull secrets using the getSecret method described in option 1.

The problem with option 2 is that it hasn't been consistently implemented, and it leaves the correct implementation up to the sink/source creator instead of giving users a secure default.

For example, the Canal Source connector takes the effort to create the config class and correctly annotate the methods:

pulsar/pulsar-io/canal/src/main/java/org/apache/pulsar/io/canal/CanalSourceConfig.java

Lines 39 to 50 in ec102fb

    
           @FieldDoc( 
        
               required = true, 
        
               defaultValue = "", 
        
               sensitive = true, 
        
               help = "Username to connect to mysql database") 
        
           private String username; 
        
           @FieldDoc( 
        
               required = true, 
        
               defaultValue = "", 
        
               sensitive = true, 
        
               help = "Password to connect to mysql database") 
        
           private String password;

However, when the connector loads the configuration, it does so without the IOConfigUtils, which is not compliant with option 2.

pulsar/pulsar-io/canal/src/main/java/org/apache/pulsar/io/canal/CanalSourceConfig.java

Lines 83 to 92 in ec102fb

    
           public static CanalSourceConfig load(String yamlFile) throws IOException { 
        
               ObjectMapper mapper = new ObjectMapper(new YAMLFactory()); 
        
               return mapper.readValue(new File(yamlFile), CanalSourceConfig.class); 
        
           } 
        
           public static CanalSourceConfig load(Map<String, Object> map) throws IOException { 
        
               ObjectMapper mapper = new ObjectMapper(); 
        
               return mapper.readValue(mapper.writeValueAsString(map), CanalSourceConfig.class); 
        
           }

As such, a user must put the username and the password in the source configuration instead of in the secrets configuration. Other connectors that use the annotation with sensitive=true but without securely loading the config are SolrSinkConfig, InfluxDBSinkConfig, JdbcSinkConfig, RabbitMQAbstractConfig, and RedisAbstractConfig.

There are also connectors that do not use any annotations and just treat passwords like normal config. The AerospikeSinkConfig is one example:

pulsar/pulsar-io/aerospike/src/main/java/org/apache/pulsar/io/aerospike/AerospikeSinkConfig.java

Lines 32 to 58 in ec102fb

    
           public class AerospikeSinkConfig implements Serializable { 
        
               private static final long serialVersionUID = 1L; 
        
               private String seedHosts; 
        
               private String keyspace; 
        
               private String columnName; 
        
               // Optional 
        
               private String userName; 
        
               private String password; 
        
               private String keySet; 
        
               private int maxConcurrentRequests = 100; 
        
               private int timeoutMs = 100; 
        
               private int retries = 1; 
        
               public static AerospikeSinkConfig load(String yamlFile) throws IOException { 
        
                   ObjectMapper mapper = new ObjectMapper(new YAMLFactory()); 
        
                   return mapper.readValue(new File(yamlFile), AerospikeSinkConfig.class); 
        
               } 
        
               public static AerospikeSinkConfig load(Map<String, Object> map) throws IOException { 
        
                   ObjectMapper mapper = new ObjectMapper(); 
        
                   return mapper.readValue(mapper.writeValueAsString(map), AerospikeSinkConfig.class); 
        
               } 
        
           }

The above examples are all in the main apache/pulsar repo. We can expect even more confusion from third party connectors.

My core thesis is that the existing framework is not well documented and requires connector maintainers to do an extra step that could easily be handled by the framework itself.

BTW: I think you need to mention a PIP first. Thanks.

A similar feature did not need a PIP #20116. I think this feature is small enough that it does not need a PIP. In your opinion, why should this require a PIP? Thanks.

shibd · 2023-07-26T15:21:53Z

Hi, @michaeljmarshall. Thanks for your explanation. I got it.

A similar feature did not need a PIP #20116. I think this feature is small enough that it does not need a PIP. In your opinion, why should this require a PIP? Thanks.

I see new configurations introduced in this PR: merge_secrets_into_config_map. According to the PIP rules, we need to initiate a PIP discussion.

Also, I do want to discuss the configuration. Maybe we can merge secrets into config map by default. There are no compatibility issues here.

If the user only configures the password in --input-specs , we will not override it, which is compatible.
If the user only configures the password in --secrets , we should merge secrets into config map by default, This doesn't break any compatibility, it's more like fixing a bug.
If the user has configured a password in both --input-specs and --secrets.
- Point 1: We won't overwrite it, as you did in your PR, print the WARN log.
- Point 2: We should use secrets to override it. This should be in line with user expectations, not break compatibility. (This is more like fixing the current bug).

What do you think?

michaeljmarshall · 2023-07-26T17:23:53Z

I see new configurations introduced in this PR: merge_secrets_into_config_map. According to the PIP rules, we need to initiate a PIP discussion.

For what it's worth, I already started a discussion on the mailing list.

You raise valid points about the best way to solve this. I am going to take a closer look and see if it might be best solved with better documentation and a few bug fixes to take advantage of the sensitive annotation. I'll post more information here and on the mailing list, as needed.

michaeljmarshall · 2023-07-28T21:53:10Z

@shibd - I created a PIP for this change #20903. I also determined a different approach would be best.

nlu90 · 2023-07-29T00:54:13Z

Option 1:

Access all secrets from the context's getSecret(String) method.

Hi, @michaeljmarshall
What's the problem with Option 1? Why do we refrain from using this existing mechanism?

michaeljmarshall · 2023-07-29T01:01:55Z

@nlu90 after further exploration, I no longer think option 1 is viable due to wrapper third party connectors.

nlu90 · 2023-07-29T01:59:27Z

Are you referring to the pulsar-debezium connectors?

Can you give an example that context.getSecret() doesn't work while the new proposed way can work seamlessly?

michaeljmarshall · 2023-08-08T20:12:33Z

Are you referring to the pulsar-debezium connectors?

Can you give an example that context.getSecret() doesn't work while the new proposed way can work seamlessly?

The Kafka Connect Adapter is a generic wrapper that is designed to use existing connectors written for the Kafka ecosystem in the Pulsar ecosystem. For reference, here is a repo with many examples of wrapping third party connectors https://github.com/datastax/pulsar-3rdparty-connector. Because we're configuring code meant for another ecosystem, it is not possible to use the sensitive annotation field.

PIP: #20903 Relates to: #20862 ### Motivation The primary motivation is to make it possible to configure Pulsar Connectors in a secure, non-plaintext way. See the PIP for background and relevant details. The new interpolation feature only applies when deploying with functions to Kubernetes. ### Modifications * Add `SecretsProvider#interpolateSecretForValue` method with a default that maintains the current behavior. * Override `interpolateSecretForValue` in the `EnvironmentBasedSecretsProvider` so that configuration values formatted as `${my-env-var}` will be replaced with the result of `System.getEnv("my-env-var")` if the result is not `null`. * Implement a recursive string interpolation method that will replace any configuration value that the `interpolateSecretForValue` implementation determines ought to be replaced. ### Verifying this change Tests are added/modified. ### Documentation - [x] `doc-required` ### Matching PR in forked repository PR in forked repository: michaeljmarshall#55

PIP: apache#20903 Relates to: apache#20862 The primary motivation is to make it possible to configure Pulsar Connectors in a secure, non-plaintext way. See the PIP for background and relevant details. The new interpolation feature only applies when deploying with functions to Kubernetes. * Add `SecretsProvider#interpolateSecretForValue` method with a default that maintains the current behavior. * Override `interpolateSecretForValue` in the `EnvironmentBasedSecretsProvider` so that configuration values formatted as `${my-env-var}` will be replaced with the result of `System.getEnv("my-env-var")` if the result is not `null`. * Implement a recursive string interpolation method that will replace any configuration value that the `interpolateSecretForValue` implementation determines ought to be replaced. Tests are added/modified. - [x] `doc-required` PR in forked repository: michaeljmarshall#55 (cherry picked from commit bfde0de)

michaeljmarshall added type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages area/connector area/security labels Jul 24, 2023

michaeljmarshall self-assigned this Jul 24, 2023

michaeljmarshall mentioned this issue Jul 24, 2023

[feat][fn] Add option to merge secrets into config for sink/source #20863

Closed

1 task

This was referenced Jul 28, 2023

[feat][io] Support configuration secret interpolation #20901

Merged

[fix][io] Improve loading sensitive fields for many connectors #20902

Open

michaeljmarshall mentioned this issue Jul 28, 2023

[pip][design] PIP 289: Secure Pulsar Connector Configuration #20903

Merged

1 task

michaeljmarshall closed this as completed in #20903 Aug 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generic Secret Injection for Connector Configuration #20862

Generic Secret Injection for Connector Configuration #20862

michaeljmarshall commented Jul 24, 2023 •

edited

Loading

michaeljmarshall commented Jul 24, 2023

shibd commented Jul 25, 2023 •

edited

Loading

michaeljmarshall commented Jul 25, 2023

shibd commented Jul 26, 2023

michaeljmarshall commented Jul 26, 2023

michaeljmarshall commented Jul 28, 2023

nlu90 commented Jul 29, 2023

michaeljmarshall commented Jul 29, 2023

nlu90 commented Jul 29, 2023 •

edited

Loading

michaeljmarshall commented Aug 8, 2023

Generic Secret Injection for Connector Configuration #20862

Generic Secret Injection for Connector Configuration #20862

Comments

michaeljmarshall commented Jul 24, 2023 • edited Loading

Search before asking

Motivation

Context

Solution

Trade off

Alternatives

Anything else?

Are you willing to submit a PR?

michaeljmarshall commented Jul 24, 2023

shibd commented Jul 25, 2023 • edited Loading

michaeljmarshall commented Jul 25, 2023

shibd commented Jul 26, 2023

michaeljmarshall commented Jul 26, 2023

michaeljmarshall commented Jul 28, 2023

nlu90 commented Jul 29, 2023

michaeljmarshall commented Jul 29, 2023

nlu90 commented Jul 29, 2023 • edited Loading

michaeljmarshall commented Aug 8, 2023

michaeljmarshall commented Jul 24, 2023 •

edited

Loading

shibd commented Jul 25, 2023 •

edited

Loading

nlu90 commented Jul 29, 2023 •

edited

Loading