Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][io] Improve loading sensitive fields for many connectors #20902

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

michaeljmarshall
Copy link
Member

@michaeljmarshall michaeljmarshall commented Jul 28, 2023

PIP 289: #20903
Relates to: #20862

Motivation

The primary motivation is to fix Pulsar Connector configurations that do not completely implement the sensitive annotation flag. Once implemented, users will be able to supply secrets in a secure, non-plaintext way. See the PIP for background and relevant details.

Modifications

  • Update how aerospike, canal, influxdb, jdbc/core, rabbitmq, redis, and solr connectors load configuration to allow for correct secret injection.
  • Fix several defaultValue annotation fields to make sure that Jackson can correctly set the defaults.

Verifying this change

Tests are updated and added.

Documentation

  • doc-required

Matching PR in forked repository

PR in forked repository: michaeljmarshall#54

@michaeljmarshall michaeljmarshall added area/connector area/security doc-required Your PR changes impact docs and you will update later. type/PIP labels Jul 28, 2023
@michaeljmarshall michaeljmarshall added this to the 3.2.0 milestone Jul 28, 2023
@michaeljmarshall michaeljmarshall self-assigned this Jul 28, 2023
@github-actions github-actions bot removed the type/PIP label Jul 28, 2023
@michaeljmarshall michaeljmarshall changed the title Improve loading connector secrets [fix][io] Improve loading sensitive fields for many connectors Aug 14, 2023
@michaeljmarshall michaeljmarshall marked this pull request as ready for review August 14, 2023 23:11
<dependency>
<groupId>${project.groupId}</groupId>
<artifactId>pulsar-io-common</artifactId>
<version>${project.version}</version>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Connectors shouldn't bundle Pulsar IO jars.
This must be scope 'provided'

Are we breaking compatibility with existing connectors?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this block from existing connectors. Should I fix it for all of them?

public static CanalSourceConfig load(Map<String, Object> map) throws IOException {
ObjectMapper mapper = new ObjectMapper();
return mapper.readValue(mapper.writeValueAsString(map), CanalSourceConfig.class);
}

public static CanalSourceConfig load(Map<String, Object> map, SourceContext context) {
return IOConfigUtils.loadWithSecrets(map, CanalSourceConfig.class, context);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about adding a method in Context?
Asking developers to use a Util class is a little API design smell to me

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that this is not the ideal solution. I wrote this PR to complete the paradigm started in the project for some but not all connectors. There are already several examples using this "flow", which is to convert the Map<String, Object> into my CustomSinkConfig class. However, I do not think we should make this mechanism easier to use. I would prefer to guide developers to the new generic solution provided in PIP 289 of using env vars instead of adding new methods in the Context.

public static AerospikeSinkConfig load(Map<String, Object> map) throws IOException {
ObjectMapper mapper = new ObjectMapper();
return mapper.readValue(mapper.writeValueAsString(map), AerospikeSinkConfig.class);
}

public static AerospikeSinkConfig load(Map<String, Object> map, SinkContext context) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should just keep the method, all other load methods should be removed, and the relevant unit tests should be fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the yamlFile load config is useless. This is not the responsibility of each connector but the responsibility of the connector framework.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should just keep the method, all other load methods should be removed, and the relevant unit tests should be fixed.

I am conflicted about removing those public methods. I agree this class is cluttered, but it introduces an unnecessary breaking change to outright remove them.

From the yamlFile load config is useless. This is not the responsibility of each connector but the responsibility of the connector framework.

What exactly is the connector framework's responsibility?

Copy link
Member

@shibd shibd Aug 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am conflicted about removing those public methods. I agree this class is cluttered, but it introduces an unnecessary breaking change to outright remove them.

For each connector, no user to use its class, right? We just need to make sure that the configuration is compatible.

I don't think removing these public methods would be a breaking change.

What exactly is the connector framework's responsibility?

The io framework is responsible for loading the configuration from the YAML and converting it to the Map. So for the connector, we only need cover to deserialize the correct configuration from the Map.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For each connector, no user to use its class, right? We just need to make sure that the configuration is compatible.

I think you're right here. Part of the issue is likely the ambiguous boundaries for users. It's probably possible to use this class as a dependency in another project, but I doubt there is really an expectation that these classes are meant to be used elsewhere. The primary risk would be for someone slightly modifying a connector.

The io framework is responsible for loading the configuration from the YAML and converting it to the Map. So for the connector, we only need cover to deserialize the correct configuration from the Map.

As long as we assume these classes are for pulsar's internal use, I agree that this is the framework's responsibility.

The last question is really whether we think this change is valuable. @eolivelli is asking in another comment on this PR whether this is the right design.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right here. Part of the issue is likely the ambiguous boundaries for users. It's probably possible to use this class as a dependency in another project, but I doubt there is really an expectation that these classes are meant to be used elsewhere. The primary risk would be for someone slightly modifying a connector.

Maybe we can discuss clearly where the boundaries we offer to users are. I don't think users should use this class as a dependency in another project. We shouldn't have uploaded the connector project to the Maven center, right?

The last question is really whether we think this change is valuable. @eolivelli is asking in another comment on this PR whether this is the right design.

Maybe the IO framework needs to deserialize the configuration object for the connector.

public interface Source<C, T> extends AutoCloseable {

    /**
     * Open connector with configuration.
     *
     * @param config initialization config
     * @param sourceContext environment where the source connector is running
     * @throws Exception IO type exceptions when opening a connector
     */
    void open(C config, SourceContext sourceContext) throws Exception;

    /**
     * Reads the next message from source.
     * If source does not have any new messages, this call should block.
     * @return next message from source.  The return result should never be null
     * @throws Exception
     */
    Record<T> read() throws Exception;
}

@github-actions
Copy link

The pr had no activity for 30 days, mark with Stale label.

@github-actions github-actions bot added the Stale label Sep 17, 2023
@Technoboy- Technoboy- modified the milestones: 3.2.0, 3.3.0 Dec 22, 2023
@coderzc coderzc modified the milestones: 3.3.0, 3.4.0 May 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/connector area/security doc-required Your PR changes impact docs and you will update later. Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants