Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pulsar-io] pass the pulsar service url to debezium source for history database #11251

Merged

Conversation

nlu90
Copy link
Member

@nlu90 nlu90 commented Jul 7, 2021

Motivation

The Debezium requires pulsar a service URL for history database usage.

In #11056 , the service.url field from PulsarKafkaWorkerConfig is no longer available. And the value is also deleted from multiple yaml config files in this commit. This causes the integration test for Debezium connector to fail.

Based on the Debezium paradigm, all configurations should be passed as strings. There's no easy way to inject a PulsarClient via configuration.

We need to ask user to provide the pulsar url explicitly and probably auth info also.

Modifications

  1. Make the database.history.pulsar.service.url field required
  2. Add the config value back to example yaml files
  3. Update the integration test config

Verifying this change

  • Make sure that the change passes the CI checks.

Copy link
Contributor

@dlg99 dlg99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM and it will fix the tests/allow connector to run.
This returns us to original problem of having to add extra parameters to configure the pulsar client, this can be address later IMO.

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM is this unblocks Debezium tests

IIUC we wanted to use the client in order to access auth and TLS.

With this change it is not possible to do it.

What is your plan?

@@ -50,7 +50,7 @@ public DebeziumMongoDbSourceTester(PulsarCluster cluster) {
sourceConfig.put("mongodb.password", "dbz");
sourceConfig.put("mongodb.task.id","1");
sourceConfig.put("database.whitelist", "inventory");
sourceConfig.put("database.history.pulsar.service.url", pulsarServiceUrl);
sourceConfig.put("history.database.pulsar.service.url", pulsarServiceUrl);
Copy link
Contributor

@dlg99 dlg99 Jul 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought it should start with DatabaseHistory.CONFIGURATION_FIELD_PREFIX_STRING which is CONFIGURATION_FIELD_PREFIX_STRING = "database.history."
debezium does

Configuration dbHistoryConfig = config.subset(DatabaseHistory.CONFIGURATION_FIELD_PREFIX_STRING, false)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I just updated them.

@nlu90
Copy link
Member Author

nlu90 commented Jul 8, 2021

LGTM is this unblocks Debezium tests

IIUC we wanted to use the client in order to access auth and TLS.

With this change it is not possible to do it.

What is your plan?

Based on Debezium's current paradigm, user can only pass string parameters in the configuration. For our case, it'll be better if they allow passing serialized objects. This will be the most secure way but needs help from Debezium community.

A second way would be we can access the PulsarClient via some static method in PulsarDatabaseHistory. This is a little bit hacky but works quickly if it's possible.

One last way is we add history database specific pulsar auth parameters in PulsarDatabaseHistory. This makes it follow the Debezium paradigm but not secure with pulsar.

@nlu90 nlu90 marked this pull request as ready for review July 9, 2021 00:09
@dlg99
Copy link
Contributor

dlg99 commented Jul 9, 2021

@nlu90

One last way is we add history database specific pulsar auth parameters

adding all parameters is tedious (the way config definition works there needs definition of each parameter with bunch of metadata around).
What if we could define one parameter, String, for serialized config json and then do something like

String clientConfJson = pulsarClient.getConfig();

to pass that to the PulsarDatabaseHistory that can use ClientBuilder.loadConf(..)

@nlu90
Copy link
Member Author

nlu90 commented Jul 9, 2021

@nlu90

One last way is we add history database specific pulsar auth parameters

adding all parameters is tedious (the way config definition works there needs definition of each parameter with bunch of metadata around).
What if we could define one parameter, String, for serialized config json and then do something like

String clientConfJson = pulsarClient.getConfig();

to pass that to the PulsarDatabaseHistory that can use ClientBuilder.loadConf(..)

@dlg99 This sounds an interesting idea!
I just checked the PulsarClient API, it doesn't support getConfig currently. We may want to discuss with the community about adding this API first.

@sijie sijie added this to the 2.9.0 milestone Jul 9, 2021
Copy link
Member

@sijie sijie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nlu90 what is the plan for supporting authentication for the Pulsar cluster?

@sijie
Copy link
Member

sijie commented Jul 9, 2021

A second way would be we can access the PulsarClient via some static method in PulsarDatabaseHistory. This is a little bit hacky but works quickly if it's possible.

@nlu90 I felt this is the most secure solution because the connector users don't need to provide the auth information through function config. The current PR still requires users to submit auth information via function config. Although I admit the code will look a bit hacky, it is still manageable.

@eolivelli
Copy link
Contributor

if this patch is unblocking @dlg99 's work to add back support for running Integration Tests of Pulsar IO
I would prefer to commit the patch in the current form and then follow up.

CI is in a very bad state currently.

cc @sijie @rdhabalia @merlimat @codelipenghui

@sijie
Copy link
Member

sijie commented Jul 13, 2021

@eolivelli How is that related to CI in a very bad state since the integration tests for debezium is not enabled?

I would have a concern about this patch as it doesn't support accessing an authenticated Pulsar cluster yet. Even we merge this pull request for now, the connector can't be used in an authenticated cluster. I think we need to spend more time figuring out how to fix this properly. All the changes related are done in the master which is the development branch for 2.9 (which is targeted to release in September based on a time-based release schedule). I don't see why we need to rush on merging this. We need to fix this properly in order to make the debezium connector work in an authenticated cluster.

@sijie
Copy link
Member

sijie commented Jul 13, 2021

@nlu90 an alternative approach is to expose the client builder via the context. So we can just serialize the client builder to a base64 encoded string and pass it to debezium config. Hence the pulsar database history can get the client builder and create a Pulsar client from that. I created a PR (#11293) to demonstrate the idea. Let me know what you think.

@eolivelli
Copy link
Contributor

@sijie

How is that related to CI in a very bad state since the integration tests for debezium is not enabled?
Sorry, I wasn't clear.

The point is that currently Debezium related tests are not running on CI, and I meant that if CI is not running the tests this is kind of a "bad state" :-)

We cannot enable them (#11154) because currently they will fail due to the problem that this PR is trying to address.

I would feel more confident with this change and any other proposal if we could revert #11056, commit #11154 and then be back in resolving the Client Auth problem.

So my proposal is:

Does it sound like a good plan ?
because without #11154 we cannot validate on CI any patch related to Debezium

@dlg99
Copy link
Contributor

dlg99 commented Jul 13, 2021

@sijie I second @eolivelli here.
We should either merge this change + fix for the CI or revert the breaking change, merge the fix for the CI, and do a follow up for the authentication after that.
At least the follow up change will get tested this time.
I don't find attractive idea of piling up more changes on top of this commit because it:

  • has unpredictable ETA. Whatever way you decide to go with authentication will require separate discussion/CR, maybe even PIP.
  • is blocking other changes (i.e. debezium upgrade)
  • is leaving the debezium connectors in broken state (affects people testing/experimenting with pulsar on 2.9-snapshot)
  • assumes that more changes added with the integration tests not enabled on the CI

We are dealing with a case of broken functionality in the build. I think the first priority should be fixing the regression quickly and new features should be second.

@codelipenghui codelipenghui merged commit d2d192b into apache:master Jul 15, 2021
@dlg99 dlg99 mentioned this pull request Jul 15, 2021
1 task
codelipenghui pushed a commit that referenced this pull request Aug 20, 2021
dlg99 pushed a commit to dlg99/pulsar that referenced this pull request Sep 24, 2021
…y database (apache#11251)

### Motivation

The Debezium requires pulsar a service URL for history database usage. 

In apache#11056 , the `service.url` field from `PulsarKafkaWorkerConfig` is no longer available. And the value is also deleted from multiple yaml config files in this [commit](apache@3ce24c9). This causes the integration test for Debezium connector to fail.

Based on the Debezium [paradigm](https://debezium.io/documentation/reference/1.5/connectors/mysql.html#debezium-mysql-connector-database-history-configuration-properties), all configurations should be passed as strings. There's no easy way to inject a PulsarClient via configuration.

We need to ask user to provide the pulsar url explicitly and probably auth info also.


### Modifications

1. Make the `database.history.pulsar.service.url` field required
2. Add the config value back to example yaml files
3. Update the integration test config

### Verifying this change

- [ ] Make sure that the change passes the CI checks.
eolivelli pushed a commit to datastax/pulsar that referenced this pull request Sep 24, 2021
…y database (apache#11251)

### Motivation

The Debezium requires pulsar a service URL for history database usage. 

In apache#11056 , the `service.url` field from `PulsarKafkaWorkerConfig` is no longer available. And the value is also deleted from multiple yaml config files in this [commit](apache@3ce24c9). This causes the integration test for Debezium connector to fail.

Based on the Debezium [paradigm](https://debezium.io/documentation/reference/1.5/connectors/mysql.html#debezium-mysql-connector-database-history-configuration-properties), all configurations should be passed as strings. There's no easy way to inject a PulsarClient via configuration.

We need to ask user to provide the pulsar url explicitly and probably auth info also.


### Modifications

1. Make the `database.history.pulsar.service.url` field required
2. Add the config value back to example yaml files
3. Update the integration test config

### Verifying this change

- [ ] Make sure that the change passes the CI checks.
bharanic-dev pushed a commit to bharanic-dev/pulsar that referenced this pull request Mar 18, 2022
…y database (apache#11251)

### Motivation

The Debezium requires pulsar a service URL for history database usage. 

In apache#11056 , the `service.url` field from `PulsarKafkaWorkerConfig` is no longer available. And the value is also deleted from multiple yaml config files in this [commit](apache@3ce24c9). This causes the integration test for Debezium connector to fail.

Based on the Debezium [paradigm](https://debezium.io/documentation/reference/1.5/connectors/mysql.html#debezium-mysql-connector-database-history-configuration-properties), all configurations should be passed as strings. There's no easy way to inject a PulsarClient via configuration.

We need to ask user to provide the pulsar url explicitly and probably auth info also.


### Modifications

1. Make the `database.history.pulsar.service.url` field required
2. Add the config value back to example yaml files
3. Update the integration test config

### Verifying this change

- [ ] Make sure that the change passes the CI checks.
bharanic-dev pushed a commit to bharanic-dev/pulsar that referenced this pull request Mar 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants