Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KAFKA-13748: Do not include file stream connectors in Connect's CLASSPATH and plugin.path by default #11908

Conversation

kkonstantine
Copy link
Contributor

@kkonstantine kkonstantine commented Mar 16, 2022

The purpose of this patch is to stop including the non-production grade connectors that are meant to be used for demos and quick starts by default in the CLASSPATH and plugin.path of Connect deployments. The package of these connector still be shipped with the Apache Kafka distribution and will be available for explicit inclusion.

The changes have been tested through the system tests and the existing unit and integration tests

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@kkonstantine kkonstantine force-pushed the KAFKA-13748-remove-filestream-connectors-from-classpath branch 3 times, most recently from da7568e to 380c91f Compare March 18, 2022 17:40
@kkonstantine kkonstantine marked this pull request as ready for review March 21, 2022 17:29
@kkonstantine
Copy link
Contributor Author

Here's a run of all the Connect system tests with the proposed changes in the PR.
https://jenkins.confluent.io/view/All/job/system-test-kafka-branch-builder/4814/

@rhauch the 9 tests that have failed have done so after loading the classes successfully. The failures are not relevant to the changes here and the issues with the test_broker_compatibility seems to make sense to address separately.

Copy link
Member

@mimaison mimaison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kkonstantine. I've not taken a look at the Python changes yet, but I left a couple of questions.

@@ -74,6 +74,7 @@ <h4><a id="connect_running" href="#connect_running">Running Kafka Connect</a></h
<li><code>config.storage.topic</code> (default <code>connect-configs</code>) - topic to use for storing connector and task configurations; note that this should be a single partition, highly replicated, compacted topic. You may need to manually create the topic to ensure the correct configuration as auto created topics may have multiple partitions or be automatically configured for deletion rather than compaction</li>
<li><code>offset.storage.topic</code> (default <code>connect-offsets</code>) - topic to use for storing offsets; this topic should have many partitions, be replicated, and be configured for compaction</li>
<li><code>status.storage.topic</code> (default <code>connect-status</code>) - topic to use for storing statuses; this topic can have multiple partitions, and should be replicated and configured for compaction</li>
<li><code>plugin.path</code> (default <code>empty</code>) - a list of paths that contain plugins (connectors, converters, transformations). For the purpose of quick starts users will have to add the path that contains the FileStreamSourceConnector and FileStreamSinkConnector packaged in <code>connect-file-"version".jar</code>, because these connectors are not included by default to the <code>CLASSPATH</code> or the <code>plugin.path</code> of the Connect worker</li>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add the same to the Standalone section above?

Also we should update quickstart.html too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment @mimaison
I've moved this bullet further up where we include properties for both standalone and distributed.

Regarding the quickstart, the reason I didn't change this file is that it's not shown anymore. Which might be something to fix, but possibly not in this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kkonstantine.

Regarding the quickstart, it's currently not shown on the website, but this commit added it back. So once we sync the docs for 3.2, it should be back.

Copy link
Contributor

@rhauch rhauch Mar 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree that it'd be good for this PR to add this information to the Connect section of the quickstart, since that's been added back for 3.2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the link @mimaison. In that case, it's worth updating the quick start here too. Let me do that

@kkonstantine kkonstantine force-pushed the KAFKA-13748-remove-filestream-connectors-from-classpath branch from 9fbefd3 to aa443b4 Compare March 23, 2022 22:42
@kkonstantine
Copy link
Contributor Author

I rebased to get the changes from #11933 and get a green run of system tests

Copy link
Contributor

@rhauch rhauch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @kkonstantine. Looks pretty good, but I do have a few comments/suggestions below.

Also, it might be good for this PR to update the Connect section of the quickstart, since that's been added back for 3.2 (see thread above).

Comment on lines 51 to 52
<li><code>plugin.path</code> (default <code>empty</code>) - a list of paths that contain
plugins (connectors, converters, transformations). For the purpose of quick starts users will have to add the path that contains the FileStreamSourceConnector and FileStreamSinkConnector packaged in <code>connect-file-"version".jar</code>, because these connectors are not included by default to the <code>CLASSPATH</code> or the <code>plugin.path</code> of the Connect worker.</li>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous list item paragraphs are on a single line, and it's probably useful to keep that formatting style. Also, a few nits:

Suggested change
<li><code>plugin.path</code> (default <code>empty</code>) - a list of paths that contain
plugins (connectors, converters, transformations). For the purpose of quick starts users will have to add the path that contains the FileStreamSourceConnector and FileStreamSinkConnector packaged in <code>connect-file-"version".jar</code>, because these connectors are not included by default to the <code>CLASSPATH</code> or the <code>plugin.path</code> of the Connect worker.</li>
<li><code>plugin.path</code> (default <code>empty</code>) - a list of paths that contain Connect plugins (connectors, converters, transformations). Before running quick starts, users must add the relative or absolute path that contains the example FileStreamSourceConnector and FileStreamSinkConnector packaged in <code>connect-file-"version".jar</code>, because these connectors are not included by default to the <code>CLASSPATH</code> or the <code>plugin.path</code> of the Connect worker.</li>

I also have a few questions.

  1. Should the path include the JAR file or the directory that contains the JAR file?
  2. Would it be useful if we supplied an example plugin.path line to add?

I've built the distribution with this PR branch, and verified that the quickstart running the standalone worker fails if the file connectors are not in the plugin.path, but adding plugin.path=libs/connect-file-<version>.jar does work as long as the quickstart is run from the installation directory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment @rhauch
One note that I have is that I wouldn't recommend adding the relative path even if it works, because it possibly doesn't lead to robust deployments. plugin.path works when following symlinks so that can help.

plugin.path works with uber-jars but I wouldn't necessarily consider connect-file-<version>.jar an uber jar. I'd recommend the mainstream way which is to add the parent directory of any directories containing connector jars. Let me try to see how an example looks like here, because that's already just a bullet point.

tests/kafkatest/services/connect.py Show resolved Hide resolved
@kkonstantine
Copy link
Contributor Author

@mimaison @rhauch I tried to address your comments. Please take another look when you get the chance. It be great to make 3.2 with this change.

Copy link
Contributor

@rhauch rhauch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @kkonstantine. A long comment/suggestion below about how to reduce the barrier of running the quickstart.

Comment on lines 176 to 179
First, make sure to add <code class="language-bash">connect-file-{{fullDotVersion}}.jar</code> to the <code>plugin.path</code> property in the Connect worker's configuration (see <a href="#connectconfigs_plugin.path">plugin.path</a> for examples).
</p>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will users that are trying this for the first time understand what this means, or will this be enough of a barrier to cause them to abandon the quickstart? I'm not sure, but it might be worth to be more explicit here.

First, since this is a quickstart I think it's okay to duplicate some information here just so that users don't have to go elsewhere to learn what they need to do to run the quickstart.

Second, the referenced plugin.path configuration section says (emphasis mine):

The list should consist of top level directories that include any combination of ...

And, to run the quickstarts we want them to add the path to that JAR file to the plugin.path configuration property, not add the path to the lib/ directory that contains the connect-file-<version>.JAR file, as the referenced documentation suggests to do.

So, I suggest we be much more explicit here, and say something like:

Suggested change
First, make sure to add <code class="language-bash">connect-file-{{fullDotVersion}}.jar</code> to the <code>plugin.path</code> property in the Connect worker's configuration (see <a href="#connectconfigs_plugin.path">plugin.path</a> for examples).
</p>
First, instruct Connect to use the example file system connectors by editing the configuration for the Connect standalone worker to changing the following line from:
</p>
<pre class="brush: bash;">
#plugin.path=</pre>
to:
</p>
<pre class="brush: bash;">
plugin.path=lib/connect-file-{{fullDotVersion}}.jar</pre>

We could make this a little easier if we changed the connect-standalone.properties and connect-distributed.properties files to add a few lines to the existing commented out section, so that all a user has to do is uncomment one line. For example:

...
# Set to a list of filesystem paths separated by commas (,) to enable class loading isolation for plugins
# (connectors, converters, transformations). The list should consist of top level directories that include 
# any combination of: 
# a) directories immediately containing jars with plugins and their dependencies
# b) uber-jars with plugins and their dependencies
# c) directories immediately containing the package directory structure of classes of plugins and their dependencies
# Note: symlinks will be followed to discover dependencies or plugins.
# Examples: 
# plugin.path=/usr/local/share/java,/usr/local/share/kafka/plugins,/opt/connectors,
#
# To run the quickstart that just uses the example file system connectors, uncomment this line:
# plugin.path=lib/connect-file-<version>.jar,
#
#plugin.path=

These are comments, and I think we can improve the comments without a KIP, especially since with this PR we're trying to limit the impact on the quickstarts.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should add that I don't think adding the comments to the config properties is really all that necessary -- it might simplify things a bit, but at the same time we don't want users to do that in production.

Not including it in the configs just means the quickstart needs a bit more content to explain how the user should modify the config. Plus, maybe a note that says not to use these in production.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Better to add this example to the quick start itself rather than the config.

Again, my dilemma was regarding whether we should suggest a relative path + uber jar value which is not typical for production even on the quick start or direct users to the documentation of the property. I'm also a fan of quick starts that demonstrate a working example step-by-step. So, after your comment I've changed the quick start doc to contain an example. Let me know how it looks after that change when you get a chance.

docs/quickstart.html Outdated Show resolved Hide resolved
Copy link
Contributor

@rhauch rhauch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks, @kkonstantine!

@kkonstantine kkonstantine force-pushed the KAFKA-13748-remove-filestream-connectors-from-classpath branch from fdbe364 to d1f8a10 Compare March 29, 2022 22:20
@kkonstantine
Copy link
Contributor Author

Thanks @rhauch. Just rebased to get the latest changes and squash the last commit with a line edit.

Copy link
Contributor

@rhauch rhauch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @kkonstantine. Still LGMT.

Copy link
Member

@mimaison mimaison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@kkonstantine kkonstantine merged commit dd62ef2 into apache:trunk Mar 30, 2022
@kkonstantine kkonstantine deleted the KAFKA-13748-remove-filestream-connectors-from-classpath branch March 30, 2022 20:15
kkonstantine added a commit that referenced this pull request Mar 30, 2022
…PATH and plugin.path by default (#11908)

With this change we stop including the non-production grade connectors that are meant to be used for demos and quick starts by default in the CLASSPATH and plugin.path of Connect deployments. The package of these connector will still be shipped with the Apache Kafka distribution and will be available for explicit inclusion. 

The changes have been tested through the system tests and the existing unit and integration tests. 

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Randall Hauch <rhauch@gmail.com>
kkonstantine added a commit that referenced this pull request Mar 30, 2022
…PATH and plugin.path by default (#11908)

With this change we stop including the non-production grade connectors that are meant to be used for demos and quick starts by default in the CLASSPATH and plugin.path of Connect deployments. The package of these connector will still be shipped with the Apache Kafka distribution and will be available for explicit inclusion.

The changes have been tested through the system tests and the existing unit and integration tests.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Randall Hauch <rhauch@gmail.com>
kkonstantine added a commit that referenced this pull request Mar 30, 2022
…PATH and plugin.path by default (#11908)

With this change we stop including the non-production grade connectors that are meant to be used for demos and quick starts by default in the CLASSPATH and plugin.path of Connect deployments. The package of these connector will still be shipped with the Apache Kafka distribution and will be available for explicit inclusion.

The changes have been tested through the system tests and the existing unit and integration tests.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Randall Hauch <rhauch@gmail.com>
@kkonstantine
Copy link
Contributor Author

Thanks both. Merged to trunk and cherry-picked to 3.2, 3.1, 3.0. cc @cadonna @tombentley

jeffkbkim pushed a commit to confluentinc/kafka that referenced this pull request May 12, 2022
…PATH and plugin.path by default (apache#11908)

With this change we stop including the non-production grade connectors that are meant to be used for demos and quick starts by default in the CLASSPATH and plugin.path of Connect deployments. The package of these connector will still be shipped with the Apache Kafka distribution and will be available for explicit inclusion.

The changes have been tested through the system tests and the existing unit and integration tests.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Randall Hauch <rhauch@gmail.com>
jeffkbkim added a commit to confluentinc/kafka that referenced this pull request May 12, 2022
…cs-11-may-2022

* apache-kafka/3.1: (51 commits)
  MINOR: reload4j build dependency fixes (apache#12144)
  KAFKA-13255: Use config.properties.exclude when mirroring topics (apache#11401)
  KAFKA-13794: Fix comparator of inflightBatchesBySequence in TransactionsManager (round 3) (apache#12096)
  KAFKA-13794: Follow up to fix producer batch comparator (apache#12006)
  fix: make sliding window works without grace period (#kafka-13739) (apache#11980)
  3.1.1 release notes (apache#12001)
  KAFKA-13794; Fix comparator of `inflightBatchesBySequence` in `TransactionManager` (apache#11991)
  KAFKA-13782; Ensure correct partition added to txn after abort on full batch (apache#11995)
  KAFKA-13748: Do not include file stream connectors in Connect's CLASSPATH and plugin.path by default (apache#11908)
  KAFKA-13775: CVE-2020-36518 - Upgrade jackson-databind to 2.12.6.1 (apache#11962)
  ...
jeffkbkim added a commit to confluentinc/kafka that referenced this pull request May 12, 2022
…cs-12-may-2022

* apache-kafka/3.0: (14 commits)
  fix: make sliding window works without grace period (#kafka-13739) (apache#11980)
  KAFKA-13794: Follow up to fix producer batch comparator (apache#12006)
  KAFKA-13794; Fix comparator of `inflightBatchesBySequence` in `TransactionManager` (apache#11991)
  KAFKA-13748: Do not include file stream connectors in Connect's CLASSPATH and plugin.path by default (apache#11908)
  KAFKA-13418: Support key updates with TLS 1.3 (apache#11966)
  KAFKA-13770: Restore compatibility with KafkaBasedLog using older Kafka brokers (apache#11946)
  KAFKA-13761: KafkaLog4jAppender deadlocks when idempotence is enabled (apache#11939)
  KAFKA-13759: Disable idempotence by default in producers instantiated by Connect (apache#11933)
  MINOR: Fix `ConsumerConfig.ISOLATION_LEVEL_DOC` (apache#11915)
  KAFKA-13750; Client Compatability KafkaTest uses invalid idempotency configs (apache#11909)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants