-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove the unwanted dependencies in the pulsar function's instance jar and make SchemaInfo an interface #10878
Remove the unwanted dependencies in the pulsar function's instance jar and make SchemaInfo an interface #10878
Conversation
@jerrypeng FYI. the CI of license check failed as following output:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a braking change for the new feature.
Please do not change this behaviour now that we are close to 2.8.0 release.
I also believe that the NAR files should not contain the Pulsar client classes, because this would be to unpredictable behaviour.
Like in a Java EE container, the Container is the owner of the implementation of the JavaEE runtime, and the Web Applications cannot override system libraries
log.info("value schema type {}", kvSchema.getValueSchema()); | ||
log.info("key encoding {}", kvSchema.getKeyValueEncodingType()); | ||
// TODO need to expose KeyValueSchema was an interface in pulsar-client-api | ||
// KeyValueSchema kvSchema = (KeyValueSchema) record.getSchema(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a breaking change for the new feature.
Please do not change this behaviour now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not a breaking change because KeyValueSchema
is not part of public API. Only symbols that are in pulsar-client-api
are. I think the test here should use Schema<KeyValue>
instead.
The NAR does not contain the client class. The function runtime does. The "instance" jar is not supposed to have the client implementation because it resides in the same class path of the function. Apart from the naming |
@merlimat the fact that the Integration test was broken is the real problem to me. Do you suggest to bundle the pulsar-client with all of the dependencies in the NAR ? |
In the Sink you won't have access to the pulsar-client implementation, only the API. The implementation is loaded in a different class loader.
I'm not sure where I would have suggested that.. :) The pulsar-client is included in the worker-runtime. The NAR (or function jar will not have it (or they can add it on their own if they want, and it wouldn't conflict). The problem here is that the pulsar-client was being included in the java-instance.jar which is what gets added to the function/source/sink class loader. |
To reiterate: that class is not part of the API. |
@merlimat the KeyValueSchema class is an important case. something like It was something that I wanted to bring up before, but I did not chime in about the lack of such API I can volunteer to add such class and retrofit the integration tests |
That's fine (although I don't understand exactly why Having said that, I think it's a completely different discussion from this PR. The expectation for functions and connectors is to have access to |
The problem is that you do not have access to "getKeySchema" and "getValueSchema" and so in a Sink you cannot know the Schema for Key and for the Value |
@jerrypeng this is the patch Once we make KeyValueSchema a public API interface I am totally +1 with this change |
Is it better to add such methods in the Schema interface directly:
So that we don't need to convert to KeyValueSchema interface and then to get the key schema and the value schema |
@codelipenghui we can follow up on #10888 You can already use getSchemaType() in order to check if this is a KeyValueSchema Adding those methods to the Schema API may not be a good idea for the API (thinking to the future), because we would add some new methods to the top level API but those methods will be useful only to the KeyValueSchema users. Also it may encourage developers to think that we can nest KeyValueSchema, but the KeyValueSchema is not strictly a simple KeyValue Pair it is something that is strictly related to message encoding. |
@eolivelli makes sense to me. |
@jerrypeng we merged #10888 thank you for taking care of this topic! |
@eolivelli that's because the pulsar-impl willl be using its own dependencies, which can conflict with the function code |
@merlimat that was I am afraid of. |
if this stuff worked on 2.7.2....and now it does not work anymore.. what did we change in 2.8 release line ? |
pulsar-io/kafka/src/main/java/org/apache/pulsar/io/kafka/AvroSchemaCache.java
Show resolved
Hide resolved
@jerrypeng @merlimat here you can find a good way to see the logs of the failing integration tests |
@jerrypeng @merlimat @eolivelli
Logs of the Function worker
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if I understand the intent of this work and I am supportive of it, I would like to wear a different hat for a moment, thinking about the imminent 2.8.0 release.
This patch is becoming bigger and bigger.
I am not sure that is it safe to modify so many things now that we are doing the release.
I am sure that many people (myself for sure) have been testing their Functions and Pulsar IO connectors for weeks or months against current 2.8.0 codebase.
If we change so many things now it will be probably painful and we can destabilise the system.
I know that if we keep the current status we cannot change it in some 2.8.1 release, but this clean up will be deferred to 2.9.0.
|
… key value schema.
@eolivelli I think we are on the right way to fix the function regression. with my last commit, I think the tests should be passed. |
…r and make SchemaInfo an interface (#10878) ### Motivation The java-instance.jar generated by the pulsar-functions-runtime-all module should only contain interfaces that Pulsar Function's framework uses to interact with user code. The module should on have the following dependencies 1. pulsar-io-core 2. pulsar-functions-api 3. pulsar-client-api 4. slf4j-api 5. log4j-slf4j-impl 6. log4j-api 7. log4j-core *Explain here the context, and why you're making that change. What is the problem you're trying to solve.* ### Modifications Change dep pulsar-client-original to pulsar-client-api Slight changes in the top level pom for what is included in all sub-modules so that additional deps don't land into java-instance.jar There is also a fix for an issue introduced by #9673. The thread context class loader was set incorrectly in ThreadRuntime. ### Future improvements 1. We should also add a test in the future to make sure external libraries don't get add accidentally this module and java-instance.jar 2. Rename the module "pulsar-functions-runtime-all" to something that describes its function better. The current name can be confusing (cherry picked from commit d81b5f8)
Sorry for late reply. Lgtm |
Nice finally got this merged! Good work everyone! |
There's one more follow up in #10918 . The dependencies in the module used for integration tests had invalid dependencies. It was changed by this PR, but there was some need to revisit that. Please check if it's needed for branch-2.8 . |
…r and make SchemaInfo an interface (apache#10878) ### Motivation The java-instance.jar generated by the pulsar-functions-runtime-all module should only contain interfaces that Pulsar Function's framework uses to interact with user code. The module should on have the following dependencies 1. pulsar-io-core 2. pulsar-functions-api 3. pulsar-client-api 4. slf4j-api 5. log4j-slf4j-impl 6. log4j-api 7. log4j-core *Explain here the context, and why you're making that change. What is the problem you're trying to solve.* ### Modifications Change dep pulsar-client-original to pulsar-client-api Slight changes in the top level pom for what is included in all sub-modules so that additional deps don't land into java-instance.jar There is also a fix for an issue introduced by apache#9673. The thread context class loader was set incorrectly in ThreadRuntime. ### Future improvements 1. We should also add a test in the future to make sure external libraries don't get add accidentally this module and java-instance.jar 2. Rename the module "pulsar-functions-runtime-all" to something that describes its function better. The current name can be confusing (cherry picked from commit d81b5f8) (cherry picked from commit 89ac98e)
- breaks since apache/pulsar#10878 changes
…r and make SchemaInfo an interface (apache#10878) ### Motivation The java-instance.jar generated by the pulsar-functions-runtime-all module should only contain interfaces that Pulsar Function's framework uses to interact with user code. The module should on have the following dependencies 1. pulsar-io-core 2. pulsar-functions-api 3. pulsar-client-api 4. slf4j-api 5. log4j-slf4j-impl 6. log4j-api 7. log4j-core *Explain here the context, and why you're making that change. What is the problem you're trying to solve.* ### Modifications Change dep pulsar-client-original to pulsar-client-api Slight changes in the top level pom for what is included in all sub-modules so that additional deps don't land into java-instance.jar There is also a fix for an issue introduced by apache#9673. The thread context class loader was set incorrectly in ThreadRuntime. ### Future improvements 1. We should also add a test in the future to make sure external libraries don't get add accidentally this module and java-instance.jar 2. Rename the module "pulsar-functions-runtime-all" to something that describes its function better. The current name can be confusing
…r and make SchemaInfo an interface (apache#10878) ### Motivation The java-instance.jar generated by the pulsar-functions-runtime-all module should only contain interfaces that Pulsar Function's framework uses to interact with user code. The module should on have the following dependencies 1. pulsar-io-core 2. pulsar-functions-api 3. pulsar-client-api 4. slf4j-api 5. log4j-slf4j-impl 6. log4j-api 7. log4j-core *Explain here the context, and why you're making that change. What is the problem you're trying to solve.* ### Modifications Change dep pulsar-client-original to pulsar-client-api Slight changes in the top level pom for what is included in all sub-modules so that additional deps don't land into java-instance.jar There is also a fix for an issue introduced by apache#9673. The thread context class loader was set incorrectly in ThreadRuntime. ### Future improvements 1. We should also add a test in the future to make sure external libraries don't get add accidentally this module and java-instance.jar 2. Rename the module "pulsar-functions-runtime-all" to something that describes its function better. The current name can be confusing
Motivation
The java-instance.jar generated by the pulsar-functions-runtime-all module should only contain interfaces that Pulsar Function's framework uses to interact with user code. The module should on have the following dependencies
1. pulsar-io-core
2. pulsar-functions-api
3. pulsar-client-api
4. slf4j-api
5. log4j-slf4j-impl
6. log4j-api
7. log4j-core
Explain here the context, and why you're making that change. What is the problem you're trying to solve.
Modifications
Change dep pulsar-client-original to pulsar-client-api
Slight changes in the top level pom for what is included in all sub-modules so that additional deps don't land into java-instance.jar
There is also a fix for an issue introduced by #9673. The thread context class loader was set incorrectly in ThreadRuntime.
Future improvements
We should also add a test in the future to make sure external libraries don't get add accidentally this module and java-instance.jar
Rename the module "pulsar-functions-runtime-all" to something that describes its function better. The current name can be confusing