chore: Introduce Kamelet input/output data types #1162

christophd · 2022-11-16T22:02:50Z

Introduce data type converters
Add data type processor to auto convert exchange message from/to given data type
Let user choose which data type to use (via Kamelet property)
Add data type registry and annotation based loader to find data type implementations by component scheme and name

Relates to CAMEL-18698 and apache/camel-k#1980

This PR introduces input/output data types on Kamelets. Each Kamelet is able to use a specific data type processor and a registry to resolve data types and its conversion logic.

apiVersion: camel.apache.org/v1alpha1
kind: Kamelet
metadata:
  name: aws-s3-source
  labels:
    camel.apache.org/kamelet.type: "source"
spec:
  template:
    beans:
    - name: dataTypeRegistry
       type: "#class:org.apache.camel.kamelets.utils.format.DefaultDataTypeRegistry"
    - name: dataTypeProcessor
      type: "#class:org.apache.camel.kamelets.utils.format.DataTypeProcessor"
      property:
        - key: format
          value: '{{outputFormat}}'
[...]
      steps:
      - process:
          ref: "{{dataTypeProcessor}}"
      - to: "kamelet:sink"

The user chooses the data type in the Kamelet binding by its format name and sets the property outputFormat on the binding source.

apiVersion: camel.apache.org/v1alpha1
kind: KameletBinding
metadata:
  name: aws-s3-uri-binding
spec:
  source:
    ref:
      kind: Kamelet
      apiVersion: camel.apache.org/v1alpha1
      name: aws-s3-source
    properties:
      bucketNameOrArn: myBucket
      accessKey: ...
      secretKey: ...
      region: …
      outputFormat: json
  sink:
    uri: log:info

The json data type is provided by the aws-s3 component and adds automatic message body conversion logic. The data type and its conversion logic is defined via annotations and a registry automatically performs a lookup based on the component scheme and the data format name.

As an example the PR introduces some standard data type converters as well as output data types for the aws-s3-source and input data types for the aws-ddb-sink Kamelet.

lburgazzoli

Added some initial comments

...main/java/org/apache/camel/kamelets/utils/format/converter/aws2/s3/AWS2S3JsonOutputType.java

lburgazzoli · 2022-11-16T22:31:14Z

...c/main/java/org/apache/camel/kamelets/utils/format/converter/standard/JsonModelDataType.java

+        }
+
+        String type = exchange.getProperty(JSON_DATA_TYPE_KEY, String.class);
+        try (JacksonDataFormat dataFormat = new JacksonDataFormat(new ObjectMapper(), Class.forName(type))) {


Creating an ObjectMapper per invocation seems quite expensive.

Ideally an object mapper from the registry should be used if present (i.e. you may want to tweak how the object mapper works), if not a local but cached one can be used.

Yes at first we can create a shared object mapper on the class level, not per invoked exchange.

You should use Camels' ClassResolver API to load classes, not Class.forName

lburgazzoli · 2022-11-16T22:42:47Z

kamelets/aws-ddb-sink.kamelet.yaml

@@ -107,17 +113,24 @@ spec:
  - "camel:aws2-ddb"
  - "camel:kamelet"
  template:
+    beans:
+    - name: dataTypeRegistry


Not sure if this should be there as there will be an instance of the registry for each kamelet instantiation

Good point. How can I make sure to add this to the Camel registry as a singleton service?

it must be done outside kamelets but eventually you can explore about not needing a registry at all ad lazy loading converters.

we can add this to the Camel registry with quarkus.camel.service.registry.include-patterns property set on the camel-quarkus extension. But this needs to be done in Camel L when generating the Maven project that builds and runs the integration. I will add a new issue on Camel K for that. Once we have this we can remove the local beans from the Kamelets.

see apache/camel-k#3845

lburgazzoli · 2022-11-16T22:43:34Z

kamelets/aws-s3-source.kamelet.yaml

@@ -107,13 +107,28 @@ spec:
        description: The number of milliseconds before the next poll of the selected bucket.
        type: integer
        default: 500
+      outputFormat:


the list of possible types must be expressed through an enum

why limiting this to a predefined enum of formats? The user may provide a custom format, too.

the problem is: how the user/tooling know what converters are available and what are supported ?

its in the Kamelet specification. The default supported output/input types are specified and described in the Kamelet spec. Like this:

apiVersion: camel.apache.org/v1alpha1 kind: Kamelet metadata: name: aws-s3-source labels: camel.apache.org/kamelet.type: "source" spec: definition: title: "AWS S3 Source" type: object properties: bucketNameOrArn: title: Bucket Name description: The S3 Bucket name or ARN type: string ... output: default: binary types: binary: mediaType: application/octet-stream json: mediaType: application/json dependencies: - "camel:jackson" schema: type: object properties: key: title: S3 key description: The S3 key identifier type: string fileContent: title: File content description: The file content as String type: string

This is the place where each data type is able to specify a schema and additional dependencies, too.

yes but if we expect the user to provide custom converter, then the only way to validate it is to wait for runtime failures. I'm thinking about how tooling would provide support/validation for that.

lburgazzoli · 2022-11-16T22:44:02Z

kamelets/aws-ddb-sink.kamelet.yaml

@@ -97,6 +97,12 @@ spec:
        x-descriptors:
          - 'urn:alm:descriptor:com.tectonic.ui:checkbox'
        default: false
+      inputFormat:


the list of possible types must be expressed through an enum

same as for output formats. I think the user may provide a custom format, too.

yes, but at least as description or as information, we need a list of default types.

@oscerd valid concern and I think we can add a list of default types. this is to be done in a separate PR though once Camel K Kamelet CRDs are updated with additional data type fields.

...l-kamelets-utils/src/main/java/org/apache/camel/kamelets/utils/format/DataTypeProcessor.java

lburgazzoli · 2022-11-16T23:53:40Z

...lets-utils/src/main/java/org/apache/camel/kamelets/utils/format/DefaultDataTypeRegistry.java

+ *
+ * The registry is able to retrieve converters for a given data type based on the component scheme and the given data type name.
+ */
+public class DefaultDataTypeRegistry extends ServiceSupport implements DataTypeRegistry, CamelContextAware {


I wonder if we need a registry or we could lazy load the converter when the processor is created and initialized using the standard camel factory finder mechanism which would work out of the box also on quarkus

Cool! I will take a look on the factory finder mechanism. The idea with the registry was to have this ready for a broader scope (not only for Kamelets but in Camel core). For the Kamelet use case in particular this might be a heavyweight solution, agreed.

I have added a lazy loading of component converters via resource path lookup using the factory finder mechanism

we need to enable the factory finder in camel-quarkus for this with quarkus.camel.service.discovery.include-patterns build time property. This needs to be done in Camel K so I will add a new issue to track this one in Camel K.

Once we have this we can disable classpath scan here in Kamelets and use lazy loading via factory finder.

see apache/camel-k#3844

Another option would be to create a quarkus extension which would register the service and i.e. also the factory finder without the need to have properties

oscerd

I just have some worries about leaving the types without a default types list.

oscerd · 2022-11-17T12:54:31Z

kamelets/aws-ddb-sink.kamelet.yaml

@@ -97,6 +97,12 @@ spec:
        x-descriptors:
          - 'urn:alm:descriptor:com.tectonic.ui:checkbox'
        default: false
+      inputFormat:


yes, but at least as description or as information, we need a list of default types.

claudio4j · 2022-11-18T14:50:07Z

...ets-utils/src/main/java/org/apache/camel/kamelets/utils/format/AnnotationDataTypeLoader.java

+/**
+ * Data type loader scans packages for {@link DataTypeConverter} classes annotated with {@link DataType} annotation.
+ */
+public class AnnotationDataTypeLoader implements DataTypeLoader, CamelContextAware {


I wonder, if this annotation and loader is going to work when running in native mode ?

good point. not sure either about this. we will switch to factory finder mechanism once this is enabled for the DataTypeConverter in camel-quarkus and this mechanism will work in native mode, too.

christophd · 2022-11-22T14:57:26Z

@oscerd I was adding some more commits while iterating and improving the PR. Do you want me to squash the changes to some essential commits first?

christophd · 2022-11-22T14:57:57Z

also I see I need to rebase

oscerd · 2022-11-22T14:59:13Z

That's fine, no need for squash at least from my PoV

oscerd · 2022-11-22T15:27:59Z

Let me know once is ready and I'll review and merge. We need also to identify the default type for each of Kamelets we're going to modify.

oscerd · 2022-11-23T14:06:06Z

Camel-k-runtime seems to be ok while upgrading to the required bits for Camel 3.19.0 and Camel Quarkus 2.14.0. I would like to include this PR in 0.10.0 and release by the end of this week or beginning of the next. So we could have a first camel k release (1.11.0) upstream with this feature.

oscerd · 2022-11-24T09:10:00Z

Can you rebase @christophd ? Thanks.

christophd · 2022-11-24T16:07:38Z

@oscerd rebase is done. Tomorrow I can have a look at the YAKS tests if you have some more time with the release.

oscerd · 2022-11-24T16:08:42Z

No rush. When you think the first iteration is done, I'll review and merge

davsclaus · 2022-11-24T16:45:52Z

Please make sure that this works in standalone Camel as well, eg such as try via camel-jbang.
Kamelets are universal building blocks and MUST work on camel-spring-boot, camel-kafka-connector, camel-quarkus (without camel-k), and of course camel-k as well.

Previously Kamelets have been kept more basic where the kamelet-util JAR was what the name said - utility. With this kind of work, then its no longer util, but a core and api part of kamelets, which IMHO later should be moved into their own modules.

oscerd · 2022-11-24T16:50:59Z

We could create a kamelet-api module. I'll try the PR on jbang too.

christophd · 2022-11-24T18:53:14Z

@davsclaus absolutely! IMO this should go to Camel core in the long term. That's why I was adding all those spi and api parts as well as the registry with factory finder lookup as part of this solution.

Setting the data content type breaks the Camel Knative producer

- Avoid having the additional dependency in favor of using plain String constants

Also use Camel ClassResolver API to resolve model class

- Align with CloudEvents spec in creating proper event type and source values - Enable Knative YAKS tests

- Remove JacksonDataFormat in favor of using simple ObjectMapper instance - Reuse ObjectMapper instance for all exchanges processed by the data type

- AWS S3 source Kamelet - AWS DDB sink Kamelet - JsonToDdbModelConverter utility and unit tests

christophd · 2022-12-01T12:59:16Z

@oscerd @lburgazzoli @davsclaus look at this all checks have passed 😄

So what we have right now is following:

reverted AWS S3 source and AWS DDB sink Kamelet to not use data types and they work as it has been before
introduced experimental Kamelets (in experimental top level folder) that make use of data types utils
add some YAKS test coverage for experimental Kamelets that are also part of the GitHub actions CI jobs

Only question is if we should include the experimental Kamelets into the catalog right now or leave them in experimental folder for now. I am happy with both ways but I guess it will be easier for people to give it a try when they are part of the catalog.

oscerd · 2022-12-01T13:01:18Z

Moving them to the official catalog seems to be better maybe something like aws-s3-source-exp.kamelet.yaml instead of .exp.kamelet.yaml. So we could give them to the users.

oscerd · 2022-12-01T13:01:33Z

and have early feedback

oscerd · 2022-12-01T14:02:41Z

So I would prefer to have them in the catalog from the 0.10.0 release.

christophd · 2022-12-01T18:24:59Z

@oscerd the experimental Kamelets are now included in the catalog. Because of validation scripts I ended up with following naming:

aws-s3-experimental-source
aws-ddb-experimental-sink

All checks are green! From my side this is good to be merged

davsclaus · 2022-12-01T20:00:52Z

LGTM

Thanks for the effort @christophd and sorry for the bike-shedding but thats the nature of humans, github and big PRs ;)

oscerd · 2022-12-02T08:08:46Z

Thanks. Nice addition!

lburgazzoli reviewed Nov 16, 2022

View reviewed changes

christophd force-pushed the issue/ENTESB-19587/data-types branch from a9994f4 to 7dbaaa4 Compare November 17, 2022 08:17

oscerd requested changes Nov 17, 2022

View reviewed changes

christophd force-pushed the issue/ENTESB-19587/data-types branch 8 times, most recently from 69e1581 to 703c258 Compare November 18, 2022 08:31

claudio4j reviewed Nov 18, 2022

View reviewed changes

christophd force-pushed the issue/ENTESB-19587/data-types branch from 37cb07b to a441625 Compare November 22, 2022 11:36

oscerd added this to the 0.10.0 milestone Nov 22, 2022

christophd force-pushed the issue/ENTESB-19587/data-types branch from 93f5e80 to 856a8e2 Compare November 24, 2022 15:52

This was referenced Nov 24, 2022

Add data type converter factory finder discovery in camel-quarkus apache/camel-k#3844

Closed

Add DataTypeRegistry as bean in Camel context apache/camel-k#3845

Closed

christophd force-pushed the issue/ENTESB-19587/data-types branch from 856a8e2 to 123f978 Compare November 25, 2022 08:56

christophd added 14 commits December 1, 2022 13:06

Fix cloud event type and do not set data content type

495ddf2

Setting the data content type breaks the Camel Knative producer

Enhance data type AWS S3 YAKS tests

4e28c94

Add option to disable data type registry classpath scan

14cd806

Set proper media types

b67651e

Fix rest-openapi-sink YAKS test

0f2b888

Remove camel-cloudevents dependency

26b6166

- Avoid having the additional dependency in favor of using plain String constants

Move AWS S3 binary output type to generic level

0f99d4b

Do cache ObjectMapper instance in JsonModelDatType converter

4fd0681

Also use Camel ClassResolver API to resolve model class

Enhance documentation on data type SPI

29e2cc9

Improve CloudEvents output produced by AWS S3 source

4cc1de4

- Align with CloudEvents spec in creating proper event type and source values - Enable Knative YAKS tests

Simplify Json model data type

dd0c65e

- Remove JacksonDataFormat in favor of using simple ObjectMapper instance - Reuse ObjectMapper instance for all exchanges processed by the data type

Fix Knative YAKS tests

9dd3251

Revert existing Kamelets to not use data type converter

11a8450

- AWS S3 source Kamelet - AWS DDB sink Kamelet - JsonToDdbModelConverter utility and unit tests

Add experimental Kamelets using data type converter API

c8e3f16

christophd force-pushed the issue/ENTESB-19587/data-types branch from 993229b to c8e3f16 Compare December 1, 2022 12:06

oscerd marked this pull request as ready for review December 1, 2022 12:16

christophd force-pushed the issue/ENTESB-19587/data-types branch 2 times, most recently from bd4b94a to a3cf8d6 Compare December 1, 2022 16:19

Include experimental Kamelets in the catalog

df62f1a

christophd force-pushed the issue/ENTESB-19587/data-types branch from a3cf8d6 to df62f1a Compare December 1, 2022 16:43

oscerd approved these changes Dec 1, 2022

View reviewed changes

oscerd merged commit b1dec7f into apache:main Dec 2, 2022

lburgazzoli mentioned this pull request Dec 2, 2022

Kamelet CloudEvents set source attribute to the string "source" apache/camel-k#3668

Open

chore: Introduce Kamelet input/output data types #1162

chore: Introduce Kamelet input/output data types #1162

Conversation

christophd commented Nov 16, 2022

lburgazzoli left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

christophd Nov 24, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oscerd left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

christophd Nov 24, 2022 • edited Loading

Choose a reason for hiding this comment

christophd commented Nov 22, 2022 • edited Loading

christophd commented Nov 22, 2022

oscerd commented Nov 22, 2022

oscerd commented Nov 22, 2022

oscerd commented Nov 23, 2022

oscerd commented Nov 24, 2022

christophd commented Nov 24, 2022

oscerd commented Nov 24, 2022

davsclaus commented Nov 24, 2022

oscerd commented Nov 24, 2022

christophd commented Nov 24, 2022

christophd commented Dec 1, 2022

oscerd commented Dec 1, 2022

oscerd commented Dec 1, 2022

oscerd commented Dec 1, 2022

christophd commented Dec 1, 2022

davsclaus commented Dec 1, 2022

oscerd commented Dec 2, 2022

christophd Nov 24, 2022 •

edited

Loading

christophd Nov 24, 2022 •

edited

Loading

christophd commented Nov 22, 2022 •

edited

Loading