Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Introduce Kamelet input/output data types #1162

Merged
merged 28 commits into from
Dec 2, 2022

Conversation

christophd
Copy link
Contributor

  • Introduce data type converters
  • Add data type processor to auto convert exchange message from/to given data type
  • Let user choose which data type to use (via Kamelet property)
  • Add data type registry and annotation based loader to find data type implementations by component scheme and name

Relates to CAMEL-18698 and apache/camel-k#1980

This PR introduces input/output data types on Kamelets. Each Kamelet is able to use a specific data type processor and a registry to resolve data types and its conversion logic.

apiVersion: camel.apache.org/v1alpha1
kind: Kamelet
metadata:
  name: aws-s3-source
  labels:
    camel.apache.org/kamelet.type: "source"
spec:
  template:
    beans:
    - name: dataTypeRegistry
       type: "#class:org.apache.camel.kamelets.utils.format.DefaultDataTypeRegistry"
    - name: dataTypeProcessor
      type: "#class:org.apache.camel.kamelets.utils.format.DataTypeProcessor"
      property:
        - key: format
          value: '{{outputFormat}}'
[...]
      steps:
      - process:
          ref: "{{dataTypeProcessor}}"
      - to: "kamelet:sink"

The user chooses the data type in the Kamelet binding by its format name and sets the property outputFormat on the binding source.

apiVersion: camel.apache.org/v1alpha1
kind: KameletBinding
metadata:
  name: aws-s3-uri-binding
spec:
  source:
    ref:
      kind: Kamelet
      apiVersion: camel.apache.org/v1alpha1
      name: aws-s3-source
    properties:
      bucketNameOrArn: myBucket
      accessKey: ...
      secretKey: ...
      region: 
      outputFormat: json
  sink:
    uri: log:info

The json data type is provided by the aws-s3 component and adds automatic message body conversion logic. The data type and its conversion logic is defined via annotations and a registry automatically performs a lookup based on the component scheme and the data format name.

As an example the PR introduces some standard data type converters as well as output data types for the aws-s3-source and input data types for the aws-ddb-sink Kamelet.

Copy link
Contributor

@lburgazzoli lburgazzoli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some initial comments

}

String type = exchange.getProperty(JSON_DATA_TYPE_KEY, String.class);
try (JacksonDataFormat dataFormat = new JacksonDataFormat(new ObjectMapper(), Class.forName(type))) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating an ObjectMapper per invocation seems quite expensive.

Ideally an object mapper from the registry should be used if present (i.e. you may want to tweak how the object mapper works), if not a local but cached one can be used.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yes at first we can create a shared object mapper on the class level, not per invoked exchange.

  2. You should use Camels' ClassResolver API to load classes, not Class.forName

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -107,17 +113,24 @@ spec:
- "camel:aws2-ddb"
- "camel:kamelet"
template:
beans:
- name: dataTypeRegistry
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this should be there as there will be an instance of the registry for each kamelet instantiation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. How can I make sure to add this to the Camel registry as a singleton service?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it must be done outside kamelets but eventually you can explore about not needing a registry at all ad lazy loading converters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can add this to the Camel registry with quarkus.camel.service.registry.include-patterns property set on the camel-quarkus extension. But this needs to be done in Camel L when generating the Maven project that builds and runs the integration. I will add a new issue on Camel K for that. Once we have this we can remove the local beans from the Kamelets.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -107,13 +107,28 @@ spec:
description: The number of milliseconds before the next poll of the selected bucket.
type: integer
default: 500
outputFormat:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the list of possible types must be expressed through an enum

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why limiting this to a predefined enum of formats? The user may provide a custom format, too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem is: how the user/tooling know what converters are available and what are supported ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its in the Kamelet specification. The default supported output/input types are specified and described in the Kamelet spec. Like this:

apiVersion: camel.apache.org/v1alpha1
kind: Kamelet
metadata:
  name: aws-s3-source
  labels:
    camel.apache.org/kamelet.type: "source"
spec:
  definition:
    title: "AWS S3 Source"
    type: object
    properties:
      bucketNameOrArn:
        title: Bucket Name
        description: The S3 Bucket name or ARN
        type: string
      ...
  output:
    default: binary
    types:
      binary:
        mediaType: application/octet-stream
      json:
        mediaType: application/json
        dependencies:
          - "camel:jackson"
        schema:
          type: object
          properties:
            key:
              title: S3 key
              description: The S3 key identifier
              type: string
            fileContent:
              title: File content
              description: The file content as String
              type: string

This is the place where each data type is able to specify a schema and additional dependencies, too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but if we expect the user to provide custom converter, then the only way to validate it is to wait for runtime failures. I'm thinking about how tooling would provide support/validation for that.

@@ -97,6 +97,12 @@ spec:
x-descriptors:
- 'urn:alm:descriptor:com.tectonic.ui:checkbox'
default: false
inputFormat:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the list of possible types must be expressed through an enum

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as for output formats. I think the user may provide a custom format, too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but at least as description or as information, we need a list of default types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oscerd valid concern and I think we can add a list of default types. this is to be done in a separate PR though once Camel K Kamelet CRDs are updated with additional data type fields.

*
* The registry is able to retrieve converters for a given data type based on the component scheme and the given data type name.
*/
public class DefaultDataTypeRegistry extends ServiceSupport implements DataTypeRegistry, CamelContextAware {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we need a registry or we could lazy load the converter when the processor is created and initialized using the standard camel factory finder mechanism which would work out of the box also on quarkus

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! I will take a look on the factory finder mechanism. The idea with the registry was to have this ready for a broader scope (not only for Kamelets but in Camel core). For the Kamelet use case in particular this might be a heavyweight solution, agreed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a lazy loading of component converters via resource path lookup using the factory finder mechanism

Copy link
Contributor Author

@christophd christophd Nov 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to enable the factory finder in camel-quarkus for this with quarkus.camel.service.discovery.include-patterns build time property. This needs to be done in Camel K so I will add a new issue to track this one in Camel K.

Once we have this we can disable classpath scan here in Kamelets and use lazy loading via factory finder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option would be to create a quarkus extension which would register the service and i.e. also the factory finder without the need to have properties

Copy link
Contributor

@oscerd oscerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just have some worries about leaving the types without a default types list.

@@ -97,6 +97,12 @@ spec:
x-descriptors:
- 'urn:alm:descriptor:com.tectonic.ui:checkbox'
default: false
inputFormat:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, but at least as description or as information, we need a list of default types.

@christophd christophd force-pushed the issue/ENTESB-19587/data-types branch 8 times, most recently from 69e1581 to 703c258 Compare November 18, 2022 08:31
/**
* Data type loader scans packages for {@link DataTypeConverter} classes annotated with {@link DataType} annotation.
*/
public class AnnotationDataTypeLoader implements DataTypeLoader, CamelContextAware {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, if this annotation and loader is going to work when running in native mode ?

Copy link
Contributor Author

@christophd christophd Nov 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point. not sure either about this. we will switch to factory finder mechanism once this is enabled for the DataTypeConverter in camel-quarkus and this mechanism will work in native mode, too.

@christophd
Copy link
Contributor Author

christophd commented Nov 22, 2022

@oscerd I was adding some more commits while iterating and improving the PR. Do you want me to squash the changes to some essential commits first?

@christophd
Copy link
Contributor Author

also I see I need to rebase

@oscerd
Copy link
Contributor

oscerd commented Nov 22, 2022

That's fine, no need for squash at least from my PoV

@oscerd
Copy link
Contributor

oscerd commented Nov 22, 2022

Let me know once is ready and I'll review and merge. We need also to identify the default type for each of Kamelets we're going to modify.

@oscerd oscerd added this to the 0.10.0 milestone Nov 22, 2022
@oscerd
Copy link
Contributor

oscerd commented Nov 23, 2022

Camel-k-runtime seems to be ok while upgrading to the required bits for Camel 3.19.0 and Camel Quarkus 2.14.0. I would like to include this PR in 0.10.0 and release by the end of this week or beginning of the next. So we could have a first camel k release (1.11.0) upstream with this feature.

@oscerd
Copy link
Contributor

oscerd commented Nov 24, 2022

Can you rebase @christophd ? Thanks.

@christophd
Copy link
Contributor Author

@oscerd rebase is done. Tomorrow I can have a look at the YAKS tests if you have some more time with the release.

@oscerd
Copy link
Contributor

oscerd commented Nov 24, 2022

No rush. When you think the first iteration is done, I'll review and merge

@davsclaus
Copy link
Contributor

Please make sure that this works in standalone Camel as well, eg such as try via camel-jbang.
Kamelets are universal building blocks and MUST work on camel-spring-boot, camel-kafka-connector, camel-quarkus (without camel-k), and of course camel-k as well.

Previously Kamelets have been kept more basic where the kamelet-util JAR was what the name said - utility. With this kind of work, then its no longer util, but a core and api part of kamelets, which IMHO later should be moved into their own modules.

@oscerd
Copy link
Contributor

oscerd commented Nov 24, 2022

We could create a kamelet-api module. I'll try the PR on jbang too.

@christophd
Copy link
Contributor Author

@davsclaus absolutely! IMO this should go to Camel core in the long term. That's why I was adding all those spi and api parts as well as the registry with factory finder lookup as part of this solution.

Setting the data content type breaks the Camel Knative producer
- Avoid having the additional dependency in favor of using plain String constants
Also use Camel ClassResolver API to resolve model class
- Align with CloudEvents spec in creating proper event type and source values
- Enable Knative YAKS tests
- Remove JacksonDataFormat in favor of using simple ObjectMapper instance
- Reuse ObjectMapper instance for all exchanges processed by the data type
- AWS S3 source Kamelet
- AWS DDB sink Kamelet
- JsonToDdbModelConverter utility and unit tests
@oscerd oscerd marked this pull request as ready for review December 1, 2022 12:16
@christophd
Copy link
Contributor Author

@oscerd @lburgazzoli @davsclaus look at this all checks have passed 😄

So what we have right now is following:

  • reverted AWS S3 source and AWS DDB sink Kamelet to not use data types and they work as it has been before
  • introduced experimental Kamelets (in experimental top level folder) that make use of data types utils
  • add some YAKS test coverage for experimental Kamelets that are also part of the GitHub actions CI jobs

Only question is if we should include the experimental Kamelets into the catalog right now or leave them in experimental folder for now. I am happy with both ways but I guess it will be easier for people to give it a try when they are part of the catalog.

@oscerd
Copy link
Contributor

oscerd commented Dec 1, 2022

Moving them to the official catalog seems to be better maybe something like aws-s3-source-exp.kamelet.yaml instead of .exp.kamelet.yaml. So we could give them to the users.

@oscerd
Copy link
Contributor

oscerd commented Dec 1, 2022

and have early feedback

@oscerd
Copy link
Contributor

oscerd commented Dec 1, 2022

So I would prefer to have them in the catalog from the 0.10.0 release.

@christophd christophd force-pushed the issue/ENTESB-19587/data-types branch 2 times, most recently from bd4b94a to a3cf8d6 Compare December 1, 2022 16:19
@christophd
Copy link
Contributor Author

@oscerd the experimental Kamelets are now included in the catalog. Because of validation scripts I ended up with following naming:

  • aws-s3-experimental-source
  • aws-ddb-experimental-sink

All checks are green! From my side this is good to be merged

@davsclaus
Copy link
Contributor

LGTM

Thanks for the effort @christophd and sorry for the bike-shedding but thats the nature of humans, github and big PRs ;)

@oscerd oscerd merged commit b1dec7f into apache:main Dec 2, 2022
@oscerd
Copy link
Contributor

oscerd commented Dec 2, 2022

Thanks. Nice addition!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants