Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Integrate With OpenSchema #339

Open
qqeasonchen opened this issue May 11, 2021 · 30 comments
Open

[Feature] Integrate With OpenSchema #339

qqeasonchen opened this issue May 11, 2021 · 30 comments
Labels
discussion This issue requires further discussion feature GSoC Google Summer of Code

Comments

@qqeasonchen
Copy link
Contributor

qqeasonchen commented May 11, 2021

option 1. reference openmessage
openmessaging/openschema#1
https://github.com/openmessaging/openschema/blob/master/spec.md

@qqeasonchen qqeasonchen added GSoC Google Summer of Code feature labels May 11, 2021
@qqeasonchen qqeasonchen changed the title Support OpenSchema Integrate With OpenSchema May 11, 2021
@yzhao244
Copy link
Contributor

I am a user of eventmesh and I'm extremely interested in contributing to this project. I will go through the project related to EventMesh Schema Registry implementation which integrates with OpenSchema.
I will follow them and get back to the community for further discussions.

@jinrongluo
Copy link
Contributor

OpenSchema spec preview is released here:

https://github.com/openmessaging/openschema/blob/master/spec.md

@qqeasonchen
Copy link
Contributor Author

I am a user of eventmesh and I'm extremely interested in contributing to this project. I will go through the project related to EventMesh Schema Registry implementation which integrates with OpenSchema.
I will follow them and get back to the community for further discussions.

welcome and looking forward to your contributions.

@yzhao244
Copy link
Contributor

yzhao244 commented Jun 28, 2021

What is the question:

I am attempting a design which integrates with OpenSchema and is also easy to extend.

What would you like to be added:
I suggest to add two more modules in overall eventmesh projects

  1. eventmesh-store-api
    This is a interface module which contains schemas registry persistency APIs such as the followings.
  public interface EventSchemaService extends SchemaRegistry  {
	void createSchema(SchemaRequest schemaRequest);

    List<SchemaResponse> readAllSchemas();
    
    void updateSchema(SchemaRequest schemaRequest, String schemaId);   
	
    void deleteSchema(SchemaRequest schemaRequest, String schemaId);
  }
  1. eventmesh-store-h2
    This module contains the actual implementation of EventSchemaService which integrates with OpenSchema. I proposed to leverage using h2 database for persisting schema registry in eventmesh.
    However, this is also a pluggable module. Therefore, vendors can implement persistency using other techniques such as file system or any other data stores at their own will.

Why is this needed:

  1. It ensures extendibility of Schema Registry in eventmesh since vendors may have requirements of using different techniques such as in-memory db, mysql db or any other data store for persisting data.
  2. Furthermore, store layer can be extended with other management infomation for persistency such as subscriptions, topics. It is just this time we do for schema registry. :)

@qqeasonchen qqeasonchen added the discussion This issue requires further discussion label Jun 29, 2021
@qqeasonchen
Copy link
Contributor Author

@yzhao244 This store here better differ from event-store in connector, what do you think?

@qqeasonchen
Copy link
Contributor Author

@yzhao244 Would you like to share some new designs for us? thanks.

@yzhao244
Copy link
Contributor

yzhao244 commented Jul 5, 2021

@yzhao244 This store here better differ from event-store in connector, what do you think?

Yes, you are right. :) ... maybe name it something like "eventmesh-database-api".. The purpose of this interface module is for introducing an abstraction layer of registry APIs

@yzhao244
Copy link
Contributor

yzhao244 commented Jul 5, 2021

@yzhao244 Would you like to share some new designs for us? thanks.

What is the purpose of the design:

The purpose of the design is for introducing Schema Registry as part of the EventMesh.
The Schema Registry is a central repository with RESTful interfaces for developers to define and register standard schemas. Addresses the problem of different data(event) format of producer and consumer.

What are the features to provide from Schema Registry:

  1. Persist and share version history of all schemas(schemas lifecycle management) and verify schema compatibility.
  2. Supports Avro, JSON, and Protobuf formats serialization/deserialization

What are the high level design to achieve the features

  1. Defines schema registry data models(subject, schema, version, compatibility) and schema REST API standards based on the open-source OpenSchema specifications.
  2. The eventmesh-database-api abstract module abstracts the CRUD capability of the schema registry into this module.
  3. Eventmesh-database-h2 contains the actual implementation. I proposed to leverage using h2 database and use JDBC API for querying with h2 database in eventmesh.

The followings are high-level design diagram:
image
An example of Backward Compatibility from OpenSchema Specification
image

@xwm1992
Copy link
Contributor

xwm1992 commented Jul 6, 2021

@yzhao244 Would you like to share some new designs for us? thanks.

What is the purpose of the design:

The purpose of the design is for introducing Schema Registry as part of the EventMesh.
The Schema Registry is a central repository with RESTful interfaces for developers to define and register standard schemas. Addresses the problem of different data(event) format of producer and consumer.

What are the features to provide from Schema Registry:

  1. Persist and share version history of all schemas(schemas lifecycle management) and verify schema compatibility.
  2. Supports Avro, JSON, and Protobuf formats serialization/deserialization

What are the high level design to achieve the features

  1. Defines schema registry data models(subject, schema, version, compatibility) and schema REST API standards based on the open-source OpenSchema specifications.
  2. The eventmesh-database-api abstract module abstracts the CRUD capability of the schema registry into this module.
  3. Eventmesh-database-h2 contains the actual implementation. I proposed to leverage using h2 database and use JDBC API for querying with h2 database in eventmesh.

The followings are high-level design diagram:
image
An example of Backward Compatibility from OpenSchema Specification
image

Hi @yzhao244 , I have some doubts about this design.

  • The interaction between the Schema Registry and the Database is not represented in this figure;
  • What is the relationship between topic and schema id, at present topic store the messages or events, are these represent as data in your figure?
  • What kind of data does the database and topic need to store

@yzhao244
Copy link
Contributor

yzhao244 commented Jul 7, 2021

@xwm1992 Thanks for the questions. :) . The followings are replies.. Sorry about my drawings are a bit rough. :)

image

  • The interaction between the Schema Registry and the Database is not represented in this figure;
    The database which is H2 In-memory DB in this case persists schemas and subjects data. REST APIs which follow OpenSchema specification will be exposed by the EventMesh Schema Registry for performing CRUD against schema and subject tables in DB. FYI, the followings are list of REST APIs that EventMesh Schema Registry according to OpenSchema standards.

POST /subjects/(string: subject)/
POST /subjects/(string: subject)/versions
POST /compatibility/subjects/(string: subject)/versions/(version: version)

GET /subjects
GET /subjects/(string: subject)
GET /subjects/(string: subject)/versions
GET /subjects/(string: subject)/versions/(version: version)/schema
GET /schemas/ids/{string: id}
GET /schemas/ids/{string: id}/subjects
GET /config/(string: subject)

PUT /config/(string: subject)

DELETE /subjects/(string: subject)/versions/(version: version)
DELETE /subjects/(string: subject)

  • What is the relationship between topic and schema id, at present topic store the messages or events, are these represent as data in your figure?
    Yes, topic still stores message or events as how it does today. Since OpenSchema factors serialization into its specification, it makes sense that EventMesh Schema Registry supports serializing/deserializing event. Therefore, in the future, a topic can store serialized event based on user defined format(json, avro and so on).

  • What kind of data does the database and topic need to store

  1. Database stores subject and schema specifications (name, schema content, versions , compatibility and so on) defined by end-user when using EventMesh Schema Registry REST APIs. The followings are my proposed DB schemas for subject and schema tables.
  2. topic stores events and messages.
    image
    image

@yzhao244
Copy link
Contributor

Furthermore, currently, the project does not have a layer which exposes API which follow the REST best practice. I would like to also propose another module something called "eventmesh-rest" which can expose EventMesh Schema Registry APIs by following the OpenSchema restful APIs standards as shown above.

@qqeasonchen
Copy link
Contributor Author

Furthermore, currently, the project does not have a layer which exposes API which follow the REST best practice. I would like to also propose another module something called "eventmesh-rest" which can expose EventMesh Schema Registry APIs by following the OpenSchema restful APIs standards as shown above.

sure ok.

@jzhou59
Copy link
Contributor

jzhou59 commented Jul 19, 2021

Hi, I'm also interested in schema registry in EventMesh. Thanks for your design and explanations. Now I get that :

  • schema represents the format of transferring messages
  • the benefit of integrating OpenSchema in EventMesh lies in that consumer could dynamically parsing any message as long as consumer can find schema id in h2 database.

Also, I have some questions:

  • in upper design, does EventMesh Schema Registry needs a separate server to run? or it could run inside EventMesh Runtime?
  • is the scope of Schema lies in content of message or the whole message?

Am I understanding it in the right way?
Next I will go through the codes of both develop-branch and PR#434, hope I could contribute to it.

@ruanwenjun
Copy link
Member

@yzhao244 Hi, I have a question, the h2 is a memory database, it seems doesn't support distributed, how can the different eventmesh-runtime sync the schema change?

@qqeasonchen
Copy link
Contributor Author

@yzhao244 Hi, I have a question, the h2 is a memory database, it seems doesn't support distributed, how can the different eventmesh-runtime sync the schema change?

good question, maybe we need to make the schema work flow clear

@yzhao244
Copy link
Contributor

@ruanwenjun @qqeasonchen Hi guys, I am thinking it is better delivering OpenSchema Integration in an incremental delivery fashion in order to ensure a safe build. :) .. In total, OpenSchema APIs can be seen as three groups.. /subject/ related APIs, /schema/ related APIs, /config/compatibility related APIs which I would suggest to deliver each group by individually separated PRs. The PR 434 currently delivers /subject/ related APIs.

@yzhao244
Copy link
Contributor

Hi, I'm also interested in schema registry in EventMesh. Thanks for your design and explanations. Now I get that :

  • schema represents the format of transferring messages
  • the benefit of integrating OpenSchema in EventMesh lies in that consumer could dynamically parsing any message as long as consumer can find schema id in h2 database.

Also, I have some questions:

  • in upper design, does EventMesh Schema Registry needs a separate server to run? or it could run inside EventMesh Runtime?
  • is the scope of Schema lies in content of message or the whole message?

Am I understanding it in the right way?
Next I will go through the codes of both develop-branch and PR#434, hope I could contribute to it.

Thanks for your participation. :) .. Yes, your understanding is correct. Schema Registry APIs are part of admin APIs so yes it can be run as part of EventMesh-runtime. The scope of schema is for ensuring the consistency and compatibility of exchanging events between event producer and event consumer.

@qqeasonchen
Copy link
Contributor Author

qqeasonchen commented Aug 3, 2021

@yzhao244 sorry, after discuss with community, Schema Registry needs a separate server to run, eventmesh runtime query and cache schema, and then used to check schema, producer and consumer do not need to interact with schemaRegistry, what do you think of this? @JunjieChou also do the design now.

@jzhou59
Copy link
Contributor

jzhou59 commented Aug 4, 2021

@qqeasonchen @yzhao244 Hi, guys. Below is a high-level design of Schema Registry and EventMesh-Schema-SPI. It decouples Schema Registry as a separated runtime which currently is one host running schema registry.
How do you think of this design?

EventMeshSchemaRegistryArchitecture_2nd_edition

@jinrongluo
Copy link
Contributor

@JunjieChou @qqeasonchen @yzhao244

Hi Junjie, Thanks for your proposal on Schema Registry design. I agree overall design and the example steps of how eventmesh are using Schema Registry to process the events. I have two comments below:

  1. Schema Registry API is part of EventMesh administrative API, In the future we can have other admin APIs such as Topic API, and subscription API. See issue [Enhancement] How EventMesh offers Administrative API to manage Topic of the eventstore #346, and issue Suggestion of Event Subscription management and persistence #349 All these admin APIs can be group into a new module of eventmesh: eventmesh-rest module. This module will be running as part of eventmesh runtime. And this module includes the Schema Registry Runtime in @JunjieChou 's design. See issue [Enhancement] Expose clientmanager/admin APIs as RESTful APIs #435

Also, It is much lightweight to run Schema Registry Runtime as part of EventMesh runtime process. Deployment and service upgrade only deal with single runtime process.

When scaling up the EventMesh runtime instances, Schema Registry Runtime will scale up along with it. it provides high availability. Since Schema APIs are stateless, we can scale up Schema Registry Runtime.

Thus, from the perspectives of extensibility, deployment maintenance, and high availability, I would say running Schema Registry as part of EventMesh Runtime process.

  1. For database, I would say it is not dedicated to Schema Registry. In the future it can be used to store other EventMesh assets, such as topics and subscriptions. see issue Suggestion of Event Subscription management and persistence #349

@qqeasonchen
Copy link
Contributor Author

@jinrongluo @yzhao244 @JunjieChou hi,Here is the different, schema registry runs dependently or along with Eventmesh runtime? disscusson is open here. I agree with setting up eventmesh-rest and eventmesh-store.

@jzhou59
Copy link
Contributor

jzhou59 commented Aug 5, 2021

@qqeasonchen @jinrongluo @yzhao244

agreement

Hi, Jinrong, I get what you mean. And I believe you are right considering scaling. The model you proposed is integrating Schema Registry API with EventMesh(eventmesh-rest), in which model there is no client and server because the eventmesh-rest undertakes the interaction with the database. And the database is independent of EventMesh so that other assets may also be stored.

question

So here comes another question, which database is suitable for this situation? h2-database is a memory database that is fast. Traditional Relational Database stores persistent data.

a new concern

Besides, I reconsider the steps which contain an unreasonable step(preparation). In step preparation, I assume that schema and serialization type is set first. However, serialization type may differ among events(with the same subject/topic) created by different producers which is actually the necessity that Schema Registry should exist. So I think schema and serialization type should be decided by producers rather than EventMesh. What do you think of this one?

@jzhou59
Copy link
Contributor

jzhou59 commented Aug 5, 2021

Anyway, the question is not an urgent one. But the concern may be a have-to-solved one before coding. What do you guys think?

@jinrongluo
Copy link
Contributor

@JunjieChou @qqeasonchen @yzhao244

Thank you JunJie for your review and analysis.

For database question. I would say the choice of database is depend on the deployment environment. For dev/test environment, where only single instance of EventMesh is provisioned, H2 database is sufficient. For Staging/Production environment, distributed database (such as MySQL) is required. So we can have eventmesh-store-plugin module which allows cloud vendors to have their own database plugin as the persistence layer for EventMesh. We can provide MySQL implementation as the reference in this opensource project.

For schema and (de-)serialization, I would suggest this is done in the EventMesh SDK side. Producer and Consumer can use EventMesh SDK to (de-)serialize their event using their own schema type.

I also love to hear other suggestions. :)

@jzhou59
Copy link
Contributor

jzhou59 commented Aug 5, 2021

@jinrongluo @qqeasonchen @yzhao244 hey, guys. After discussing with Weiming, I found that my understandings of some terms is not correct which makes my design a bit confusing. I get your points which is exactly what I thought and I will return with a new design picture. Sorry to make these confuses.

@jzhou59
Copy link
Contributor

jzhou59 commented Aug 12, 2021

Hi, @qqeasonchen @jinrongluo @yzhao244.
After comparing schema integration in other projects(Kafka, EMQ, Pulsar), I propose to separate OpenSchema into two parts. One is server-side (OpenSchema Registry) which provides storing and maintaining schema services, another is client-side which provides (de-)serialization and validation services.
I have defined the client-side architecture and interfaces according to it and created a pr #498 .

@qqeasonchen
Copy link
Contributor Author

@JunjieChou Nice

qqeasonchen pushed a commit that referenced this issue Aug 23, 2021
* [ISSUE #339] add design doc for integrating OpenSchema

* fix typos and change some representations

* add eventmesh-schemaregistry pictures

* Delete eventmesh-schemaregistry-arch.png

* Delete eventmesh-schemaregistry-process.jpg

* add  eventmesh-schemaregistry pictures

* change representations and process design
xwm1992 pushed a commit that referenced this issue Sep 23, 2021
… and implement most APIs that doesn't need compatibility check. (#525)

* [ISSUE #339] a skeleton of independent openschema registry service

* add license for build.gradle

add license for ```incubator-eventmesh/eventmesh-openschema/build.gradle```

* modify license

modify license of ```incubator-eventmesh/eventmesh-openschema/build.gradle```

* update license

update license of ```build.gradle``` in ```incubator-eventmesh/eventmesh-openschema/eventmesh-openschema-registry```

* Update application.yml

update license of ```application.yml``` in ```incubator-eventmesh/eventmesh-openschema/eventmesh-openschema-registry```

* [ISSUE #339] add FLIP-like progress

* [ISSUE #339] fix allowed-licenses.txt and remove unused dependencies

* [ISSUE #339] remove null in allowed-licenses.txt and move dependency version in root-build.gradle
@qqeasonchen qqeasonchen added this to the 1.3.0 milestone Oct 14, 2021
@xwm1992 xwm1992 changed the title Integrate With OpenSchema [Feature] Integrate With OpenSchema Dec 16, 2021
xwm1992 pushed a commit to xwm1992/EventMesh that referenced this issue Dec 27, 2021
)

* [ISSUE apache#339] add design doc for integrating OpenSchema

* fix typos and change some representations

* add eventmesh-schemaregistry pictures

* Delete eventmesh-schemaregistry-arch.png

* Delete eventmesh-schemaregistry-process.jpg

* add  eventmesh-schemaregistry pictures

* change representations and process design
xwm1992 pushed a commit to xwm1992/EventMesh that referenced this issue Dec 27, 2021
…ervice and implement most APIs that doesn't need compatibility check. (apache#525)

* [ISSUE apache#339] a skeleton of independent openschema registry service

* add license for build.gradle

add license for ```incubator-eventmesh/eventmesh-openschema/build.gradle```

* modify license

modify license of ```incubator-eventmesh/eventmesh-openschema/build.gradle```

* update license

update license of ```build.gradle``` in ```incubator-eventmesh/eventmesh-openschema/eventmesh-openschema-registry```

* Update application.yml

update license of ```application.yml``` in ```incubator-eventmesh/eventmesh-openschema/eventmesh-openschema-registry```

* [ISSUE apache#339] add FLIP-like progress

* [ISSUE apache#339] fix allowed-licenses.txt and remove unused dependencies

* [ISSUE apache#339] remove null in allowed-licenses.txt and move dependency version in root-build.gradle
@qqeasonchen qqeasonchen modified the milestones: 1.3.0, 1.4.0 Jan 20, 2022
@jzhou59
Copy link
Contributor

jzhou59 commented Apr 9, 2022

Hi, I would like to continue on this issue and have created a pr in #821.
However, @ruanwenjun and I seem to have a different understanding of the responsibilities of the openschema plugin.
I was trying to use the plugin for interacting with openschema implementations such as SchemaRegistry, including registering schemas, retrieving schemas, and so on.
@ruanwenjun suggests providing serialization, deserialization, and validation for the different message types.
What do you think is needed when integrating with openschema? @qqeasonchen

@ruanwenjun
Copy link
Member

ruanwenjun commented Apr 9, 2022

Hi, I would like to continue on this issue and have created a pr in #821. However, @ruanwenjun and I seem to have a different understanding of the responsibilities of the openschema plugin. I was trying to use the plugin for interacting with openschema implementations such as SchemaRegistry, including registering schemas, retrieving schemas, and so on. @ruanwenjun suggests providing serialization, deserialization, and validation for the different message types. What do you think is needed when integrating with openschema? @qqeasonchen

I read the doc of SchemaRegistry, in short, it's a server which provide some interface to CRUD schema. This seems too weighty to me, our goal is simply to use the OpenSchema specification. As for the storage of Schema, it is better not to rely on additional web service, just store with our metadata is good enough.

BTY, I don't recommend designing OpenSchema as a plugin, this is over design, at least, I don't think we will integrate with other schema specification.

@jzhou59
Copy link
Contributor

jzhou59 commented Apr 10, 2022

@ruanwenjun Ok, I see. I am gonna close the pr #821 and add a new module under eventmesh-admin.

xwm1992 pushed a commit that referenced this issue Aug 4, 2022
* [ISSUE #339] add design doc for integrating OpenSchema

* fix typos and change some representations

* add eventmesh-schemaregistry pictures

* Delete eventmesh-schemaregistry-arch.png

* Delete eventmesh-schemaregistry-process.jpg

* add  eventmesh-schemaregistry pictures

* change representations and process design
xwm1992 pushed a commit that referenced this issue Aug 4, 2022
… and implement most APIs that doesn't need compatibility check. (#525)

* [ISSUE #339] a skeleton of independent openschema registry service

* add license for build.gradle

add license for ```incubator-eventmesh/eventmesh-openschema/build.gradle```

* modify license

modify license of ```incubator-eventmesh/eventmesh-openschema/build.gradle```

* update license

update license of ```build.gradle``` in ```incubator-eventmesh/eventmesh-openschema/eventmesh-openschema-registry```

* Update application.yml

update license of ```application.yml``` in ```incubator-eventmesh/eventmesh-openschema/eventmesh-openschema-registry```

* [ISSUE #339] add FLIP-like progress

* [ISSUE #339] fix allowed-licenses.txt and remove unused dependencies

* [ISSUE #339] remove null in allowed-licenses.txt and move dependency version in root-build.gradle
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion This issue requires further discussion feature GSoC Google Summer of Code
Projects
None yet
Development

No branches or pull requests

6 participants