Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Producer Consumer design #8417

Closed
Redirts opened this issue Oct 30, 2020 · 4 comments
Closed

Producer Consumer design #8417

Redirts opened this issue Oct 30, 2020 · 4 comments

Comments

@Redirts
Copy link

Redirts commented Oct 30, 2020

I am looking for advice on how the following problem is handled in solutions that use Pulsar.

The scenario is: A producer is sending messages (in JSON) and the consumer is taking them and inserting data into database tables (specified in the message).

Suddenly the producer changes the structure of the data sent in the messages (let's say because the table structure in the database has a new column).

So for a moment, Pulsar has messages with the old data structure and now starts receiving messages with new data structure.

My doubt is regarding how the consumer should handle this scenario.
What to do with the messages with old structure that are now invalid since they cannot be inserted in the database table since the table structure changed. Retry and then permanently fail (dead letter Q?).

Also, do you usually opt to sent the metadata along with your messages or do you normally handle this in a separate topic or other form.

Thanks for any advice

@codelipenghui
Copy link
Contributor

There is a build-in schema registry in Pulsar. You can update your schema when the database schema is changed. From the consumer side, you can get the schema version of the schema. For example, the schema version changes from 1 to 2, this means version 1 is the old data structure and version 2 is the new data structure. So that you can make make a decision according to the different schema versions, this way, you need to upgrade the consumer first. For more details of the Pulsar schema, you can see http://pulsar.apache.org/docs/en/schema-get-started/

Another way is the DLQ that you have mentioned. You can store if the old data into the DLQ if needed. Of cause, if you don't want to maintain the old data you can acknowledge them directly.

@Redirts
Copy link
Author

Redirts commented Nov 2, 2020

Thank you for the comment.
What you describe makes total sense.

My idea would be that the consumer does not need to be upgraded since it will just read the metadata inside the message and know what the tablename, structure and data he needs to write to.

From what I read, pulsar currently does not support multiple schemas on a topic so I would need to have one topic per domain of data produced.

##8301

@codelipenghui
Copy link
Contributor

codelipenghui commented Nov 6, 2020

@Redirts

From what I read, pulsar currently does not support multiple schemas on a topic so I would need to have one topic per domain of data produced.

We have already support multiple schemas on a topic and support one schema compatibility policies on a topic. For more details, you can see http://pulsar.apache.org/docs/en/schema-evolution-compatibility/ and #3876

@tisonkun
Copy link
Member

Closed as answered.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants