-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Python Client] Python client support using custom Avro schema definition #12516
Merged
codelipenghui
merged 3 commits into
apache:master
from
gaoran10:gaoran/python-custom-avro-schema
Oct 30, 2021
Merged
[Python Client] Python client support using custom Avro schema definition #12516
codelipenghui
merged 3 commits into
apache:master
from
gaoran10:gaoran/python-custom-avro-schema
Oct 30, 2021
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
@gaoran10:Thanks for your contribution. For this PR, do we need to update docs? |
gaoran10
requested review from
merlimat,
BewareMyPower,
codelipenghui,
congbobo184,
hangc0276 and
zymap
October 28, 2021 05:24
codelipenghui
approved these changes
Oct 29, 2021
merlimat
approved these changes
Oct 29, 2021
merlimat
pushed a commit
that referenced
this pull request
Oct 30, 2021
…tion (#12516) ### Motivation Currently, the Python client didn't support using schema definition to generate `AvroSchema`, so users couldn't use the schema definition file in the Python client. ### Modifications Add a new init-param `schema_definition` for `AvroSchema` to support initializing the `AvroSchema` by an Avro schema definition. ``` class AvroSchema(Schema): def __init__(self, record_cls, schema_definition=None): if record_cls is None and schema_definition is None: raise AssertionError("The param record_cls and schema_definition shouldn't be both None.") if record_cls is not None: self._schema = record_cls.schema() else: self._schema = schema_definition super(AvroSchema, self).__init__(record_cls, _pulsar.SchemaType.AVRO, self._schema, 'AVRO') ``` ### How to use Assume that there is a company Avro schema definition file `company.avsc` like this. ``` { "doc": "this is doc", "namespace": "example.avro", "type": "record", "name": "Company", "fields": [ {"name": "name", "type": ["null", "string"]}, {"name": "address", "type": ["null", "string"]}, {"name": "employees", "type": ["null", {"type": "array", "items": { "type": "record", "name": "Employee", "fields": [ {"name": "name", "type": ["null", "string"]}, {"name": "age", "type": ["null", "int"]} ] }}]}, {"name": "labels", "type": ["null", {"type": "map", "values": "string"}]} ] } ``` Users could load schema from file by `avro.schema` or `fastavro.schema` > refer to [load_schema](https://fastavro.readthedocs.io/en/latest/schema.html#fastavro._schema_py.load_schema) or [Avro Schema](http://avro.apache.org/docs/current/gettingstartedpython.html) ``` schema_definition = load_schema("examples/company.avsc") # schema_definition = avro.schema.parse(open("examples/company.avsc", "rb").read()).to_json() avro_schema = AvroSchema(None, schema_definition=schema_definition) producer = client.create_producer( topic=topic, schema=avro_schema) consumer = client.subscribe(topic, 'test', schema=avro_schema) company = { "name": "company-name" + str(i), "address": 'xxx road xxx street ' + str(i), "employees": [ {"name": "user" + str(i), "age": 20 + i}, {"name": "user" + str(i), "age": 30 + i}, {"name": "user" + str(i), "age": 35 + i}, ], "labels": { "industry": "software" + str(i), "scale": ">100", "funds": "1000000.0" } } producer.send(company) msg = consumer.receive() # Users could get a dict object by `value()` method. msg.value() ```
codelipenghui
pushed a commit
that referenced
this pull request
Nov 22, 2021
- This PR adds doc for #12516 - Preview looks good: ![image](https://user-images.githubusercontent.com/50226895/142824597-eab9a2e5-e445-4c88-a536-358f0b5fab71.png)
dlg99
pushed a commit
to dlg99/pulsar
that referenced
this pull request
Nov 23, 2021
- This PR adds doc for apache#12516 - Preview looks good: ![image](https://user-images.githubusercontent.com/50226895/142824597-eab9a2e5-e445-4c88-a536-358f0b5fab71.png)
Anonymitaet
added
doc-complete
Your PR changes impact docs and the related docs have been already added.
and removed
doc-label-missing
labels
Nov 24, 2021
eolivelli
pushed a commit
to eolivelli/pulsar
that referenced
this pull request
Nov 29, 2021
…tion (apache#12516) ### Motivation Currently, the Python client didn't support using schema definition to generate `AvroSchema`, so users couldn't use the schema definition file in the Python client. ### Modifications Add a new init-param `schema_definition` for `AvroSchema` to support initializing the `AvroSchema` by an Avro schema definition. ``` class AvroSchema(Schema): def __init__(self, record_cls, schema_definition=None): if record_cls is None and schema_definition is None: raise AssertionError("The param record_cls and schema_definition shouldn't be both None.") if record_cls is not None: self._schema = record_cls.schema() else: self._schema = schema_definition super(AvroSchema, self).__init__(record_cls, _pulsar.SchemaType.AVRO, self._schema, 'AVRO') ``` ### How to use Assume that there is a company Avro schema definition file `company.avsc` like this. ``` { "doc": "this is doc", "namespace": "example.avro", "type": "record", "name": "Company", "fields": [ {"name": "name", "type": ["null", "string"]}, {"name": "address", "type": ["null", "string"]}, {"name": "employees", "type": ["null", {"type": "array", "items": { "type": "record", "name": "Employee", "fields": [ {"name": "name", "type": ["null", "string"]}, {"name": "age", "type": ["null", "int"]} ] }}]}, {"name": "labels", "type": ["null", {"type": "map", "values": "string"}]} ] } ``` Users could load schema from file by `avro.schema` or `fastavro.schema` > refer to [load_schema](https://fastavro.readthedocs.io/en/latest/schema.html#fastavro._schema_py.load_schema) or [Avro Schema](http://avro.apache.org/docs/current/gettingstartedpython.html) ``` schema_definition = load_schema("examples/company.avsc") # schema_definition = avro.schema.parse(open("examples/company.avsc", "rb").read()).to_json() avro_schema = AvroSchema(None, schema_definition=schema_definition) producer = client.create_producer( topic=topic, schema=avro_schema) consumer = client.subscribe(topic, 'test', schema=avro_schema) company = { "name": "company-name" + str(i), "address": 'xxx road xxx street ' + str(i), "employees": [ {"name": "user" + str(i), "age": 20 + i}, {"name": "user" + str(i), "age": 30 + i}, {"name": "user" + str(i), "age": 35 + i}, ], "labels": { "industry": "software" + str(i), "scale": ">100", "funds": "1000000.0" } } producer.send(company) msg = consumer.receive() # Users could get a dict object by `value()` method. msg.value() ```
eolivelli
pushed a commit
to eolivelli/pulsar
that referenced
this pull request
Nov 29, 2021
- This PR adds doc for apache#12516 - Preview looks good: ![image](https://user-images.githubusercontent.com/50226895/142824597-eab9a2e5-e445-4c88-a536-358f0b5fab71.png)
fxbing
pushed a commit
to fxbing/pulsar
that referenced
this pull request
Dec 19, 2021
- This PR adds doc for apache#12516 - Preview looks good: ![image](https://user-images.githubusercontent.com/50226895/142824597-eab9a2e5-e445-4c88-a536-358f0b5fab71.png)
codelipenghui
pushed a commit
that referenced
this pull request
Dec 20, 2021
…tion (#12516) ### Motivation Currently, the Python client didn't support using schema definition to generate `AvroSchema`, so users couldn't use the schema definition file in the Python client. ### Modifications Add a new init-param `schema_definition` for `AvroSchema` to support initializing the `AvroSchema` by an Avro schema definition. ``` class AvroSchema(Schema): def __init__(self, record_cls, schema_definition=None): if record_cls is None and schema_definition is None: raise AssertionError("The param record_cls and schema_definition shouldn't be both None.") if record_cls is not None: self._schema = record_cls.schema() else: self._schema = schema_definition super(AvroSchema, self).__init__(record_cls, _pulsar.SchemaType.AVRO, self._schema, 'AVRO') ``` ### How to use Assume that there is a company Avro schema definition file `company.avsc` like this. ``` { "doc": "this is doc", "namespace": "example.avro", "type": "record", "name": "Company", "fields": [ {"name": "name", "type": ["null", "string"]}, {"name": "address", "type": ["null", "string"]}, {"name": "employees", "type": ["null", {"type": "array", "items": { "type": "record", "name": "Employee", "fields": [ {"name": "name", "type": ["null", "string"]}, {"name": "age", "type": ["null", "int"]} ] }}]}, {"name": "labels", "type": ["null", {"type": "map", "values": "string"}]} ] } ``` Users could load schema from file by `avro.schema` or `fastavro.schema` > refer to [load_schema](https://fastavro.readthedocs.io/en/latest/schema.html#fastavro._schema_py.load_schema) or [Avro Schema](http://avro.apache.org/docs/current/gettingstartedpython.html) ``` schema_definition = load_schema("examples/company.avsc") # schema_definition = avro.schema.parse(open("examples/company.avsc", "rb").read()).to_json() avro_schema = AvroSchema(None, schema_definition=schema_definition) producer = client.create_producer( topic=topic, schema=avro_schema) consumer = client.subscribe(topic, 'test', schema=avro_schema) company = { "name": "company-name" + str(i), "address": 'xxx road xxx street ' + str(i), "employees": [ {"name": "user" + str(i), "age": 20 + i}, {"name": "user" + str(i), "age": 30 + i}, {"name": "user" + str(i), "age": 35 + i}, ], "labels": { "industry": "software" + str(i), "scale": ">100", "funds": "1000000.0" } } producer.send(company) msg = consumer.receive() # Users could get a dict object by `value()` method. msg.value() ``` (cherry picked from commit 85575f4)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
cherry-picked/branch-2.8
Archived: 2.8 is end of life
cherry-picked/branch-2.9
Archived: 2.9 is end of life
doc-complete
Your PR changes impact docs and the related docs have been already added.
release/2.8.2
release/2.9.2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
Currently, the Python client didn't support using schema definition to generate
AvroSchema
, so users couldn't use the schema definition file in the Python client.Modifications
Add a new init-param
schema_definition
forAvroSchema
to support initializing theAvroSchema
by an Avro schema definition.How to use
Assume that there is a company Avro schema definition file
company.avsc
like this.Users could load schema from file by
avro.schema
orfastavro.schema
Verifying this change
Add new tests to verify encode and decode using
AvroSchema
generated by the Avro schema definition.Does this pull request potentially affect one of the following parts:
If
yes
was chosen, please highlight the changesDocumentation
Check the box below and label this PR (if you have committer privilege).
Need to update docs?
doc-required
(If you need help on updating docs, create a doc issue)
no-need-doc
(Please explain why)
doc
(If this PR contains doc changes)