Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Schema Registry + Avro Serializer] 1.0.0b1 #13124

Merged
merged 36 commits into from
Sep 4, 2020
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
56bf5eb
init commit
yunhaoling Aug 14, 2020
d1c96f9
avro serializer structure
yunhaoling Aug 18, 2020
6311e92
adding avro serializer
yunhaoling Aug 20, 2020
0096ef2
tweak api version and fix a typo
yunhaoling Aug 20, 2020
2e95001
test template
yunhaoling Aug 21, 2020
2e8bcc7
avro serializer sync draft
yunhaoling Aug 22, 2020
6248ff1
major azure sr client work done
yunhaoling Aug 25, 2020
f97478e
add sample docstring for sr
yunhaoling Aug 25, 2020
3cf6459
avro serializer async impl
yunhaoling Aug 26, 2020
58fb59f
close the writer
yunhaoling Aug 26, 2020
02c60ec
update avro se/de impl
yunhaoling Aug 27, 2020
85d6766
update avro serializer impl
yunhaoling Aug 27, 2020
5fa4b43
fix apireview reported error in sr
yunhaoling Aug 27, 2020
b910027
srav namespace, setup update
yunhaoling Aug 27, 2020
6b6c8b2
doc update
yunhaoling Aug 28, 2020
c465be1
update doc and api
yunhaoling Aug 30, 2020
63a278c
impl, doc update
yunhaoling Aug 31, 2020
dc363f4
partial update according to laruent's feedback
yunhaoling Sep 1, 2020
740de0e
be consistent with eh extension structure
yunhaoling Sep 1, 2020
7734c42
more update code according to feedback
yunhaoling Sep 1, 2020
92cd385
update credential config
yunhaoling Sep 1, 2020
1c60676
rename package name to azure-schemaregistry-avroserializer
yunhaoling Sep 1, 2020
f20bba0
fix pylint
yunhaoling Sep 1, 2020
c81f16b
try ci fix
yunhaoling Sep 2, 2020
41ee64b
fix test for py27 as avro only accept unicode
yunhaoling Sep 3, 2020
2675331
first round of review feedback
yunhaoling Sep 3, 2020
fb0e6f9
remove temp ci experiment
yunhaoling Sep 3, 2020
0260ea3
init add conftest.py to pass py2.7 test
yunhaoling Sep 3, 2020
bb687cb
laurent feedback update
yunhaoling Sep 3, 2020
d8e0986
remove dictmixin for b1, update comment in sample
yunhaoling Sep 3, 2020
b91fb4f
update api in avroserializer and update test and readme
yunhaoling Sep 4, 2020
929ee68
update test, docs and links
yunhaoling Sep 4, 2020
8fbed90
add share requirement
yunhaoling Sep 4, 2020
01a39a7
update avro dependency
yunhaoling Sep 4, 2020
bde3c24
pr feedback and livetest update
yunhaoling Sep 4, 2020
a2903b6
Merge remote-tracking branch 'central/master' into sr-dev
yunhaoling Sep 4, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Release History

## 1.0.0b1 (Unreleased)
yunhaoling marked this conversation as resolved.
Show resolved Hide resolved

Version 1.0.0b1 is the first preview of our efforts to create a user-friendly and Pythonic client library for Azure Schema Registry Avro Serializer.
yunhaoling marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
include *.md
include azure/__init__.py
yunhaoling marked this conversation as resolved.
Show resolved Hide resolved
recursive-include tests *.py
recursive-include samples *.py
122 changes: 122 additions & 0 deletions sdk/schemaregistry/azure-schemaregistry-avroserializer/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# Azure Schema Registry Avro Serializer client library for Python

Azure Schema Registry Avro Serializer provides the ability to serialize and deserialize data according
to the given avro schema. It is integrated with Azure Schema Registry SDK and will automatically register and get schema.

## Getting started

### Install the package

Install the Azure Schema Registry Avro Serializer client library for Python with [pip][pip]:

```Bash
pip install azure-schemaregistry-avroserializer
```

### Prerequisites:
To use this package, you must have:
* Azure subscription - [Create a free account][azure_sub]
* Azure Schema Registry
* Python 2.7, 3.5 or later - [Install Python][python]

### Authenticate the client
Interaction with Schema Registry Avro Serializer starts with an instance of SchemaRegistryAvroSerializer class. You need the endpoint, AAD credential and schema group name to instantiate the client object.

**Create client using the azure-identity library:**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got feedback from cala; we're not supposed to have samples in these sections, we could make it a formal sample and link to it however, which is what I've leaned towards. Feel free to eyeball my current SB PR for an example of this.


```python
from azure.schemaregistry.serializer.avro_serializer import SchemaRegistryAvroSerializer
from azure.identity import DefaultAzureCredential

credential = DefaultAzureCredential()
endpoint = '<< ENDPOINT OF THE SCHEMA REGISTRY >>'
schema_group = '<< GROUP NAME OF THE SCHEMA >>'
schema_registry_client = SchemaRegistryAvroSerializer(endpoint, credential, schema_group)
```

## Key concepts

- Avro: Apache Avro™ is a data serialization system.

## Examples

The following sections provide several code snippets covering some of the most common Schema Registry tasks, including:

- [Serialization](serialization)
- [Deserialization](deserialization)

### Serialization

```python
import os
from azure.schemaregistry.serializer.avro_serializer import SchemaRegistryAvroSerializer
from azure.identity import DefaultAzureCredential

token_credential = DefaultAzureCredential()
endpoint = os.environ['SCHEMA_REGISTRY_ENDPOINT']
schema_group = "<your-group-name>"

serializer = SchemaRegistryAvroSerializer(endpoint, token_credential, schema_group)

schema_string = """
{"namespace": "example.avro",
"type": "record",
"name": "User",
"fields": [
{"name": "name", "type": "string"},
{"name": "favorite_number", "type": ["int", "null"]},
{"name": "favorite_color", "type": ["string", "null"]}
]
}"""

with serializer:
dict_data = {"name": "Ben", "favorite_number": 7, "favorite_color": "red"}
encoded_bytes = serializer.serialize(dict_data, schema_string)
```
Copy link
Member

@lmazuel lmazuel Sep 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarity, I would add another quick code block for EventHub usage, since it's the main planned usage (to connect the dot). Something really simple, with a "Look at EH doc for more details about EH"

I would make it a difference code block to be sure people would not assume this package is EH specific somehow


### Deserialization

```python
import os
from azure.schemaregistry.serializer.avro_serializer import SchemaRegistryAvroSerializer
from azure.identity import DefaultAzureCredential

token_credential = DefaultAzureCredential()
endpoint = os.environ['SCHEMA_REGISTRY_ENDPOINT']
schema_group = "<your-group-name>"

serializer = SchemaRegistryAvroSerializer(endpoint, token_credential, schema_group)

with serializer:
encoded_bytes = b'<data_encoded_by_azure_schema_registry_avro_serializer>'
decoded_data = serializer.deserialize(encoded_bytes)
```

## Troubleshooting

Azure Schema Registry Avro Serializer raise exceptions defined in [Azure Core][azure_core].
yunhaoling marked this conversation as resolved.
Show resolved Hide resolved

## Next steps

### More sample code

Please find further examples in the [samples](./samples) directory demonstrating common Azure Schema Registry Avro Serializer scenarios.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

relative links should be absolute per Cala's recommendations (otherwise docs break when we render them out to various sources)


## Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide
a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions
provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.

[pip]: https://pypi.org/project/pip/
[python]: https://www.python.org/downloads/
[azure_core]: https://github.com/Azure/azure-sdk-for-python/blob/master/sdk/core/azure-core/README.md
[azure_sub]: https://azure.microsoft.com/free/
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# --------------------------------------------------------------------------
#
# Copyright (c) Microsoft Corporation. All rights reserved.
#
# The MIT License (MIT)
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the ""Software""), to
# deal in the Software without restriction, including without limitation the
# rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
# sell copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
# IN THE SOFTWARE.
#
# --------------------------------------------------------------------------
__path__ = __import__("pkgutil").extend_path(__path__, __name__) # type: ignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# --------------------------------------------------------------------------
#
# Copyright (c) Microsoft Corporation. All rights reserved.
#
# The MIT License (MIT)
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the ""Software""), to
# deal in the Software without restriction, including without limitation the
# rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
# sell copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
# IN THE SOFTWARE.
#
# --------------------------------------------------------------------------
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# --------------------------------------------------------------------------
#
# Copyright (c) Microsoft Corporation. All rights reserved.
#
# The MIT License (MIT)
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the ""Software""), to
# deal in the Software without restriction, including without limitation the
# rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
# sell copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
# IN THE SOFTWARE.
#
# --------------------------------------------------------------------------
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# --------------------------------------------------------------------------
#
# Copyright (c) Microsoft Corporation. All rights reserved.
#
# The MIT License (MIT)
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the ""Software""), to
# deal in the Software without restriction, including without limitation the
# rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
# sell copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
# IN THE SOFTWARE.
#
# --------------------------------------------------------------------------
from ._version import VERSION

__version__ = VERSION

from ._schema_registry_avro_serializer import SchemaRegistryAvroSerializer

__all__ = [
"SchemaRegistryAvroSerializer"
]
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
# --------------------------------------------------------------------------
#
# Copyright (c) Microsoft Corporation. All rights reserved.
#
# The MIT License (MIT)
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the ""Software""), to
# deal in the Software without restriction, including without limitation the
# rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
# sell copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in
# all copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED *AS IS*, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
# FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
# IN THE SOFTWARE.
#
# --------------------------------------------------------------------------
import abc
from typing import BinaryIO, Union, Type, TypeVar, Optional, Any, Dict
from io import BytesIO
import avro
from avro.io import DatumWriter, DatumReader, BinaryDecoder, BinaryEncoder

try:
ABC = abc.ABC
except AttributeError: # Python 2.7, abc exists, but not ABC
ABC = abc.ABCMeta("ABC", (object,), {"__slots__": ()}) # type: ignore

ObjectType = TypeVar("ObjectType")


class AvroObjectSerializer(ABC):
yunhaoling marked this conversation as resolved.
Show resolved Hide resolved

def __init__(self, codec=None):
"""A Avro serializer using avro lib from Apache.
:param str codec: The writer codec. If None, let the avro library decides.
"""
self._writer_codec = codec
self._schema_writer_cache = {} # type: Dict[str, DatumWriter]
self._schema_reader_cache = {} # type: Dict[str, DatumReader]

def serialize(
self,
data, # type: ObjectType
schema, # type: Optional[Any]
yunhaoling marked this conversation as resolved.
Show resolved Hide resolved
):
# type: (...) -> bytes
"""Convert the provided value to it's binary representation and write it to the stream.
Schema must be a Avro RecordSchema:
https://avro.apache.org/docs/1.10.0/gettingstartedpython.html#Defining+a+schema
:param data: An object to serialize
yunhaoling marked this conversation as resolved.
Show resolved Hide resolved
:param schema: A Avro RecordSchema
yunhaoling marked this conversation as resolved.
Show resolved Hide resolved
yunhaoling marked this conversation as resolved.
Show resolved Hide resolved
"""
yunhaoling marked this conversation as resolved.
Show resolved Hide resolved
if not schema:
raise ValueError("Schema is required in Avro serializer.")

if not isinstance(schema, avro.schema.Schema):
schema = avro.schema.parse(schema)

try:
writer = self._schema_writer_cache[str(schema)]
except KeyError:
writer = DatumWriter(schema)
self._schema_writer_cache[str(schema)] = writer

stream = BytesIO()

writer.write(data, BinaryEncoder(stream))
encoded_data = stream.getvalue()

stream.close()
return encoded_data

def deserialize(
self,
data, # type: Union[bytes, BinaryIO]
schema, # type:
yunhaoling marked this conversation as resolved.
Show resolved Hide resolved
return_type=None, # type: Optional[Type[ObjectType]] # pylint: disable=unused-argument
yunhaoling marked this conversation as resolved.
Show resolved Hide resolved
):
# type: (...) -> ObjectType
"""Read the binary representation into a specific type.
Return type will be ignored, since the schema is deduced from the provided bytes.
:param data: A stream of bytes or bytes directly
:type data: BinaryIO or bytes
:param schema: A Avro RecordSchema
:param return_type: Return type is not supported in the Avro serializer.
:returns: An instantiated object
:rtype: ObjectType
"""
if not hasattr(data, 'read'):
data = BytesIO(data)

if not isinstance(schema, avro.schema.Schema):
schema = avro.schema.parse(schema)

try:
reader = self._schema_reader_cache[str(schema)]
except KeyError:
reader = DatumReader(writers_schema=schema)
self._schema_reader_cache[str(schema)] = reader

bin_decoder = BinaryDecoder(data)
decoded_data = reader.read(bin_decoder)
data.close()

return decoded_data