-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AVRO-1938: Add fingerprinting support to Python implementation #1181
AVRO-1938: Add fingerprinting support to Python implementation #1181
Conversation
With this change, Schema fingerprints can be extracted by invoking the `fingerprint` method on the schema object. By default, fingerprints will be generated with the CRC-64 algorithm. Optinally, the algorithm can be supplied. All algorithms supported by hashlib are available, but Avro recommends using one among CRC-32, MD5, and SHA256 as per needs.
This commit addresses review comments and freezes the supported fingerprinting algorithms set.
Good on moving to black! 🎉 🎉 |
@subhashb could you please take a crack at adding type hints to the changes here? |
Addresses PR 1181 review comments. Methods within Fingerprint mixin have been made available at the module level, including static variables used in fingerprinting. This PR has been synced with latest master.
@kojiromike All comments have been addressed, along with type hints where applicable. |
@RyanSkraba @kojiromike Can you please help me understand what could be causing this error? Do we need to flush caches? As fas as I can see, these failures are not due to changes in this PR's branch. |
There is some problem with Python since recently (today ?!). |
@martin-g Could it have something to do with this change: pytest-dev/pytest-xdist#821 ?
@RyanSkraba If you agree, we will need to clear the cache manually. |
@kojiromike Synced with latest master. Can you please approve workflow runs and review the PR? |
@kojiromike Addressed review comments. |
@kojiromike Addressed further review comments and left a note about |
* AVRO-1938 Add support for fingerprinting schemas With this change, Schema fingerprints can be extracted by invoking the `fingerprint` method on the schema object. By default, fingerprints will be generated with the CRC-64 algorithm. Optinally, the algorithm can be supplied. All algorithms supported by hashlib are available, but Avro recommends using one among CRC-32, MD5, and SHA256 as per needs. * AVRO-1938 Fix issue with AbstractSet typecheck * Format with black * Freeze Supported Algorithms Set This commit addresses review comments and freezes the supported fingerprinting algorithms set. * Minor lint fix with black * Address Typecheck issues with Frozenset * Fold Fingerprint Mixin within Schema Addresses PR 1181 review comments. Methods within Fingerprint mixin have been made available at the module level, including static variables used in fingerprinting. This PR has been synced with latest master. * Add type hints to fingerprint methods/variables * Fix incorrect import sorting in schema.py to pass lint check * Address @kojiromike Jul 16 review comments * Address @kojiromike Jul 16 review comments - 2 * Address @kojiromike Jul 17 review comments * Fix black lint issue (cherry picked from commit f504265)
…e#1181) * AVRO-1938 Add support for fingerprinting schemas With this change, Schema fingerprints can be extracted by invoking the `fingerprint` method on the schema object. By default, fingerprints will be generated with the CRC-64 algorithm. Optinally, the algorithm can be supplied. All algorithms supported by hashlib are available, but Avro recommends using one among CRC-32, MD5, and SHA256 as per needs. * AVRO-1938 Fix issue with AbstractSet typecheck * Format with black * Freeze Supported Algorithms Set This commit addresses review comments and freezes the supported fingerprinting algorithms set. * Minor lint fix with black * Address Typecheck issues with Frozenset * Fold Fingerprint Mixin within Schema Addresses PR 1181 review comments. Methods within Fingerprint mixin have been made available at the module level, including static variables used in fingerprinting. This PR has been synced with latest master. * Add type hints to fingerprint methods/variables * Fix incorrect import sorting in schema.py to pass lint check * Address @kojiromike Jul 16 review comments * Address @kojiromike Jul 16 review comments - 2 * Address @kojiromike Jul 17 review comments * Fix black lint issue
This PR adds support for generating fingerprints from schema objects. Schema fingerprints are extracted by invoking the
fingerprint
method on the schema object. By default, fingerprints are generated with the CRC-64 algorithm. Optionally, an algorithm, specified by the algorithm name used byhashlib
, can be supplied.All algorithms supported by hashlib are made available, but Avro recommends using one among CRC-32, MD5, and SHA256 as per needs.
Make sure you have checked all steps below.
Jira
Tests
FingerprintTestCase
classTestMisc
class:test_unsupported_fingerprint_algorithm
andtest_less_popular_fingerprint_algorithm
test methodsCommits
Documentation
Functionality confirms to Avro Specification.