Skip to content

Conversation

@yossariano
Copy link
Contributor

  • LSP tracking ticket
  • We ran into an issue in our BibServiceV2 implementation where we saw extreme latency during avro encoding of records. See this issue for context. Switching to fastavro shows considerable improvements over avro in our benchmarks. The following benchmark is comparing the current avro_encoder in nypl-python-utils with the one in this PR:
-- Python (avro) ------------------------------
2025-06-13 12:05:59,742 | avro_encoder | INFO: Fetching Avro schema from https://platform.nypl.org/api/v0.1/current-schemas/Bib
Encoding sample-bib.json 100 times
2025-06-13 12:06:00,168 | avro_encoder | INFO: Encoding (100) records using Bib schema
Encoded 100 records

real	0m4.427s
user	0m3.952s
sys	0m0.052s
--
-- Python (fastavro) ------------------------------
2025-06-13 12:06:04,153 | avro_client | INFO: Fetching Avro schema from https://platform.nypl.org/api/v0.1/current-schemas/Bib
Encoding sample-bib.json 100 times
2025-06-13 12:06:04,399 | avro_client | INFO: Encoding (100) records using Bib schema
Encoded 100 records

real	0m0.747s
user	0m0.478s
sys	0m0.022s
--
  • I'm making this change with the intent of being fully backward compatible with existing services using this client. See the passing unit tests which are unchanged apart from the new fastavro-aware schema, which won't be exposed to any users of the module. Let me know if you have any concerns in that regard.
  • You'll notice that I'm using schemaless_writer and schemaless_reader in this change. In my testing, using the standard fastavro writer and reader put headers onto the encoded data which didn't match the bytes of the unit test's expected data. In order to adhere to the existing tests (and presumably existing encoders/decoders) I opted to exclude the headers during encoding.
  • Note: it looks like several unit tests in this package for unrelated modules were broken before I made this change (KinesisClient, CloudLibraryClient). Just calling that out lest it appear that these changes introduced test failures

Copy link
Contributor

@charmingduchess charmingduchess left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smashing! I'd recommend a minor version bump to 1.7, rather than a patch bump.

@yossariano yossariano merged commit ffa087a into main Jun 17, 2025
2 checks passed
@yossariano yossariano deleted the SCC-4748-fastavro branch June 17, 2025 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants