Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

standalone deser #239

Merged
merged 2 commits into from
Apr 8, 2020
Merged

standalone deser #239

merged 2 commits into from
Apr 8, 2020

Conversation

noisesmith
Copy link
Contributor

@noisesmith noisesmith commented Apr 2, 2020

Updates some stale comments.

Changes the spec for avro serdes so that the schema document is optional. This allows a reader to continue using a topic as the producer evolves their schema, without updating the schemas documents in lockstep.

Checklist

  • tests
  • updated CHANGELOG (the "unreleased" section)

@noisesmith noisesmith requested a review from a team as a code owner April 2, 2020 23:36
MicahElliott
MicahElliott previously approved these changes Apr 2, 2020
:opt-un [::schema
::schema-filename]
:req-un [::serde-keyword
::key?]))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth have the req-un before that opt-un. I feel I see that most commonly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@noisesmith noisesmith changed the title Ug 22 standalone deser standalone deser Apr 2, 2020
@noisesmith
Copy link
Contributor Author

I'm still working on the test coverage and CHANGELOG update

@creese
Copy link
Contributor

creese commented Apr 3, 2020

We may want to create a separate top-level function for when a deserializer-only serde is wanted. That feels a bit safer to me than relaxing the current spec.

@codecov
Copy link

codecov bot commented Apr 3, 2020

Codecov Report

Merging #239 into master will decrease coverage by 0.27%.
The diff coverage is 90.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #239      +/-   ##
==========================================
- Coverage   80.30%   80.02%   -0.28%     
==========================================
  Files          42       42              
  Lines        2660     2673      +13     
  Branches      153      153              
==========================================
+ Hits         2136     2139       +3     
- Misses        371      381      +10     
  Partials      153      153              
Impacted Files Coverage Δ
src/jackdaw/serdes/resolver.clj 90.62% <75.00%> (+0.30%) ⬆️
src/jackdaw/specs.clj 78.78% <88.88%> (+5.71%) ⬆️
src/jackdaw/serdes/avro.clj 89.07% <100.00%> (+0.12%) ⬆️
src/jackdaw/serdes/avro/confluent.clj 100.00% <100.00%> (ø)
src/jackdaw/serdes/edn2.clj 45.45% <0.00%> (-45.46%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8ae7dd6...6df5f40. Read the comment docs.

@noisesmith
Copy link
Contributor Author

We may want to create a separate top-level function for when a deserializer-only serde is wanted. That feels a bit safer to me than relaxing the current spec.

Thanks Charles, I think the current version of the PR does what you want.

@@ -18,3 +18,16 @@
{:avro/schema schema
:key? key?
:avro/coercion-cache coercion-cache})))

(defn reader-serde
"Creates a serde for Avro data with an optional schema.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't take a schema.

:req-un [::serde-keyword
::key?]
:opt-un [::schema
::schema-filename]))
Copy link
Contributor

@creese creese Apr 3, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

::key? is only needed for the Schema Registry. We can probably reuse :jackdaw.serde/serde defined above.
Let's modify the spec to take an optional key named ::reader-only?. When true, ::schema, ::schema-filename, and ::key? are not required. When false or missing, only ::key? and one of ::schema and ::schema-filename are required.

this looser spec (see :jackdaw.serde/confluent-avro-reader-serde and
:jackdaw.serde/confluent-avro-serde)."
[schema-registry-url schema key? & [opts-map]]
(serde schema-registry-url schema key? opts-map))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should call jsa/serde without a schema.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can set a value for :reader-only? so the user doesn't have to.

(throw (ex-info "Invalid serde config."
(s/explain-data :jackdaw.serde/confluent-avro-serde serde-config)))
(s/explain-data spec-key serde-config)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The serde-resolver requires a schema. Can we back out these changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's only the serialization side that uses it, and the current kafka-productionised code looks up everything using the resolver

An alternate approach would be to allow the resolver (or something with a similar API) to return a wrapper with only the deserializer field.

The current approach is compatible with the existing code paths because the expected deserializer field is present and the resolver adds the expected conversions on read, I'm open to other options but would like to limit the scope of the API change. What about creating a wrapper where the deserializer works via the schema registry and the serializer simply throws an informative error?

@@ -54,12 +54,16 @@
(let [schema (cond
schema-filename (load-schema serde-config)
schema schema)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you plan to handle the case where schema and schema-filename are not supplied?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch - my test should have caught that but did not

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cond just works that way, but I added an explicit nil for clarity

(ins)user=> (cond true 1)
1
(ins)user=> (cond false 1)
nil

@cddr
Copy link
Contributor

cddr commented Apr 3, 2020

I thought that this PR #157 meant that schema was no longer required but looking back, I don't see a test that proves it to be fair.

@noisesmith
Copy link
Contributor Author

I thought that this PR #157 meant that schema was no longer required but looking back, I don't see a test that proves it to be fair.

I think this is true - but the spec used by the resolver would still fail, and I think I want that entry point.

;; by providing :serde/type of :read-only we can allow a serde without local schema
reader-avro-config {:serde-keyword :jackdaw.serdes.avro.confluent/serde
:key? false
:serde-type :read-only}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather have this work like :key?.

Example:

{:serde-keyword :jackdaw.serdes.avro.confluent/serde
 :key? false
 :read-only? true}

Or if :key? is not required:

{:serde-keyword :jackdaw.serdes.avro.confluent/serde
 :read-only? true}

@creese
Copy link
Contributor

creese commented Apr 6, 2020

What happened to jackdaw.serdes.avro.confluent/reader-schema?

@@ -39,6 +39,8 @@
Options:
schema-registry-url - The URL for the schema registry
type-registry - A mapping per jackdaw.serdes.avro/+base-schema-type-registry+>
read-only - Specifies that you will not be using the resulting serializer,
and does not require a schema or schema-filename
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen if you try to write with a read-only avro serdes?
Would be nice to get a clean error message, as debugging avro error is a black art at the best of times!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added an informative message, and a test of that message.

@noisesmith
Copy link
Contributor Author

What happened to jackdaw.serdes.avro.confluent/reader-schema?

With a multi-spec a separate top level definition isn't needed - if the :serde-type keys is present with :read-only as the value, the read-only spec is used. This means that resolver stays simple - it doesn't need to know about any other specs, and the spec itself implements that conditional.

::key?]))

(s/def :jackdaw.serde/confluent-avro-serde
(s/multi-spec avro-serde-type :read-only?))
Copy link
Contributor

@creese creese Apr 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we extend the existing key group?

(s/def :jackdaw.serde/confluent-avro-serde
  (s/keys :req-un [::serde-keyword
                   ::key?
                   (or ::read-only? ::schema ::schema-filename)])

@@ -671,7 +674,8 @@
{:keys [avro/schema
avro/coercion-cache
key?
deserializer-properties]
deserializer-properties
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it pre-exists this pr, introduced in commt #8551c18c836b7b52e0a0b16b9b041d30b3eb4a0c

::read-only?)])
#(exactly-one-true? (string? (:schema %))
(string? (:schema-filename %))
(true? (:read-only? %)))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we agreed just to check for the key's presence or absence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misunderstood - easy enough to fix

creese
creese previously approved these changes Apr 8, 2020
@noisesmith noisesmith merged commit 2dab283 into master Apr 8, 2020
@noisesmith noisesmith deleted the UG-22-standalone-deser branch April 8, 2020 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants