Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dictionary encoding and decoding #234

Merged
merged 8 commits into from
Aug 14, 2023

Conversation

JamieMair
Copy link
Contributor

Fixes issues described in #233

Note that these would be breaking changes for any messages serialised which include dictionaries. Previously stored messages will fail when being deserialised. I am not 100% certain that these changes should be merged, but it is worth considering if the existing code did not match the existing specification. However, changing this may affect downstream users that are storing data which includes Dict types.

@codecov
Copy link

codecov bot commented Jul 18, 2023

Codecov Report

Patch coverage: 98.79% and project coverage change: +0.04% 🎉

Comparison is base (548017c) 92.39% compared to head (d780f7e) 92.43%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #234      +/-   ##
==========================================
+ Coverage   92.39%   92.43%   +0.04%     
==========================================
  Files          25       25              
  Lines        2815     2817       +2     
==========================================
+ Hits         2601     2604       +3     
+ Misses        214      213       -1     
Files Changed Coverage Δ
src/codec/encoded_size.jl 85.41% <94.73%> (+8.27%) ⬆️
src/codec/decode.jl 85.78% <100.00%> (-0.67%) ⬇️
src/codec/encode.jl 99.12% <100.00%> (ø)

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Also fixed a typo in a comment
@Drvi
Copy link
Collaborator

Drvi commented Jul 18, 2023

Thank you @JamieMair, well spotted. This seems to indeed be a bug in ProtoBuf.jl, the current approach to encoding/decoding maps might be slightly more performant but, as you point out, the spec seems to be pretty adamant that maps should be as similar to repeated messages of pairs as possible (even though they are not practically compatible, repeated messages can have duplicated "keys" while maps presumably do not.)

As for this being a breaking change -- I think this qualifies as a bug and should be released as a patch version. I'll check downstream users on JuliaHub and open issues if necessary, giving them some time to adjust.

Can I trouble you for writing a tests/tests for repeated messages and maps being interchangeable?

@JamieMair
Copy link
Contributor Author

the current approach to encoding/decoding maps might be slightly more performant

I agree, this additional overhead seems a bit redundant, but at least one can use arrays if that is an issue.

Can I trouble you for writing a tests/tests for repeated messages and maps being interchangeable?

I can have a go when I next get some time to work on this. Just to check, would this involve something like

  1. Create a PairStruct similar to TestStruct with a key and a value
  2. Add the encoding/decoding methods
  3. Encode an array of these structs, and then check whether this is equivalent to encoding a Dict
  4. Also check that the array of these structs can be decoded into a Dict?

Is there something simpler I can do, or have I missed anything?

@Drvi
Copy link
Collaborator

Drvi commented Jul 18, 2023

I can have a go when I next get some time to work on this. Just to check, would this involve something like
Create a PairStruct similar to TestStruct with a key and a value
Add the encoding/decoding methods
Encode an array of these structs, and then check whether this is equivalent to encoding a Dict
Also check that the array of these structs can be decoded into a Dict?

Basically, yeah. I think you should be able to write a proto file with a PairStruct to get the methods generated for you, but the test would simply be a matter of encoding a Vector of PairStruct and successfully decoding it as a Dict and vice versa.

I'll try to review the the rest of the PR soon. Again, thanks for your help!

@JamieMair
Copy link
Contributor Author

Great! I'll get on that when I next have a little time. Thanks for the quick response!

@JamieMair
Copy link
Contributor Author

@Drvi I have just found some time to add in the unit tests you asked for. Feel free to make any changes you want. Thanks for the support on this PR.

@Drvi
Copy link
Collaborator

Drvi commented Aug 8, 2023

@JamieMair Thanks a for the test! It made me realize that we were adding a length for the map fields i.e. we weren't treating them like repeated fields of messages which don't use the packed representation either. Can you check my changes and try them in your application?

The JET failures on nightly are unrelated to this PR.

@JamieMair
Copy link
Contributor Author

@JamieMair Can you check my changes and try them in your application?

I have just tried to test it now in our application. I have a test which checks the bytes generated (see https://github.com/JamieMair/TensorBoardLogger.jl/blob/update-to-new-protobuf/test/test_hparams.jl) but it seems that the encoded length byte on the object is wrong by 2. The second byte of the array should be 91 but it is encoding it as 93. I have tried to investigate why this is an issue, but I can't find it. I think possibly there is a problem with _encoded_size being a bit too large for the Dict type. Maybe it is this line - https://github.com/JamieMair/ProtoBuf.jl/blob/a14c2e643f31bf4f0a5cf2475d37cbbdcf8c4745/src/codec/encoded_size.jl#L50 - but I don't know for sure.

@Drvi
Copy link
Collaborator

Drvi commented Aug 8, 2023

Ah, good catch @JamieMair I think I have a fix for that.

@Drvi
Copy link
Collaborator

Drvi commented Aug 8, 2023

@JamieMair Can you try again?

@JamieMair
Copy link
Contributor Author

@Drvi Yes, that's amazing. All works great on my end - I've removed our workaround. Thanks for finishing this one off!

@Drvi Drvi merged commit 851224a into JuliaIO:master Aug 14, 2023
7 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants