Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show a chunk of json when mapping the object fails #84

Merged
merged 10 commits into from
Apr 28, 2022

Conversation

dblevins
Copy link
Contributor

This appears in a stacktrace like the following

javax.json.bind.JsonbException: Can't map JSON Object to class java.lang.String: {"rendered":"Privacy Policy"}

	at org.apache.johnzon.jsonb.JohnzonJsonb.fromJson(JohnzonJsonb.java:89)

@rmannibucau
Copy link
Contributor

Can we try to not use toString which is slow but direct in mem value?

@dblevins
Copy link
Contributor Author

When you say direct in mem value, do you mean there is already the related chunk of json already available in memory somewhere so it can be displayed?

@rmannibucau
Copy link
Contributor

@dblevins guess we can passthrough the input (stream or reader or string) and if in memory we could just use it but I was more thinking about just giving some context iterating over the object and not serializing it as JSON but more a simple toString for a subpart, something in the spirit of var buffer = new StringBuilder(50); var it = object.entrySet().iterator(); while (buffer.length() < 50 && iterator.hasNext()) { if (buffer.length() > 0) { buffer.append(", "); } var e = iterator.next(); buffer.append(e.getKey()).append(": ").append(e.getValue()); }

It has the avantage to 1. not go through a real JSON serialization, 2. not serialize the full object when you only need one field (take the example of a json object of 1m with an url, here the url is sufficient to fill the 50 chars) so it will give some context to the error but also enables to keep performances fine when error are repeated (including attack cases ;)).

Hope it makes sense.

@dblevins
Copy link
Contributor Author

I like it! I'll properly open a JIRA for this and rework the PR.

@dblevins
Copy link
Contributor Author

All right, I've gone ahead and made this change and squash committed over the previous attempt.

Copy link
Contributor

@rmannibucau rmannibucau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I like it, just a few details to fix for me then we are good:

  1. Instead of object/array cast, jsonp provide jsonvalue.asJsonXxc methods
  2. Some formatting issues?
  3. Please ensure snippet is always closing the generator otherwise it can leak the stream (not a big deal) and memory buffer (big deal ;))

Btw I really like the mix of ideas you did ;)

@dblevins
Copy link
Contributor Author

I've updated the code to remove the casting. On formatting, I'd need some pointers on what should be updated as the checkstyle passed.

There is a call to generator.close() in the get() method in the Snippet class. If you're thinking of something else, let me know and I can adjust.

FYI, none of the existing toString() methods close their generators.

I can send a separate PR for that if you like.

@rmannibucau
Copy link
Contributor

rmannibucau commented Apr 21, 2022

@dblevins

  • about the formatting it is just a matter of not changing the indentation when no change were done (MappingParserImpl.java for ex), we don't have a strong enforcement but "common" PR rule to not change what we don't modify applies when possible (not saying we are super clean but having a style enforcer is worse for contributor from my experience on other projects so we didn't set it up)
  • on the new tests you should close the parser ;) (same, it leaks mem/buffers)

There is a call to generator.close() in the get() method in the Snippet class. If you're thinking of something else, let me know and I can adjust.

Sure, try/finally (or autoclose), no assumption it will be called...cause it can fail in between.
You should also not create the generator in the constructor (think primitive cases you hardcoded) but when hitting the root object probably.
Another related issue is, I think, we should reuse the json generator factory we already have with its config in the mapper (

protected final JsonGeneratorFactory generatorFactory;
) instead of using a banalized one (Json/JsonProviderImpl).

Long story short the idea would be something along this: https://gist.github.com/rmannibucau/bc400921f17dd12eb86491462071b0e9 .

@dblevins
Copy link
Contributor Author

Thanks for the clarification on the formatting -- I hadn't noticed that code was reformatted. Mixing reformatting and real changes is also on my no-no list :) I'll fix and again squash the commit so the reformatting is not in the history.

I'll fix the try/finally close and then take a closer look at the diff.

@dblevins
Copy link
Contributor Author

Ok, all that feedback should be addressed and I must say the result of having bounced this back and forth is really looking great!

Effectively, Snippet.of(object, max) is now new Snippet(max).of(object). You can also pass the JsonGeneratorFactory in as a constructor. All the state that was in the main Snippet class has been moved into an inner-class called Buffer that is closable. Since our design is essentially trying to manage memory I wanted to 1) give the dangerous part a fitting name and 2) ensure all that code is in a class no one can reference. The Buffer class is only ever created in a try-with-resources, so we have 100% certainty no leaks can possibly happen.

Thank you so very much for the code that handled the internal plumbing. That's the kind of code so painful to figure out when you don't know the codebase / hard to describe even when you do. I greatly appreciate the assist and made it clear in the commit message you are the real author.

Such a truly fun collaboration!!

@rmannibucau
Copy link
Contributor

Hi David,

Looked quickly but seems several changes are lost or unintended like the double call to generator.close, the snippet creation using Json and not the configured JsonGeneratorFactory, the parser close call miss in tests or the fix in test/truncation behavior to respect max config.
Is it intended?

@dblevins
Copy link
Contributor Author

You're right -- I missed the code for JohnzonBuilder and the wildcard builder. I went ahead and added it along with a test case to verify the johnzon.snippetMaxLength parameter works.

The supplied JsonGeneratorFactory is getting used. I suspect the feedback is for the second constructor that creates a JsonGeneratorFactory when one isn't available. I updated it to construct JsonGeneratorFactoryImpl directly and documented it is not the preferred constructor. Hopefully that addresses the feedback.

The close call was there due to buffering that would lead to an empty string. Those calls are documented as idempotent and safe to call more than once, so my thinking was "close is better than flush." I dug into that a bit more and found the buffering was causing more significant issues and hiding some bugs. The short version: the 64k buffer in JsonGenerator was causing the whole document to be serialized despite our isComplete() calls. The SnippetOutputStream would never see any bytes until the very end, so ultimately this code was no better than the toString() version I originally submitted.

I've significantly amped up the testing, reworked the code and all is looking good -- famous last words. Give it a good look and let me know if anything looks off.

@rmannibucau
Copy link
Contributor

Second constructor is only used in test this is why i would drop it from main code since we will never want to allow its usage.
The max setting is also wrong due to the usage we do I think (this is not output buffer size but token one).

For close we must find a way to call it once and it should be fine with autocloseable so I suspect buffer was buggy.

Last thing is the bug for primitive we should fix since truncation should be respected there too, no passthrough IMHO ;).

@dblevins
Copy link
Contributor Author

Can you check the code again? The tests don't use the second constructor and close is called only once.

I'm not convinced trimming things like null to things like n... which is the same number of characters is a good idea. Do you have a scenario in mind where a user would actually find that more useful than seeing null?

@rmannibucau
Copy link
Contributor

Ok so the second constructor is never useful since there is always a generator factory in scope so let's drop it.

About null truncation the question is "is a snippet of size < 4 useful" but it can be configured and if so respected (size shouldnt be > its max including the ellipsis if we want to document properly and simply this config IMO).

Copy link
Contributor

@rmannibucau rmannibucau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two minor comments (mainly dropping the error prone constructor and flushes which are not needed) then looks perfect for me

*
* @param max the maximum length of the serialized json produced via of()
*/
public Snippet(final int max) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd really drop this one since minimum would be to require to copy the config (except the buffer size maybe) from the mapper one which is likely equivalent to use the mapper one as we do in the other constructor.
Also note that the buffer size is not important since before using any output we should flush it as any "output stream" so this one should drop IMHO.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you comment this constructor must not be used in johnzon project please but is only there for convenience for intergators and that using it disable several features of the project?

@dblevins
Copy link
Contributor Author

I'll review the comments in detail, but I'd like to offer a compromise. I'll make the change to "null", etc being truncated that I don't agree with in exchange for you allowing the Snippet(int) constructor and Snippet.of(value, int) static method to live. I use Johnzon in almost every project and I'd really like the ability to use this code elsewhere and have those methods.

@rmannibucau
Copy link
Contributor

@dblevins hmm, what I really dislike in this constructor is the fact I want to totally forbid it to be used inside the project itself for the reason mentionned earlier. I would be happy if you do a johnzon-snippet module exposing such a constructor and using Json.createGeneratorFactory(<....>) while it is not in mapper/jsonb modules. Alternative would be to ensure at build with a maven plugin this constructor is never used outside tests with a comment explaining this is not a reliable API for a generic purposes as the mapper/jsonb instance (agree it is simpler when you know you are UTF-8 for ex but it is generally not true from johnzon standpoint). Guess personally I would expose it in johnzon-json-extras a bit by default since it is already a generic module for such extensions not having other strong dependencies and it requires almost no work except having a new SnippetBis class delegating to Json and Snippet.
Hope it makes sense.

Not my preference, but implementing in the hope of compromise
Call flush only before snippet.terminate() and snippet.get()
Reduce calls to snippet.terminate() to only what is needed
Copy link
Contributor

@rmannibucau rmannibucau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few picky comments but the big one is the change from outpustream to writer which moves the code to use the default charset instead of reusing the one from the generator factory, was it intended? how is it supposed to work now/what am I missing?

*
* @since 1.2.17
*/
public interface Buffered {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more a question than anything else: shouldnt it be Bufferable or something like that? (not blocking for me but asking before it gets merged)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My first instinct was Bufferable as well. What swayed me away is the "able" interfaces are all describing an action someone can take on the object and if they don't take that action, the thing described is not done. If you don't serialize a Serializable it isn't serialized. If you don't close a Closeable it isn't closed. Flushable objects of course do flush eventually, but the method it has still triggers an action. So calling it Bufferable felt a bit off as there's no action for the user to do. It's buffered whether or not the user does anything and calling bufferSize does nothing. Buffered is a statement of fact and bufferSize returns that fact.

All that said, I'm not super married to it. That's just the logic that got me there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, guess the same reasoning got me asking the question, while not handled by johnzon it is not buffered.
What if we quit that area and call it StreamDescriptor which would be more OO (i am an instance of stream descriptor so I can get meta about the stream/writer).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally like Buffered as it is telling Johnzon a fact, "I am buffered". There is a ByteArrayOutputStream inside the supplied SnippetWriter that Johnzon doesn't control. Johnzon can look at the writers and outputstreams it is given and use the interface to ask, "are you buffered?" and if they are Johnzon can optimize its own buffering to match their buffer size. So to me it makes sense as there's a conversation there:

johnzon: are you buffered?
writer: yes, I'm buffered.
johnzon: Ok, what's your buffer size?
writer: It's 50.
johnzon: great, I'll write to you 50 bytes at a time

That said, if you'd like to rename it after the merge I'm ok with that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see where you coming from. I was more seeing it as:

johnzon: do you want to be buffered?
writer: yes, I need a custom buffer.
johnzon: Ok, what's your buffer requested size?
writer: It's 50.
johnzon: great, I'll write to you 50 bytes at a time

let me try to PR on top of your PR but this is not the critical point, was more curious ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. As I said to me it makes sense. We could easily get a Writer that writes to a file and has a buffer in it optimized for the page size of disk. This interface would be a great way for them to tell that fact to Johnzon so it can participate in the optimization. Ideally, whatever interface name use use it is strongly tied to buffering and not generic and therefore has many other purposes in the future. If we did that, then it would get complicated for someone to say, "Yes, I am buffered, but I am not those other things."

@@ -236,7 +238,8 @@ public Mapper build() {
supportEnumContainerDeserialization,
typeLoader, discriminatorMapper, discriminator,
deserializationPredicate, serializationPredicate,
enumConverterFactory),
enumConverterFactory,
new Snippet(snippetMaxLength, generatorFactory)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we must be able to run with any JSON-P impl so we likely need some indirection there (fine to not be optimum if we don't run with johnzon-core but a generic json-p impl)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might have been a comment for the constructor as it referenced JsonGeneratorFactoryImpl. I updated it back to Json.createGeneratorFactory. If that wasn't what you meant, I sent an invite to the repo so you have access to do any adjustments you'd like.

*
* @param max the maximum length of the serialized json produced via of()
*/
public Snippet(final int max) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you comment this constructor must not be used in johnzon project please but is only there for convenience for intergators and that using it disable several features of the project?

public String of(final JsonValue value) {
try (final Buffer buffer = new Buffer()) {
buffer.write(value);
return buffer.get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was expecting that moving get() after the auto-close try block would auto-flush it, isn't it right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to have the buffer instance constructed inside the try-with-resources as I like the assurance there can be 100% certain there can be no possible leak. If we don't like flush, we could call close which is documented to idempotent and safe to call more than once. Johnzon properly implements idempotent closes so it is safe. If a JSON-P impl doesn't properly implement close and fails, that'd be their bug IMO.

That said, I'm ok if you'd like to adjust it.

@dblevins
Copy link
Contributor Author

Ok, I've made most the changes. I'm happy for there to be other changes and have provided access to my fork so you have the option to do that before anything is merged. I offer that in the spirit of speed as I only have a few more days to do all the message work and haven't even started yet. Feel free to make any change you need to feel confident merging.

@rmannibucau
Copy link
Contributor

@dblevins added some code on top of yours at #88 (sorry for the second PR but github messed up when I PR-ed on your repo not sure why)

@dblevins
Copy link
Contributor Author

@rmannibucau happy for you to push directly to this repo so I can feel happy seeing a purple "merged" symbol on this PR.

@dblevins
Copy link
Contributor Author

@rmannibucau alternatively, you could merge it and then do a PR on top of master. Personally, I don't think anything in this PR is too dangerous to merge at this point.

@rmannibucau
Copy link
Contributor

ok let's do it

@rmannibucau rmannibucau merged commit 133273d into apache:master Apr 28, 2022
@dblevins
Copy link
Contributor Author

Who hooo!! :) That feels great! :)

@dblevins
Copy link
Contributor Author

Thank you, @rmannibucau !! I really enjoyed the help and collaboration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants