-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support reading from/writing to JSON-P parser/generator #224
Comments
Hi @emattheis, Both Yasson and Johnzon - https://github.com/apache/johnzon/blob/master/johnzon-jsonb/src/main/java/org/apache/johnzon/jsonb/api/experimental/JsonbExtension.java - can map a JsonValue to a POJO (and opposite) through JSON-B implementations. wdyt? |
@rmannibucau For parsing use cases, getting |
Yes and no. Yes it is an option but no cause nothing enforces it - jsonb is by design in mem + node would fit in mem anyway otherwise even the mapping of the pojo wouldnt fit. |
@rmannibucau of course, the result of the JSON-B parsing is in memory, but the source certainly is not required to be - nor should it be! One of the great benefits of JSON-B is that one can selectively map the JSON structure into an object graph as opposed to the JSON-P approach where your only option for structured data is to deserialize the entire target node into a The use case where POJO mem is less than JsonObject is precisely what I'm interested in solving. It is common, in my experience, to have results from an JSON API that contain more data than I need and often unbounded structures that I cannot safely consume. A practical example is a search result. The ability to traverse a structure with JSON-P and selectively consume nodes using JSON-B while avoiding unmapped structures is useful. Several popular JSON frameworks (including the JSON-B reference implementation) already include this functionality. Excluding it from the formal specification is a barrier to adoption for those of us who already use this pattern. |
Well, my point is that you can already do it through multiple fashion:
Personally i use 3 since a few years and it is quite trivial. So the new proposed api looks like breaking the loose coupling between jsonb parser - or generator - handling rather than a feature. IMHO the link should stay the loaded objects for the mapping+jsonp provider for the parser/reader/generator/writer look ups but nothing more otherwise you start to enforce the impl with no real feature gain. Finally also note that as mentionned, the gain is just in one case if not done through 3, and nlt in cases you use deserializers or any way to forward unmapped values to the mapped bean, which is another common case. So to summarize, this feature is already there and would break a bit more things that it would bring IMHO. |
@rmannibucau can you elaborate or point me to examples of the existing solutions you propose? I'm totally onboard with using existing solutions within JSON-P and JSON-B, I just don't see how you can actually do what I need. Here's an example of what I'm trying to accomplish https://gist.github.com/emattheis/20910c7034753e443852d924289a8954. Obviously in a real-world scenario the JSON source would be coming from an actual streaming source and have many more records. The key to the pattern is being able to process the overall document in a streaming fashion and still be able to pull out elements of the array as POJOs without pulling the JSON representation into memory. As I was throwing together the gist, I can see more clearly what you are saying about positioning the JsonParser correctly. Ideally, the parser would provide a way to peek at the upcoming token so you don't have to rely on exception handling to break the loop. |
Hello everyone |
Hello, @emattheis Im a bit short this week (changing of job) but give a try to create a JsonProvider using a delegate pattern on JsonProvider.provider() which is able to wrap JsonParser creations (using delegate pattern again). It is enough to filter nodes you dont want either in blacklist or whitelist mode enabling both use cases we talked about (yours which sounds like a whitelist and mine which is more a blacklist to keep setUnknownProperty working as for openapi). |
While I agree that Jackson is not a great reference for the standard in general due to its sprawling feature set, it does serve as a good example for this use case. In fact, Jackson is the de facto standard JSON library in many code bases and it would be beneficial to depend on a lighter API. In fact, interoperability between streaming and POJO mapping is a common theme among popular JSON libraries, so I still feel it deserves attention in the JSON-B and JSON-P specs. @rmannibucau If I understand correctly, you're suggesting wrapping a |
@emattheis it is at parser level more than JsonStructure so you dont have unecessary allocations. AIso note in real life it is quite trivial to impl cause mapping is known and very worse case requires a linkedlist handling around the parser. Finally, just a deserializer already does exactly that even if not in jsonb. Edit: also note that the generator side does not bring anything so API wouldn't be as symmetric sa others |
My goal is to produce a POJO. You seem to be suggesting that I rely on the forthcoming support for producing a POJO from a JsonStructure, in which case the JsonStructure is an intermediate representation that is thrown away.
It is certainly easy enough when you are simply trying to ignore/include top-level nodes from a structure, but I would argue that is is non-trivial when you wish to extract nodes at arbitrary depths within a nested structure.
I'm not sure what you mean by not in jsonb, but the fact that @rmannibucau if your position is that this is not a feature that should be in JSON-B, that's fine, but if you are suggesting it can already be done with the existing spec, please provide a concrete example when you have a chance. This is a solved problem in the broader Java/JSON space. This issue is about making it part of the JSON-B standard. |
Did you try using JsonbConfig#withDeserializers? |
How do you suppose that will help solve my use-case when there is no way to pass a I suppose another option to my proposal would be to expose serializers/deserializers for a given type and let callers use them directly, but that seems silly to me. |
If will be used when pass an inputstream or writer to jsonb which will then create a parser from it. Exactly what you ask (you dont care of the parser actually in your use case). |
Please provide a concrete example. The gist I provided is a trivialized example, but it is not clear how you are suggesting to solve it. What type would I associate with the deserializer and use in the call to Bear in mind I do not want to consume the stream in a single pass, I want to extract POJOs in a streaming fashion in general. My trivial example is NOT the only use case this issue aims to address. The assumption is that the caller does have a need to directly control the |
This is exactly what deserializers are about. Registered in the config they are automatically used and gives you all parser to pojo mapping features. |
@rmannibucau I understand how serializers and deserializers work. Obviously it is trivial to obtain an entire collection at once. That is not the use case I am trying to solve. If you look at the pattern in the gist I linked to, you will see what I'm trying to accomplish. The example you link to is not applicable as it simply consumes everything into a The key issues with this approach are:
The use cases I need to solve involved collections that cannot be assumed to fit in memory and the need to process elements as they arrive - e.g. a continuous stream of JSON messages. |
No:
FYI it had been tested with data until 10G (files) forwarded through a websocket (nested of nested object as message) in a jvm having 4g of heap space so pretty sure it works already ;). |
I'm sorry, but I do not understand what you are saying here. A code sample would be extremely helpful.
This is only true if you are asking Jsonb to deserialize the entire list, which I am specifically aiming to avoid (again, please refer to the gist I provided).
Please show the code that achieves this. |
@emattheis not sure I got 2 right but there are two options to deserialize, either align on the incoming stream or align on the model. Both are good and wrong depending the input/output (if the payload has 10 attributes and the mapping 2 the mapping is better but if it is the opposite it would be better to reverse the pattern. JSON-B can't know that so this will stay an implementation detail so it can end up as needing to provide a custom filtering JsonProvider to achieve portably and reliably what you ask). That said here is how to already use a parser+jsonb mapping today:
|
Okay, I think I understand what you're getting at, and I created a revised gist exploiting a deserializer as you suggest: https://gist.github.com/emattheis/6f92ab0d04584273411a7330e32c0ea8. So I must concede that it is possible to achieve streaming POJO mapping with JSON-P and JSON-B with the current APIs. That being said, I think this illustrates the point of this issue quite clearly. Hacking around the intended use of I think we'll need to agree to disagree on this and see if this issue attracts some other opinions. Maybe @m0mus or @njr-11 can weigh in? |
You dont need a threadlocal if the instance is passed through the jsonbconfig or just use cdi. I used it in the standalone sample. Also note it is not a workaround but the intended usage of the deserializer - if you dont want that control level you use an adapter. So this is the built in usage. Now you are right jsonb is about in mem usage, not streming by design - even with your api - so maybe you should use javax streaming api directly and map it as intendee with subevents? Sounds cleaner to me whatever api is in jsonb. |
Producing a stream of POJOs as a side effect, and then returning
This is exactly why I raised this issue! There is no good reason why future versions of JSON-P and JSON-B couldn't work together with less friction - as demonstrated by the many existing JSON libraries that already do this, including the JSON-B reference implementation. I am not looking for a novel approach to meet my requirements. I already have a working solution in place using Jackson, but I would prefer to be able to rely on standard APIs. |
Fact is you have tons of options already. |
Per my many comments above, using
I think it's a stretch to assume that such overhead will be negligible. Certainly the intermediate object overhead is not a major concern assuming the
No, because it means I would need to add significant code to my project to filter arbitrary JSON structures to match my intended POJO model. This functionality is provided by Jackson today, and COULD be easily provided by JSON-B.
With due respect, I am not looking for help solving my application challenges: they are ALREADY solved by Jackson. I would like to be able to use JSON-B instead, but not at the expense of the maintainability of my codebase. Let's just let this issue rest until some more voices can be heard. |
Your JSON-B proposal does not solve your need and assumes you already handle the parsing yourself, in a real code it would not be as smooth as you expect I fear since you still need to drop the undesired (big) elements.
Exactly what does not work with your current proposal IMHO. I assume it works if you prehandle the parser state which also means - since JSON ordering is rarely guaranteed - that you would need to wrap the parser as I originally proposed to ensure you skip the key/values you don't want. JSON-B can't help much there.
Hmm, we are speaking of 1 line, not sure what you have in mind but reading it I fear you just want to migrate some jackson code without trying alternatives.
Honestly I used all the proposed solutions in real-time and batch apps without mem issues. Code maintenance is a no-issue - ok I use lombok so any delegate pattern is smooth but even without it is code without any cleverness so maintenance cost is 0. So not sure how to interpret that except that you should probably not migrate the app? 🤔 |
There is clearly a communication issue here. I am not proposing anything other than a feature that already exists in all popular JSON libraries (including the JSON-B reference implementation). Namely, the ability to use a streaming parser and delegate to a POJO mapper. In real code it is precisely as smooth as I think. The streaming part obviously must handle walking the parsing tree to the correct location, but thereafter it can delegate to the POJO mapper and get all the benefits of that library to further extract exactly as much of the tree as desired. I'm not asking for JSON-B to do anything more than it already does with respect to deserializing POJOs, I simply want it to accept an arbitrary |
It does not enable to not materialize the skipped entries, jsonp does not enable that today so you can still consume way too much memory. This is partly why this proposal is closer to a workaround to me. Edit: created two related issues in jsonp, let see how it is received but think it is core to this issue and that jsonb must rely on jsonp before all to solve it properly. |
Any efficiency shortcomings in If |
Thanks for raising this ehnacement request @emattheis, but I'm going to close it as a duplicate of #122 which covers the same functionality you are requesting. (#122 is also on the roadmap for the next version of JSON-B) |
For efficient handling of large data sets, it is desirable to mix the JSON-P and JSON-B APIs in the same way that one can use the StAX and JAXB APIs for XML processing. If the
Jsonb
interface was extended withfromJson
andtoJson
overloads forJsonParser
andJsonGenerator
, it would be possible to selectively deserialize/serialize JSON from/to arbitrary points in a larger stream.It looks like Yasson already implements this functionality, and it would be great to see it become part of the formal spec.
The text was updated successfully, but these errors were encountered: