-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Skipping #5
Comments
I think maybe you could just go the recursive route and add a parameter to the |
I don't think it's possible to do it without either heap allocation or potential stack overflow since msgpack forces a tree traversal without letting you know the size of the tree in advance. That being said, I'd really like this feature, and I think a configurable max-depth is the way to go with a callback when the max-depth is reached (so the user can cleanly abort/advance the stream). A lot can be done to mitigate stack usage for larger values of In my case, I have the maximum size of the msgpack stream, so I can just advance the stream myself if max-depth is reached. However, in many cases I just want to skip an object and sub-parts it may or may not have. I know this won't be more than 5 layers deep or so. If this is an acceptable solution, I'd be happy to work on a PR. I just don't want to go down that long and painful road if you don't want it in the library. |
So I could implement the If the "next" object (the object to skip) contains no other objects, Otherwise, This solves basically all the problems:
I also think this is in-scope because you need to know the sizes of MP objects for maximum efficiency (from the I'll work this up and see if it's feasible. If it's not then we're back where we started, but it seems promising. |
For example, let's say we have something like this:
If I call So, if a user wanted to implement a recursive skip, they could do something like this:
If that's the case, then I think it should work. I just didn't like having to know the type beforehand if I just wanted to throw stuff away. |
Yeah.
Yeah but, watch out for stack overflow! It's probably better to do a
I'm not super enthused about it either. Maybe I could add a |
I suppose you could, bit you'd have to keep track of your iteration index at each level so when you descend, you don't lose your place and forget where you are at an outer array. I suppose this has a smaller impact than recursing though. I'm guessing it'd look something like this (completely untested, but hopefully the intent is clear):
This has the problem where you burn a bunch of stack at the beginning, but your stack isn't going to grow. This gets more complicated if you need to do non-blocking i/o, in which case Supporting maps is a little more complicated, so |
OK, I've implemented this in my local repo. If user code wants to handle arrays and maps, it will have to implement a state machine and should build in a recursion limit, but at least the pieces are now in CMP to make skipping possible. I've also added a small RPC example that uses skipping. Current tests pass, but I need to write a couple more for the new functionality (even though the RPC example does some itself). I still need to work on #15, and then I need to write some tests for it too, but progress! |
Awesome! I'm currently running this on a micro-controller and throwing away parts (manually) that I don't actually need. Once this is pushed, I'll play with it and see what the damage is in terms of additional design space. It'll definitely clean my code up a bit. |
Interesting approach, could be a good starting point if someone wants to implement skipping 👍 |
I really liked the idea of @clwi and implemented it. You can currently find it as a gist. Two things, that should be thought about:
I'm happy to change my code and open a pull request, if you think it's worth it. |
It would be nice to have a pull request opened for it, it would also make the code a bit easier to review :) I looked at your implementation and it looks nice, good job 👍 ! The only thing I would personally like to see is some error checking on the |
This is done pending test coverage. |
Skipping is reasonably well covered in the test suite now; closing. |
So #3 proposed adding skipping to CMP, and there's some discussion there.
After trying to implement a version, it looks like the only way to fully implement this is by creating a SAX-style state machine.
"WHY!?", you might ask. Well I'll tell you, hopefully I'm wrong.
It's not a backend-support problem. We can add an optional
skip
callback, and CMP can just set an error whenevercmp_skip_object
is called on a context where that callback isNULL
.The problem is nested arrays and maps. The naive approach is to just have
cmp_skip_object
recursively call itself, but that leaves CMP open to stack overflow attacks via specifically-crafted data. I absolutely will not do that.The alternative is to have a bunch of state in
cmp_ctx_t
itself and use the heap. There are a few downsides to this:I vote against adding skipping to CMP. To use the example in #3 of an RPC server, let's say you're getting MessagePack data as a stream and that's your CMP backend (error handling omitted):
This is pretty simple, and the only thing that would be different if CMP added
cmp_skip_next_object
isskip_netstream_bytes(str_size)
is replaced withcmp_skip_next_object(&cmp)
.The problem is that
cmp_skip_next_object
might have to skip a map containing 5000 other arrays, each containing 5000 arrays, each containing 5000 arrays that each contain 5000 entries of the Gettysburg Address. Skipping an unwanted string is much simpler than skipping the next object, whatever it might be. Furthermore, skipping can most easily be handled using backend API's designed specifically for that; CMP can add no value there. Therefore, I think adding skipping to CMP is out of scope.That said, I'm always open to arguments! :) If this is a feature you're really needing and you've got a cool idea on how to do it, I'm absolutely happy to work on it (or, even better, merge a PR ;) ). I just think it's not feasible.
The text was updated successfully, but these errors were encountered: