Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for user-specified mapping type [was: Parsing into OrderedDict] #7

Closed
jpmckinney opened this issue Sep 11, 2019 · 13 comments
Labels
enhancement New feature or request

Comments

@jpmckinney
Copy link

I have code that re-orders JSON keys into a standardized order using OrderedDict.move_to_end. I want to use ijson to read the input iteratively. Presently, I think I would need to convert the dict that ijson returns into an OrderedDict, but my data has deep JSON objects, so this would be a fairly expensive operation. It would be faster to parse the data into an OrderedDict directly.

Is there an interest in adding this feature?

@rtobar
Copy link

rtobar commented Sep 12, 2019

ijson.items (I'm assuming you are using this?) doesn't return a dict by default; it returns an iterator that you can use to navigate individual values. You should be able to use that to build your OrderedDict directly with the values coming out of the iteration, but without more details it would be difficult to give more advise.

@jpmckinney
Copy link
Author

Here is some sample code:

echo "{}{}" > test.json
import ijson.backends.yajl2_cffi as ijson

with open('test.json', 'rb') as f:
    for item in ijson.common.items(ijson.parse(f, multiple_values=True), ''):
        print(type(item))

Output is:

<class 'dict'>
<class 'dict'>

@rtobar
Copy link

rtobar commented Sep 13, 2019

In your example you are selecting the top-level element, which is an object; thus you get dictionaries. Have you had a look at the examples in https://github.com/ICRAR/ijson/blob/master/README.rst? I think you are basically after the lower-level ijson.parse function, but again without realistic JSON content I can't say for sure.

@rtobar rtobar added the question Further information is requested label Sep 13, 2019
@jpmckinney
Copy link
Author

I essentially want the option for this line to be map = OrderedDict() instead of map = {}: https://github.com/isagalaev/ijson/blob/e252a50db34b71cc2b5e0b9a77cd76dee8e95005/ijson/common.py#L116

I can re-implement ijson.common.items to use a new sub-class of ObjectBuilder that contains the change above, but that seems like a lot of effort to get a change in behaviour that is very commonly used in the standard library's json module.

@rtobar
Copy link

rtobar commented Sep 13, 2019

There are a couple of gotchas with modifying the object builder directly:

  • It is used by most, but not all backends (the C back-end re-implements this logic in C for efficiency), so it's not exactly a one-liner.
  • It would affect all levels in your JSON structure, which is not necessarily what you want. As far as I understand it, you want to preserve the order at the highest level, not necessarily in the deeper objects.

So again, I think such a change would be a bit of an overkill.

On the other hand, I think you basically want a modified version of this: isagalaev#62 (comment)

from collections import OrderedDict

import ijson
from ijson.common import ObjectBuilder

def objects(data):
    key = '-'
    builder = None
    for prefix, event, value in ijson.parse(data):
        if not prefix and event == 'map_key':
            if builder:
                yield key, builder.value
            key = value
            builder = ObjectBuilder()
        elif prefix.startswith(key):
            builder.event(event, value)
    if builder:
        yield key, builder.value

with open('json.json', 'rb') as data:
    result = OrderedDict(objects(data))
for key, value in result.items():
    print(key, value)

@jpmckinney
Copy link
Author

I do want it to affect all levels :) (like with the object_pairs_hook in the standard library). Converting to an OrderedDict at the top level is easy – just OrderedDict(item) in my earlier snippet.

@jpmckinney
Copy link
Author

jpmckinney commented Sep 14, 2019

Right now, I need to do something like this (won't work with all backends):

import ijson.backends.yajl2_cffi as ijson

# Copy of ijson.common.items, using different builder.
def items(prefixed_events, prefix):
    prefixed_events = iter(prefixed_events)
    try:
        while True:
            current, event, value = next(prefixed_events)
            if current == prefix:
                if event in ('start_map', 'start_array'):
                    builder = OrderedObjectBuilder()
                    end_event = event.replace('start', 'end')
                    while (current, event) != (prefix, end_event):
                        builder.event(event, value)
                        current, event, value = next(prefixed_events)
                    del builder.containers[:]
                    yield builder.value
                else:
                    yield value
    except StopIteration:
        pass


# Copy of ObjectBuilder, using OrderedDict instead of dict.
class OrderedObjectBuilder(ijson.common.ObjectBuilder):
    def event(self, event, value):
        if event == 'start_map':
            map = OrderedDict()
            self.containers[-1](map)

            def setter(value):
                map[self.key] = value
            self.containers.append(setter)
        else:
            super().event(event, value)

Later, my code calls items.

@rtobar rtobar added enhancement New feature or request and removed question Further information is requested labels Sep 15, 2019
@rtobar
Copy link

rtobar commented Sep 15, 2019

@jpmckinney thanks for the pointer to the mechanism used by the standard lib, I actually didn't know about it. Such a generic solution sounds good actually i.e., provide a object_pairs_hook argument or similar that can be used by the object builder. Do you think you could provide a PR with the change, including a test? Otherwise I could implement when I get some time, probably next week. Note that the C backend will need this as well.

@rtobar
Copy link

rtobar commented Sep 15, 2019

@jpmckinney while we are at this, maybe offering an option for using something other than lists could also be a possibility worth considering.

@jpmckinney
Copy link
Author

I'm not sure that I can get to it this week – and I'm not familiar with Python in C.

Using something other than lists sounds interesting; however, it hasn't come up as an option in the standard library. I think it's fine to start with object_pairs_hook.

The standard library does allow alternative constructors through parse_float, parse_int, parse_constant, and object_hook (which is lower priority than object_pairs_hook). Another optional parameter is cls, which allows even more customization of decoding. However, I think these can be implemented later if there is demand (for example, I prefer ijson's default behaviour of using Decimal instead of float).

@rtobar
Copy link

rtobar commented Sep 18, 2019

I already implemented the a new map_type option, it's on the master branch. Could you give it a try?

@jpmckinney
Copy link
Author

It works! Thanks

@rtobar
Copy link

rtobar commented Sep 18, 2019

Great! I'll close then for now, and will also adjust the title for future reference.

@rtobar rtobar closed this as completed Sep 18, 2019
@rtobar rtobar changed the title Parsing into OrderedDict Add support for user-specified mapping type [was: Parsing into OrderedDict] Sep 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants