Skip to content

# How to perform a "streaming transform" on some JSON? #32

@zacharysyoung

Description

@zacharysyoung

I'm struggling to put together the concepts you've laid out well in the docs into a solution where I can stream in some JSON, update some values, and stream it out.

I have this toy example JSON:

{
    "0": {"foo": "bar"},
    "1": {"foo": "bar"},
    "2": {"foo": "bar"},
    "3": {"foo": "bar"},
    "4": {"foo": "bar"},
    "5": {"foo": "bar"},
    "6": {"foo": "bar"},
    "7": {"foo": "bar"},
    "8": {"foo": "bar"},
    "9": {"foo": "bar"}
}

where I want to update the value for every odd (int-ified) key to {"foo": "BAR"}:

{
    "0": {"foo": "bar"},
    "1": {"foo": "BAR"},
    "2": {"foo": "bar"},
    "3": {"foo": "BAR"},
    "4": {"foo": "bar"},
    "5": {"foo": "BAR"},
    "6": {"foo": "bar"},
    "7": {"foo": "BAR"},
    "8": {"foo": "bar"},
    "9": {"foo": "BAR"}
}

The only I thing I've made work is:

@streamable_dict
def update(data):
    for key, value in data.items():
        if int(key) % 2 == 1:
            value = {"foo": "BAR"}
        else:
            value = dict(value)

        yield key, value

with open("input.json") as f_in:
    data = json_stream.load(f_in, persistent=True)
    updated_data = update(data)
    with open("output.json", "w") as f_out:
        json.dump(updated_data, f_out, indent=1)

But I have to use persistent=True to make that work and that uses 2X more memory over the standard lib's load and dump functions.

I've looked at the Encoding json-stream objects section, but cannot figure out what it'd take to make either json-stream's default function or JSONStreamEncoder class work for me. I've also tried to figure out if the visitor pattern is applicable.

Generally, my stumbling block seems to be getting the worker/procesor in between json-stream's decoder and standar lib's encoder.

Do you have a concrete example of doing a "streaming transform" of some JSON?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions