Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Write large strings in bounded memory #30

Closed
chkno opened this issue Dec 3, 2020 · 3 comments
Closed

Write large strings in bounded memory #30

chkno opened this issue Dec 3, 2020 · 3 comments

Comments

@chkno
Copy link
Contributor

chkno commented Dec 3, 2020

jsonstreams is a big win over the built-in json for bounding memory usage when encoding JSON documents that are large because they contain many elements, but it doesn't help for JSON documents that are large because they contain one large element -- the current implementation requires that each element be entirely loaded into memory for encoding.

I sketched a method of overcoming this limitation in this string-streams branch. The key thing there is the test_memory_usage test, which verifies that memory usage does not scale with element size. The changes currently in that branch to make that test pass are inelegant.

Thoughts?

@dcbaker
Copy link
Owner

dcbaker commented Dec 3, 2020

Hmmm. I'm just thinking out loud, but if jsonstreams used iterencode(), instead of encode(), you could probably (at least for values) just use a custom JsonEncoder class that know how to handle very large objects I think. If that would work that would be a more generic solution.

@chkno
Copy link
Contributor Author

chkno commented Dec 3, 2020

Yea, that sounds promising, as long as jsonstreams only holds a bounded number of iterencode-output chunks at a time (probably just one chunk). This would mostly affect the pretty printer, which is the only part of jsonstreams that looks at the encoded data before writing it.

@chkno chkno mentioned this issue Dec 3, 2020
@chkno
Copy link
Contributor Author

chkno commented Dec 4, 2020

Thanks for your help with this!

#32 serves my use case, so I'm going to close this now.

I feel kinda bad leaving leaving the pretty-printing code still using encode() rather than iterencode(), and so not getting the memory efficiency benefit. On the other hand, folks using pretty=True are probably using it with human consumption in mind, and so probably not using it on enormous elements that cause memory consumption problems.

Looking forward to the next release!

@chkno chkno closed this as completed Dec 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants