Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doesn't work with large JSON files #11

Open
colegleason opened this issue Jul 16, 2014 · 8 comments
Open

Doesn't work with large JSON files #11

colegleason opened this issue Jul 16, 2014 · 8 comments

Comments

@colegleason
Copy link

Trying to compare two files of size 701MB results in

FATAL ERROR: CALL_AND_RETRY_0 Allocation failed - process out of memory
@wanderer
Copy link

Also its really slow for large files

@andreyvit
Copy link
Owner

Patches are welcome! ;-)

@miranda-zhang
Copy link

My file is only 854K, but it seems to running forever, going to leave it overnight to see if it can finish...

@mistertest
Copy link

I think it's working more better in terminal mode than js implementation in browser. But I have not largest files to confirm.

@idanElitzur
Copy link

Any news regarding this?
I'm facing with the same issue right now
With 2 files of 20-30MB each one (almost 1M lines)

@philipborg
Copy link

philipborg commented Jul 10, 2023

I don't get an out of memory exception with my 60MB json files. It just never finishes and my job agents kills it after an hour, I left it running longer than that locally but after most of a day I called it quits. Performance seems to degrade exponentially with file-size as my 7MB files completes "nicely" after about 150 seconds comparatively. Would be nice if this performed at linear speed with file size.

@mrmianbao
Copy link

@philipborg I had the same issue.. it was working before I made changes to my tsconfig.

It just hangs.

@andreyvit I forget exactly what changes I made but it was something like changing my tsconfig to target: "esnext"

@philipborg
Copy link

philipborg commented Nov 13, 2023

I have found a workaround for my needs I thought I'd share. It seems that comparing large arrays of objects is the issue. Replacing the array with an object, so it only compares key matches, solved it for me. Now it goes decently fast even with 60MiB files. My objects themself contains arrays of strings which doesn't seem to cause any problems.

So instead of

[
    {
        "field1": 2,
        "field2": "foo",
        [...]
    },
    {
        "field1": 1337,
        "field2": "bar",
        [...]
    }
]

I used something more akin to

{
    "MyIdentifier1": {
        "field1": 2,
        "field2": "foo",
        [...]
    },
    "MyIdentifier2": {
        "field1": 1337,
        "field2": "bar",
        [...]
    }
}

This only works if you can find an unique identifier in the data to use as the object key. In my case I had to do a composite key and escape my separators. It worsens the comparison and requires you to mutate the json before comparing, so it's just a workaround and not an issue solution.

Essentially it restricts the search scope for what it will compare objects with as now it only compares objects with matching keys.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants