New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very slow on large JSONs #1

Closed
antonioribeiro opened this Issue Jan 13, 2018 · 9 comments

Comments

Projects
None yet
2 participants
@antonioribeiro

antonioribeiro commented Jan 13, 2018

Benchmarked it on a fairly large JSON and this is what I got:

json_decode: 439 records loaded in 5ms
json5_decode: 439 records loaded in 658238ms

Here's the code I'm using to test it:

require '/Users/antoniocarlos/code/pragmarx/pragmarx.com/vendor/pragmarx/countries/vendor/autoload.php';

$contents = file_get_contents('/Users/antoniocarlos/code/pragmarx/pragmarx.com/test.json');

show_diff(function () use ($contents) {
    return json_decode($contents, true);
});

show_diff(function () use ($contents) {
    return json5_decode($contents, true);
});

///-----------------------------------------------------------------

function microtime_float()
{
    list($usec, $sec) = explode(" ", microtime());

    return ((float) $usec + (float)$sec);
}

function show_diff($runner) {
    $start = microtime_float();

    $count = count($runner());

    $time = (int) ((microtime_float() - $start) * 1000);

    echo "$count records loaded in {$time}ms\n";
}

And the JSON file was generated using JSON generator:

[
  '{{repeat(1500, 7)}}',
  {
    _id: '{{objectId()}}',
    index: '{{index()}}',
    guid: '{{guid()}}',
    isActive: '{{bool()}}',
    balance: '{{floating(1000, 4000, 2, "$0,0.00")}}',
    picture: 'http://placehold.it/32x32',
    age: '{{integer(20, 40)}}',
    eyeColor: '{{random("blue", "brown", "green")}}',
    name: '{{firstName()}} {{surname()}}',
    gender: '{{gender()}}',
    company: '{{company().toUpperCase()}}',
    email: '{{email()}}',
    phone: '+1 {{phone()}}',
    address: '{{integer(100, 999)}} {{street()}}, {{city()}}, {{state()}}, {{integer(100, 10000)}}',
    about: '{{lorem(1, "paragraphs")}}',
    registered: '{{date(new Date(2014, 0, 1), new Date(), "YYYY-MM-ddThh:mm:ss Z")}}',
    latitude: '{{floating(-90.000001, 90)}}',
    longitude: '{{floating(-180.000001, 180)}}',
    tags: [
      '{{repeat(7)}}',
      '{{lorem(1, "words")}}'
    ],
    friends: [
      '{{repeat(3)}}',
      {
        id: '{{index()}}',
        name: '{{firstName()}} {{surname()}}'
      }
    ],
    greeting: function (tags) {
      return 'Hello, ' + this.name + '! You have ' + tags.integer(1, 10) + ' unread messages.';
    },
    favoriteFruit: function (tags) {
      var fruits = ['apple', 'banana', 'strawberry'];
      return fruits[tags.integer(0, fruits.length - 1)];
    }
  }
]

Environment

  • colinodell/json5 v1.0.1
  • PHP 7.2.0 (cli) (built: Dec 3 2017 21:46:44) ( NTS )
  • macOS 10.13.1 (17B1003)
@colinodell

This comment has been minimized.

Owner

colinodell commented Jan 13, 2018

Thanks for the feedback!

I've started working on some optimizations in #2. So far I'm seeing a massive improvement - 96% according to Blackfire (when testing against a smaller subset of your example JSON).

There's still some room for improvement, especially when it comes to string parsing. I don't know how fast we can realistically get this - probably not on par with the native C code, but hopefully fast enough to be usable in these cases.

@antonioribeiro

This comment has been minimized.

antonioribeiro commented Jan 13, 2018

At first I was not feeling any slowness on tiny JSON5 files, so, since your package is supposed to parse both JSON and JSON5, I started using it to parse everything, instead of json_decode(), and then it got to a huge Vanilla JSON (even bigger than this one in the test). To fix it here I did something that may be an option for you too:

if (is_null($decoded = json_decode($contents = $this->loadFile($file), true))) {
     $decoded = json5_decode($contents, true);
}

Taking in consideration json_decode() is able to decode a 65000 lines JSON in 17ms, it may be worth doing two calls (every single time) for those who, like me, tried to completely drop json_decode() and just use json5_decode().

@antonioribeiro

This comment has been minimized.

antonioribeiro commented Jan 13, 2018

The optimized version is really faster (20 times or more) than the old one, but, yes, still a problem for really big JSONs. This is it on that huge file:

$ php test.php
265 records loaded in     22ms
265 records loaded in 68.661ms
@antonioribeiro

This comment has been minimized.

antonioribeiro commented Jan 13, 2018

json_decode() fails fast, with a comment in the line 32.000:

$ php test.php
0 records loaded in 7ms
265 records loaded in 68853ms

@colinodell colinodell referenced this issue Jan 14, 2018

Merged

Optimizations #2

@colinodell

This comment has been minimized.

Owner

colinodell commented Jan 14, 2018

Thank you for all of the feedback @antonioribeiro!

I was able to get the benchmark down a little further:

231 records loaded in 5ms
231 records loaded in 1222ms

Taking in consideration json_decode() is able to decode a 65000 lines JSON in 17ms, it may be worth doing two calls (every single time) for those who, like me, tried to completely drop json_decode() and just use json5_decode().

I love this idea! This has also been implemented, making the execution time for plain JSON files almost identical :)

I'll have all of these optimizations released in the next few minutes once the tests pass.

@colinodell

This comment has been minimized.

Owner

colinodell commented Jan 14, 2018

I love this idea! This has also been implemented, making the execution time for plain JSON files almost identical :)

Actually, it looks like this is causing failures on PHP 5.x because json_decode() accepts things it shouldn't :-/ So I'm only going to try json_decode() on 7.x.

5.x will still get the benefit of all the other optimizations though!

@antonioribeiro

This comment has been minimized.

antonioribeiro commented Jan 14, 2018

Great news! ~1 second? Down from 680? Wow, congrats!

PHP 5 is still around but not for a very long time, I think it would not be a big problem.

Thanks for the package, I love it.

@colinodell

This comment has been minimized.

Owner

colinodell commented Jan 14, 2018

v1.0.2 has been released!

Thanks again for your feedback on this - I really appreciate it!

@colinodell

This comment has been minimized.

Owner

colinodell commented Jan 14, 2018

Oops, I had a small bug that prevented the json_decode() fallback from working. Please upgrade to v1.0.3 instead :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment