Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize JSON #4077

Merged
merged 4 commits into from
Nov 3, 2017
Merged

Optimize JSON #4077

merged 4 commits into from
Nov 3, 2017

Conversation

MikeHolman
Copy link
Contributor

@MikeHolman MikeHolman commented Oct 26, 2017

This change represents a major overhaul to JSON.stringify.

The new architecture works as follows:

We first calculate the length of the string we need and create a side data structure with the values we will need, but do not actually create any concat strings, etc. like we used to do.

If/when the string is needed, we create a buffer using the length and metadata collected in the first pass, filling it element by element.

If the string is passed to JSON.parse, we use our metadata to construct an object without needing to parse the string.

By avoiding creation of all the intermediate concat strings, this makes JSON.stringify much more efficient. In micro-benchmarks I'm seeing JSON.stringify take around 40% less time than today.

And in case the string is actually sent to JSON.parse, that now takes 70% less time.

This improves Speedometer 2 geomean by about 3%. Also improves AcmeAir by about 3%.

@@ -77,7 +77,7 @@ typedef unsigned char boolean;
#define __JITTypes_h__

// TODO: OOP JIT, how do we make this better?
const int VTABLE_COUNT = 48;
const int VTABLE_COUNT = 49;
Copy link
Contributor

@digitalinfinity digitalinfinity Oct 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh- didn't know this existed...is there a compile assert or test that exists to remind us to update this? #Resolved

Copy link
Contributor Author

@MikeHolman MikeHolman Oct 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah there is a compile assert #Closed

@@ -163,6 +163,10 @@ namespace Js
static const size_t GlobalCodeLength = _countof(_u("Global code")) - 1;
static const size_t EvalCodeLength = _countof(_u("eval code")) - 1;
static const size_t UnknownScriptCodeLength = _countof(_u("Unknown script code")) - 1;

static const charcount_t NullStringLength = _countof(_u("Null")) - 1;
Copy link
Contributor

@digitalinfinity digitalinfinity Oct 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above lengths are all size_t - since you're touching this code anyway, can you update those to be charcount_t too? #Resolved

@@ -538,27 +538,25 @@ namespace Js
}

// If exoticToPrim is not undefined, then
if (nullptr != exoticToPrim)
Assert(nullptr != exoticToPrim);
Copy link
Contributor

@digitalinfinity digitalinfinity Oct 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a fail-fast? What guarantees that this is always not null? #Resolved

Copy link
Contributor Author

@MikeHolman MikeHolman Oct 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is null then result is nullptr and it's not valid to pass nullptr to GetTypeId (prefast took issue with this due to some of the annotations I added).

If we get to this point, we will always have called JavascriptOperators::GetPropertyReference for varMethod, and we always assign exoticToPrim = varMethod. #Closed

Field(JSONProperty*) jsonContent;
Field(const char16*) gap;

DynamicObject* ParseObject(_In_ JSONObject* valueList) const;
Copy link
Contributor

@digitalinfinity digitalinfinity Oct 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: based on what these functions are doing, I'd prefer something like ReconstructVar/ReconstructObject/ReconstructArray for a slight improvement in readability #Resolved

Copy link
Contributor Author

@MikeHolman MikeHolman Oct 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally agree. We're not really parsing at all, are we? #Closed

{
const uint elementCount = valueList->Count();
PropertyIndex requestedInlineSlotCapacity = static_cast<PropertyIndex>(elementCount);
DynamicObject* obj = this->GetLibrary()->CreateObject(true, requestedInlineSlotCapacity);
Copy link
Contributor

@digitalinfinity digitalinfinity Oct 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: annotate true with the parameter name for readability #Resolved

String
};

typedef DListCounted<JSONProperty, Recycler> JSONObject;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be a dlist? Or would an slist suffice? Best I can tell, we always just iterate these lists from the beginning?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And actually, I wonder why JSONObject and JSONArray need to be separate types- can JSONObject just be represented as JSONArray- they both look like a list of JSONProperties + Count, and you might be able to avoid a bunch of intermediate allocations


In reply to: 147304974 [](ancestors = 147304974)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be, but problem with our SLists is that we don't have any way to append. Maybe I'm missing something, but seems like you can only push and pop, which would cause the list to be reversed. I could use SList and then reverse it, but I thought that might be more expensive.

Copy link
Contributor Author

@MikeHolman MikeHolman Oct 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I can know up front how many properties I am going to enumerate for Objects? #Closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, forgot append semantics were missing in slists. I wonder how expensive doing two passes with the static enumerator is- if its not too expensive, you might get nice locality benefits by precomputing the size and allocating an array instead of a list?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although if possible, I'd do it heuristically here- if the number of properties is small, allocate an array- if its large, use a list since it would likely be allocating in a bucket that is generally sparse and cause fragmentation

struct JSONProperty
{
Field(JSONContentType) type;
Field(const PropertyRecord*) propertyRecord;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need both the property record and property name? Can we live with just one? By my rough count, this structure is at 17 bytes which is unfortunate since it'll get allocated in the 32 byte bucket, and the wasted space can add up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right I can probably remove the PropertyRecord case without any real impact. I was overoptimizing for the Parse scenario.

LazyJSONString::Parse() const
{
// If the gap is a non-space character, then parsing will be non-trivial transformation
for (charcount_t i = 0; i < this->gapLength; ++i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the reason behind this check but I have a couple of questions:

  1. If the gap is a non-space character, will JSON.parse ever succeed? If not, can we either throw a straight up syntax error, or maybe cache some position/token info and then throw?
  2. If the check here fails, rather than just return null, I would eagerly serialize the string here and throw away the metadata since that's no longer needed (actually, GetSz should probably throw away the intermediate metadata anyway since we dont need to extend the lifetime of those objects). We could even just move the check to the constructor (although delay serialization till Parse is called)

Copy link
Contributor Author

@MikeHolman MikeHolman Oct 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I was thinking you might be able to have something perverse like a gap of {"a":"WHY"}, and so if you do serialize {b:"Because I can!"} you get serialized string of
{
{"a":"WHY"},{"b": "Because I can!"}
}

2a. I considered throwing away the metadata if this fails but was lazy and thought it wouldn't happen in practice.

2b. I was wary of throwing away the metadata in case people call parse multiple times on the same string (or need the string and then later call parse on it). But could be that isn't a realistic concern. #Closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yuck at 1. For 2, its probably good to release the metadata as soon as we know they're useless so that we don't waste memory. Also, would be nice to have a AutoPtr setting a flag indicating whether the complex parse failed or not and cache that- if it failed, we can throw the syntax error without having to retry the parse

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like failing here would be so rare that storing any additional data about it is a waste of memory.

However, I will throw away the metadata after flattening in case of complex gap, because that seems like a much more realistic scenario.

bool Has(_In_ RecyclableObject* object)
{
JSONObjectStack* stack = this;
while (stack != nullptr)
Copy link
Contributor

@digitalinfinity digitalinfinity Oct 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Use list iterator macros? #Pending

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a real list, so not sure what macros you mean?


In reply to: 147308869 [](ancestors = 147308869)

class JSONStringBuilder
{
private:
ScriptContext* scriptContext;
Copy link
Contributor

@digitalinfinity digitalinfinity Oct 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: all of these fields are consts? like char16* const buffer etc? #Pending

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm filling the buffer, so that one isn't really a const. i can change some of these to be const, but tbh i find const members to be more effort then they are worth


In reply to: 147309108 [](ancestors = 147309108)

this->AppendJSONPropertyString(this->jsonContent);
// Null terminate the string
*this->currentLocation = _u('\0');
Assert(this->currentLocation == buffer + bufferLength - 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is bufferLength only ever used in this assert? Should it be covered by an #ifdef DEBUG in that case?

this->SetStringGap(JavascriptString::FromVar(space));
break;
case TypeIds_StringObject:
this->SetStringGap(JavascriptConversion::ToString(space, this->scriptContext));
Copy link
Contributor

@MSLaguana MSLaguana Oct 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? Some experimentation seems to suggest that non-string non-number values here have no effect. #Closed

Copy link
Contributor

@MSLaguana MSLaguana Oct 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming this corresponds to the third parameter of JSON.stringify(obj, replacer, space) #Closed

Copy link
Contributor

@MSLaguana MSLaguana Oct 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nevermind; I misread this as the case for Object, not StringObject. #Closed

const char16*
LazyJSONString::GetSz()
{
const charcount_t allocSize = SafeSzSize();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also check for whether m_pszValue is already set from a previous call to this function, and just return that to avoid re-serializing repeatedly? Or does that already happen somehow?

Copy link
Contributor Author

@MikeHolman MikeHolman Oct 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think GetSz() is generally called, and GetString() does this check, but there might be some cases where this could happen so I'll add in this check.

Var toJSON = nullptr;
PolymorphicInlineCache* cache = this->scriptContext->Cache()->toJSONCache;
PropertyValueInfo info;
PropertyValueInfo::SetCacheInfo(&info, nullptr, cache, false);
Copy link
Contributor

@MSLaguana MSLaguana Oct 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like with your annotation changes SetCacheInfo expects a non-null second parameter, so prefast is complaining #Resolved

}
if (value == nullptr)
{
Assert(holder != nullptr);
Copy link
Contributor

@MSLaguana MSLaguana Oct 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might need to be an AnalysisAssert since prefast is complaining on the next line that holder might be null. #Resolved

Field(JSONArray*) arr;
};

JSONProperty()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like clang is complaining about a lack of a copy constructor for JSONProperty which is needed for its use in a DList

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically because of const-ness: candidate constructor (the implicit copy constructor) not viable: 1st argument ('const Js::JSONProperty') would lose const qualifier

@obastemur
Copy link
Collaborator

@MikeHolman did you look at / do you remember the change in memory consumption ?

@MikeHolman MikeHolman force-pushed the lazyjson branch 2 times, most recently from b25030c to 274a1e9 Compare October 31, 2017 02:58
@MikeHolman
Copy link
Contributor Author

@obastemur Under what scenario are you interested? I didn't do that much memory testing, but in cases where JSON string is a temporary object it doesn't seem to make much difference in memory.

@MikeHolman MikeHolman force-pushed the lazyjson branch 3 times, most recently from 0222743 to e3870e2 Compare October 31, 2017 03:08
@obastemur
Copy link
Collaborator

@MikeHolman I didn't test it yet but the description says 3% acme-air win. Curious if number of allocations is the reason. max-peek-memory might tell.

@MikeHolman MikeHolman force-pushed the lazyjson branch 4 times, most recently from bf47771 to dfd0d38 Compare November 1, 2017 00:45
@MikeHolman
Copy link
Contributor Author

@digitalinfinity I think you're right that we can do better than SList in some cases for objects, but maybe that is worth saving for a future checkin?

@MikeHolman MikeHolman force-pushed the lazyjson branch 2 times, most recently from 9d12daf to 8a4074f Compare November 3, 2017 14:58
@@ -594,6 +594,17 @@ class DListCounted : public DList<TData, TAllocator, RealCount>
} \
}

#define FOREACH_DLISTCOUNTED_ENTRY(T, alloc, data, list) \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this isn't used any more now that you switched to SLISTs

@chakrabot chakrabot merged commit 80f5a23 into chakra-core:master Nov 3, 2017
chakrabot pushed a commit that referenced this pull request Nov 3, 2017
Merge pull request #4077 from MikeHolman:lazyjson

This change represents a major overhaul to JSON.stringify.

The new architecture works as follows:

We first calculate the length of the string we need and create a side data structure with the values we will need, but do not actually create any concat strings, etc. like we used to do.

If/when the string is needed, we create a buffer using the length and metadata collected in the first pass, filling it element by element.

If the string is passed to JSON.parse, we use our metadata to construct an object without needing to parse the string.

By avoiding creation of all the intermediate concat strings, this makes JSON.stringify much more efficient. In micro-benchmarks I'm seeing JSON.stringify take around 40% less time than today.

And in case the string is actually sent to JSON.parse, that now takes 70% less time.

This improves Speedometer 2 geomean by about 3%. Also improves AcmeAir by about 3%.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants