Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue and optimization ideas - TMVCJsonDataObjectsSerializer.JsonObjectToDataSet #553

Closed
MPannier opened this issue May 11, 2022 · 15 comments
Assignees
Labels
accepted Issue has been accepted and inserted in a future milestone enhancement

Comments

@MPannier
Copy link
Contributor

Hello,

I have a JSON array with 7500 elements which I load into a TFDMemTable with 38 fields by using this function FDMemTable1.LoadFromJSONArrayString.
In my case, this takes about 14 sec. I did some research to speed up the data load. I found some optimizations for FDMemTable (BeginBatch/EndBatch and AutoCalcFields := False;).
Now the "load" takes about 12 sec. It is faster but not fast enough.

I have done a test with a classic for i := 1 to 8000 loop to insert some hard coded fake data into my FDMemTable. I would like to know which part is the bottleneck. This takes less than 200 ms!
That means my FDMemTable and also the 8000 records are not the problem.

After that, I have tested the JSON parsing. I've done something like this:

lJsonArray := TJDOJsonArray.Parse(Res.BodyAsString) as TJDOJsonArray;
try
for var J := 0 to Pred(lJsonArray.Count) do
begin
var lJsonObj : TJsonObject := lJsonArray.Items[J].ObjectValue;
DataSet.Append;
DataSetField1.AsInteger := lJsonObj.I['Field1'];
DataSetField2.AsWideString := lJsonObj.S['Field2'];
...
DataSet.Post;
end;

This takes only 200 ms (with all 38 fields). Compared to 12 sec, it is really fast.
Why is LoadFromJSONArrayString slower?

I found the reason in TMVCJsonDataObjectsSerializer.JsonObjectToDataSet. For a new test, I have uncommented the following line:

//lName := GetNameAs(ADataSet.Owner, Field.Name, Field.FieldName);

//if (IsIgnoredAttribute(AIgnoredFields, lName)) or (IsIgnoredComponent(ADataSet.Owner, Field.Name)) then
//continue;

//lName := TMVCSerializerHelper.ApplyNameCase(GetNameCase(ADataSet, ANameCase), lName { Field.FieldName } );

lName := Field.FieldName; //uncomment the previous lines and use only this one

Now the data load takes only 200 ms.
The problem-functions are GetNameAs, IsIgnoredComponent and GetNameCase. All of these functions use RTTI to get some information from TField.
I know we can't remove this functions, but can we speed up these functions?
I have some ideas, but I'm not sure what the best one is.
First a question: Can TField have the attributes MVCNameAsAttribute, MVCDoNotSerializeAttribute and MVCNameCaseAttribute? Maybe we can remove one of these functions?
Is it possible to check for all attributes at the same time? Then the RTTI stuff is done only one time, not three times.
Is caching an option? (field name and a list of attributes)
Is an additional parameter an option? Something like "IgnoreNameCase" or "FieldNameAsIs"?
I have no experience with RTTI. Maybe the RTTI code can be optimized?

What do You think? Any other suggestions?

@fastbike
Copy link
Contributor

Yes, unfortunately this method is called for each item (n) in the json array (from procedure TMVCJsonDataObjectsSerializer.JsonArrayToDataSet)

And the method you referenced calls those RTTI methods for each field (n) so this becomes an O(nm) performance calculations (8,000 rows x 38).

My initial thoughts would be to either build a RTTI structure (via a call to a nested proc) and then pass that structure to the method that converts each element of the array to a dataset row.
Others may have better ideas.

@danieleteti
Copy link
Owner

Yes guys, that part of code has been under my radar for too much time... I've to give it some attention.
@MPannier can you provide a reproducible and selfcontained example so that we can have a reference?
I'll try to think some solution in this week.

@danieleteti danieleteti added enhancement accepted Issue has been accepted and inserted in a future milestone labels May 11, 2022
@danieleteti danieleteti added this to the 3.2.2-nitrogen milestone May 11, 2022
@MPannier
Copy link
Contributor Author

I have made a simple example (console application) with latest DMVC units and Delphi 11.
It seems to be faster than my real project (Delphi 10.4 and DMVC 3_1_0). But the RTTI stuff is still slower than without RTTI.
Example.zip

@danieleteti
Copy link
Owner

@MPannier the repo version contains many improvemenets over the 3.2.1-carbon so sound good that its faster than the latest released version. We can however still use the test you wrote to fine tuning it and, perhaps, put this test in the test suite as performance test.

@fastbike
Copy link
Contributor

Happy to help with testing

@danieleteti
Copy link
Owner

Hi guys, do you found some areas to improve with a reasonable gain? The current version is quite fast in the @MPannier test.

@MPannier
Copy link
Contributor Author

I will do a new test with my real world project this week and give You feedback.

@fastbike
Copy link
Contributor

I wrote a simple VCL app to create and populate a 3 column dataset with 20k records, serialise them out to a JSON string and then read them back.

The deserialisation with the current code base is no faster than the codebase we have in our production code (from Sept 2021). We're seeing around 10 seconds to deserialise the JSON, the root of the issue is the repeated lookup of the dataset field using RTTI. 20k x 3 is 60k lookups where 3 would suffice.

ProjectJSON.zip

@fastbike
Copy link
Contributor

fastbike commented Jun 27, 2022

To improve performance you'd need to providing an overload to
procedure TMVCJsonDataObjectsSerializer.JsonObjectToDataSet(const AJsonObject: TJDOJsonObject; const ADataSet: TDataSet; const AIgnoredFields: TMVCIgnoredList; const ANameCase: TMVCNameCase);
so that it looks more like
procedure TMVCJsonDataObjectsSerializer.JsonObjectToDataSet(const AJsonObject: TJDOJsonObject; const ADataSet: TDataSet; const ADataSetFields: TMVCDataSetFields);

And do the setup of the TMVCDataSetFields before calling this new method once for each record.

@MPannier
Copy link
Contributor Author

Hello,

I have done the tests with the latest version of DMVC Framework from yesterday in my real project. And it was still slow. It needs 15000 milliseconds. My Dataset contains about 7800 records and has 41 columns.
If I comment the lines with GetNameAs, IsIgnoredAttribute und TMVCSerializerHelper.ApplyNameCase in MVCFramework.Serializer.JsonDataObjects.pas → TMVCJsonDataObjectsSerializer.JsonObjectToDataSet
It is much faster. It needs only 300 - 400 milliseconds.

//      lName := GetNameAs(ADataSet.Owner, Field.Name, Field.FieldName);
//
//      if (IsIgnoredAttribute(AIgnoredFields, lName)) or (IsIgnoredComponent(ADataSet.Owner, Field.Name)) then
//        continue;
//
//      lName := TMVCSerializerHelper.ApplyNameCase(GetNameCase(ADataSet, ANameCase), lName { Field.FieldName } );
lName := Field.FieldName;

But why is the test project much faster than my real world project? I have debugged the project and find out, that within the test project the function TMVCAbstractSerializer.GetNameAs gets nil at the line ObjFld := ObjType.GetField(AComponentName);
In my real world project not. Why? I don't know. But this will make the deserialization faster. I decide to make the test project as similar as possible as my real project.
That means I have added a DataModule with a FDMemTable and 40 Fields at Design Time. And now the test project is also slow :-) Please try and verify this statement.

As @fastbike wrote and as I have thought, the problem is the repeated RTTI call for each column and for each record. And there are 3 procedures which calls RTTI. In my case 7800 * 41 * 3.

How can we solve this performance problem?
One easy solution could be to made only one call to RTTI (combine the calls to GetNameAs, IsIgnoredAttribute, IsIgnoredComponent and GetNameCase).
Another idea is some kind of raw-parameter or an ignore-NameCase or ignore-MVCAttributes. In my case I don't need this functionality. See my code sample above. lName := Field.FieldName; works for me.
A universal solution could be some kind of cache (of RTTI calls). In DeserializeDataSet You could read all RTTI stuff of all fields and in JsonObjectToDataSet this cache could be used.

In attachment there is the new test project.
Example2.zip
image

@fastbike
Copy link
Contributor

I've made a small change to use the TMVCDataSetFields list, which is populated by calling GetDataSetFields in the TMVCJsonDataObjectsSerializer.JsonArrayToDataSet method , before enumerating the json array.

Time has now fallen from ~ 10 seconds to ~115 milliseconds. Two orders of magnitude faster which is a massive win !

The only gotcha at this stage is that I commented out the lines handling nested datasets to get the code to compile. Either we take a hit on having to make a separate calls to GetDataSetFields for each record that is a nested dataset or figure out another way of handling what could possibly be a recursive nested dataset scenario.

procedure TMVCJsonDataObjectsSerializer.JsonArrayToDataSet(const AJsonArray: TJDOJsonArray; const ADataSet: TDataSet;
  const AIgnoredFields: TMVCIgnoredList; const ANameCase: TMVCNameCase);
var
  I: Integer;
  Fields:  TMVCDataSetFields;
begin
  Fields := GetDataSetFields(ADataSet, AIgnoredFields, ANameCase);
  for I := 0 to Pred(AJsonArray.Count) do
  begin
    ADataSet.Append;
    JsonObjectToDataSet(AJsonArray.Items[I].ObjectValue, ADataSet, Fields);// AIgnoredFields, ANameCase);
    ADataSet.Post;
  end;
end;

and

procedure TMVCJsonDataObjectsSerializer.JsonObjectToDataSet(const AJsonObject: TJDOJsonObject;
  const ADataSet: TDataSet;   const ADataSetFields: TMVCDataSetFields);
var
  J: Integer;
  Field: TField;
  lName: string;
  SS: TStringStream;
  SM: TMemoryStream;
  NestedDataSet: TDataSet;
begin
  if (ADataSet.State in [dsInsert, dsEdit]) then
  begin
    for J := 0 to ADataSetFields.Count - 1 do
    begin
      lName := ADataSetFields[J].FieldName;
      Field := ADataSet.Fields[ADataSetFields[J].I];
      // case GetNameCase(ADataSet, ANameCase) of
      // ncLowerCase:
      // name := LowerCase(Field.FieldName);
      // ncUpperCase:
      // name := UpperCase(Field.FieldName);
      // end;
      if not AJsonObject.Contains(lName) then
        continue;
      if (AJsonObject[lName].Typ = jdtObject) and (AJsonObject.Values[lName].ObjectValue = nil) then
      // Nullable Type
      begin
        Field.Clear;
        continue;
      end;
      case Field.DataType of
// etc

@fastbike
Copy link
Contributor

I'd also suggest renaming the record property - for clarity - from "I" to "Index" if that does not break too many things

@MPannier
Copy link
Contributor Author

I can confirm. The change from @fastbike works much faster.
This is what I mean with RTTI cache.

@danieleteti
Copy link
Owner

Sounds Good! We need to find a way to get this improvements mantaining the same functionality level. I'll give a look ASAP. Thanks guys.

@danieleteti
Copy link
Owner

Hi guys. I did some changes based on your suggestion. That caused some refactoring about methods visibility, paramaters, types and so on. However, it should be OK. All the unit tests (currently 429) passed OK. Please, check it on your side, and thanks for your support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Issue has been accepted and inserted in a future milestone enhancement
Projects
None yet
Development

No branches or pull requests

3 participants