Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of emissary.output.DropOffUtil.processMetadata() by changing loop… #138

Merged
merged 1 commit into from
Nov 6, 2021

Conversation

jdcove2
Copy link
Collaborator

@jdcove2 jdcove2 commented Apr 19, 2021

This method is called for each BaseDataObject. The current "for" loop syntax creates a lot of iterator objects and should be changed so that it does not.

@jdcove2 jdcove2 requested a review from fbruton April 19, 2021 17:38
@jdcove2
Copy link
Collaborator Author

jdcove2 commented Apr 19, 2021

DropOffUtil.parseMetadata() uses the "new" style for loop syntax which creates an iterator object for each execution of the loop. Iterating over Lists should not be done this way since the "old" style for loop syntax achieves the same result without creating an object. This is evidenced using the 20Meg mysqldump file as an example where parseMetadata() creates 1,773,237 objects representing 56,743,584 bytes unnecessarily as shown in the following screen shots.

  1. ArrayList iterator objects created by the old code.

image

  1. No ArrayList iterator objects created by new code.

image

@jdcove2 jdcove2 changed the title Improve performance of DropOffUtil.processMetadata() by changing loop… Improve performance of emissary.output.DropOffUtil.processMetadata() by changing loop… Apr 19, 2021
Copy link
Contributor

@LightSeekerSC LightSeekerSC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. There are no processing differences and this reduces the creation of iterators, resulting in saved memory.

@jdcove2 jdcove2 self-assigned this Jun 9, 2021
@jpdahlke
Copy link
Collaborator

I would like @fbruton to weigh in on this one. I believe he had some concerns.

Copy link
Collaborator

@fbruton fbruton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend updating the loop for parentParams and extended_filetypes and declaring the data type as an ArrayList instead of Lists.

Also, can you profile .clear() change separately. I'm not sure if that change is going to make much of a difference.

Also, recommend holding off on updating the loop for childObjList and payloadList as part of this merge request.

As we continue to profile and look for performance updates, maybe ArrayList isn't the desired data structure for IBDO records.

if (tld.hasParameter(param)) {
parentTypes.put("1" + param, tld.getStringParameter(param));
}
}

for (final IBaseDataObject p : payloadList) {
for (int i = 0; i < payloadList.size(); i++) {
final IBaseDataObject p = payloadList.get(i);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have concerns with the performance of this accessor if the data structure is not an ArrayList or any other data structure that does not perform at O(1). HDMobileAgent currently creates an ArrayList, but the API doesn't prevent the framework from leveraging another data type, such as LinkedList.

@@ -1085,14 +1091,16 @@ public void processMetadata(final List<IBaseDataObject> payloadList) {
if (p.hasExtractedRecords()) {
final List<IBaseDataObject> childObjList = p.getExtractedRecords();
Collections.sort(childObjList, new emissary.util.ShortNameComparator());
for (final IBaseDataObject child : childObjList) {
for (int j = 0; j < childObjList.size(); j++) {
final IBaseDataObject child = childObjList.get(j);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some concerns my previous comment in regards to the potential performance hit if the data type is changed from ArrayList.

@@ -1030,7 +1033,7 @@ public void processMetadata(final List<IBaseDataObject> payloadList) {
}

if (p.getStringParameter("EXTENDED_FILETYPE") == null) {
final List<String> extended_filetypes = new ArrayList<String>();
extended_filetypes.clear();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there will be many cases where this could be a long list, but if so, this could potentially use more memory.

@jdcove2
Copy link
Collaborator Author

jdcove2 commented Jul 1, 2021

To make sure I understand the comments:

  1. "declaring the data type as an ArrayList instead of Lists."
    • Change method argument, "payloadList", to be ArrayList instead of List
  2. "can you profile .clear() change separately. I'm not sure if that change is going to make much of a difference."
    • The "extended_filetypes.clear()" was not a performance enhancement. It was put in because the previous code created a new ArrayList that is by default empty, so I cleared the list to mimic that behaviour.
  3. "recommend holding off on updating the loop for childObjList and payloadList"
    • Revert 1018-1019 (payloadList)
    • Revert 1094-1095 (childObjList)
  4. "As we continue to profile and look for performance updates, maybe ArrayList isn't the desired data structure for IBDO records."
    • I'm not sure what data structure will perform better than ArrayList for just holding a list of IBDO's.

@fbruton
Copy link
Collaborator

fbruton commented Jul 1, 2021

To make sure I understand the comments:

  1. "declaring the data type as an ArrayList instead of Lists."

    • Change method argument, "payloadList", to be ArrayList instead of List
  2. "can you profile .clear() change separately. I'm not sure if that change is going to make much of a difference."

    • The "extended_filetypes.clear()" was not a performance enhancement. It was put in because the previous code created a new ArrayList that is by default empty, so I cleared the list to mimic that behaviour.
  3. "recommend holding off on updating the loop for childObjList and payloadList"

    • Revert 1018-1019 (payloadList)
    • Revert 1094-1095 (childObjList)
  4. "As we continue to profile and look for performance updates, maybe ArrayList isn't the desired data structure for IBDO records."

    • I'm not sure what data structure will perform better than ArrayList for just holding a list of IBDO's.
  1. That's if we decide to update the loop for payloadList.
  2. clear() doesn't resize the array list. It could potentially hold more memory if the size of the Arraylist being created differed drastically. Should not be an issue with smaller list of a similar size, which should be the case for that particular field. Overall, I'm okay with the change.
  3. Yes
  4. That may be the case for specific use cases. As a framework, it should be flexible enough to meet the needs of various types of use cases. For example, will the system perform more reads or writes to the records, may determine if using an ArrayList or LinkedList is more efficient. If it's decided that ArrayList is the best option for most/all cases, then I believe the API should reflect that.

@jdcove2
Copy link
Collaborator Author

jdcove2 commented Jul 1, 2021

Is the following closer to what you want?

  1. Make this code change in a new "issue".
  2. Leave this code change in the current PR/Issue. Note that the ArrayList internal array size can be changed with "trimToSize".
  3. Make this code change in a new "issue".
  4. Make this a consideration in a new "issue".

@jdcove2 jdcove2 force-pushed the Performance_DropOffUtil_Loop branch from 14268cf to c0f907a Compare July 29, 2021 17:14
@jpdahlke jpdahlke added this to the 7.2.0 milestone Oct 4, 2021
@dev-mlb dev-mlb merged commit 13f13ed into master Nov 6, 2021
@jpdahlke jpdahlke deleted the Performance_DropOffUtil_Loop branch June 6, 2023 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants