Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of large local arrays #80

Closed
nars1 opened this issue Nov 7, 2017 · 0 comments
Closed

Improve performance of large local arrays #80

nars1 opened this issue Nov 7, 2017 · 0 comments
Assignees
Milestone

Comments

@nars1
Copy link
Collaborator

nars1 commented Nov 7, 2017

Final Release Note

Local arrays with large number of subscripts scale much better. When the number of nodes in a local array is in the millions, node creation time is now noticeably faster (in the example given in this Issue Description, the average $zut column improved from 18 seconds to 0.1 second when the Array Size was 16Mi). [#80]

Description

Cycle ; Local Array Scalability Test.
    new Cycle,Iter,Size,Then,Span,Average
    write !,"Cy Array Size Avg $zut",!
    write "-- ---------- ------",!
    for Cycle=1:1:24 do
    .   new Array
    .   set Size=2**Cycle
    .   set Then=$zut
    .   for Iter=1:1:Size set Array(Iter)=Iter
    .   set Span=$zut-Then
    .   set Average=Span/Size
    .   write $justify(Cycle,2)," ",$justify($fnumber(Size,","),10)," ",$justify(Average,6,3),!
    quit

The above M program creates progressively larger local arrays (according to powers of two) and reports the average duration of the process per array node.

Here are the results it prints out:

Cy Array Size Avg $zut
-- ---------- ------
 1          2  0.500
 2          4  0.000
 3          8  0.125
 4         16  0.063
 5         32  0.063
 6         64  0.078
 7        128  0.078
 8        256  0.059
 9        512  0.063
10      1,024  0.062
11      2,048  0.062
12      4,096  0.061
13      8,192  0.062
14     16,384  0.064
15     32,768  0.066
16     65,536  0.071
17    131,072  0.083
18    262,144  0.137
19    524,288  0.260
20  1,048,576  0.466
21  2,097,152  0.857
22  4,194,304  1.696
23  8,388,608  4.620
24 16,777,216 18.002

The average $zut column increases exponentially as the # of nodes in the array doubles. We instead expect the average time column to be more or less the same.

Draft Release Note

Local arrays with large number of subscripts scale a lot better. When the # of nodes in a local array are in the millions, node creation time is now noticeably faster (in the example given in this Issue Description, the average $zut column improved from 18 seconds to 0.1 second when the Array Size was 16Mi).

@nars1 nars1 self-assigned this Nov 7, 2017
@nars1 nars1 added this to the r120 milestone Nov 7, 2017
nars1 added a commit to nars1/YottaDB that referenced this issue Nov 7, 2017
…inearly) as it affects local array performance when the number of nodes is large (e.g. millions) YottaDB#80
nars1 added a commit that referenced this issue Nov 7, 2017
…inearly) as it affects local array performance when the number of nodes is large (e.g. millions) #80
@nars1 nars1 closed this as completed Nov 7, 2017
nars1 added a commit to nars1/YottaDB that referenced this issue Nov 9, 2017
…pool garbage collection in YottaDB) by choosing a better pivot

The primary enhancement is to stpg_sort.c. Previously, the pivot was chosen as the median of the left, right and (left+right)/2 indices in the array. This is the normally recommended pivot rule for quicksort. But it was observed to give us a pivot that ended up almost in one corner of the array (i.e. too close to left or too close to right). This in turn caused quicksort to degenerate into an O(n^2) algorithm which showed its colors when a garbage collection had to run with thousands of items. The pivot is now chosen as the median of 9 equally-spaced numbers in the array spanning [left,right] indices. And this was observed to give us a pivot that ended up almost in the midpoint of the array (45% most of the time) thus enabling quicksort to run in O(nlogn). With these changes, a garbage collection that used to take 83 seconds took 0.5 seconds.

In addition the following changes were done.

a) Enhance the stringpool to contain > 4Gi items ("stp_array_size" variable)
b) stp_expand_array.c : Expand the "stp_array" array (that holds the items for garbage collection) exponentially instead of linearly.
c) lv_getslot.c : And handle an edge case (numElems == MAXINT4) introduced in a prior commit for YottaDB#80
nars1 added a commit to nars1/YottaDB that referenced this issue Nov 9, 2017
…pool garbage collection in YottaDB) by choosing a better pivot YottaDB#85

The primary enhancement is to stpg_sort.c. Previously, the pivot was chosen as the median of the left, right and (left+right)/2 indices in the array. This is the normally recommended pivot rule for quicksort. But it was observed to give us a pivot that ended up almost in one corner of the array (i.e. too close to left or too close to right). This in turn caused quicksort to degenerate into an O(n^2) algorithm which showed its colors when a garbage collection had to run with thousands of items. The pivot is now chosen as the median of 9 equally-spaced numbers in the array spanning [left,right] indices. And this was observed to give us a pivot that ended up almost in the midpoint of the array (45% most of the time) thus enabling quicksort to run in O(nlogn). With these changes, a garbage collection that used to take 83 seconds took 0.5 seconds.

In addition the following changes were done.

a) Enhance the stringpool to contain > 4Gi items ("stp_array_size" variable)
b) stp_expand_array.c : Expand the "stp_array" array (that holds the items for garbage collection) exponentially instead of linearly.
c) lv_getslot.c : And handle an edge case (numElems == MAXINT4) introduced in a prior commit for YottaDB#80
nars1 added a commit to nars1/YottaDB that referenced this issue Nov 10, 2017
…in lv_getslot YottaDB#80

These changes address review comments

1) All longcpy usages need to be replaced with memcpy. Since there were only a few remaining in the codebase, I replaced all of them.
2) lv_getslot.c : In function lvtreenode_getslot(), as soon as numElems is > (MAXINT4 / 2) but is < MAXINT4, we would end up passing a number that is > MAXINT4 but < MAXUINT4 to lvtreenode_newblock(). The latter expects a signed int though and so is going to see this as a negative number. Fix it so we never pass more than MAXINT4 to lvtreenode_newblock(). While in this module, fix a few assignments to be DEBUG_ONLY and move them to their proper scope and add a few missing asserts in lv_getslot() similar to the other two (lvtree_getslot() and lvtreenode_getslot()).
nars1 added a commit that referenced this issue Nov 10, 2017
…pool garbage collection in YottaDB) by choosing a better pivot #85

The primary enhancement is to stpg_sort.c. Previously, the pivot was chosen as the median of the left, right and (left+right)/2 indices in the array. This is the normally recommended pivot rule for quicksort. But it was observed to give us a pivot that ended up almost in one corner of the array (i.e. too close to left or too close to right). This in turn caused quicksort to degenerate into an O(n^2) algorithm which showed its colors when a garbage collection had to run with thousands of items. The pivot is now chosen as the median of 9 equally-spaced numbers in the array spanning [left,right] indices. And this was observed to give us a pivot that ended up almost in the midpoint of the array (45% most of the time) thus enabling quicksort to run in O(nlogn). With these changes, a garbage collection that used to take 83 seconds took 0.5 seconds.

In addition the following changes were done.

a) Enhance the stringpool to contain > 4Gi items ("stp_array_size" variable)
b) stp_expand_array.c : Expand the "stp_array" array (that holds the items for garbage collection) exponentially instead of linearly.
c) lv_getslot.c : And handle an edge case (numElems == MAXINT4) introduced in a prior commit for #80
nars1 added a commit that referenced this issue Nov 10, 2017
…in lv_getslot #80

These changes address review comments

1) All longcpy usages need to be replaced with memcpy. Since there were only a few remaining in the codebase, I replaced all of them.
2) lv_getslot.c : In function lvtreenode_getslot(), as soon as numElems is > (MAXINT4 / 2) but is < MAXINT4, we would end up passing a number that is > MAXINT4 but < MAXUINT4 to lvtreenode_newblock(). The latter expects a signed int though and so is going to see this as a negative number. Fix it so we never pass more than MAXINT4 to lvtreenode_newblock(). While in this module, fix a few assignments to be DEBUG_ONLY and move them to their proper scope and add a few missing asserts in lv_getslot() similar to the other two (lvtree_getslot() and lvtreenode_getslot()).
@nars1 nars1 removed this from the r120 milestone Jan 8, 2018
@nars1 nars1 added this to the r120 milestone Jan 8, 2018
@ksbhaskar ksbhaskar changed the title Performance issues in large local arrays Improve performance of large local arrays Mar 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant