In [1]:
from histoptimizer import Histoptimizer

item_sizes = [1.0, 4.5, 6.3, 2.1, 8.4, 3.7, 8.6, 0.3]

(dividers, variance) = Histoptimizer.partition(item_sizes, 3)


In [2]:
list(dividers)

[3, 5]

The algorithm fills out a matrix of sub-problem solutions column-wise from left to right.

It first determines the cost of placing the first divider at every possible item location. For column 1 of the matrix, the row for each item index contains the variance (vs the overall mean) of the first bucket were the first divider placed there. This is handled as part of the intialization.

For the second column, the algorithm again calculates for each item index the cost function for placing the second divider at that location. However, in this case the cost function must be evaluated once for each item in the previous column that comes before the item in column 2.

Once the minimum cost function value is found for item i, the value is stored at min_cost[2,i]. In addition, the item index in column 1 that yielded the minimum cost is stored at divider_location[2,i]. Later, this will allow us to reconstruct the divider set that gave us the lowest value.

Third and subsequent columns are processed the same way as the second.

Finally, after the mth divider/column is processed, an extra, final divider column is processed in the same manner. This column represents a final divider that closes the set of partitioned items. In order to solve the original posed problem, only the value at min_cost[m+1,n] needs to be calculated, because the final divider will always be placed at item n.

By calculating the entire final column, we gain without additional work the ability to find the optimal partition of any given prefix of the item set--the optimal partition of the first 1..k items in the items set into the same number of buckets, for any k <= n.

After the algorithm is complete, the minimum achievable variance is stored at min_cost[m+1,n] and the optimal set of divider locations can be obtained by walking back the chain of divider locations starting at divider_location[m+1,n].

Parallellizing
==============

The value of each column depends on the values in the column before it, but does not depend on the value of any other row in the same column. So, we cannot parallelize on the bucket axis, but we can parallelize by item. In any case the item count is expected in most cases to be significantly larger than the number of buckets.

Each column, then, will be handled sequentially, and the parallelization strategy is to deploy cores in parallel to calculate the cost value for each row.

When you are deploying cores in parallel, you want each core to perform the same number of operations, because otherwise some cores will have "wasted" idle time. This is a challenge for the current algorithm, because item 2 only does one cost evaluation, against item 1 in the previous row. But item n must evaluate for every item 1..n-1 in the previous row.

A simple solution here is to pair row 2 with row n, row 3 with row n-1, and so on. Each thread then has the same number of cost evaluations to make.

With one thread of execution for each pair of items, we can keep up to n/2 threads of execution busy most of the time.





A simple strategy is to