This notebook looks at constructing some of the components to provide the variance processor.

## Online Covariance

Based on the article here: https://en.wikipedia.org/wiki/Algorithms_for_calculating_variance .

In particular, Welford's online algorithm.

We assume our input data is a 1D array. We can always flatten before hand if not. We are calculating the mean and variance of the array.

### From the Article for the 1D Case

We need to convert the algorithm below to the ND case.

In [1]:
# For a new value newValue, compute the new count, new mean, the new M2.
# mean accumulates the mean of the entire dataset
# M2 aggregates the squared distance from the mean
# count aggregates the number of samples seen so far
def update(existingAggregate, newValue):
    (count, mean, M2) = existingAggregate
    count += 1
    delta = newValue - mean
    mean += delta / count
    delta2 = newValue - mean
    M2 += delta * delta2

    return (count, mean, M2)

# Retrieve the mean, variance and sample variance from an aggregate
def finalize(existingAggregate):
    (count, mean, M2) = existingAggregate
    (mean, variance, sampleVariance) = (mean, M2 / count, M2 / (count - 1))
    if count < 2:
        return float('nan')
    else:
        return (mean, variance, sampleVariance)

In [2]:
def parallel_variance(avg_a, count_a, var_a, avg_b, count_b, var_b):
    delta = avg_b - avg_a
    m_a = var_a * (count_a - 1)
    m_b = var_b * (count_b - 1)
    M2 = m_a + m_b + delta ** 2 * count_a * count_b / (count_a + count_b)
    return M2 / (count_a + count_b - 1)

We had when calculating U:
```
[In FOR loop]
    x = x.reshape(x.shape[0], 1)
    x_dash = x - mean
    mean += (x_dash / count)
    covariance += np.dot(x_dash, x_dash.T)
                
covariance = covariance / count
```
This is the same - only in the above update they substract the new mean while we substract the old mean. The finalize is also the same - just dividing the covariance by the count. As n gets large the sample covariance becomes indistinguishable from the actual. This will be the case for us as only a few seconds gets us hundreds of samples so n is 100.

We need to be looking at classes as we can have the aggregate values as state.

I think we can leave not removing the updated mean if we are dealing with large cound values (so the mean update will be small).

In [23]:
import numpy as np

class Covariance_Unit:
    """A model to compute covariance online."""
    def __init__(self, size):
        """Initialise.
        
        size is an integer setting the 1D size of an input."""
        self.size = size
        self.count = 0
        self.mean = np.zeros(shape=(size, 1))
        self.square_sum = np.zeros(shape=(size, size))

    def update(self, x):
        """Add a data point x.
        
        x is a 1D numpy array of length 'size'.
        """
        self.count += 1
        # Remove old mean
        x_dash = x - self.mean
        # Compute mean update
        self.mean += x_dash / self.count
        # Compute covariance update
        self.square_sum += np.dot(x_dash, x_dash.T)
    
    @property
    def covariance(self):
        """Compute covariance when requested."""
        return self.square_sum / self.count
        

In [None]:
from src.var_processor.pb_threshold import get_rand_ints

In [80]:
%%time
size = (3, 3)
length = size[0]*size[1]

cu = Covariance_Unit(length)

for i in range(0, 10000):
    # get a 3x3 grid of random 8-bit integers
    rand_ints = get_rand_ints(8, size)
    flattened = rand_ints.reshape(length, 1).astype(np.uint8)
    cu.update(flattened)

print(cu.count, cu.mean, cu.covariance, sep="\n")

10000
[[127.62  ]
 [126.6561]
 [127.7728]
 [127.8828]
 [126.9163]
 [126.9174]
 [127.9779]
 [127.2481]
 [127.5458]]
[[ 5.44840332e+03  2.73870674e+01 -8.05375009e+01 -4.12921118e+01
   1.35894856e+02 -1.73356832e+01 -9.20126209e+00 -1.00129642e+02
   6.91362604e+00]
 [ 2.73870674e+01  5.47821759e+03  6.49932668e+00 -4.91352540e+01
   4.60117028e+01 -4.21126007e+00 -6.40554166e+01  1.21542875e+00
   5.28829989e+01]
 [-8.05375009e+01  6.49932668e+00  5.51584416e+03  7.02031097e+01
   1.42972897e+01  5.16165061e+01  7.03966528e+01  1.40461628e+01
  -3.46599439e+01]
 [-4.12921118e+01 -4.91352540e+01  7.02031097e+01  5.50359400e+03
   1.73218722e+01  6.02841859e+01  6.02421498e+00 -8.00309152e+01
  -2.41932706e+00]
 [ 1.35894856e+02  4.60117028e+01  1.42972897e+01  1.73218722e+01
   5.53078712e+03 -1.49242178e+01  5.48973093e+01 -4.85907777e+01
   2.60099020e+01]
 [-1.73356832e+01 -4.21126007e+00  5.16165061e+01  6.02841859e+01
  -1.49242178e+01  5.50152820e+03  1.38487274e+01  1.56215213e+0

In [10]:
cu.covariance[0,0]

5591.771157682662

In [11]:
cu.count

10000

In [13]:
cu.square_sum[0,0]

55917711.576826625

In [25]:
cu.covariance[0,0]/cu.covariance[0,0]

1.0

In [26]:
cu.covariance[0,1]/((cu.covariance[0,0]**0.5)*(cu.covariance[1,1]**0.5)) 

0.017259850427445513

In [27]:
cu.square_sum[0,1]/((cu.square_sum[0,0]**0.5)*(cu.square_sum[1,1]**0.5)) 

0.017259850427445513

Questions and points:
* Can we store the mean as an uint8?
    * Not as when we divide by count > 256 we naturally get a not uint8 number
* Can we scale to have unit variance?
    * This is Pearson's correlation - for entry jk we need to divide by the covariance value for sqrt(c\_jj\*c\_kk)
    * Based on here - http://users.stat.umn.edu/~helwig/notes/datamat-Notes.pdf, we just scale using a diagonal matrix
    * We divide x_dash by a vector containing the sqrt of the difference squared
    * Is the divider the skew? https://www.johndcook.com/blog/skewness_kurtosis/
    * Do we also track the SD and divide? Scaling factor is average of squares
    * Scaling factor is the diagonal entries of the covariance matrix
    * Do we just divide by the absolute value of x_dash?

We normalise at each iteration (or each batch)

We could do we scaling at each iteration. If we consider 0 to 1 to be scaled to the bit representation (so for 8-bit we have 0 to 255 etc).

But careful because once we start subtracting we will get unsigned integers.

We can normalise by dividing by the absolute but we need to do a trick for 0 values (or add a small factor)

In [31]:
a = np.array([-1, -2, 4, 5, 0])
norm = np.round(a / (np.abs(a) + 0.0001))
print(norm)

[-1. -1.  1.  1.  0.]


In [34]:
# Version 

class Correlation_Unit(Covariance_Unit):
    """A model to compute correlation online."""

    def update(self, x):
        """Add a data point x.
        
        x is a 1D numpy array of length 'size'.
        """
        self.count += 1
        # Remove old mean
        x_dash = x - self.mean
        # Compute mean update
        self.mean += np.round(x_dash / self.count)
        # Normalise x_dash
        x_dash_norm = np.round(x_dash / (np.abs(x_dash) + 0.0001))
        # Compute covariance update
        self.square_sum += np.dot(x_dash_norm, x_dash_norm.T)
        
        

In [81]:
%%time
size = (3, 3)
length = size[0]*size[1]

cu = Correlation_Unit(length)

for i in range(0, 10000):
    # get a 3x3 grid of random 8-bit integers
    rand_ints = get_rand_ints(8, size)
    flattened = rand_ints.reshape(length, 1).astype(np.uint8)
    cu.update(flattened)

print(cu.count, cu.mean, cu.covariance, sep="\n")

10000
[[127.]
 [131.]
 [133.]
 [122.]
 [123.]
 [133.]
 [133.]
 [130.]
 [126.]]
[[ 9.958e-01 -2.600e-03  2.060e-02  1.390e-02  2.410e-02 -1.950e-02
   1.130e-02 -4.000e-04  1.180e-02]
 [-2.600e-03  9.966e-01  6.700e-03  1.390e-02  1.430e-02 -1.610e-02
   9.800e-03 -1.360e-02 -2.200e-03]
 [ 2.060e-02  6.700e-03  9.945e-01 -1.460e-02  1.480e-02  1.580e-02
  -7.800e-03  1.450e-02 -9.500e-03]
 [ 1.390e-02  1.390e-02 -1.460e-02  9.959e-01  2.600e-03  3.800e-03
  -1.300e-02 -1.450e-02 -4.300e-03]
 [ 2.410e-02  1.430e-02  1.480e-02  2.600e-03  9.971e-01 -1.800e-03
   5.400e-03  4.900e-03  5.900e-03]
 [-1.950e-02 -1.610e-02  1.580e-02  3.800e-03 -1.800e-03  9.965e-01
  -7.700e-03  1.990e-02  5.900e-03]
 [ 1.130e-02  9.800e-03 -7.800e-03 -1.300e-02  5.400e-03 -7.700e-03
   9.951e-01  3.300e-03  8.000e-04]
 [-4.000e-04 -1.360e-02  1.450e-02 -1.450e-02  4.900e-03  1.990e-02
   3.300e-03  9.956e-01  6.100e-03]
 [ 1.180e-02 -2.200e-03 -9.500e-03 -4.300e-03  5.900e-03  5.900e-03
   8.000e-04  6.100e-

In [38]:
cu.covariance[0, 0]

0.9961

Actually faster to normalise at the end.

But we need to be careful of overflow over lots of samples as the sume of squares will get very big for integer values.

## Binary Values and Covariance

Actually, let's see what happens when we apply a binary threshold to the input data and then take the covariance.

In [47]:
#from src.var_processor.pb_threshold import pb_threshold

def pb_threshold(input_values):
    """Apply a probablistic binary threshold to the input_values."""
    input_size = input_values.shape
    data_type = input_values.dtype
    bit_size = data_type.itemsize*8
    rand_ints = get_rand_ints(bit_size, input_size)
    binary_values = np.where(input_values > rand_ints, 1, 0)
    return binary_values

In [82]:
%%time
size = (3, 3)
length = size[0]*size[1]

cu = Covariance_Unit(length)

for i in range(0, 10000):
    # get a 3x3 grid of random 8-bit integers
    rand_ints = get_rand_ints(8, size).astype(np.uint8)
    thresholded = pb_threshold(rand_ints)
    flattened = thresholded.reshape(length, 1)
    cu.update(flattened)

print(cu.count, cu.mean, cu.covariance, sep="\n")

10000
[[0.4958]
 [0.5066]
 [0.5015]
 [0.4992]
 [0.497 ]
 [0.5033]
 [0.4952]
 [0.5028]
 [0.4999]]
[[ 2.50186820e-01  8.36299275e-04 -1.13808804e-03 -1.12136938e-03
  -3.71343386e-03  2.98742430e-03  1.08950235e-03  2.31453833e-03
   3.76951093e-03]
 [ 8.36299275e-04  2.50279490e-01 -2.52116950e-04 -8.22433063e-04
  -2.68356350e-03 -3.48541760e-03 -8.99611734e-04 -6.36173920e-04
  -1.35102301e-03]
 [-1.13808804e-03 -2.52116950e-04  2.50258582e-01 -7.25005193e-05
   2.65364055e-03 -2.38452726e-04 -1.00275129e-03  4.87834905e-05
   9.16106900e-04]
 [-1.12136938e-03 -8.22433063e-04 -7.25005193e-05  2.50239437e-01
   1.29975085e-03  4.72407901e-03  2.91564132e-03  5.22789024e-03
  -3.49098927e-03]
 [-3.71343386e-03 -2.68356350e-03  2.65364055e-03  1.29975085e-03
   2.50207450e-01 -3.54917152e-03  8.13469168e-05 -2.45221585e-03
   4.54400126e-04]
 [ 2.98742430e-03 -3.48541760e-03 -2.38452726e-04  4.72407901e-03
  -3.54917152e-03  2.50251944e-01 -2.48974656e-03  2.13059917e-03
   2.52416170e-0

In [49]:
cu.covariance[0,0]

0.2502339448910254

In [52]:
cu.square_sum[0,0]

2502.339448910254

Using binary values kinda normalises our covariance anyway without a need for additional scaling.

Again it would be good to get rid of the count value - just have normalised on each round. Info on normalsiing here - https://stackoverflow.com/questions/2850743/numpy-how-to-quickly-normalize-many-vectors - fast normalisation is given by np.sqrt(np.einsum('...i,...i', vectors, vectors))

In [None]:
class Correlation_Unit(Covariance_Unit):
    """A model to compute correlation online."""

    def update(self, x):
        """Add a data point x.
        
        x is a 1D numpy array of length 'size'.
        """
        self.count += 1
        # Remove old mean
        x_dash = x - self.mean
        # Compute mean update
        self.mean += x_dash / self.count
        # Normalise x_dash
        x_dash_norm = np.round(x_dash / (np.abs(x_dash) + 0.0001))
        # Compute covariance update
        self.square_sum += np.dot(x_dash_norm, x_dash_norm.T)

In [83]:
%%time
size = (3, 3)
length = size[0]*size[1]

cu = Correlation_Unit(length)

for i in range(0, 10000):
    # get a 3x3 grid of random 8-bit integers
    rand_ints = get_rand_ints(8, size).astype(np.uint8)
    thresholded = pb_threshold(rand_ints)
    flattened = thresholded.reshape(length, 1)
    cu.update(flattened)

print(cu.count, cu.mean, cu.covariance, sep="\n")

10000
[[1.]
 [0.]
 [0.]
 [0.]
 [1.]
 [1.]
 [1.]
 [0.]
 [0.]]
[[ 0.4931 -0.2382 -0.2466 -0.2454  0.2453  0.2411  0.2507 -0.2465 -0.2473]
 [-0.2382  0.4935  0.2495  0.2499 -0.2499 -0.2477 -0.253   0.2429  0.2482]
 [-0.2466  0.2495  0.5002  0.2492 -0.2491 -0.2521 -0.2532  0.2539  0.2549]
 [-0.2454  0.2499  0.2492  0.499  -0.2552 -0.2483 -0.2516  0.2528  0.2496]
 [ 0.2453 -0.2499 -0.2491 -0.2552  0.5029  0.254   0.2519 -0.2505 -0.249 ]
 [ 0.2411 -0.2477 -0.2521 -0.2483  0.254   0.4981  0.2497 -0.2436 -0.2484]
 [ 0.2507 -0.253  -0.2532 -0.2516  0.2519  0.2497  0.5068 -0.2553 -0.2517]
 [-0.2465  0.2429  0.2539  0.2528 -0.2505 -0.2436 -0.2553  0.5027  0.2505]
 [-0.2473  0.2482  0.2549  0.2496 -0.249  -0.2484 -0.2517  0.2505  0.5009]]
CPU times: user 1.12 s, sys: 0 ns, total: 1.12 s
Wall time: 1.12 s


We don't need to normalise x_dash because it will already be small.

We do need to normalise the square_sum...

In [73]:
class Scaled_CU:
    """A model to compute s scaled update online with no count."""

    def __init__(self, size, update_factor=0.01):
        """Initialise.
        
        size is an integer setting the 1D size of an input;
        update_factor is a factor 0 > f <= 1.
        
        The update factor should be ~ batch size."""
        self.size = size
        self.uf = update_factor
        self.mean = np.zeros(shape=(size, 1))
        self.square_sum = np.zeros(shape=(size, size))
        self.reset_count = 0

    def update(self, x):
        """Add a data point x.
        
        x is a 1D numpy array of length 'size'.
        """
        # Remove old mean
        x_dash = x - self.mean
        # Compute mean update
        self.mean += self.uf*(x_dash)
        # Compute square matrix scale factor based on SD - but this division may slow down
        # Extra factor to avoid division by 0
        mask = x_dash != 0
        x_dash[mask] = x_dash[mask] * np.abs(x_dash[mask])**-1
        # Compute covariance update
        self.square_sum += np.dot(x_dash, x_dash.T)
        
        # Now we need to normalise again the square sum
        # We can do this every 1/update_factor samples
        self.reset_count += 1
        correction_factor = self.uf**-1
        if self.reset_count > correction_factor:
            self.reset_count = 0
            self.square_sum = self.square_sum / correction_factor
    
    @property
    def correlation(self):
        """Compute covariance when requested."""
        pass

Nearly there - but because we've scaled our square_sum to have max sum of 1, as we add consecutive, we end up with count values.

In [84]:
%%time
size = (3, 3)
length = size[0]*size[1]

cu = Scaled_CU(length)

for i in range(0, 10000):
    # get a 3x3 grid of random 8-bit integers
    rand_ints = get_rand_ints(8, size).astype(np.uint8)
    thresholded = pb_threshold(rand_ints)
    flattened = thresholded.reshape(length, 1)
    cu.update(flattened)

print(cu.mean, cu.square_sum, sep="\n")

[[0.46948023]
 [0.41075422]
 [0.57070639]
 [0.43224553]
 [0.46227332]
 [0.49466475]
 [0.55275045]
 [0.47899044]
 [0.45956529]]
[[ 2.02020202  1.02890303 -1.11191677  1.11210305 -1.10949301 -0.95088295
   1.17010893 -1.09288509 -0.98950697]
 [ 1.02890303  2.02020202 -1.20949895  1.08849299 -0.96949915 -0.96849301
   0.90970695 -0.98970691 -0.93151307]
 [-1.11191677 -1.20949895  2.02020202 -1.21068689  1.16890897  0.92909911
  -1.11109721  0.91149709  1.09091089]
 [ 1.11210305  1.08849299 -1.21068689  2.02020202 -1.08869889 -0.97130519
   1.07089493 -0.99089897 -1.09191305]
 [-1.10949301 -0.96949915  1.16890897 -1.08869889  2.02020202  0.93069887
  -1.14868085  0.86908901  1.05010297]
 [-0.95088295 -0.96849301  0.92909911 -0.97130519  0.93069887  2.02020202
  -0.94891107  1.03129487  1.12948515]
 [ 1.17010893  0.90970695 -1.11109721  1.07089493 -1.14868085 -0.94891107
   2.02020202 -0.93050101 -0.86907905]
 [-1.09288509 -0.98970691  0.91149709 -0.99089897  0.86908901  1.03129487
  -0.930

In [67]:
cu.square_sum[-1, -1]

100.00000000001425

Some of the values above are too high

In [77]:
class Scaled_CU2:
    """A model to compute s scaled update online with no count."""

    def __init__(self, size):
        """Initialise.
        
        Args:
            size: integer setting the 1D size of an input.
        """
        self.size = size
        self.count = 0
        self.x_sum = np.zeros(shape=(size, 1))
        self.square_sum = np.zeros(shape=(size, size))

    def update(self, x):
        """Add a data point x.
        
        x is a 1D numpy array of length 'size'.
        """
        self.count += 1
        self.x_sum += x
        x_dash = self.x_sum - self.count*x
        scale_factor = self.count*(self.count+1)
        self.square_sum += (scale_factor**-1)*np.dot(x_dash, x_dash.T)
    
    @property
    def mean(self):
        """Compute mean when requested."""
        return self.x_sum / self.count
    
    @property
    def correlation(self):
        """Compute covariance when requested."""
        return self.square_sum / self.count

In [85]:
%%time
size = (3, 3)
length = size[0]*size[1]

cu = Scaled_CU2(length)

for i in range(0, 10000):
    # get a 3x3 grid of random 8-bit integers
    rand_ints = get_rand_ints(8, size).astype(np.uint8)
    thresholded = pb_threshold(rand_ints)
    flattened = thresholded.reshape(length, 1)
    cu.update(flattened)

print(cu.mean, cu.correlation, sep="\n")

[[0.4946]
 [0.4944]
 [0.5032]
 [0.5061]
 [0.5015]
 [0.4909]
 [0.5098]
 [0.4986]
 [0.4931]]
[[ 2.49574795e-01 -3.84130589e-03  4.60581043e-03 -2.06704296e-03
  -2.35228749e-03  2.21821548e-03  2.53346294e-03  1.47247746e-03
  -5.27575357e-04]
 [-3.84130589e-03  2.49542091e-01 -4.37181132e-03 -1.19821839e-03
   7.71378418e-04 -3.13733726e-03  9.30436269e-04 -1.87617167e-03
  -6.54775579e-04]
 [ 4.60581043e-03 -4.37181132e-03  2.49573312e-01 -5.32032097e-04
   7.04100103e-03  1.45058157e-03 -2.41259530e-04  1.32179210e-03
   1.38695247e-04]
 [-2.06704296e-03 -1.19821839e-03 -5.32032097e-04  2.49546800e-01
  -2.82985470e-03  2.89042243e-03 -3.81524404e-03 -4.96131295e-03
  -1.16338861e-03]
 [-2.35228749e-03  7.71378418e-04  7.04100103e-03 -2.82985470e-03
   2.49596163e-01 -1.51390050e-03 -2.66830147e-03 -6.40916506e-03
   6.84623922e-04]
 [ 2.21821548e-03 -3.13733726e-03  1.45058157e-03  2.89042243e-03
  -1.51390050e-03  2.49499076e-01  6.73865266e-03  9.83409233e-04
   2.00258587e-03]
 [ 

In [86]:
%%time
size = (3, 3)
length = size[0]*size[1]

cu = Scaled_CU2(length)

for i in range(0, 30000):
    # get a 3x3 grid of random 8-bit integers
    rand_ints = get_rand_ints(8, size).astype(np.uint8)
    thresholded = pb_threshold(rand_ints)
    flattened = thresholded.reshape(length, 1)
    cu.update(flattened)

print(cu.mean, cu.correlation, sep="\n")

[[0.5013    ]
 [0.49573333]
 [0.49636667]
 [0.4964    ]
 [0.49556667]
 [0.50046667]
 [0.504     ]
 [0.50086667]
 [0.50136667]]
[[ 2.49857525e-01  1.62767234e-03 -2.02911061e-03  1.33547234e-03
   7.73800037e-04 -5.80474215e-04  4.79195511e-04  1.14245432e-04
   5.32403555e-04]
 [ 1.62767234e-03  2.49832204e-01 -1.61735733e-03  5.18056408e-04
   5.23566788e-04 -1.23963515e-03 -3.16587579e-03 -6.28717574e-04
   3.35572578e-03]
 [-2.02911061e-03 -1.61735733e-03  2.49827254e-01  1.42226402e-03
   2.49880375e-04 -5.46248548e-04  1.59746203e-03  7.34813152e-04
   4.46889823e-04]
 [ 1.33547234e-03  5.18056408e-04  1.42226402e-03  2.49827757e-01
  -1.00494204e-03 -2.91974910e-03  1.57329005e-03  8.03026623e-04
   2.80379646e-03]
 [ 7.73800037e-04  5.23566788e-04  2.49880375e-04 -1.00494204e-03
   2.49823633e-01 -6.34483193e-04  5.60258300e-04  2.79203076e-03
   5.45643174e-04]
 [-5.80474215e-04 -1.23963515e-03 -5.46248548e-04 -2.91974910e-03
  -6.34483193e-04  2.49846219e-01  7.01011331e-04  2

The thresholding slows us up more.

Could we combine an online and batch update to allow continuous operation?

E.g. reset every n samples and sum as two n batches?

We have:

```
XX_AB = XX_A + XX_B + (n_A*n_B)/(n_AB) * (x_bar_A - x_bar_B)(x_bar_A - x_bar_B)
```

Or add the covariances then add a factor based on the difference of the means, where the new meanP:
```
x_bar_X = (n_A*x_bar_A + n_b*x_bar_B) / (n_A + n_B)
```

In [None]:
class Continuous_CU:
    """A model to compute a continous mean and covariance."""

    def __init__(self, size, batch_size=1000):
        """Initialise.
        
        size is an integer setting the 1D size of an input."""
        self.size = size
        self.count = 0
        self.x_sum = np.zeros(shape=(size, 1))
        self.square_sum = np.zeros(shape=(size, size))
        # Additional variables to store running mean + covar
        self.mean = np.zeros(shape=(size, 1))
        self.covariance = np.zeros(shape=(size, size))
        # Running count for batch
        self.batch_count = 0
        self.batch_size = batch_size

    def update(self, x):
        """Add a data point x.
        
        x is a 1D numpy array of length 'size'.
        """
        self.count += 1
        self.x_sum += x
        x_dash = self.x_sum - self.count*x
        scale_factor = self.count*(self.count+1)
        self.square_sum += (scale_factor**-1)*np.dot(x_dash, x_dash.T)
        
    def update_mean(self):
        """Update mean after a batch is processed."""
        pass
    
    def update_covariance(self):
        """Update covariance after a batch is processed."""
        pass
    
    @property
    def covariance(self):
        """Compute covariance when requested."""
        return self.square_sum / self.count

## Timing

I think the getting of random integers and the thresholding is reasonably slow. Let's have a look at timing.

In [89]:
import cProfile
cProfile.run("""rand_ints = get_rand_ints(8, size).astype(np.uint8)""")

         63 function calls in 0.000 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 <string>:1(<module>)
        2    0.000    0.000    0.000    0.000 <string>:12(__new__)
        1    0.000    0.000    0.000    0.000 pb_threshold.py:6(get_rand_ints)
        2    0.000    0.000    0.000    0.000 version.py:271(__init__)
        8    0.000    0.000    0.000    0.000 version.py:282(<genexpr>)
        6    0.000    0.000    0.000    0.000 version.py:420(_parse_letter_version)
        2    0.000    0.000    0.000    0.000 version.py:461(_parse_local_version)
        2    0.000    0.000    0.000    0.000 version.py:474(_cmpkey)
        2    0.000    0.000    0.000    0.000 version.py:48(parse)
        3    0.000    0.000    0.000    0.000 version.py:490(<lambda>)
        1    0.000    0.000    0.000    0.000 version.py:74(__lt__)
        1    0.000    0.000    0.000    0.000 version.

In [90]:
%%timeit
rand_ints = get_rand_ints(8, size).astype(np.uint8)
thresholded = pb_threshold(rand_ints)
flattened = thresholded.reshape(length, 1)

77.9 µs ± 890 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [91]:
%%timeit
rand_ints = get_rand_ints(8, size).astype(np.uint8)

33.6 µs ± 515 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [92]:
%%timeit
thresholded = pb_threshold(rand_ints)

45.8 µs ± 1.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
