## Working with NumPy Aggregate Functions

Note the discussion below applies to most of the NumPy aggregate functions. We'll use `np.sum` as the example to play with.

### Mindset

> With arrays, do not think of rows, columns, etc. because it might confuse you so that you think that the one dimension is more "higher-up" than another dimension - like the rows contain columns/fields. 
>
> That is **wrong** - all dimensions are equal. So if you have a `3*3*3` array, think of it as a Rubik's cube, where each dimension is just as "important" - none "contain" the other. 
>
> Same obivously applies when there is more than 3 dimensions as well ...

### Setup the Cube

In [2]:
import numpy as np

In [3]:
cube = np.ones((3, 3, 3))

cube

array([[[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]],

       [[1., 1., 1.],
        [1., 1., 1.],
        [1., 1., 1.]]])

In [4]:
cube.shape

(3, 3, 3)

In [6]:
print(cube[0])
print(cube[0].shape)

[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]
(3, 3)


In [7]:
print(cube[0][0])
print(cube[0][0].shape)

[1. 1. 1.]
(3,)


In [10]:
cube[0][0][0]

1.0

### Let's update some of the cube values

In [11]:
cube[0][0][0] = 5.0
cube[1][0][0] = 55.0
cube[2][0][0] = 99.0

cube

array([[[ 5.,  1.,  1.],
        [ 1.,  1.,  1.],
        [ 1.,  1.,  1.]],

       [[55.,  1.,  1.],
        [ 1.,  1.,  1.],
        [ 1.,  1.,  1.]],

       [[99.,  1.,  1.],
        [ 1.,  1.,  1.],
        [ 1.,  1.,  1.]]])

### Sum without axis

In [13]:
np.sum(cube)

183.0

...or:

In [14]:
np.sum(cube, axis=None)

183.0

### Sum with axis 0

Collapse dimension 0 into remaining dimensions.

In [23]:
np.sum(cube, axis=0)

array([[159.,   3.,   3.],
       [  3.,   3.,   3.],
       [  3.,   3.,   3.]])

### Sum with axis 1

Collapse dimension 1 into remaining dimensions.

In [41]:
np.sum(cube, axis=1)

array([[  7.,   3.,   3.],
       [ 57.,   3.,   3.],
       [101.,   3.,   3.]])

### "Real World" Example

Following are batches of people; each person has two attributes; number-of-eyes and salary:

In [53]:
batch_1 = np.array([
    [2, 2000],
    [1, 2000],
    [2, 2000]
])

batch_2 = np.array([
    [2, 1000],
    [1, 1000],
    [2, 1000]
])

batches = np.array([batch_1, batch_2])

In [54]:
# 2 batches, each batch with three people, each person with 2 attributes
batches.shape

(2, 3, 2)

#### Get total eyes and total salary, per batch

I.e. 
- collapse dimension 1 into remaining dimensions
- collapse **people dimension** so all we have left is the **batch info** and **number-of-eyes/salary info** 



In [56]:
np.sum(batches, axis=1)

array([[   5, 6000],
       [   5, 3000]])

### keepdims

Now let's keep the same dimension as the input:
- Keep the same number of dimensions, not that each dimension will have same size (contain same number of elements)


In [25]:
new_array = np.sum(cube, axis=1, keepdims=True)

new_array

array([[[  7.,   3.,   3.]],

       [[ 57.,   3.,   3.]],

       [[101.,   3.,   3.]]])

In [26]:
new_array.shape

(3, 1, 3)

In [27]:
cube.shape

(3, 3, 3)

In [29]:
print(
    len(new_array.shape),
    len(cube.shape)
)

3 3
