## Array Summation in Python

In NumPy, `np.sum` is the standard way to sum elements of an array. It works on scalars, 1D vectors, and higher-dimensional arrays.

```python
import numpy as np

A = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])

# Sum all elements
np.sum(A)                  # 45 (same as np.sum(A, axis=None))

# Sum column-wise (axis=0)
np.sum(A, axis=0)          # array([12, 15, 18])

# Sum row-wise (axis=1)
np.sum(A, axis=1)          # array([ 6, 15, 24])
```

For a 2D array with shape `(rows, columns)`:
- `axis=0` collapses rows and keeps one value per column.
- `axis=1` collapses columns and keeps one value per row.

**Extra knowledge:**
- `np.sum` is vectorized and much faster than Python's built-in `sum` on large numeric arrays.
- For higher-dimensional tensors in deep learning, `axis` controls whether you reduce over the batch dimension, feature dimension, time steps, etc.
- You can also pass a tuple of axes, e.g. `np.sum(A3d, axis=(1, 2))` to sum over multiple dimensions at once.

### Use of `keepdims`

The `keepdims` parameter in many NumPy reduction functions (such as `np.sum`, `np.max`, `np.mean`) is used to **keep the reduced axes with size 1** instead of removing them.

```python
import numpy as np

A = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])

# Without keepdims (default)
col_sum = np.sum(A, axis=0)            # shape: (3,)
row_sum = np.sum(A, axis=1)            # shape: (3,)

# With keepdims=True
col_sum_kd = np.sum(A, axis=0, keepdims=True)  # shape: (1, 3)
row_sum_kd = np.sum(A, axis=1, keepdims=True)  # shape: (3, 1)
```

Shapes:
- `A.shape` is `(3, 3)`.
- `col_sum.shape` is `(3,)`, but `col_sum_kd.shape` is `(1, 3)`.
- `row_sum.shape` is `(3,)`, but `row_sum_kd.shape` is `(3, 1)`.

**Why is `keepdims` useful?**
- It preserves dimensions so that results can **broadcast back** against the original array.
- This is very common in deep learning when normalizing over an axis (e.g., subtracting a per-row mean or max) while keeping shapes compatible for broadcasting.

## Broadcasting Rules in NumPy

Broadcasting is a powerful mechanism in NumPy that allows operations to be performed on arrays of different shapes **without** explicitly copying data.

Broadcasting rules:
1. If the arrays do not have the same number of dimensions, the shape of the smaller array is padded with ones on its **left** side until both shapes have the same number of dimensions.
2. For each dimension, the sizes must be either **equal** or **one**. If they differ and neither is 1, NumPy raises an error.
3. The result shape is the element-wise maximum of the input shapes.

Example:
```python
import numpy as np

vector = np.array([1, 2, 3])          # shape: (3,)
matrix = np.array([[4, 5, 6],
                   [7, 8, 9]])       # shape: (2, 3)

result = matrix + vector             # shape: (2, 3)
print(result)
```
Output:
```
[[ 5  7  9]
 [ 8 10 12]]
```

Here, `vector` (shape `(3,)`) is treated as `(1, 3)` and then broadcast to `(2, 3)` to match `matrix`.

**Extra knowledge:**
- You can use `np.newaxis` (or `None`) to insert size-1 dimensions and control broadcasting, e.g. `vector[:, np.newaxis]` gives shape `(3, 1)`.
- Many neural network operations rely on broadcasting to apply biases (`(1, features)`) across a whole batch (`(batch_size, features)`).
- A common error is: `ValueError: operands could not be broadcast together` — check the shapes and see which dimension pair violates the rules above.

### Example: Subtracting the Maximum Value from Each Row


Let’s consider the matrix:

A = [[1, 2, 3],
     [4, 5, 6],
     [7, 8, 9]]


We want to subtract the maximum value of each row from every element in that row:


- From the first row, subtract 3


- From the second row, subtract 6


- From the third row, subtract 9


So we get:


A = [[1-3, 2-3, 3-3],


     [4-6, 5-6, 6-6],


     [7-9, 8-9, 9-9]]


After subtracting the maximum value from each row, we get:


A_result = [[-2, -1, 0],


            [-2, -1, 0],


            [-2, -1, 0]]


```python


import numpy as np


A = np.array([[1, 2, 3],


              [4, 5, 6],


              [7, 8, 9]])


# Subtract row-wise maximum with keepdims=True so that broadcasting works row-by-row


A_result = A - np.max(A, axis=1, keepdims=True)


print(A_result)


```


### Broadcasting Explanation


#### Case 1: `keepdims=False` (default)


A = [[1, 2, 3],


     [4, 5, 6],


     [7, 8, 9]]


```python


np.max(A, axis=1)  # => [3, 6, 9]



```


- `np.max(A, axis=1)` returns a 1D array with shape `(3,)`.


- During broadcasting, this 1D array is treated like a `1 x 3` 2D array when aligning with `A` (shape `(3, 3)`).


So, in the expression:


```python


A - np.max(A, axis=1)



```


the array `[3, 6, 9]` is broadcast across the rows to match the shape of `A`.


#### Case 2: `keepdims=True`


A = [[1, 2, 3],


     [4, 5, 6],


     [7, 8, 9]]


```python


np.max(A, axis=1, keepdims=True)  # => [[3], [6], [9]]



```


- `np.max(A, axis=1, keepdims=True)` returns a 2D array with shape `(3, 1)`.


- During broadcasting, this `(3, 1)` array is stretched along columns to match `A` (shape `(3, 3)`).


So, in the expression:


```python


A - np.max(A, axis=1, keepdims=True)



```


the array `[[3], [6], [9]]` is broadcast across the columns to match the shape of `A`, and we get the same result as above.

In [1]:
import numpy as np

In [2]:
A = [
    [1, 2, 3],
    [4, 5, 6],
    [7, 8, 9]
]

print(np.sum(A))

print(np.sum(A, axis=0))  # Sum along columns
print(np.sum(A, axis=0).shape)

print(np.sum(A, axis=1))  # Sum along rows
print(np.sum(A, axis=1).shape)

print(np.sum(A, axis=0, keepdims=True))  # Keep dimensions
print(np.sum(A, axis=0, keepdims=True).shape)

print(np.sum(A, axis=1, keepdims=True))  # Keep dimensions
print(np.sum(A, axis=1, keepdims=True).shape)

45
[12 15 18]
(3,)
[ 6 15 24]
(3,)
[[12 15 18]]
(1, 3)
[[ 6]
 [15]
 [24]]
(3, 1)
