### Universal Functions (ufuncs) in NumPy

- Universal Functions, or ufuncs, are `vectorized` functions provided by NumPy that operate element-wise on arrays. They are fast, efficient, and written in C under the hood for performance.
- Vectorized operations are operations that act on whole arrays at once—no loops!
- These operations are implemented using ufuncs (universal functions), which are:
    - Written in low-level C (much faster than Python).
    - Run internally on the entire array, avoiding slow Python loops.
    - Take advantage of CPU-level optimizations and memory efficiency.

In [57]:
import numpy as np

In [131]:
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])

# Element-wise addition
z = np.add(x, y)
print(z) 
#x + y

[5 7 9]


#### Slow: Using Python Loops Instead of Vectorization
When you use Python for-loops to operate on each element one-by-one, it becomes much slower.

##### Example of slow computation:

In [132]:
x = [1, 2, 3]
y = [4, 5, 6]
z = []

for i in range(len(x)):
    z.append(x[i] + y[i])
z

[5, 7, 9]

This is slow because:

- Each iteration goes through the Python interpreter.
- Python is not optimized for numeric computation.

#### Benefits of ufuncs:
- Fast: Implemented in C for performance.
- Readable: Clean syntax compared to loops.
- Broadcasting: Can handle arrays of different shapes.
- Memory Efficient: Avoids unnecessary copying or looping.

#### NumPy Operators and Their Equivalent Universal Functions (ufuncs)

| **Operator** | **Equivalent ufunc**   | **Description**                        |
|--------------|------------------------|----------------------------------------|
| `+`          | `np.add`               | Addition (e.g., 1 + 1 = 2)             |
| `-`          | `np.subtract`          | Subtraction (e.g., 3 - 2 = 1)          |
| `-` (unary)  | `np.negative`          | Unary negation (e.g., -2)              |
| `*`          | `np.multiply`          | Multiplication (e.g., 2 * 3 = 6)       |
| `/`          | `np.divide`            | Division (e.g., 3 / 2 = 1.5)           |
| `//`         | `np.floor_divide`      | Floor division (e.g., 3 // 2 = 1)      |
| `**`         | `np.power`             | Exponentiation (e.g., 2 ** 3 = 8)      |
| `%`          | `np.mod`               | Modulus/remainder (e.g., 9 % 4 = 1)    |


In [62]:
# Define two sample arrays
a = np.array([10, 20, 30])
b = np.array([3, 4, 5])

print("a:", a)
print("b:", b)

a: [10 20 30]
b: [3 4 5]


In [63]:
# Addition
print("\nAddition")
print("a + b:", a + b)                  # Using operator
print("np.add(a, b):", np.add(a, b))    # Using ufunc


Addition
a + b: [13 24 35]
np.add(a, b): [13 24 35]


In [64]:
# Subtraction
print("\nSubtraction")
print("a - b:", a - b)
print("np.subtract(a, b):", np.subtract(a, b))


Subtraction
a - b: [ 7 16 25]
np.subtract(a, b): [ 7 16 25]


In [65]:
# Unary Negation
print("\nUnary Negation")
print("-a:", -a)
print("np.negative(a):", np.negative(a))


Unary Negation
-a: [-10 -20 -30]
np.negative(a): [-10 -20 -30]


In [66]:
# Multiplication
print("\nMultiplication")
print("a * b:", a * b)
print("np.multiply(a, b):", np.multiply(a, b))


Multiplication
a * b: [ 30  80 150]
np.multiply(a, b): [ 30  80 150]


In [67]:
# Division
print("\nDivision")
print("a / b:", a / b)
print("np.divide(a, b):", np.divide(a, b))


Division
a / b: [3.33333333 5.         6.        ]
np.divide(a, b): [3.33333333 5.         6.        ]


In [68]:
# Floor Division
print("\nFloor Division")
print("a // b:", a // b)
print("np.floor_divide(a, b):", np.floor_divide(a, b))


Floor Division
a // b: [3 5 6]
np.floor_divide(a, b): [3 5 6]


In [69]:
# Exponentiation
print("\nExponentiation")
print("a ** b:", a ** b)
print("np.power(a, b):", np.power(a, b))


Exponentiation
a ** b: [    1000   160000 24300000]
np.power(a, b): [    1000   160000 24300000]


In [70]:
# Modulus
print("\nModulus")
print("a % b:", a % b)
print("np.mod(a, b):", np.mod(a, b))


Modulus
a % b: [1 0 0]
np.mod(a, b): [1 0 0]


### Aggregation in Numpy

- Often when faced with a large amount of data, a first step is to compute summary statistics for the data in question.
- Perhaps the most common summary statistics are the mean and standard deviation, which allow you to summarize the “typical” values in a dataset, but other aggregates are useful as well (the sum, product, median, minimum and maximum, quantiles, etc.)
- NumPy has fast built-in aggregation functions for working on arrays; we’ll discuss and demonstrate some of them here.

#### Summing the Values in an Array

Python itself can do this using the built-in `sum` function

In [1]:
import numpy as np

In [136]:
L = np.arange(1, 10)
L

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [138]:
# using python sum function
print(sum(L))

45


In [139]:
# using NumPy’s sum function
print(np.sum(L))

45


In [142]:
L2 = np.random.random(100)
L2

array([0.83228301, 0.92515159, 0.56283368, 0.73879196, 0.82239864,
       0.66628648, 0.58521561, 0.382839  , 0.82249128, 0.98211114,
       0.55632782, 0.52938123, 0.63812253, 0.33650516, 0.60826023,
       0.28961307, 0.12366061, 0.35848841, 0.70389458, 0.21693803,
       0.70368317, 0.25967293, 0.5189442 , 0.19932382, 0.6621334 ,
       0.69177577, 0.62689146, 0.70194795, 0.03935499, 0.10927607,
       0.29862856, 0.67681751, 0.75933015, 0.89115502, 0.670748  ,
       0.61838308, 0.62311947, 0.37849324, 0.95983144, 0.31659032,
       0.44156135, 0.70815906, 0.74746149, 0.57119435, 0.5982731 ,
       0.07018558, 0.3517191 , 0.47580435, 0.2267789 , 0.3786267 ,
       0.99255809, 0.19057276, 0.96599579, 0.63758636, 0.86652707,
       0.43506456, 0.44497028, 0.55302618, 0.29846117, 0.02510281,
       0.70544558, 0.01491077, 0.96732296, 0.69227771, 0.30786207,
       0.05603385, 0.37101798, 0.36660153, 0.59538903, 0.58331037,
       0.71592892, 0.36366874, 0.86697695, 0.36789483, 0.80155

In [143]:
print(np.sum(L2))

50.79188857611674


In [144]:
print(sum(L2))

50.791888576116726


However, because it executes the operation in compiled code, NumPy’s version of the operation is computed much more
quickly:

In [27]:
big_array = np.random.rand(1000000)

In [145]:
%timeit np.sum(big_array)

2.04 ms ± 244 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [146]:
%timeit sum(big_array)

217 ms ± 43.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


#### Minimum and Maximum

Similarly, Python has built-in min and max functions, used to find the minimum value and maximum value of any given array:

In [147]:
L = np.arange(1, 100)
L

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
       35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51,
       52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68,
       69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85,
       86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

In [148]:
print(max(L))

99


In [149]:
print(min(L))

1


NumPy’s corresponding functions have similar syntax, and again operate much more quickly:

In [150]:
%timeit min(big_array)

132 ms ± 22.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [151]:
%timeit np.min(big_array)

897 μs ± 136 μs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


###  Multi dimensional aggregates

One common type of aggregation operation is an aggregate along a row or column. 

In [45]:
M = np.array([[1,2,3], [4,5,6], [7,8,9]])
print(M)

[[1 2 3]
 [4 5 6]
 [7 8 9]]


In [47]:
M.sum()

np.int64(45)

By default, each NumPy aggregation function will return the aggregate over the entire array:

Aggregation functions take an additional argument specifying the axis along which the aggregate is computed. 

In [49]:
M

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [None]:
np.array([1,2,3], [4,5,6], [7,8,9])

In [56]:
# minimum value within each column
# M.min(axis=0)
M.sum(axis=0)

array([12, 15, 18])

In [54]:
#minimum value within each row
# M.min(axis=1)
M.sum(axis=1)


array([ 6, 15, 24])

#### NumPy Aggregation Functions Summary

In NumPy, a NaN-safe version of a function is a version that ignores NaN (Not a Number) values when performing computations.

| **Function**      | **NaN-Safe Version** | **Description**                           |
|-------------------|----------------------|-------------------------------------------|
| `np.sum`          | `np.nansum`          | Compute sum of elements                   |
| `np.prod`         | `np.nanprod`         | Compute product of elements               |
| `np.mean`         | `np.nanmean`         | Compute mean of elements                  |
| `np.std`          | `np.nanstd`          | Compute standard deviation                |
| `np.var`          | `np.nanvar`          | Compute variance                          |
| `np.min`          | `np.nanmin`          | Find minimum value                        |
| `np.max`          | `np.nanmax`          | Find maximum value                        |
| `np.argmin`       | `np.nanargmin`       | Find index of minimum value               |
| `np.argmax`       | `np.nanargmax`       | Find index of maximum value               |
| `np.median`       | `np.nanmedian`       | Compute median of elements                |
| `np.percentile`   | `np.nanpercentile`   | Compute rank-based statistics (percentile)|
| `np.any`          | N/A                  | Evaluate whether any elements are true    |
| `np.all`          | N/A                  | Evaluate whether all elements are true    |


### Broadcasting in NumPy

Broadcasting is a powerful feature in NumPy that allows operations between arrays of different shapes, without manually reshaping or replicating data.

`Think of it like this:`
Instead of resizing arrays to the same shape, NumPy "stretches" the smaller one virtually, so operations can happen element-wise without copying data.



In [152]:
# Add scalar to array (simple broadcasting)

a = np.array([1, 2, 3])
# b = 5

# print(a+b)
a


array([1, 2, 3])

`NumPy "broadcasts" 5 to [5, 5, 5] behind the scenes and adds it to a.`

#### Broadcasting Rules
Broadcasting in NumPy follows a strict set of rules to determine the interaction between the two arrays:
- `Rule 1:`  If arrays have different numbers of dimensions, the smaller one is padded with ones on the left.
```python
(3, 2) and (2,) → becomes (3, 2) and (1, 2)
```
- `Rule 2:` Then, it compares dimensions from right to left. If the shape of the two arrays does not match in any dimension, the dimensions of size 1 can be stretched to match the other.
```python
(3, 1) and (3, 2) → the (3, 1) becomes (3, 2) during the operation
```
You can only stretch a dimension if it is 1.
- `Rule 3:` If dimensions are not equal and not 1, the arrays are incompatible.
```python
(3, 2) and (3,) → becomes (3, 2) and (1, 3) → ❌ incompatible: 2 ≠ 3
```

#####  Broadcasting example 1

In [105]:
#  Add 1D and 2D arrays
a = np.array([[1, 2, 3],
              [4, 5, 6]])
b = np.array([10, 20, 30])

print(a + b)

print(a.shape)
print(b.shape)

[[11 22 33]
 [14 25 36]]
(2, 3)
(3,)


We see by rule 1 that the array `b` has fewer dimensions, so we pad it on the left with ones:
- a.shape -> (2, 3)
- b.shape -> (1, 3)

By rule 2, we now see that the first shapes disagrees, so we stretch this dimension to match:
- a.shape -> (2, 3)
- b.shape -> (2, 3)

The shapes match, and we see that the final shape will be 
`(2,3)`


In [107]:
x = a + b
x
x.shape

(2, 3)

####  Broadcasting example 2

In [None]:
# example where both arrays need to be broadcast:
a = np.arange(3).reshape((3, 1))
b = np.arange(3)
print(a)
print(b)
print(a.shape)
print(b.shape)

[[0]
 [1]
 [2]]
[0 1 2]
(3, 1)
(3,)


3

Step 1: Align dimensions
Array	Shape
a	(3, 1)
b	(3,) → treated as (1, 3) for broadcasting

So NumPy views this as:
- a.shape = (3, 1)
- b.shape = (1, 3)

Now it applies broadcasting:
| Dimension | `a` | `b` | Result |
| --------- | --- | --- | ------ |
| First     | 3   | 1   | 3      |
| Second    | 1   | 3   | 3      |

Final result shape = (3, 3)

In [114]:
a + b

array([[0, 1, 2],
       [1, 2, 3],
       [2, 3, 4]])

#### Broadcasting example 3

In [127]:
# example in which the two arrays are not compatible:

M = np.ones((3, 2), dtype=int)
a = np.arange(3)
print(M)
print(a)
print(M.shape)
print(a.shape)

[[1 1]
 [1 1]
 [1 1]]
[0 1 2]
(3, 2)
(3,)


Step 1: Check the shapes

| Variable | Shape  | Meaning                  |
| -------- | ------ | ------------------------ |
| `M`      | (3, 2) | 3 rows × 2 columns       |
| `a`      | (3,)   | A 1D array of 3 elements |

Step 2: Apply broadcasting rules
Broadcasting aligns from right to left, padding smaller shapes with 1s on the left.

Let’s align the shapes:

| M's shape | (3, 2) |
| a's shape | (3,) → becomes (1, 3) when padded |

Now compare dimension-wise from the right:

| Dimension | M | a (broadcasted) | Compatible? |
| --------- | - | --------------- | ----------- |
| Last      | 2 | 3               |   No        |
| 2nd Last  | 3 | 1               |   Yes       |

The last dimension sizes are 2 and 3, which are not equal and neither is 1, so they cannot be broadcast together.


In [126]:
M + a

ValueError: operands could not be broadcast together with shapes (3,2) (3,) 