# Vectorized Operations in NumPy

Vectorized operations in NumPy are faster than traditional Python loops because of the way NumPy is implemented and how it takes advantage of modern computer architecture. Here's a detailed explanation:

### 1. **NumPy is Written in C and Optimized for Performance**
   - NumPy's core is written in **C** and **Fortran**, which are much faster than Python.
   - When you use vectorized operations in NumPy, the heavy lifting happens in C/Fortran, bypassing the slower Python interpreter. This allows for **compiled, low-level execution** directly on the data.

### 2. **Elimination of Python Loops**
   - **Traditional Python loops** (e.g., `for` loops) involve significant overhead because each iteration needs to go through the Python interpreter.
   - In contrast, **vectorized operations** perform computations on entire arrays in one step, avoiding the need for explicit looping in Python. The operations are executed in **low-level, highly-optimized loops in C**, which are faster than high-level Python loops.

### 3. **Memory Contiguity and Efficient Access**
   - NumPy arrays are stored in **contiguous blocks of memory**, unlike Python lists, which are arrays of pointers. This means:
     - CPU can **cache** and **fetch data more efficiently**.
     - Memory access is faster, as there are fewer indirections.
   - This layout enables **SIMD (Single Instruction, Multiple Data)** optimizations, where a single instruction can process multiple data points simultaneously.

### 4. **Broadcasting**
   - NumPy supports **broadcasting**, which allows you to apply operations to arrays of different shapes without explicitly writing nested loops.
   - Broadcasting uses **vectorized operations** under the hood to extend smaller arrays to match the shape of larger ones without creating temporary copies, which saves time and memory.

### 5. **Low-Level Multi-Threading**
   - Many NumPy operations (like matrix multiplication) internally use multi-threading, meaning they take advantage of multiple CPU cores to perform tasks simultaneously.
   - Python loops, unless explicitly parallelized (e.g., using `concurrent.futures`), run on a single core.

### 6. **Fewer Function Calls**
   - Python loops involve multiple function calls for indexing and computations in each iteration. Each function call adds overhead.
   - Vectorized operations compute the result in a single function call, reducing overhead significantly.

### 7. **Optimized Math Libraries**
   - NumPy links to **high-performance libraries** like BLAS (Basic Linear Algebra Subprograms) and LAPACK (Linear Algebra Package) for numerical computations. These libraries are designed to perform matrix and vector operations extremely efficiently.

### Results:
- Python loops take significantly longer because they iterate over elements one by one.
- NumPy's vectorized operations compute the result in a fraction of the time because:
  1. Low-level, optimized C code processes the data.
  2. Data is fetched and computed in batches, not one element at a time.

---

### Real-Life Analogy
Imagine you need to move 1 million bricks:
- **Python Loop**: One worker (the Python interpreter) moves the bricks one at a time. Each trip involves picking up a brick, walking to the destination, and placing it.
- **NumPy Vectorized Operation**: A bulldozer (C/Fortran backend) moves the entire batch of bricks in one go, completing the task much faster.

---

### Summary of Key Benefits
1. **Avoids interpreter overhead** by executing operations in C.
2. **Efficient memory access** using contiguous arrays.
3. **Utilizes parallelism** and modern CPU optimizations like SIMD.
4. **Optimized math libraries** handle operations with minimal overhead.

By leveraging these features, NumPy can perform operations orders of magnitude faster than loops in pure Python.

In [1]:
#### Using Loops:
import numpy as np
import time

# Create two arrays
size = 10**6
a = np.arange(size)
b = np.arange(size)

# Traditional Python loop
start = time.time()
result = [a[i] + b[i] for i in range(size)]
end = time.time()
print("Time with Python loop:", end - start)

#### Using Vectorized Operations:
# Vectorized NumPy addition
start = time.time()
result = a + b
end = time.time()
print("Time with NumPy vectorized operation:", end - start)

Time with Python loop: 0.2270195484161377
Time with NumPy vectorized operation: 0.012979269027709961


Let us now start with some vectorized operations!

However, before we go the vectorized operations, let us create some arrays using the knowledge of the previous notebooks!

In [3]:
# Let us convert a 1 D array by reshaping it to a 2 D array 
array_1D = np.linspace(start=2, stop=20, num=9)
print(f"Original 1 D array: \n{array_1D}")

array_1D_reshape_2D = array_1D.reshape((3,3))
print(f"The 1 D array after being reshaped to a 2 D array:\n{array_1D_reshape_2D}")

Original 1 D array: 
[ 2.    4.25  6.5   8.75 11.   13.25 15.5  17.75 20.  ]
The 1 D array after being reshaped to a 2 D array:
[[ 2.    4.25  6.5 ]
 [ 8.75 11.   13.25]
 [15.5  17.75 20.  ]]


In [5]:
# concatenating three 1 D arrays 
array_1 = np.array([1,2,3])
array_2 = np.array([4,5,6])
array_3 = np.array([7,8,9])

array_concat = np.concatenate([array_1, array_2, array_3]) # note pass the three arrays in a list
print(array_concat)

[1 2 3 4 5 6 7 8 9]


In [7]:
# concatenate 2 D arrays 
array_4 = array_concat.reshape((3,3))
print(array_4)

# say I want to concat the array_4 to itself row-wise i.e. first axis 
array_concat_row = np.concatenate([array_4, array_4])
print(array_concat_row)

# say I want to concat the array_4 to itself column-wise i.e. second axis 
array_concat_col = np.concatenate([array_4, array_4], axis=1)
print(array_concat_col)

[[1 2 3]
 [4 5 6]
 [7 8 9]]
[[1 2 3]
 [4 5 6]
 [7 8 9]
 [1 2 3]
 [4 5 6]
 [7 8 9]]
[[1 2 3 1 2 3]
 [4 5 6 4 5 6]
 [7 8 9 7 8 9]]


In [9]:
# Let us see some more vectorized operations
# let us first print the array
print(f"Consider this array:\n{array_1D}")

# recirpocal of the array - vectorized operation
print(f"Reciprocal Array:\n{1/array_1D}")

# divide one array by the other array
print(f'The two arrays we are considering are:{array_1D},{array_concat}')
print(f"Dividing former with the latter array: {array_1D/array_concat}")

# we can do this for the multidimensional array too
print(f"First multidimensional array:\n{array_1D_reshape_2D}")
print(f"Second multidimensional array:\n{array_4}")

# let us add this two arrays
print(f"Addition of the above two arrays:\n{array_4+array_1D_reshape_2D}")

Consider this array:
[ 2.    4.25  6.5   8.75 11.   13.25 15.5  17.75 20.  ]
Reciprocal Array:
[0.5        0.23529412 0.15384615 0.11428571 0.09090909 0.0754717
 0.06451613 0.05633803 0.05      ]
The two arrays we are considering are:[ 2.    4.25  6.5   8.75 11.   13.25 15.5  17.75 20.  ],[1 2 3 4 5 6 7 8 9]
Dividing former with the latter array: [2.         2.125      2.16666667 2.1875     2.2        2.20833333
 2.21428571 2.21875    2.22222222]
First multidimensional array:
[[ 2.    4.25  6.5 ]
 [ 8.75 11.   13.25]
 [15.5  17.75 20.  ]]
Second multidimensional array:
[[1 2 3]
 [4 5 6]
 [7 8 9]]
Addition of the above two arrays:
[[ 3.    6.25  9.5 ]
 [12.75 16.   19.25]
 [22.5  25.75 29.  ]]


## Exploring NumPy ufuncs


#### Universal Functions (ufuncs) in NumPy

A **ufunc** (short for **Universal Function**) in NumPy is a function that operates element-wise on arrays. They are one of the key reasons behind NumPy's speed and efficiency. Let's delve into their significance, working mechanism, and advantages.

---

##### Key Characteristics of ufuncs

1. **Element-wise Operations**: 
   - Ufuncs process each element in an array independently and in a vectorized manner. For example:
     ```python
     import numpy as np
     arr = np.array([1, 2, 3, 4])
     result = np.sqrt(arr)  # Element-wise square root
     print(result)  # Output: [1. 1.41421356 1.73205081 2.]
     ```

2. **Support for Broadcasting**: 
   - Ufuncs automatically apply operations to arrays of different shapes by extending the smaller array (via broadcasting).
     ```python
     arr = np.array([1, 2, 3])
     result = arr + 5  # Broadcasting applies the scalar value 5 to all elements
     print(result)  # Output: [6 7 8]
     ```

3. **Vectorized Operations**:
   - Unlike loops, which iterate element by element, ufuncs perform operations on the entire array at once using optimized C backend code.

4. **Type Flexibility**:
   - Ufuncs support a variety of input types (e.g., integers, floats, and complex numbers) and can return outputs in specified data types.

5. **High Performance**:
   - They are written in C and Fortran, making them much faster than Python loops. Ufuncs eliminate Python's overhead and operate directly on the array's memory.

---

##### Types of Ufuncs

1. **Unary Ufuncs**:
   - Operate on a single input array, producing an output array.
     Examples:
     - `np.sqrt`, `np.abs`, `np.log`, `np.exp`, `np.sin`, `np.cos`
     ```python
     arr = np.array([1, 4, 9, 16])
     print(np.sqrt(arr))  # Output: [1. 2. 3. 4.]
     ```

2. **Binary Ufuncs**:
   - Operate on two input arrays element-wise, producing an output array.
     Examples:
     - `np.add`, `np.subtract`, `np.multiply`, `np.divide`, `np.maximum`, `np.minimum`
     ```python
     arr1 = np.array([1, 2, 3])
     arr2 = np.array([4, 5, 6])
     print(np.add(arr1, arr2))  # Output: [5 7 9]
     ```

---

##### Advantages of Ufuncs

1. **Fast Execution**:
   - Ufuncs use highly-optimized C loops, significantly faster than Python loops.

2. **Memory Efficiency**:
   - Operate in place (if specified) to reduce memory usage, avoiding the need to create unnecessary temporary arrays.
     ```python
     arr = np.array([1, 2, 3])
     np.add(arr, 10, out=arr)  # In-place addition
     print(arr)  # Output: [11 12 13]
     ```

3. **Support for Broadcasting**:
   - Seamlessly operate on arrays of different shapes without requiring manual replication of data.

4. **Multiple Outputs**:
   - Some ufuncs return multiple outputs, such as `np.modf`, which separates the fractional and integer parts of an array:
     ```python
     arr = np.array([1.5, 2.3, 3.7])
     fractional, integral = np.modf(arr)
     print(f"Fractional: {fractional}, Integral: {integral}")
     # Output: Fractional: [0.5 0.3 0.7], Integral: [1. 2. 3.]
     ```

5. **Customizable**:
   - You can define your own ufuncs using `numpy.frompyfunc` or `numpy.vectorize` to handle custom element-wise operations.

---

##### Real-Life Analogy of Ufuncs
Think of ufuncs as a **conveyor belt** in a factory:
- Each product (element in the array) is processed simultaneously and independently on the conveyor belt.
- Instead of manually handling each product (using loops), the ufunc (conveyor belt) processes all products in one go, saving time and effort.

---

##### Practical Use Case: Sigmoid Function

Suppose you need to calculate the sigmoid function \( \sigma(x) = \frac{1}{1 + e^{-x}} \) for a large array:

```python
arr = np.array([1, 2, 3, 4])
sigmoid = 1 / (1 + np.exp(-arr))
print(sigmoid)  # Output: [0.73105858 0.88079708 0.95257413 0.98201379]
```

Here:
- The `np.exp` ufunc computes \( e^{-x} \) efficiently for the entire array.
- The operations `1 +`, `/`, and `1 /` are applied element-wise using NumPy's optimized functions.

---

##### Summary

Ufuncs are the backbone of NumPy's speed and efficiency. By eliminating Python loops, leveraging broadcasting, and operating directly on contiguous memory, ufuncs enable fast, memory-efficient, and readable code for numerical computations.

In [10]:
# ARITHMETIC FUNCTIONS
# Define two arrays
arr1 = np.array([10, 20, 30])
arr2 = np.array([3, 4, 5])

# Operator equivalent ufunc examples
addition = np.add(arr1, arr2)        # Equivalent to arr1 + arr2
subtraction = np.subtract(arr1, arr2)  # Equivalent to arr1 - arr2
negation = np.negative(arr1)         # Equivalent to -arr1
multiplication = np.multiply(arr1, arr2)  # Equivalent to arr1 * arr2
division = np.divide(arr1, arr2)     # Equivalent to arr1 / arr2
floor_division = np.floor_divide(arr1, arr2)  # Equivalent to arr1 // arr2
exponentiation = np.power(arr1, 2)   # Equivalent to arr1 ** 2
modulus = np.mod(arr1, arr2)         # Equivalent to arr1 % arr2

# Print results
print(f"Addition: {addition}")         # [13 24 35]
print(f"Subtraction: {subtraction}")   # [7 16 25]
print(f"Negation: {negation}")         # [-10 -20 -30]
print(f"Multiplication: {multiplication}")  # [30 80 150]
print(f"Division: {division}")         # [3.33333333 5.         6.        ]
print(f"Floor Division: {floor_division}")  # [3 5 6]
print(f"Exponentiation: {exponentiation}")  # [100 400 900]
print(f"Modulus: {modulus}")           # [1 0 0]

Addition: [13 24 35]
Subtraction: [ 7 16 25]
Negation: [-10 -20 -30]
Multiplication: [ 30  80 150]
Division: [3.33333333 5.         6.        ]
Floor Division: [3 5 6]
Exponentiation: [100 400 900]
Modulus: [1 0 0]


In [11]:
# Absolute Value
arr_abs = np.array([-10, -20, 30])
absolute_values = np.abs(arr_abs)  # Computes the absolute values
print(f"Absolute Values: {absolute_values}")  # [10 20 30]

# Trigonometric Functions
angles = np.array([0, np.pi/2, np.pi])  # Angles in radians
sin_values = np.sin(angles)             # Sine of angles
cos_values = np.cos(angles)             # Cosine of angles
tan_values = np.tan(angles)             # Tangent of angles
print(f"Sine Values: {sin_values}")     # [0. 1. 0.]
print(f"Cosine Values: {cos_values}")   # [1. 0. -1.]
print(f"Tangent Values: {tan_values}")  # [0. inf -0.]

# Exponents and Logarithms
arr_exp = np.array([1, 2, 3])
exponentials = np.exp(arr_exp)           # Exponentials (e^x)
natural_logs = np.log(arr_exp)           # Natural logarithms (ln(x))
base10_logs = np.log10(arr_exp)          # Logarithms base 10
base2_logs = np.log2(arr_exp)            # Logarithms base 2
print(f"Exponentials: {exponentials}")   # [ 2.71828183  7.3890561  20.08553692]
print(f"Natural Logs: {natural_logs}")   # [0.         0.69314718 1.09861229]
print(f"Base-10 Logs: {base10_logs}")    # [0.         0.30103    0.47712125]
print(f"Base-2 Logs: {base2_logs}")      # [0.         1.         1.5849625 ]

Absolute Values: [10 20 30]
Sine Values: [0.0000000e+00 1.0000000e+00 1.2246468e-16]
Cosine Values: [ 1.000000e+00  6.123234e-17 -1.000000e+00]
Tangent Values: [ 0.00000000e+00  1.63312394e+16 -1.22464680e-16]
Exponentials: [ 2.71828183  7.3890561  20.08553692]
Natural Logs: [0.         0.69314718 1.09861229]
Base-10 Logs: [0.         0.30103    0.47712125]
Base-2 Logs: [0.        1.        1.5849625]


### Advanced ufuncs

In [13]:
# specifying the output 
array_1 = np.arange(9)
print(array_1)

array_2 = np.empty(9)

np.multiply(array_1, 9, out=array_2) # specifying the out parameter 
print(array_2)

[0 1 2 3 4 5 6 7 8]
[ 0.  9. 18. 27. 36. 45. 54. 63. 72.]


In [14]:
arr_x = np.arange(5)
arr_y = np.zeros(10)
np.power(2, arr_x, out=arr_y[::2])
print(arr_y)

[ 1.  0.  2.  0.  4.  0.  8.  0. 16.  0.]


Note, however, we can do this same operation as follows:

In [15]:
arr_x = np.arange(5)
arr_y = np.zeros(10)
arr_y[::2] = 2**arr_x
print(arr_y)

[ 1.  0.  2.  0.  4.  0.  8.  0. 16.  0.]


The difference between `np.power(2, x, out=y[::2])` and `y[::2] = 2 ** x` lies in how NumPy handles memory and computation efficiency.

#### Explanation:

##### `np.power(2, x, out=y[::2])`
- **What happens**: The `out` parameter in `np.power` directly specifies where the result of the computation should be stored. In this case, the result of \(2^x\) is directly written into the specified slice `y[::2]` of the array `y`.
- **Efficiency**: No temporary array is created. The computation and assignment happen in a single step, which is memory-efficient and faster for large arrays.
- **Memory savings**: Since no intermediate array is created, less memory is used. This is particularly important when working with large datasets or arrays.

##### `y[::2] = 2 ** x`
- **What happens**: The expression `2 ** x` creates a temporary array to store the results of \(2^x\). This temporary array is then copied into the slice `y[::2]`.
- **Efficiency**: This involves two steps:
  1. Computation of `2 ** x` and creation of a temporary array.
  2. Copying the values from the temporary array into `y[::2]`.
- **Memory usage**: The temporary array occupies extra memory, which may become significant for large arrays.

---

#### Why it matters:
- **Small computations**: For small arrays, the difference in performance or memory usage is negligible because modern systems can handle small temporary arrays with ease.
- **Large computations**: For very large arrays (e.g., millions of elements), the creation of temporary arrays can significantly impact memory usage and performance. By directly writing to the target array using the `out` parameter, you avoid this overhead.

---

#### Key Takeaways:
- Use the `out` parameter in ufuncs (universal functions like `np.power`, `np.add`, etc.) to directly store the results in an existing array when memory efficiency is critical.
- This technique is particularly valuable in large-scale computations, where reducing temporary memory allocation can improve performance.

### Aggregations

In [16]:
# The reduce method repeatedly applies a binary operation (like addition or multiplication) across all elements of an array, reducing it to a single result.
arr_x = np.arange(1,10)
print(f"Array to be reduced:{arr_x}")
# I want to reduce this array to a single dibit and the reduction should happen by adding the digits 
print(f"Reduction by Addition of the array: {np.add.reduce(array=arr_x)}")
print(f"Reduction by Multiplication of the array: {np.multiply.reduce(array=arr_x)}")


Array to be reduced:[1 2 3 4 5 6 7 8 9]
Reduction by Addition of the array: 45
Reduction by Multiplication of the array: 362880


In [17]:
# The accumulate method stores all intermediate results of the binary operation, instead of collapsing them into a single value.

arr_x = np.arange(1,10)
print(f"Array to be aggregated:{arr_x}")

print(f"Accumulation by Addition of the array: {np.add.accumulate(array=arr_x)}")
print(f"Accumulation by Multiplication of the array: {np.multiply.accumulate(array=arr_x)}")

Array to be aggregated:[1 2 3 4 5 6 7 8 9]
Accumulation by Addition of the array: [ 1  3  6 10 15 21 28 36 45]
Accumulation by Multiplication of the array: [     1      2      6     24    120    720   5040  40320 362880]


In [18]:
# let us take random 100 numbers 
L = np.random.randint(100, size=100)
print(f"The sequence of 100 integers are:\n{L}")

# let us find the sum of this sequence 
print(f"The sum is : {sum(L)}")

# However, it ius better to use the NumPy version since it is fast
print(f"The sum using np.sum is:{np.sum(L)}")

# finding the min and the max
print(f"The minimum in the series is : {np.min(L)} and the maximum is {np.max(L)}")

The sequence of 100 integers are:
[64 80 92 51 24 26 81 37  0 92 97 37 53 59 29 56 44 75  2 48 51 18  4 98
 44 71 69 42 95 80 78 98 26 15 43 17 46  0 11 86 88 19 76 92 61  8 92  9
  7 22 90 54  0 62 50 58 49 15 36 28 22 39 84 78  6 87 24 75 30 81 75 60
 31 27  5 73 94 14 43 34 22 55 16 52 63 37 38  5 17 71 38 66 60 75 41 14
 45 25 13 68]
The sum is : 4758
The sum using np.sum is:4758
The minimum in the series is : 0 and the maximum is 98


In [20]:
L = np.random.randint(9, size=(3,3))
print(f"The 2 D array is:\n{L}")

# summing the array
print(f"The sum of the array is: {np.sum(L)}")

# summing along the row axis
print(f"The sum across the rows is:{np.sum(L, axis=0)}")

# summing along the column axis 
print(f"The sum across the column axis is:{np.sum(L, axis=1)}")

# min and max 
print(f"The min of the array is: {np.min(L)} and the max is: {np.max(L)}")
print(f"The min across the row axis is:{np.min(L, axis=0)} and the max along the row is:{np.max(L, axis=0)}")
print(f"The min across the col axis is:{np.min(L, axis=1)} and the max along the col is:{np.max(L, axis=1)}")

# argmax and argmin
print(f"The index of the min value is : {np.argmin(L)}")

The 2 D array is:
[[0 0 6]
 [3 3 8]
 [1 7 8]]
The sum of the array is: 36
The sum across the rows is:[ 4 10 22]
The sum across the column axis is:[ 6 14 16]
The min of the array is: 0 and the max is: 8
The min across the row axis is:[0 0 6] and the max along the row is:[3 7 8]
The min across the col axis is:[0 3 1] and the max along the col is:[6 8 8]
The index of the min value is : 0


In NumPy, the `np.argmin` function returns the **index of the minimum value** in a flattened (1D) version of the array. This happens because NumPy treats multidimensional arrays as a single, flattened array for this operation.

Let’s break it down with your array:

#### Original Array:
```
[[7, 7, 8],
 [1, 0, 1],
 [3, 7, 8]]
```

#### Flattened Version (row-major order):
```
[7, 7, 8, 1, 0, 1, 3, 7, 8]
```

- In this flattened array, the **minimum value is `0`**.
- The index of `0` in this flattened array is **4** (zero-based indexing).

#### Row-Major Order Explanation:
NumPy flattens arrays in row-major order (C-style order), meaning it processes rows sequentially. For the given array:
- First row: `[7, 7, 8]` (indices 0, 1, 2)
- Second row: `[1, 0, 1]` (indices 3, 4, 5)
- Third row: `[3, 7, 8]` (indices 6, 7, 8)

Since `0` is in the second row, second column, its position in the flattened array is 4. That’s why `np.argmin` returns **4**.

In [21]:
# other aggregations 
data = np.random.randint(15, size=(4,4))
print(data)

print("Using standard NumPy functions:")
print("Sum:", np.sum(data))
print("Product:", np.prod(data))
print("Mean:", np.mean(data))
print("Standard Deviation:", np.std(data))
print("Variance:", np.var(data))
print("Minimum:", np.min(data))
print("Maximum:", np.max(data))
print("Index of Minimum:", np.argmin(data))
print("Index of Maximum:", np.argmax(data))
print("Median:", np.median(data))
print("Percentile (50th):", np.percentile(data, 50))
print("Any True Values:", np.any(data))
print("All True Values:", np.all(data))

[[ 8  2 14  7]
 [ 3 13  0  6]
 [13  7 12  8]
 [ 9 12  6 13]]
Using standard NumPy functions:
Sum: 133
Product: 0
Mean: 8.3125
Standard Deviation: 4.164113801278731
Variance: 17.33984375
Minimum: 0
Maximum: 14
Index of Minimum: 6
Index of Maximum: 2
Median: 8.0
Percentile (50th): 8.0
Any True Values: True
All True Values: False


In [22]:
# Define a 2D array with some NaN values
data = np.array([
    [1, 2, 3],
    [4, np.nan, 6],
    [7, 8, np.nan]
])

# Compute results using standard functions
print("Using standard NumPy functions:")
print("Sum:", np.sum(data))
print("Product:", np.prod(data))
print("Mean:", np.mean(data))
print("Standard Deviation:", np.std(data))
print("Variance:", np.var(data))
print("Minimum:", np.min(data))
print("Maximum:", np.max(data))
print("Index of Minimum:", np.argmin(data))
print("Index of Maximum:", np.argmax(data))
print("Median:", np.median(data))
print("Percentile (50th):", np.percentile(data, 50))
print("Any True Values:", np.any(data))
print("All True Values:", np.all(data))

# Compute results using nan-handling functions
print("\nUsing NumPy nan-handling functions:")
print("Sum (ignoring NaN):", np.nansum(data))
print("Product (ignoring NaN):", np.nanprod(data))
print("Mean (ignoring NaN):", np.nanmean(data))
print("Standard Deviation (ignoring NaN):", np.nanstd(data))
print("Variance (ignoring NaN):", np.nanvar(data))
print("Minimum (ignoring NaN):", np.nanmin(data))
print("Maximum (ignoring NaN):", np.nanmax(data))
print("Index of Minimum (ignoring NaN):", np.nanargmin(data))
print("Index of Maximum (ignoring NaN):", np.nanargmax(data))
print("Median (ignoring NaN):", np.nanmedian(data))
print("Percentile (50th, ignoring NaN):", np.nanpercentile(data, 50))

Using standard NumPy functions:
Sum: nan
Product: nan
Mean: nan
Standard Deviation: nan
Variance: nan
Minimum: nan
Maximum: nan
Index of Minimum: 4
Index of Maximum: 4
Median: nan
Percentile (50th): nan
Any True Values: True
All True Values: True

Using NumPy nan-handling functions:
Sum (ignoring NaN): 31.0
Product (ignoring NaN): 8064.0
Mean (ignoring NaN): 4.428571428571429
Standard Deviation (ignoring NaN): 2.4411439272335804
Variance (ignoring NaN): 5.959183673469389
Minimum (ignoring NaN): 1.0
Maximum (ignoring NaN): 8.0
Index of Minimum (ignoring NaN): 0
Index of Maximum (ignoring NaN): 7
Median (ignoring NaN): 4.0
Percentile (50th, ignoring NaN): 4.0


## Broadcasting Arrays 

### Rule 1
- If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading side.

### Rule 2
- If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape.

### Rule 3
- If in any dimension the sizes disagree and neither is equal to 1, an error is raised.

In [23]:

# A is 1-dimensional
A = np.array([1, 2, 3])  # Shape (3,)

# B is 2-dimensional
B = np.array([[1], [2], [3]])  # Shape (3, 1)

# Broadcasting shapes:
# A: (1, 3) (padded with 1 on the left to match dimensions)
# B: (3, 1)

# Resulting operation
result = A + B
print(result)

[[2 3 4]
 [3 4 5]
 [4 5 6]]


Let’s break down the explanation in detail for why the shapes change during broadcasting and why the resulting shape becomes `(3, 3)`:

---

#### **Step 1: Shapes of `A` and `B`**
- `A` has shape `(3,)`. It is 1-dimensional.
- `B` has shape `(3, 1)`. It is 2-dimensional.

To perform addition, the shapes of `A` and `B` must be made compatible using **broadcasting rules**.

---

#### **Step 2: Applying Rule 1**
##### If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading side.
- `A` has fewer dimensions, so its shape is padded:
  - Original shape: `(3,)`
  - Padded shape: `(1, 3)` (a leading `1` is added).

- `B`'s shape remains `(3, 1)` since it is already 2-dimensional.

Now we have:
- `A` → Shape `(1, 3)`
- `B` → Shape `(3, 1)`

---

#### **Step 3: Applying Rule 2**
##### If the shape of the two arrays does not match in any dimension, the array with shape equal to `1` in that dimension is stretched to match the other shape.

- First dimension (rows):  
  - `A` has `1`, and `B` has `3`.  
  - The `1` in `A` is stretched to match the `3` in `B`.  

- Second dimension (columns):  
  - `A` has `3`, and `B` has `1`.  
  - The `1` in `B` is stretched to match the `3` in `A`.

After stretching:
- `A` → Shape `(3, 3)` (rows: stretched from 1 to 3).
- `B` → Shape `(3, 3)` (columns: stretched from 1 to 3).

---

#### **Step 4: Element-Wise Addition**
After broadcasting, both arrays have the same shape `(3, 3)`, so element-wise addition proceeds as follows:

**Expanded `A`:**
```
[[1, 2, 3],   # Row 1 (stretched from original row [1, 2, 3])
 [1, 2, 3],   # Row 2
 [1, 2, 3]]   # Row 3
```

**Expanded `B`:**
```
[[1, 1, 1],   
 [2, 2, 2],   
 [3, 3, 3]]   
```

**Result (`A + B`):**
```
[[2, 3, 4],   # (1+1, 2+1, 3+1)
 [3, 4, 5],   # (1+2, 2+2, 3+2)
 [4, 5, 6]]   # (1+3, 2+3, 3+3)
```

---

#### **Conclusion**
The resulting shape is `(3, 3)` because:
- The first dimension (`1` in `A` and `3` in `B`) stretches to `3`.
- The second dimension (`3` in `A` and `1` in `B`) stretches to `3`.  
This makes the final shape `(3, 3)`.

In [25]:
# C is 2-dimensional
C = np.array([[1, 2, 3], [4, 5, 6]])  # Shape (2, 3)

# D is 1-dimensional
D = np.array([10, 20, 30])  # Shape (3,)

# Broadcasting shapes:
# C: (2, 3)
# D: (1, 3) (stretched to match the first dimension of C)

# Resulting operation
result = C + D
print(result)

[[11 22 33]
 [14 25 36]]


Let’s break down why and how the shapes of `C` and `D` are transformed during broadcasting, step by step:

---

#### **Step 1: Shapes of `C` and `D`**
- `C` has shape `(2, 3)`. It is a 2-dimensional array (2 rows and 3 columns).
- `D` has shape `(3,)`. It is a 1-dimensional array (3 elements).

To perform addition, the shapes of `C` and `D` must be made compatible using **broadcasting rules**.

---

#### **Step 2: Applying Rule 1**
##### If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading side.
- `D` has fewer dimensions than `C`, so its shape is padded:
  - Original shape: `(3,)`
  - Padded shape: `(1, 3)` (a leading `1` is added to make it 2-dimensional).

- `C`'s shape remains `(2, 3)` since it is already 2-dimensional.

Now we have:
- `C` → Shape `(2, 3)`
- `D` → Shape `(1, 3)`

---

#### **Step 3: Applying Rule 2**
##### If the shape of the two arrays does not match in any dimension, the array with shape equal to `1` in that dimension is stretched to match the other shape.

- **First dimension (rows):**
  - `C` has `2` rows, and `D` has `1` row.  
  - The `1` in `D` is stretched to match the `2` in `C`.  
  - This means `D` is conceptually duplicated along the first dimension.

- **Second dimension (columns):**
  - Both `C` and `D` have `3` columns, so no stretching is needed here.

After stretching:
- `C` → Shape `(2, 3)` (unchanged).
- `D` → Shape `(2, 3)` (rows stretched from 1 to 2).

---

#### **Step 4: Element-Wise Addition**
After broadcasting, both arrays have the same shape `(2, 3)`, so element-wise addition can proceed:

**Array `C`:**
```
[[1, 2, 3],   # Row 1
 [4, 5, 6]]   # Row 2
```

**Broadcasted `D`:**
```
[[10, 20, 30],   # Row 1 (original)
 [10, 20, 30]]   # Row 2 (stretched copy of the first row)
```

**Result (`C + D`):**
```
[[11, 22, 33],   # (1+10, 2+20, 3+30)
 [14, 25, 36]]   # (4+10, 5+20, 6+30)
```

---

#### **Conclusion**
- **Broadcasting Transformation:**
  - `D`'s shape becomes `(1, 3)` after padding.
  - It is then stretched along the first dimension to match `C`'s shape `(2, 3)`.

- **Final Shape of Result:**
  - The resulting shape is `(2, 3)`, and addition is performed element-wise.  
This is because broadcasting stretches arrays only where needed, maintaining computational efficiency and element-wise compatibility.

In [26]:
E = np.array([[1, 2, 3], [4, 5, 6]])  # Shape (2, 3)
F = np.array([1, 2])  # Shape (2,)

# Attempt to add
try:
    result = E + F
except ValueError as e:
    print(f"Error: {e}")

Error: operands could not be broadcast together with shapes (2,3) (2,) 


Let’s analyze why broadcasting fails in this scenario and why the shapes of `E` and `F` are incompatible:

---

#### **Step 1: Shapes of `E` and `F`**
- `E` has shape `(2, 3)`. It is a 2-dimensional array with 2 rows and 3 columns.
- `F` has shape `(2,)`. It is a 1-dimensional array with 2 elements.

---

#### **Step 2: Broadcasting Rules**
To determine whether broadcasting can occur, we apply the rules of broadcasting:

##### **Rule 1: Padding dimensions**
If the arrays differ in their number of dimensions, the shape of the array with fewer dimensions is padded with ones on the **left side** to match the number of dimensions of the other array.

- `E` is already 2-dimensional: `(2, 3)`.
- `F` is 1-dimensional, so it is padded with a leading `1`:
  - Original shape: `(2,)`
  - Padded shape: `(1, 2)`.

Now we have:
- `E` → Shape `(2, 3)`
- `F` → Shape `(1, 2)`

---

##### **Rule 2: Dimension-wise compatibility**
For broadcasting to succeed, the sizes in each dimension must satisfy one of the following conditions:
1. They are equal, or
2. One of the sizes is `1` (so it can stretch to match the other size).

- **First dimension (rows):**
  - `E` has `2` rows, and `F` has `1` row after padding.  
  - `1` can stretch to `2`. ✅ **Compatible**

- **Second dimension (columns):**
  - `E` has `3` columns, and `F` has `2` columns.  
  - These sizes **do not match**, and neither is `1`. ❌ **Incompatible**

Because of this mismatch in the second dimension, broadcasting cannot occur.

---

#### **Step 3: Why Broadcasting Fails**
The shapes `(2, 3)` for `E` and `(1, 2)` for `F` are incompatible because the second dimension of `E` (3 columns) cannot accommodate the second dimension of `F` (2 columns). 

---

#### **Conclusion**
1. **Broadcasting fails** because Rule 3 states:
   > If in any dimension the sizes disagree and neither is equal to `1`, an error is raised.
   
2. **Resolution:**
   - To make the shapes compatible, you must adjust `F` so that its second dimension becomes `1` or `3`.  
   - For example, you could reshape `F` to `(2, 1)` or `(2, 3)` to allow broadcasting.

### Applications of Broadcasting 

In [27]:
rng = np.random.default_rng(1000)

# Generate random integers
# Syntax: rng.integers(low, high=None, size=None, dtype=int)
random_integers = rng.integers(low=1, high=100, size=(5, 3))
print(random_integers)

# let us center this array
# we need to find the mean for each feature i.e. mean along the rows 
X_mean = np.mean(random_integers, axis=0)
print(f"The mean for each feature is:{X_mean}")

# Center the matrix 
centered_matrix = random_integers - X_mean
print(f"The centered matrix is: \n{centered_matrix}")
print(f"The mean for each feature now in the centered matrix is:{np.mean(centered_matrix, axis=0)}")

[[21 52 84]
 [60 82 47]
 [21 21 50]
 [53 30 19]
 [50 28 26]]
The mean for each feature is:[41.  42.6 45.2]
The centered matrix is: 
[[-20.    9.4  38.8]
 [ 19.   39.4   1.8]
 [-20.  -21.6   4.8]
 [ 12.  -12.6 -26.2]
 [  9.  -14.6 -19.2]]
The mean for each feature now in the centered matrix is:[ 0.00000000e+00 -1.42108547e-15 -2.84217094e-15]
