# Vectorization and Broadcasting

Two key features that make NumPy particularly useful are vectorization and broadcasting. These features allow us to write code that is efficient both in terms of computing time and memory usage.


In [1]:
import numpy as np

## Vectorization

Vectorization refers to the technique of performing operations on entire arrays rather than iterating over individual elements.

Let's say we have two large arrays, `a` and `b`, and we want to add them element-wise. With pure Python, we'd have to use a loop:


In [4]:
a = list(range(1000000))
b = list(range(1000000, 2000000))

result = [x+y for x, y in zip(a, b)]

result[0: 5]

[1000000, 1000002, 1000004, 1000006, 1000008]


With NumPy's vectorization, we can perform the operation directly on the arrays:


In [5]:
a = np.array(a)
b = np.array(b)

result = a + b
result

array([1000000, 1000002, 1000004, ..., 2999994, 2999996, 2999998])


Vectorization leverages low-level optimizations and hardware capabilities to carry out bulk operations on data without needing explicit looping, which significantly speeds up computation time.



## Broadcasting

Broadcasting is another powerful mechanism that allows NumPy to work with arrays of different shapes when performing arithmetic operations.

Let's say we want to add the number 5 to every element in an array. With broadcasting, we can do this directly:


In [6]:
a = np.array([1, 2, 3, 4, 5])
result = a + 5
result

array([ 6,  7,  8,  9, 10])


In this example, the scalar `5` is "broadcast" across the array `a` to match its shape, and then the operation is performed.

Broadcasting becomes especially handy when dealing with multi-dimensional arrays:


In [27]:
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
b = np.array([1, 2, 3])

result = a + b
result

array([[ 2,  4,  6],
       [ 5,  7,  9],
       [ 8, 10, 12]])


Here, the one-dimensional array `b` is broadcasted across the two-dimensional array `a` in such a way that it matches `a`'s shape, and then the addition is performed.

## Conclusion

NumPy's vectorization and broadcasting are powerful features that can drastically speed up computation time and optimize memory usage. By avoiding explicit loops and allowing operations on arrays of different shapes, they make it much easier and more efficient to perform numerical computations on large data sets.

In [17]:
# Creating dataset with different scales
np.random.seed(42) # Seed for reproducibility
data = np.random.randint(10, size=(5,3)) * [10, 100, 1000]

In [19]:
data

array([[  60,  300, 7000],
       [  40,  600, 9000],
       [  20,  600, 7000],
       [  40,  300, 7000],
       [  70,  200, 5000]])

$$z=\frac{x-\hat{x}}{\sigma}$$

In [23]:
means = np.mean(data, axis=0)
std_devs = np.std(data, axis=0)
normalized_data = (data - means) / std_devs

In [24]:
normalized_data

array([[ 0.80295507, -0.5976143 ,  0.        ],
       [-0.3441236 ,  1.19522861,  1.58113883],
       [-1.49120227,  1.19522861,  0.        ],
       [-0.3441236 , -0.5976143 ,  0.        ],
       [ 1.3764944 , -1.19522861, -1.58113883]])

> Content created by [**Carlos Cruz-Maldonado**](https://www.linkedin.com/in/carloscruzmaldonado/).  
> I am available to answer any questions or provide further assistance.   
> Feel free to reach out to me at any time.  