<a href="https://www.kaggle.com/code/fareselmenshawii/vectorization?scriptVersionId=184759468" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

<div class="table-of-contents" style="background-color:#000000; padding: 20px; margin: 10px; font-size: 110%; border-radius: 25px; box-shadow: 10px 10px 5px 0px rgba(0,0,0,0.75);">
  <h1 style="color:#17E8C4;">Table of Contents</h1>
  <ol>
    <li><a href="#overview" style="color: #17E8C4;">1. Overview</a></li>
    <li><a href="#imports" style="color: #17E8C4;">2. Imports</a></li>
    <li><a href="#create-data" style="color: #17E8C4;">3. Create Data</a></li>
    <li><a href="#compare-implementation" style="color: #17E8C4;">4. Compare Implementation</a></li>
    <li><a href="#conclusion" style="color: #17E8C4;">5. Conclusion</a></li> 
  </ol>
</div>

<a id="overview"></a>
<h1 style="background-color:#000000; border:0; color:black; box-shadow: 10px 10px 5px 0px rgba(0,0,0,0.75); transform: rotateX(10deg);">
  <center style="color: #17E8C4;">Overview</center>
</h1>

# Overview
**In this notebook, we'll discuss vectorization.**

**Vectorization is a method to accelerate mathematical operations.**

**Let's get started!**


<a id="2"></a>
<h1 style='background:#000000;border:0; color:black;
    box-shadow: 10px 10px 5px 0px rgba(0,0,0,0.75);
    transform: rotateX(10deg);
    '><center style='color: #17E8C4;'>Imports</center></h1>


# Imports

In [1]:
import numpy as np 
import time 

<a id="3"></a>
<h1 style='background:#000000;border:0; color:black;
    box-shadow: 10px 10px 5px 0px rgba(0,0,0,0.75);
    transform: rotateX(10deg);
    '><center style='color: #17E8C4;'>Create Data</center></h1>
    
# Create Data

In [2]:
import numpy as np

np.random.seed(1)  # Ensures reproducibility of the random numbers

# Create small arrays to demonstrate the difference in small arrays
x_small = np.random.rand(10_000)
y_small = np.random.rand(10_000)

# Create large arrays to demonstrate the difference in large arrays
x_large = np.random.rand(100_000_000)
y_large = np.random.rand(100_000_000)





# Define Dot Function

In [3]:
def my_dot(x, y):
    """
    Compute the dot product of two vectors.

    Args:
        x (np.ndarray): First input vector.
        y (np.ndarray): Second input vector.

    Returns:
        float: The dot product of the input vectors.
    """
    total = 0
    for xi, yi in zip(x, y):
        total += xi * yi
    return total

<a id="4"></a>
<h1 style='background:#000000;border:0; color:black;
    box-shadow: 10px 10px 5px 0px rgba(0,0,0,0.75);
    transform: rotateX(10deg);
    '><center style='color: #17E8C4;'>Compare Implementations</center></h1>
    
# Compare Implementations

## Compare small data difference

In [4]:
import time
import numpy as np

# Capture start time for vectorized version
start_time = time.time()
np.dot(x_small, y_small)
end_time = time.time()
print(f"Vectorized version duration: {1000 * (end_time - start_time):.2f} ms")

# Capture start time for non-vectorized version
start_time = time.time()
my_dot(x_small, y_small)
end_time = time.time()
print(f"Non-Vectorized version duration: {1000 * (end_time - start_time):.2f} ms")

Vectorized version duration: 0.21 ms
Non-Vectorized version duration: 7.30 ms


## Compare large data difference

In [5]:
# Measure the time for the vectorized version on large arrays
start_time = time.time()
np.dot(x_large, y_large)
end_time = time.time()
print(f"Vectorized version duration (large arrays): {1000 * (end_time - start_time):.2f} ms")

# Measure the time for the non-vectorized version on large arrays
start_time = time.time()
my_dot(x_large, y_large)
end_time = time.time()
print(f"Non-Vectorized version duration (large arrays): {1000 * (end_time - start_time):.2f} ms")

Vectorized version duration (large arrays): 92.74 ms
Non-Vectorized version duration (large arrays): 55808.14 ms


<a id="5"></a>
<h1 style='background:#000000;border:0; color:black;
    box-shadow: 10px 10px 5px 0px rgba(0,0,0,0.75);
    transform: rotateX(10deg);
    '><center style='color: #17E8C4;'>Conclusion</center></h1>
    
# Conclusion

**Vectorization provides large speed up in computation because libraries like NumPy take advantage of data parallelism.**




**This is critical in Machine Learning where the datasets are often very large.**