## Example 1: Summing Elements in a List

In [None]:

#Original Loop Version:

numbers = [1, 2, 3, 4, 5]
total = 0

for num in numbers:
    total += num

print(total)


#Vectorized Version:

import numpy as np

numbers = [1, 2, 3, 4, 5]
total = np.sum(numbers)

print(total)



## Example 2: Element-wise Multiplication of Two Lists

In [None]:

#Original Loop Version:

list1 = [1, 2, 3, 4, 5]
list2 = [2, 3, 4, 5, 6]
result = []

for i in range(len(list1)):
    result.append(list1[i] * list2[i])

print(result)


#Vectorized Version:

list1 = [1, 2, 3, 4, 5]
list2 = [2, 3, 4, 5, 6]
result = np.multiply(list1, list2)

print(result)

## Example 3: Finding Maximum Element in a List

In [None]:
#Original Loop Version:

numbers = [3, 1, 5, 2, 4]
max_value = numbers[0]

for num in numbers:
    if num > max_value:
        max_value = num

print(max_value)


#Vectorized Version:

import numpy as np

numbers = [3, 1, 5, 2, 4]
max_value = np.max(numbers)

print(max_value)


In these examples, the original code is written using loops to perform operations on lists. The vectorized versions utilize NumPy functions to achieve the same results in a more concise and efficient manner.

By using vectorized operations, you can take advantage of optimized C code execution in NumPy, which leads to faster and more efficient computations. Vectorized code typically eliminates the need for explicit loops and leverages optimized array operations for improved performance.

It's important to note that not all code can be easily vectorized, and the feasibility of vectorization depends on the specific task and data structures involved. However, when dealing with numerical operations on arrays or lists, utilizing vectorized functions and operations provided by libraries like NumPy can often lead to significant performance improvements.

## Vectorize Pandas Code

Example 1: Iterating over DataFrame Rows

In [8]:

#Inefficient Loop Version:

import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

result = []
for index, row in df.iterrows():
    result.append(row['A'] + row['B'])

df['C'] = result


#Vectorized Version:

import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

df['C'] = df['A'] + df['B']



Example 2: Applying Functions to DataFrame Columns

In [None]:
#Inefficient Loop Version:

import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3]})

def square(x):
    return x ** 2

df['B'] = df['A'].apply(square)


#Vectorized Version:

import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3]})

df['B'] = df['A'] ** 2



Example 3: Iterating over DataFrame Rows with Conditional Logic

In [None]:
#Inefficient Loop Version:

import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

result = []
for index, row in df.iterrows():
    if row['A'] > 1:
        result.append(row['B'])

df['C'] = result


#Vectorized Version:

import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})

df['C'] = df.loc[df['A'] > 1, 'B']



Example 4: Grouping and Aggregating DataFrame

In [None]:
#Inefficient Loop Version:

import pandas as pd

df = pd.DataFrame({'Group': ['A', 'A', 'B', 'B'], 'Value': [1, 2, 3, 4]})

result = {}
for group, group_df in df.groupby('Group'):
    result[group] = group_df['Value'].sum()

summary_df = pd.DataFrame.from_dict(result, orient='index', columns=['Sum'])


#Vectorized Version:

import pandas as pd

df = pd.DataFrame({'Group': ['A', 'A', 'B', 'B'], 'Value': [1, 2, 3, 4]})

summary_df = df.groupby('Group')['Value'].sum().reset_index(name='Sum')



In these examples, the original code is inefficient as it relies on iterative operations or explicit loops to perform operations on pandas DataFrames. By leveraging vectorized operations provided by pandas, such as using built-in functions or applying operations on entire columns, the code becomes more concise and efficient. Vectorization allows for faster computations and improved performance, especially when dealing with large datasets.

## Using Numpy to Vectorize

### Example 1: Calculating Euclidean Distance Matrix

In [0]:
#Original Non-Vectorized Version:

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})

dist_matrix = np.zeros((len(df), len(df)))

for i in range(len(df)):
    for j in range(len(df)):
        dist_matrix[i, j] = np.sqrt(np.sum((df.iloc[i] - df.iloc[j]) ** 2))

print(dist_matrix)


#Vectorized Version:

import pandas as pd
import numpy as np

df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]})

dist_matrix = np.sqrt(np.sum((df.values[:, np.newaxis] - df.values) ** 2, axis=2))

print(dist_matrix)



### Example 2: Group-Wise Standardization of DataFrame Columns

In [0]:
#Original Non-Vectorized Version:

df = pd.DataFrame({'Group': ['A', 'A', 'B', 'B'], 'Value': [1, 2, 3, 4]})

result = pd.DataFrame()

for group, group_df in df.groupby('Group'):
    group_mean = group_df['Value'].mean()
    group_std = group_df['Value'].std()
    group_df['Standardized'] = (group_df['Value'] - group_mean) / group_std
    result = pd.concat([result, group_df])

print(result)


#Vectorized Version:

import pandas as pd
import numpy as np

df = pd.DataFrame({'Group': ['A', 'A', 'B', 'B'], 'Value': [1, 2, 3, 4]})

group_mean = df.groupby('Group')['Value'].transform('mean')
group_std = df.groupby('Group')['Value'].transform('std')
df['Standardized'] = (df['Value'] - group_mean) / group_std

print(df)



### Example 3: Computing Rolling Mean with Variable Window Size

In [0]:
#Original Non-Vectorized Version:

df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [6, 7, 8, 9, 10]})

window_sizes = [2, 3, 4]
result = pd.DataFrame()

for window_size in window_sizes:
    rolling_mean = df.rolling(window_size).mean()
    result = pd.concat([result, rolling_mean.add_suffix(f'_mean_{window_size}')], axis=1)

print(result)


#Vectorized Version:

df = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [6, 7, 8, 9, 10]})

window_sizes = [2, 3, 4]
result = pd.concat([df.rolling(window_size).mean().add_suffix(f'_mean_{window_size}') for window_size in window_sizes], axis=1)

print(result)



In these examplmes, NumPy is used to vectorize the pandas code, resulting in more efficient and concise computations. The vectorized versions leverage NumPy's array operations to perform calculations

## Example of code that can't be vectorized


### Example 1: File I/O Operations

Code involving reading or writing to files typically cannot be vectorized as it relies on external I/O operations, which are inherently sequential and cannot be parallelized efficiently.

In [None]:

with open('input.txt', 'r') as file:
    lines = file.readlines()
    for line in lines:
        # Process each line
        pass



### Example 2: User Input and Interactive Prompts

Code that relies on user input or interactive prompts cannot be vectorized since it depends on real-time interaction with the user, which cannot be parallelized.

In [None]:

name = input("Enter your name: ")
print(f"Hello, {name}!")



### Example 3: Network Operations

Code involving network operations, such as making HTTP requests or socket communications, cannot be vectorized since it depends on external network responses, which are sequential and require I/O operations.

In [None]:
import requests

response = requests.get('https://www.example.com')
print(response.status_code)



### Example 4: Recursive Algorithms

Recursive algorithms, by their nature, rely on sequential function calls and cannot be vectorized. Each recursion depends on the previous recursive call, making it difficult to parallelize.

In [None]:
def fibonacci(n):
    if n <= 1:
        return n
    else:
        return fibonacci(n-1) + fibonacci(n-2)

result = fibonacci(10)
print(result)



### Example 5: Dynamic Code Execution

Code that involves dynamically generated or executed code cannot be easily vectorized. Dynamic code execution often relies on evaluating or executing code based on runtime conditions, making it challenging to parallelize.

In [None]:

code = """
result = 0
for i in range(100):
    result += i
print(result)
"""

exec(code)



### Example 6: Non-Uniform Data Access

Code that requires non-uniform access to data, such as searching or sorting, cannot be efficiently vectorized since it relies on sequential operations based on specific conditions or comparisons.

In [None]:
numbers = [5, 2, 8, 3, 1]
sorted_numbers = sorted(numbers)
print(sorted_numbers)


### Example 7: Dynamic Data Structures

Code that involves dynamic data structures, such as linked lists or trees, often relies on non-sequential access patterns and cannot be effectively vectorized for performance improvements.

In [None]:

class Node:
    def __init__(self, data):
        self.data = data
        self.next = None

# Manipulating linked list
head = Node(1)
head.next = Node(2)
head.next.next = Node(3)



In these examples, the code involves operations that rely on external I/O, user interaction, network responses, recursion, dynamic code execution, non-uniform data access, or dynamic data structures. These scenarios do not lend themselves well to vectorization due to their inherent sequential or dynamic nature. It's important to consider the specific requirements and constraints of the problem at hand when determining whether vectorization can be effectively applied for performance improvements.