# <center>Vectorizing</center> 
### <center>a sum function for parallel processing purposes</center> 

<center>Michael Siebel</center>
<center>March 2020</center>

# Goals  
<br>

In this demonstration, I create two sample functions for summing across columns: one that uses a traditional for loop and one that uses a vectorized loop.  The vectorized loop can be attached to 4 threads that could run in parallel.

By "vectorization", I am referring to rewriting a loop so that it processes mulitple elements of the array within a single loop iteration. Vectorized functions are necessary for distributive computing tasks.

The following is meant to demonstrate how a simple loop can be edited for simultaneous execution of values.  The next step would be to attach each execution to a different thread per loop iteration.

***

## Import Packages

In [1]:
import pandas as pd

***

## Loop Version

Takes 8 loop iterations

In [2]:
# Create Dataset
df = pd.DataFrame({
    "Var1": [1,2,3,4,5,6,7,8],
    "Var2": [8,7,6,5,4,3,2,1],
    "Total": [0,0,0,0,0,0,0,0]
})

# Function that sums across first two rows
def sum_func(df):
    for i in range(len(df)):
        print("Ran Iteration", i+1)
        df.iloc[i,2] = df.iloc[i,0] + df.iloc[i,1]
        
    return(df)
        
sum_func(df)

Ran Iteration 1
Ran Iteration 2
Ran Iteration 3
Ran Iteration 4
Ran Iteration 5
Ran Iteration 6
Ran Iteration 7
Ran Iteration 8


Unnamed: 0,Var1,Var2,Total
0,1,8,9
1,2,7,9
2,3,6,9
3,4,5,9
4,5,4,9
5,6,3,9
6,7,2,9
7,8,1,9


***

# Vectorized Version

Takes 2 loop iterations and can therefore be parallelized to run 4x quicker

In [3]:
# Create Dataset
df = pd.DataFrame({
    "Var1": [1,2,3,4,5,6,7,8],
    "Var2": [8,7,6,5,4,3,2,1],
    "Total": [0,0,0,0,0,0,0,0]
})

# Function that vectorizes sum function calculating 4 data points at a time
def sum_vecd(df):
    rng = range(len(df))
    j = 0
    
    for i in rng[::4]:
        j = j + 1
        print("Ran Iteration", int(((i+1)*j)/(i+1)))
        df.iloc[i,2]   = df.iloc[i,  0] + df.iloc[i,  1] # Attach to thread 1
        df.iloc[i+1,2] = df.iloc[i+1,0] + df.iloc[i+1,1] # Attach to thread 2
        df.iloc[i+2,2] = df.iloc[i+2,0] + df.iloc[i+2,1] # Attach to thread 3
        df.iloc[i+3,2] = df.iloc[i+3,0] + df.iloc[i+3,1] # Attach to thread 4
        
    return(df)
    
sum_vecd(df)

Ran Iteration 1
Ran Iteration 2


Unnamed: 0,Var1,Var2,Total
0,1,8,9
1,2,7,9
2,3,6,9
3,4,5,9
4,5,4,9
5,6,3,9
6,7,2,9
7,8,1,9
