# Finding Distance (numpy or pandas ?)

The first task for our super hero (Python) is as follows:
A bunch of villains spread all over the city, our super hero has the locations (x, y) of all the villains. He has to calculate the distance of each villain from his head quaters (0,0). What super power (NumPy or Pandas) serves the best in this situation?

In [1]:
#Lets load the super powers
import numpy as np
import pandas as pd
import time as time  # time module to calcualate computation time

In [2]:
#generate villain locations
import random as rd
villains_x = [rd.uniform(0,1000) for i in range(1000000)]
villains_y = [rd.uniform(0,1000) for i in range(1000000)]
villains   = [i for i in zip(villains_x,villains_y)]

In [3]:
#Function to calculate distance using numpy without vectorization
def dist_numpy_wov(v_np):
    dist   = []
    #calculating distance
    for i in range(len(v_np[:,0])):
        dist.append(v_np[i,0]**2+v_np[i,1]**2)    
    return dist

In [4]:
#Function to calculate distance using numpy with vectorization
def dist_numpy_wv(v_np):
    #calculating distance using vectorization in numpy
    dist   =  np.sum(np.square(v_np),axis=1)    
    return dist

In [5]:
#Function to calculate distance using pandas
def dist_pandas(v_df):
    dist = []
    #calculating distance
    for i in range(len(v_df['x'])):
        dist.append(v_df.loc[i,'x']**2 + v_df.loc[i,'y']**2)  
    return dist   

## NumPy without vectorization

In [6]:
start_time = time.time()

#Calculating distances 
villains   = np.array(villains)
dist       = dist_numpy_wov(villains)

end_time   = time.time()
print ("Time take by numpy without vectorization {0:1.6f}".format(end_time-start_time))

Time take by numpy without vectorization 4.024132


## Pandas

In [7]:
start_time = time.time()

#Calculating distances 
villains_df   = pd.DataFrame(villains, columns = ["x", "y"])
dist          = dist_pandas(villains_df)

end_time   = time.time()
print ("Time take by numpy without vectorization {0:1.6f}".format(end_time-start_time))

Time take by numpy without vectorization 81.982560


## NumPy with vectorization

In [8]:
start_time = time.time()

#Calculating distances 
villains   = np.array(villains)
dist       = dist_numpy_wv(villains)

end_time   = time.time()
print ("Time take by numpy with vectorization {0:1.6f}".format(end_time-start_time))

Time take by numpy with vectorization 0.135962


## Conclusion
It was pretty clear from the above results that NumPy with vectorization is the one which takes much lesser time. Pandas on the other hand is very bad at distance calculating. The reason boils down to the indexing. Indexing series object is much slower than indexing numpy arrays. Check this site (https://penandpants.com/2014/09/05/performance-of-pandas-series-vs-numpy-arrays/) for explanation on the same. Anyway our super hero (python) should use numpy with vectorization to find the distances of super villains from the head quaters.



## Winner for finding_distance:   NumPy with vectorization



Thanks..!!!
Have a nice day..!!!