# 01 - Test Numpy and Pandas basic operations

This notebook tests some Numpy and Pandas operations directly from [GitHub.dev](https://github.dev) console using browser only.

## Numpy quick test

In this quick example I will test Euclidian Distance calculations using the following approaches:

* using Python's built in [math library](https://docs.python.org/3.9/library/math.html),
* using same approach, but using [Numpy's](https://numpy.org/doc/stable/reference/) built in functions,
* using [numpy.linalg.norm](https://numpy.org/doc/stable/reference/generated/numpy.linalg.norm.html) function (see [this answer](https://stackoverflow.com/questions/1401712/how-can-the-euclidean-distance-be-calculated-with-numpy/1401828#1401828) on StakOverflow for more information),
* using [scipy.spatial.distance.euclidean](https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.euclidean.html) function, which is designed for it.

Then I use [timeit](https://docs.python.org/3.9/library/timeit.html) function to benchmark these approaches. *All done in browser*, yeah!! 🥸

In [None]:
# Whoa, we're doing Python in the browser!

import math
import numpy as np
import pandas as pd
from scipy.spatial import distance

Defintion of all four approaches:

In [None]:
def euclidean_dist_math(v1, v2):
    dist = [math.pow((a - b), 2) for a, b in zip(v1, v2)]
    eudist = math.sqrt(sum(dist))
    return eudist


def euclidean_dist_numpy_1(v1, v2):
    v1_a = np.array(v1)
    v2_a = np.array(v2)
    sd = np.sum((v1 - v2) ** 2)
    eudist = np.sqrt(sd)
    return eudist

def euclidean_dist_numpy_2(v1, v2):
    return np.linalg.norm(v1 - v2)

def euclidean_dist_scipy(v1, v2):
    return distance.euclidean(v1, v2)


I'm generating two random vectors for tests:

In [8]:
dis1 = np.random.rand(20)
dis2 = np.random.rand(20)
v1, v2 = np.array(dis1), np.array(dis2)
v1, v2

(array([0.6011253 , 0.88268413, 0.72489089, 0.49835124, 0.25734033,
        0.45259221, 0.51388361, 0.95113192, 0.89941948, 0.34807077,
        0.78238147, 0.64165583, 0.27028451, 0.09006867, 0.06878724,
        0.44286608, 0.8847115 , 0.66163305, 0.90433019, 0.22369285]),
 array([0.20453146, 0.35584951, 0.29696016, 0.32267087, 0.52834998,
        0.60505067, 0.74883257, 0.52191055, 0.45883077, 0.74255573,
        0.23080826, 0.01683285, 0.98629357, 0.70079532, 0.89134223,
        0.25795683, 0.71290316, 0.71580891, 0.10940485, 0.5600869 ]))

And then resting all:

In [None]:
# Inspired by https://stackoverflow.com/questions/37794849/efficient-and-precise-calculation-of-the-euclidean-distance

import timeit

def wrapper(func, *args, **kwargs):
    def wrapped():
        return func(*args, **kwargs)
    return wrapped

wrappered_math = wrapper(euclidean_dist_math, v1, v2)
wrappered_numpy1 = wrapper(euclidean_dist_numpy_1, v1, v2)
wrappered_numpy2 = wrapper(euclidean_dist_numpy_2, v1, v2)
wrappered_scipy = wrapper(euclidean_dist_scipy, v1, v2)
t_math = timeit.repeat(wrappered_math, repeat=3, number=100000)
t_numpy1 = timeit.repeat(wrappered_numpy1, repeat=3, number=100000)
t_numpy2 = timeit.repeat(wrappered_numpy2, repeat=3, number=100000)
t_scipy = timeit.repeat(wrappered_scipy, repeat=3, number=100000)

print(f'math approach: {sum(t_math)/len(t_math)}')
print(f'numpy simple approach: {sum(t_numpy1)/len(t_numpy1)}')
print(f'numpy.linalg.norm approach: {sum(t_numpy2)/len(t_numpy2)}')
print(f'scipy.distance approach: {sum(t_scipy)/len(t_scipy)}')

math approach: 3.3474333330000263
numpy simple approach: 1.5147999999999986
numpy.linalg.norm approach: 1.045099999999972
scipy.distance approach: 1.9831666666666479


Test came out as expected (tested on Apple MacBook Pro w/M1 using Brave Browser):

* The fastest is [numpy.linalg.norm](https://numpy.org/doc/stable/reference/generated/numpy.linalg.norm.html) function, 1.045100s per run,
* Followed by [Numpy's](https://numpy.org/doc/stable/reference/) built in functions, 1.514800s per run,
* And [scipy.spatial.distance.euclidean](https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.euclidean.html) function, 1.983167s per run,
* And finally Python's built in [math library](https://docs.python.org/3.9/library/math.html) with 3.347433s per run.

*All in the browser!* 😊