#### Let's consider the problem of finding the closest restaurant to our current GPS coordinates. Let's assume the current position is given as an (x,y) coordinate, and that we have coordinates of various restaurants stored in a list of positions.

In [1]:
import math
def closest(position, positions):
    x0, y0 = position
    dbest, ibest = None, None
    # for all your positions in your data set
    for i, (x,y) in enumerate(positions):
        # compute the Euclidean distance
        dist = ((x - x0) ** 2) + ((y - y0) ** 2)
        dist = math.sqrt(dist)
        
        # keep truck of the smallest distance
        if dbest is None or dist < dbest: 
            dbest, ibest = dist, i # you keep replacing the best one by the smallest one. 
    return ibest # return the index for the best distance. 


#### First we'll create a random list of coordinates. To make it realistic, let's create 10 M coordinates.

In [2]:
import random

In [3]:
positions = [(random.random(), random.random()) for i in range(10000000)] # create a random list of coordinates of two numbers

#### Let's see how long it takes to compute the closest distance to our current coordinates: (0.5,0.5)

In [4]:
%timeit closest((.5, .5), positions) # timing how long it takes to find the closest distance to (o.5, 0.5) if that is where you are. 

11.5 s ± 915 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


## The above exercise is just to show us why we use numpy and not lists.

#### Now let's try doing something similar with Numpy. Numpy arrays are much more efficient, and so is the method for random number generation.

In [5]:
import numpy as np

In [7]:
positions = np.random.rand(10000000, 2) # creating 2 pairs of random numbers a million times

In [8]:
positions.ndim, positions.shape

(2, (10000000, 2))

In [9]:
positions.size

20000000

#### Now let's again compute the distances to our position (0.5, 0.5)

In [10]:
x, y = positions[:,0], positions[:,1] # x and y contain the 1st and 2nd cols, respectively.

In [11]:
distances = np.sqrt( (x - 0.5)**2 + (y - 0.5)**2 ) # do the distance of the entire set of x and ys

In [13]:
# This is taking the cell number above so it needs to match
%timeit exec(In[11])

480 ms ± 32.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


<p> Why is numpy more efficient? The python core library provided Lists. A list is the Python equivalent of an array, but is resizeable and can contain elements of different types. </p>
<ul>
<li>Size - Numpy data structures take up less space</li>
<li>Performance - they have a need for speed and are faster than lists. </li>
<li>Functionality - SciPy and Numpy have optimized functions such as linear algebra operations built in.</li>
</ul>