# Assignment 1: Search Algorithms


## Deliverables:
    1) Make five arrays, each of length 512, 1024, 2048, 4096, and 8192 containing randomly
       generated uniformly distributed integers from 1 to 10000. 
       You may use the numpy package for this.  

    2) Sort each of the random number arrays from smallest to largest. 
       You may use any algorithm to sort the data.

    3) Execute the base search algorithm (binary search) (from the text and github code) 
       for each of array, noting the execution time for each array.  With each execution
       use the maximum value of the random number array as the number for which you are searching.

    4) Use python (perhaps with the Pandas package) to prepare a five-column table containing
       the following columns with all times in milliseconds:
    
                    length of the random number array
                    sort time
                    linear search time for the sorted array
                    binary search time for the sorted array
                    binary search plus sort times

    5) Use python matplotlib or Seaborn to generate a plot with the size of the random number array on the horizontal axis
     and with execution time in milliseconds on the vertical axis. The plot should show execution time against
     array size for linear and binary search algorithms alone.  Discuss the results.

    6) Use Python matplotlib or Seaborn to generate a measure of the size of the data set on the horizontal axis
     and with execution time in milliseconds on the vertical axis. The plot should show execution time against
     array size for each form of the algorithm being tested (last four columns of the table).  Discuss the results.


## Code adapted from:
Bhargava, Aditya. Grokking Algorithms. Manning Publications, 2016.

In [6]:
import numpy as np
import pandas as pd
from datetime import datetime




In [7]:
np.random.seed(0)

In [8]:
#function Generate Array generates an array of given size that contains random integers between 1 and 10,000. The function sorts and returns the array and the time required to perform the sort. Default sort kind for the generated numpy array is mergesort, but user can override.
def GenerateArray(size, sort_kind='mergesort'):
    
    GenArray = np.random.randint(low=1, high= 10000, size=size)
    start = datetime.now()
    GenArray.sort(kind=sort_kind) 
    end = datetime.now()

    ArraySortTime = (end-start).total_seconds()*1E3
    return (GenArray, ArraySortTime)


In [10]:
def SimpleSearch(array, item):
    start = datetime.now()
    register = [] # creates  empty register of guesses
    low = 1 
    while item > low:
            low += 1 #increments low
            register.append(low) # appends incremental guesses to register
 
    end = datetime.now()
    duration = end - start
    MilliElapsed = duration.total_seconds()*1E3
 # returns a tuple which contains search time in milliseconds and register of the guesses
    return MilliElapsed, register

In [12]:
def BinarySearch(array, item):
    start = datetime.now()
    low = np.min(array)
    high = np.max(array)
    register = [] # creates  empty register of increments; for debug purposes
    while low <= high:     
        mid= (low +high)/2  #finds midpoint
        guess = int(mid)+1  
        register.append(guess) # appends increments to register; for debug purposes
        if guess == item:
                end = datetime.now()
                duration = end - start
                MilliElapsed = duration.total_seconds()*1E3  
#returns a tuple which contains search time in milliseconds and register of the guesses 
                return MilliElapsed, register 
        elif guess > item:
                high = mid

        else:
                low = mid


In [11]:
# test code for SimpleSearch function

#s = np.random.randint(low=1, high=100, size=10)
#s.sort(kind='mergesort')
#s

#SimpleSearch(list(s),82)

In [13]:
# test code for BinarySearch function
#BinarySearch(five12[0], np.max(five12[0])) #testing the binary search 

In [28]:
#
five12=GenerateArray(512)
ten24 = GenerateArray(1024)
twenty48 = GenerateArray(2048)
forty96 = GenerateArray(4096)
eight192 = GenerateArray(8192)

len(five12[0])

512

## Simple and Binary Search for arrays of 512, 1024, 2048, 4096 elements

In [29]:
simple_512 = SimpleSearch(five12[0], np.max(five12[0]))
binary_512 = BinarySearch(five12[0], np.max(five12[0]))

simple_1024 = SimpleSearch(ten24[0], np.max(ten24[0]))
binary_1024 = BinarySearch(ten24[0], np.max(ten24[0]))

simple_2048 = SimpleSearch(twenty48[0], np.max(twenty48[0]))
binary_2048 = BinarySearch(twenty48[0], np.max(twenty48[0]))

simple_4096 = SimpleSearch(forty96[0], np.max(forty96[0]))
binary_4096 = BinarySearch(forty96[0], np.max(forty96[0]))

## Simple and Binary Search for 8192 elements

In [30]:
simple_8192= SimpleSearch(eight192[0], np.max(eight192[0]))

print('Guesses Made:',len(simple_8192[1]), '   Max Guess:', max(simple_8192[1]),
'  Max in array:', max(eight192[0]), ' Duration (ms):', simple_8192[0])

binary_8192 = BinarySearch(eight192[0], np.max(eight192[0]))

print('Guesses Made:',len(binary_8192[1]), '   Max Guess:', max(binary_8192[1]),
'  Max in array:', max(eight192[0]), ' Duration (ms):', binary_8192[0])


Guesses Made: 9993    Max Guess: 9994   Max in array: 9994  Duration (ms): 1.961
Guesses Made: 14    Max Guess: 9994   Max in array: 9994  Duration (ms): 0.0


# Summary Dataframe
                length of the random number array
                sort time
                linear search time for the sorted array
                binary search time for the sorted array
                binary search plus sort times

In [33]:
print('Guesses Made:',len(simple_512[1]), '   Max Guess:', max(simple_512[1]),
'  Max in array:', max(five12[0]), ' Duration (ms):', simple_512[0])

Guesses Made: 9991    Max Guess: 9992   Max in array: 9992  Duration (ms): 4.001


In [31]:
Summary = {
    'NumberOfElements': [len(five12[0]), len(ten24[0]), len(twenty48[0]), len(forty96[0]), len(eight192[0])], 
    'SortTime (ms)': [five12[1], ten24[1], twenty48[1], forty96[1], eight192[1]],
    'SimpleSearchTime (ms)': [simple_512[0], simple_1024[0], simple_2048[0], simple_4096[0], simple_8192[0]],
    'BinarySearchTime (ms)': [binary_512[0], binary_1024[0], binary_2048[0], binary_4096[0], binary_8192[0]]
        }

df = pd.DataFrame.from_dict(Summary)

In [20]:
df

Unnamed: 0,NumberOfElements,SortTime (ms),SimpleSearchTime (ms),BinarySearchTime (ms)
0,512,1.0,4.986,0.0
1,1024,0.0,4.986,0.0
2,2048,0.0,2.992,0.0
3,4096,0.0,1.995,0.0
4,8192,0.996,1.994,0.999


In [17]:
### Archive this code
#generates five arrays of length 512, 1024, 2048, 8192
#five12 = np.random.randint(low=1, high=10000, size=512)
#ten24 = np.random.randint(low=1, high=10000, size=1024)
#twenty48 = np.random.randint(low=1, high=10000, size=2048)
#forty96 = np.random.randint(low=1, high=10000, size=4096)
#eight192 = np.random.randint(low=1, high=10000, size=8192)


In [18]:
### Archive this code

#sorts 5 arrays and stores duration required to sort in milliseconds
#start = datetime.now()
#five12.sort(kind='mergesort')
#end = datetime.now()
#five12_sort = (end-start).total_seconds()*1E3

#start = datetime.now()
#ten24.sort(kind='mergesort')
#end = datetime.now()
#ten24_sort = (end-start).total_seconds()*1E3

#start = datetime.now()
#twenty48.sort(kind='mergesort')
#end = datetime.now()
#twenty48_sort = (end-start).total_seconds()*1E3

#start = datetime.now()
#forty96.sort(kind='mergesort')
#end = datetime.now()
#forty96_sort = (end-start).total_seconds()*1E3

#start = datetime.now()
#eight192.sort(kind='mergesort')
#end = datetime.now()
#eight192_sort = (end-start).total_seconds()*1E3



In [19]:
### Archive this code.
#start = datetime.now() #time object for starting search
#end = SimpleSearch(eight192, np.max(eight192)) #end collects the register of guesses and time to execute

#duration =end[0] - start #calculates the timedelta between execution and completion
#eight192_simple = duration.total_seconds()*1E3 #stores the time required in milliseconds



#start = datetime.now() #time object for starting search
#end = BinarySearch(eight192, np.max(eight192)) #end collects the register of guesses and time to execute
#duration =end[0] - start #calculates the timedelta between execution and completion
#eight192_binary=duration.total_seconds()*1E3 #stores the time required in milliseconds