## **Binary Search**

Binary Search is a fundamental searching algorithm used to efficiently locate a target value within a sorted array. It operates by repeatedly dividing the search interval in half, narrowing down the possible locations for the target value. The key feature of Binary Search is its logarithmic time complexity O(log n), making it significantly faster than linear search for large datasets.

**Algorithm**

* Step 1: Begin with the entire sorted array.
* Step 2: Define two pointers, 'low' and 'high', representing the start and end of the current search interval, respectively.
* Step 3: Calculate the middle index of the current interval: mid = (low + high) / 2.
* Step 4: Compare the target value with the element at the middle index.
    * If the target matches the middle element, return its index (successful search).
    * If the target is less than the middle element, adjust the 'high' pointer to mid - 1 and repeat Step 3.
    * If the target is greater than the middle element, adjust the 'low' pointer to mid + 1 and repeat Step 3.
* Step 5: Repeat Steps 3-4 until the target is found or the search interval becomes empty.

**Points to be noted**:
* Binary Search requires the array to be sorted beforehand. If the array is not sorted, a sorting algorithm should be applied first.
* It's suitable for large datasets where efficiency is crucial, as its logarithmic time complexity ensures fast searches even in massive arrays.
* Time complexity: O(log n), significantly faster than linear search (O(n)).
* Space complexity: O(1), additional memory usage is constant regardless of array size.
* Time complexity: O(log n), significantly faster than linear search (O(n)).
Space complexity: O(1), additional memory usage is constant regardless of array size.
Versatile algorithm with applications in various domains like data retrieval, sorting algorithms, and more.
Versatile algorithm with applications in various domains like data retrieval, sorting algorithms, and more.


In [1]:
import time
import pandas as pd
import numpy as np

### **Generic functions to evaluate the test cases**

In [2]:
def evaluate_test_case(function, test):
    """This is a custom function to compute the time taken to execute the test"""
    start_time = time.time()
    output = function(**test['input'])
    end_time = time.time()
    execution_time = end_time - start_time
    print("Test Output is ", output)
    if test['output'] == output:
        print("\033[32mTEST PASSED\033[0m")
    else:
        print("\033[31mTEST FAILED\033[0m")
    print("Function Execution Time: ", execution_time, " seconds")

In [3]:
def evaluate_test_cases(function, tests):
    """This is a custom function to compute the time taken to execute the test cases"""
    for test in tests:
        start_time = time.time()
        output = function(**test['input'])
        end_time = time.time()
        execution_time = end_time - start_time
        print("Test Output is ", output)
        if test['output'] == output:
            print("\033[32mTEST PASSED\033[0m")
        else:
            print("\033[31mTEST FAILED\033[0m")
        print("Function Execution Time: ", execution_time, " seconds")
        print(50 * "===")

### Loading data from file

In [4]:
data = pd.read_csv("../ISBN_Example.csv", sep="|")
data

Unnamed: 0,isbn,name,author
0,9781492032649,"Hands-On Machine Learning with Scikit-Learn, K...",Aurélien Géron
1,9781789955750,Python Machine Learning,"Sebastian Raschka, Vahid Mirjalili"
2,9780262035613,Deep Learning,"Ian Goodfellow, Yoshua Bengio, Aaron Courville"
3,9780596529321,Programming Collective Intelligence,Toby Segaran
4,9781491957660,Python for Data Analysis,Wes McKinney
5,9781449361327,Data Science for Business,"Foster Provost, Tom Fawcett"
6,9781449369415,Introduction to Machine Learning with Python,"Andreas C. Müller, Sarah Guido"
7,9780999247108,Machine Learning Yearning,Andrew Ng
8,9781617294631,Natural Language Processing in Action,"Lane, Howard, and Hapke"
9,9781492041139,Data Science from Scratch,Joel Grus


In [6]:
isbn = data['isbn'].tolist()
isbn.sort()
isbn

[9780262035613,
 9780596529321,
 9780999247108,
 9781449361327,
 9781449369415,
 9781491957660,
 9781492032649,
 9781492041139,
 9781617294631,
 9781789955750]

### **Binary Search Algorithm**

* Step 1 − Select the middle item in the array and compare it with the key value to be searched. If it is matched, return the position of the median.

* Step 2 − If it does not match the key value, check if the key value is either greater than or less than the median value.

* Step 3 − If the key is greater, perform the search in the right sub-array; but if the key is lower than the median value, perform the search in the left sub-array.

* Step 4 − Repeat Steps 1, 2 and 3 iteratively, until the size of sub-array becomes 1.

* Step 5 − If the key value does not exist in the array, then the algorithm returns an unsuccessful search.

#### Function Signature:

In [11]:
def find_book(books, query):
    """Function to find book in the books list
    Input -> list of books
    Query -> book to be find out
    """
    pass

#### Sample Inputs

In [8]:
query = 9781449369415
output = 4
print("Input List: ", isbn)
print("Query: ", query)
print("Output: ", output)

Input List:  [9780262035613, 9780596529321, 9780999247108, 9781449361327, 9781449369415, 9781491957660, 9781492032649, 9781492041139, 9781617294631, 9781789955750]
Query:  9781449369415
Output:  4


#### Creating Dictionary of Inputs and Outputs

In [9]:
test = {
    'input' : {
        'books': isbn,
        'query': 9781449369415
    },
    'output': 4
}

In [12]:
result = find_book(**test['input'])
result == test['output']

False

#### Creating input dictionary


    Expected book is in middle of the list
    Expected book is in first of the list
    Expected book is in last of the list
    Book list has only one element which is expected book
    Book list does not contain the expected book
    Book list is empty
    Book list contains duplicate entries
    Expected book occurs multiple times in the list array

In [13]:
# tests is a list of directories
tests = []

In [14]:
# Expected book is in middle of the list
tests.append({
    'input': {
        'books': isbn,
        'query':9781491957660
    },
    'output': 5
})

In [15]:
# Expected book is in the first of the list
tests.append({
    'input': {
        'books': isbn,
        'query':9780262035613
    },
    'output': 0
})

In [16]:
# Expected book is in the last of the list
tests.append({
    'input': {
        'books': isbn,
        'query':9781789955750
    },
    'output': 9
})

In [17]:
# Book list has only one element which is expected book
tests.append({
    'input': {
        'books': [9781789955750],
        'query':9781789955750
    },
    'output': 0
})

In [18]:
# Book list does not contain the expected book
tests.append({
    'input': {
        'books': isbn,
        'query':9781789955751
    },
    'output': -1
})

In [19]:
# Book list is empty
tests.append({
    'input': {
        'books': [],
        'query':9781789955751
    },
    'output': -1
})

In [20]:
#Book list contains duplicate entries
tests.append({
    'input': {
        'books': [9780262035613, 9780596529321, 9780596529321,9780999247108, 9780999247108,9781449361327, 9781449369415, 9781491957660, 9781492032649, 9781492032649,9781492041139, 9781617294631,9781617294631, 9781789955750],
        'query':9781491957660
    },
    'output': 7
})

In [21]:
tests.append({
    'input': {
        'books': [9780262035613, 9780596529321, 9780596529321,9780999247108, 9780999247108,9781449361327, 9781449369415,9781491957660,9781491957660,9781491957660, 9781491957660, 9781492032649, 9781492032649,9781492041139, 9781617294631,9781617294631, 9781789955750],
        'query':9781491957660
    },
    'output': 7
})

In [22]:
tests

[{'input': {'books': [9780262035613,
    9780596529321,
    9780999247108,
    9781449361327,
    9781449369415,
    9781491957660,
    9781492032649,
    9781492041139,
    9781617294631,
    9781789955750],
   'query': 9781491957660},
  'output': 5},
 {'input': {'books': [9780262035613,
    9780596529321,
    9780999247108,
    9781449361327,
    9781449369415,
    9781491957660,
    9781492032649,
    9781492041139,
    9781617294631,
    9781789955750],
   'query': 9780262035613},
  'output': 0},
 {'input': {'books': [9780262035613,
    9780596529321,
    9780999247108,
    9781449361327,
    9781449369415,
    9781491957660,
    9781492032649,
    9781492041139,
    9781617294631,
    9781789955750],
   'query': 9781789955750},
  'output': 9},
 {'input': {'books': [9781789955750], 'query': 9781789955750}, 'output': 0},
 {'input': {'books': [9780262035613,
    9780596529321,
    9780999247108,
    9781449361327,
    9781449369415,
    9781491957660,
    9781492032649,
    978149204

### Implementation of Binary Search

In [27]:
def find_book_binary(books, query):
    """Function to find a book in books using binary search"""
    l, r = 0, len(books) - 1
    while l <= r:
        mid = (l + r) // 2
        if books[mid] == query:
            return mid
        if query < books[mid]:
            r = mid - 1
        elif query > books[mid]:
            l = mid + 1
    return -1
            

In [28]:
evaluate_test_case(find_book_binary, test)

Test Output is  4
[32mTEST PASSED[0m
Function Execution Time:  3.0994415283203125e-06  seconds


In [29]:
evaluate_test_cases(find_book_binary, tests)

Test Output is  5
[32mTEST PASSED[0m
Function Execution Time:  3.337860107421875e-06  seconds
Test Output is  0
[32mTEST PASSED[0m
Function Execution Time:  2.6226043701171875e-06  seconds
Test Output is  9
[32mTEST PASSED[0m
Function Execution Time:  1.1920928955078125e-06  seconds
Test Output is  0
[32mTEST PASSED[0m
Function Execution Time:  9.5367431640625e-07  seconds
Test Output is  -1
[32mTEST PASSED[0m
Function Execution Time:  2.86102294921875e-06  seconds
Test Output is  -1
[32mTEST PASSED[0m
Function Execution Time:  7.152557373046875e-07  seconds
Test Output is  7
[32mTEST PASSED[0m
Function Execution Time:  3.0994415283203125e-06  seconds
Test Output is  8
[31mTEST FAILED[0m
Function Execution Time:  7.152557373046875e-07  seconds


***
**Above test is failed as we are expecting the output is 7 since that is the first occurance but we got second**
***

In [30]:
def get_location(books, query, mid):
    """Function to get the location of first occurance"""
    mid_num = books[mid]
    print(f"Mid Number is {mid_num} and mid is {mid}")
    if mid_num == query:
        if mid - 1 > 0 and books[mid - 1] == query:
            return "left"
        else:
            return "found"
    elif mid_num > query:
        return "left"
    elif mid_num < query:
        return "right"           

In [31]:
def find_books_binary(books, query):
    """Function to check if book exits in books"""
    lo, hi = 0, len(books) - 1
    while lo <= hi:
        mid = (lo + hi) // 2
        result = get_location(books, query, mid)
        if result == "found":
            return mid
        elif result == "left":
            hi = mid - 1
        elif result == "right":
            lo = mid + 1
    return -1

In [34]:
evaluate_test_case(find_books_binary, test)

Mid Number is 9781449369415 and mid is 4
Test Output is  4
[32mTEST PASSED[0m
Function Execution Time:  6.0558319091796875e-05  seconds


In [35]:
evaluate_test_cases(find_books_binary, tests)

Mid Number is 9781449369415 and mid is 4
Mid Number is 9781492041139 and mid is 7
Mid Number is 9781491957660 and mid is 5
Test Output is  5
[32mTEST PASSED[0m
Function Execution Time:  8.487701416015625e-05  seconds
Mid Number is 9781449369415 and mid is 4
Mid Number is 9780596529321 and mid is 1
Mid Number is 9780262035613 and mid is 0
Test Output is  0
[32mTEST PASSED[0m
Function Execution Time:  8.58306884765625e-06  seconds
Mid Number is 9781449369415 and mid is 4
Mid Number is 9781492041139 and mid is 7
Mid Number is 9781617294631 and mid is 8
Mid Number is 9781789955750 and mid is 9
Test Output is  9
[32mTEST PASSED[0m
Function Execution Time:  1.1682510375976562e-05  seconds
Mid Number is 9781789955750 and mid is 0
Test Output is  0
[32mTEST PASSED[0m
Function Execution Time:  3.0994415283203125e-06  seconds
Mid Number is 9781449369415 and mid is 4
Mid Number is 9781492041139 and mid is 7
Mid Number is 9781617294631 and mid is 8
Mid Number is 9781789955750 and mid is 9


### When we analyse the time complexity of the Binary search,
Initial length of array: N\
Iteration 1: N / 2\
Iteration 2: N / 4\
Iteration 3: N / 8\
Iteration k: N / 2^k\

Overall, it take O(log N)

### Generic Binary Search with Closures