## Linear & Binary Search

### Problem Statement:
>Search for a book with ISBN number in the list

#### Importing libraries

In [1]:
import pandas as pd
import numpy as np

#### Loading data from csv file

In [2]:
data = pd.read_csv("./ISBN_Example.csv", sep="|")

In [3]:
data_isbn = data["isbn"].tolist()
data_isbn.sort()
data_isbn

[9780262035613,
 9780596529321,
 9780999247108,
 9781449361327,
 9781449369415,
 9781491957660,
 9781492032649,
 9781492041139,
 9781617294631,
 9781789955750]

#### **Method**


    1. State the problem clearly. Identify the input & output formats.
    2. Come up with some example inputs & outputs. Try to cover all edge cases.
    3. Come up with a correct solution for the problem. State it in plain English.
    4. Implement the solution and test it using example inputs. Fix bugs, if any.
    5. Analyze the algorithm's complexity and identify inefficiencies, if any.
    6. Apply the right technique to overcome the inefficiency. Repeat steps 3 to 6.

    reference: https://jovian.com


#### **1. State the problem clearly with Inputs and Output**

> Write the problem to fine the position of book with given ISBN numbers arranged in ascending order. One constraint is to find the book by accessing minimum nuber of elements in the list

##### **Inputs**:
***
    Input list: Contains the list with ISBN numbers
    
    Query: Book to query in the list
    

##### **Output**:
***
     Position of the book

In [4]:
def find_book(books,query):
    """Function to find a book in the list"""
    pass

#### **2. Some Example inputs and outputs**

In [5]:
query = 9781491957660
output = 5
print("Input List: ",data_isbn)
print("Query: ", query)
print("Output: ", output)

Input List:  [9780262035613, 9780596529321, 9780999247108, 9781449361327, 9781449369415, 9781491957660, 9781492032649, 9781492041139, 9781617294631, 9781789955750]
Query:  9781491957660
Output:  5


In [6]:
result = find_book(data_isbn, query)
result == output

False

#### Creating the dictionary of inputs and outputs

In [7]:
test = {
    'input': {
        'books': data_isbn,
        'query':9781491957660
    },
    'output': 5
}

In [8]:
result = find_book(**test['input'])
result == test['output']

False

#### **Other Inputs**

1. Expected book is in middle of the list
2. Expected book is in first of the list
3. Expected book is in last of the list
4. Book list has only one element which is expected book
5. Book list does not contain the expected book
6. Book list is empty
7. Book list contains duplicate entries
8. Expected book occurs multiple times in the list array

In [9]:
tests = []

In [10]:
# Expected book is in middle of the list
tests.append({
    'input': {
        'books': data_isbn,
        'query':9781491957660
    },
    'output': 5
})

In [11]:
# Expected book is in the first of the list
tests.append({
    'input': {
        'books': data_isbn,
        'query':9780262035613
    },
    'output': 0
})

In [12]:
# Expected book is in the last of the list
tests.append({
    'input': {
        'books': data_isbn,
        'query':9781789955750
    },
    'output': 9
})

In [13]:
# Book list has only one element which is expected book
tests.append({
    'input': {
        'books': [9781789955750],
        'query':9781789955750
    },
    'output': 0
})

In [14]:
# Book list does not contain the expected book
tests.append({
    'input': {
        'books': data_isbn,
        'query':9781789955751
    },
    'output': -1
})

In [15]:
# Book list is empty
tests.append({
    'input': {
        'books': [],
        'query':9781789955751
    },
    'output': -1
})

In [16]:
#Book list contains duplicate entries
tests.append({
    'input': {
        'books': [9780262035613, 9780596529321, 9780596529321,9780999247108, 9780999247108,9781449361327, 9781449369415, 9781491957660, 9781492032649, 9781492032649,9781492041139, 9781617294631,9781617294631, 9781789955750],
        'query':9781491957660
    },
    'output': 7
})

In [17]:
tests.append({
    'input': {
        'books': [9780262035613, 9780596529321, 9780596529321,9780999247108, 9780999247108,9781449361327, 9781449369415,9781491957660,9781491957660,9781491957660, 9781491957660, 9781492032649, 9781492032649,9781492041139, 9781617294631,9781617294631, 9781789955750],
        'query':9781491957660
    },
    'output': 7
})

In [18]:
tests

[{'input': {'books': [9780262035613,
    9780596529321,
    9780999247108,
    9781449361327,
    9781449369415,
    9781491957660,
    9781492032649,
    9781492041139,
    9781617294631,
    9781789955750],
   'query': 9781491957660},
  'output': 5},
 {'input': {'books': [9780262035613,
    9780596529321,
    9780999247108,
    9781449361327,
    9781449369415,
    9781491957660,
    9781492032649,
    9781492041139,
    9781617294631,
    9781789955750],
   'query': 9780262035613},
  'output': 0},
 {'input': {'books': [9780262035613,
    9780596529321,
    9780999247108,
    9781449361327,
    9781449369415,
    9781491957660,
    9781492032649,
    9781492041139,
    9781617294631,
    9781789955750],
   'query': 9781789955750},
  'output': 9},
 {'input': {'books': [9781789955750], 'query': 9781789955750}, 'output': 0},
 {'input': {'books': [9780262035613,
    9780596529321,
    9780999247108,
    9781449361327,
    9781449369415,
    9781491957660,
    9781492032649,
    978149204

#### **3. Come up with a correct solution for the problem. State it in plain English**

>Linear Search Algorithm: Search every element sequentially. This is simple and most obvious solution to the problem. This may be a brute force algorithm

1. Create a variable to store the location
2. Check whether the number at the location is equal to the expected book
3. If they are equal, then answer is location in the list
4. If it is not equal, increment the location by 1, until location we reach the end of the list
5. If expected book is not in the list, return -1

#### **4. Implement the solution and test it using example inputs. Fix bugs, if any.**

In [19]:
def find_books(books, query):
    """Function to find if book exists in the books list"""
    #Create a variable loc for tracking the location
    loc = 0
    # Iterate over entire list
    while True:
        # If length of the books is zero, return -1
        if len(books) == 0:
            return -1

        if books[loc] == query:
            return loc

        loc += 1

        if loc == len(books):
            return -1
    

In [20]:
test

{'input': {'books': [9780262035613,
   9780596529321,
   9780999247108,
   9781449361327,
   9781449369415,
   9781491957660,
   9781492032649,
   9781492041139,
   9781617294631,
   9781789955750],
  'query': 9781491957660},
 'output': 5}

In [21]:
result = find_books(**test['input'])
result == output

True

#### Creating a custom function to track the start and end time

In [22]:
import time

In [23]:
def evaluate_test_case(find_books, test):
    """This is a custom function to compute the time taken to execute the test"""
    start_time = time.time()
    output = find_books(**test['input'])
    end_time = time.time()
    execution_time = end_time - start_time
    print("Test Output is ", output)
    if test['output'] == output:
        print("\033[32mTEST PASSED\033[0m")
    else:
        print("\033[31mTEST FAILED\033[0m")
    print("Function Execution Time: ", execution_time, " seconds")

In [24]:
evaluate_test_case(find_books, test)

Test Output is  5
[32mTEST PASSED[0m
Function Execution Time:  4.76837158203125e-06  seconds


##### **Creating Function for multiple test cases**

In [25]:
def evaluate_test_cases(find_books, tests):
    """This is a custom function to compute the time taken to execute the test cases"""
    for test in tests:
        start_time = time.time()
        output = find_books(**test['input'])
        end_time = time.time()
        execution_time = end_time - start_time
        print("Test Output is ", output)
        if test['output'] == output:
            print("\033[32mTEST PASSED\033[0m")
        else:
            print("\033[31mTEST FAILED\033[0m")
        print("Function Execution Time: ", execution_time, " seconds")
        print(50 * "===")

In [26]:
evaluate_test_cases(find_books, tests)

Test Output is  5
[32mTEST PASSED[0m
Function Execution Time:  2.6226043701171875e-06  seconds
Test Output is  0
[32mTEST PASSED[0m
Function Execution Time:  1.430511474609375e-06  seconds
Test Output is  9
[32mTEST PASSED[0m
Function Execution Time:  1.9073486328125e-06  seconds
Test Output is  0
[32mTEST PASSED[0m
Function Execution Time:  9.5367431640625e-07  seconds
Test Output is  -1
[32mTEST PASSED[0m
Function Execution Time:  1.6689300537109375e-06  seconds
Test Output is  -1
[32mTEST PASSED[0m
Function Execution Time:  7.152557373046875e-07  seconds
Test Output is  7
[32mTEST PASSED[0m
Function Execution Time:  1.9073486328125e-06  seconds
Test Output is  7
[32mTEST PASSED[0m
Function Execution Time:  1.6689300537109375e-06  seconds


#### **4. Analysing the algorithm's complexity and inefficiencies**

> Complexity of an algorithm is measure of amount of time and space required by an algorithm for an input of a given size, N. Complexity refers to worst case complexity.

1. Time complexity of algorithm is cN  for some fixed amount of c. This is running time of algorithm
2. Space complexity of algorithm is c'

For Linear Search, time complexity is O(N) and Space complexity is O(1)

#### **6. Applying the right technique to overcome inefficiency**

1. Since the list if already sorted, we can use better technique than linear search
2. Find the middle element of the list
3. If it matches query, return the middle position as the output
4. If middle element is greater than query, then search in first half
5. If middle element is less than query, then search in next half
6. If no element found, return -1

#### **7. Implement the binary search solution and test it using example inputs**

In [27]:
def find_books_binary(books, query):
    """Function to find if book exists in the books list"""
    lo, hi = 0, len(books) - 1
    while lo <= hi:
        mid = (lo + hi) // 2
        mid_num = books[mid]
        print(f"Values: lo = {lo}, hi = {hi}, mid = {mid} and middle number = {mid_num}")

        if mid_num == query:
            return mid
        elif mid_num < query:
            lo = mid + 1
        elif mid_num > query:
            hi = mid - 1
    return -1
    

In [28]:
evaluate_test_cases(find_books_binary, tests)

Values: lo = 0, hi = 9, mid = 4 and middle number = 9781449369415
Values: lo = 5, hi = 9, mid = 7 and middle number = 9781492041139
Values: lo = 5, hi = 6, mid = 5 and middle number = 9781491957660
Test Output is  5
[32mTEST PASSED[0m
Function Execution Time:  0.0003986358642578125  seconds
Values: lo = 0, hi = 9, mid = 4 and middle number = 9781449369415
Values: lo = 0, hi = 3, mid = 1 and middle number = 9780596529321
Values: lo = 0, hi = 0, mid = 0 and middle number = 9780262035613
Test Output is  0
[32mTEST PASSED[0m
Function Execution Time:  9.298324584960938e-06  seconds
Values: lo = 0, hi = 9, mid = 4 and middle number = 9781449369415
Values: lo = 5, hi = 9, mid = 7 and middle number = 9781492041139
Values: lo = 8, hi = 9, mid = 8 and middle number = 9781617294631
Values: lo = 9, hi = 9, mid = 9 and middle number = 9781789955750
Test Output is  9
[32mTEST PASSED[0m
Function Execution Time:  9.775161743164062e-06  seconds
Values: lo = 0, hi = 0, mid = 0 and middle number = 

***
**Above test is failed as we are expecting the output is 7 since that is the first occurance but we got second**
***

In [29]:
def get_location(books, query, mid):
    """Function to get the location of first occurance"""
    mid_num = books[mid]
    print(f"mid = {mid} and middle number = {mid_num}")
    if mid_num == query:
        if mid - 1 >= 0 and books[mid - 1] == query:
            return 'left'
        else:
            return 'found'
    elif mid_num > query:
        return 'left'
    elif mid_num < query:
        return 'right'


In [30]:
def find_books_binary(books, query):
    """Function to find if book exists in the books list"""
    lo, hi = 0, len(books) - 1
    while lo <= hi:
        mid = (lo + hi) // 2
        print(f"Values: lo = {lo}, mid = {mid}, hi = {hi}")
        result = get_location(books, query, mid)
        if result == 'found':
            return mid
        elif result == 'left':
            hi = mid - 1
        elif result == 'right':
            lo = mid + 1
    return -1

In [31]:
evaluate_test_cases(find_books_binary, tests)

Values: lo = 0, mid = 4, hi = 9
mid = 4 and middle number = 9781449369415
Values: lo = 5, mid = 7, hi = 9
mid = 7 and middle number = 9781492041139
Values: lo = 5, mid = 5, hi = 6
mid = 5 and middle number = 9781491957660
Test Output is  5
[32mTEST PASSED[0m
Function Execution Time:  9.918212890625e-05  seconds
Values: lo = 0, mid = 4, hi = 9
mid = 4 and middle number = 9781449369415
Values: lo = 0, mid = 1, hi = 3
mid = 1 and middle number = 9780596529321
Values: lo = 0, mid = 0, hi = 0
mid = 0 and middle number = 9780262035613
Test Output is  0
[32mTEST PASSED[0m
Function Execution Time:  1.5735626220703125e-05  seconds
Values: lo = 0, mid = 4, hi = 9
mid = 4 and middle number = 9781449369415
Values: lo = 5, mid = 7, hi = 9
mid = 7 and middle number = 9781492041139
Values: lo = 8, mid = 8, hi = 9
mid = 8 and middle number = 9781617294631
Values: lo = 9, mid = 9, hi = 9
mid = 9 and middle number = 9781789955750
Test Output is  9
[32mTEST PASSED[0m
Function Execution Time:  2.145

#### **8. Analysing the algorithm's complexity and inefficiencies**

When we analyse the time complexity of the Binary search,
Initial length of array: N
Iteration 1: N / 2
Iteration 2: N / 4
Iteration 3: N / 8
Iteration k: N / 2^k

Overall, it take O(log N)

#### **9. Comparing Linear and Binary Searches**

In [32]:
stress_test = {
    'input': {
        'books': list(range(0, 10000000)),
        'query': 9999998
    },
    'output': 9999998
}

In [33]:
def find_books_linear(books, query):
    """Function to find if book exists in the books list using linear search"""
    #Create a variable loc for tracking the location
    loc = 0
    # Iterate over entire list
    while True:
        # If length of the books is zero, return -1
        if len(books) == 0:
            return -1

        if books[loc] == query:
            return loc

        loc += 1

        if loc == len(books):
            return -1

In [34]:
def get_location(books, query, mid):
    """Function to get the location of first occurance"""
    mid_num = books[mid]
    #print(f"mid = {mid} and middle number = {mid_num}")
    if mid_num == query:
        if mid - 1 >= 0 and books[mid - 1] == query:
            return 'left'
        else:
            return 'found'
    elif mid_num > query:
        return 'left'
    elif mid_num < query:
        return 'right'

In [35]:
def find_books_binary(books, query):
    """Function to find if book exists in the books list"""
    lo, hi = 0, len(books) - 1
    while lo <= hi:
        mid = (lo + hi) // 2
        #print(f"Values: lo = {lo}, mid = {mid}, hi = {hi}")
        result = get_location(books, query, mid)
        if result == 'found':
            return mid
        elif result == 'left':
            hi = mid - 1
        elif result == 'right':
            lo = mid + 1
    return -1

In [36]:
evaluate_test_case(find_books_linear, stress_test)

Test Output is  9999998
[32mTEST PASSED[0m
Function Execution Time:  0.8572938442230225  seconds


In [37]:
evaluate_test_case(find_books_binary, stress_test)

Test Output is  9999998
[32mTEST PASSED[0m
Function Execution Time:  2.193450927734375e-05  seconds


In [38]:
0.9200794696807861 / 1.7404556274414062e-05

52864.28767123288

**Binary Search is Approx 52,000 faster than Linear Search**

### **Generic Binary Search**

Here is generic binary search strategy:

1. Comeup with condition to determine whether the answer lies before or after the given position
2. Retrieve midpoint of the list
3. If middle element is the query, then return the midpoint as answer
4. If answer lies before middle element, repeat the search with first half of the list
5. If answer lies after middle element, repeat the search with last half of the list

#### Binary Search

In [40]:
def binary_search(lo, hi, condition):
    """Function to perform Binary search"""
    while lo <= hi:
        mid = (lo + hi) // 2
        result = condition(mid)
        if result == 'found':
            return mid
        elif result == 'left':
            hi = mid -1
        elif result == 'right':
            lo = mid + 1
    return -1

Since binary search is separated now, lets rewrite the find_books function

In [41]:
def find_books(books, query):
    def condition(mid):
        if books[mid] == query:
            if mid - 1 >= 0 and books[mid -1] == query:
                return 'left'
            else:
                return 'found'
        elif books[mid] < query:
            return 'right'
        elif books[mid] > query:
            return 'left'
    return binary_search(0, len(books) -1, condition)

In [42]:
evaluate_test_cases(find_books, tests)

Test Output is  5
[32mTEST PASSED[0m
Function Execution Time:  5.0067901611328125e-06  seconds
Test Output is  0
[32mTEST PASSED[0m
Function Execution Time:  3.814697265625e-06  seconds
Test Output is  9
[32mTEST PASSED[0m
Function Execution Time:  3.5762786865234375e-06  seconds
Test Output is  0
[32mTEST PASSED[0m
Function Execution Time:  1.1920928955078125e-06  seconds
Test Output is  -1
[32mTEST PASSED[0m
Function Execution Time:  2.1457672119140625e-06  seconds
Test Output is  -1
[32mTEST PASSED[0m
Function Execution Time:  9.5367431640625e-07  seconds
Test Output is  7
[32mTEST PASSED[0m
Function Execution Time:  2.1457672119140625e-06  seconds
Test Output is  7
[32mTEST PASSED[0m
Function Execution Time:  4.0531158447265625e-06  seconds


### **Find First and Last Position of Element in Sorted Array**

https://leetcode.com/problems/find-first-and-last-position-of-element-in-sorted-array/description/

Given an array of integers nums sorted in non-decreasing order, find the starting and ending position of a given target value.

If target is not found in the array, return [-1, -1].

You must write an algorithm with O(log n) runtime complexity.

1. List is increasing order
2. find start and end index of the query

In [59]:
# BINARY SEARCH

def binary_search(lo, hi, condition):
    """Function to perform Binary search"""
    while lo <= hi:
        mid = (lo + hi) // 2
        result = condition(mid)
        if result == 'found':
            return mid
        elif result == 'left':
            hi = mid -1
        elif result == 'right':
            lo = mid + 1
    return -1

In [60]:
def first_position(nums, target):
    def condition(mid):
        if nums[mid] == target:
            if mid > 0 and nums[mid -1] == target:
                return 'left'
            return 'found'
        elif nums[mid] < target:
            return 'right'
        elif nums[mid] > target:
            return 'left'
    return binary_search(0, len(nums) - 1, condition)

In [61]:
def last_position(nums, target):
    def condition(mid):
        if nums[mid] == target:
            if mid < len(nums) -1 and nums[mid + 1] == target:
                return 'right'
            return 'found'
        elif nums[mid] < target:
            return 'right'
        elif nums[mid] > target:
            return 'left'
    return binary_search(0, len(nums) - 1, condition)

In [62]:
def first_last_position(nums, target):
    return first_position(nums, target), last_position(nums, target)

In [63]:
first_last_position([5,7,7,8,8,10], 8)

(3, 4)