## **Interpolation Search**

Interpolation search is an algorithm for searching for a specific value (key) in a sorted array. It uses an interpolation formula to estimate the position of the key based on its value and the values of the surrounding elements. This makes it a more efficient alternative to binary search in certain situations.

* **Data Structure**: Requires a sorted array.
* **Average Performance**: O(log(log(n))), significantly faster than linear search and slightly faster than binary search on average.
* **Best-Case Performance**: O(1), if the target key is the first element.
* **Worst-Case Performance**: O(n), if the data is not uniformly distributed or the target key is not present.
**Strengths**:
* More efficient than binary search when data is uniformly distributed.
* Adapts to the distribution of the data, unlike binary search.
**Weaknesses**:
* Not as efficient as binary search when data is not uniformly distributed.
* More complex to implement than binary search.

Interpolation search finds a particular item by computing the probe position. Initially, the probe position is the position of the middle most item of the collection.

**mid = lo + ((hi - lo) * (X - A[lo]) / (A[hi] - A[lo]))**

where 
lo -> Lowest index of the list\
hi -> Highest index of the list\
A[n] -> Value stored at index n in the list

In [1]:
import time
import pandas as pd
import numpy as np

### Generic Functions to Evaluate Test cases

In [2]:
def evaluate_test_case(function, test):
    """This is a custom function to compute the time taken to execute the test"""
    start_time = time.time()
    output = function(**test['input'])
    end_time = time.time()
    execution_time = end_time - start_time
    print("Test Output is ", output)
    if test['output'] == output:
        print("\033[32mTEST PASSED\033[0m")
    else:
        print("\033[31mTEST FAILED\033[0m")
    print("Function Execution Time: ", execution_time, " seconds")

In [3]:
def evaluate_test_cases(function, tests):
    """This is a custom function to compute the time taken to execute the test cases"""
    for test in tests:
        start_time = time.time()
        output = function(**test['input'])
        end_time = time.time()
        execution_time = end_time - start_time
        print("Test Output is ", output)
        if test['output'] == output:
            print("\033[32mTEST PASSED\033[0m")
        else:
            print("\033[31mTEST FAILED\033[0m")
        print("Function Execution Time: ", execution_time, " seconds")
        print(50 * "===")

### Loading data from file

In [4]:
data = pd.read_csv("../ISBN_Example.csv", sep="|")
data

Unnamed: 0,isbn,name,author
0,9781492032649,"Hands-On Machine Learning with Scikit-Learn, K...",Aurélien Géron
1,9781789955750,Python Machine Learning,"Sebastian Raschka, Vahid Mirjalili"
2,9780262035613,Deep Learning,"Ian Goodfellow, Yoshua Bengio, Aaron Courville"
3,9780596529321,Programming Collective Intelligence,Toby Segaran
4,9781491957660,Python for Data Analysis,Wes McKinney
5,9781449361327,Data Science for Business,"Foster Provost, Tom Fawcett"
6,9781449369415,Introduction to Machine Learning with Python,"Andreas C. Müller, Sarah Guido"
7,9780999247108,Machine Learning Yearning,Andrew Ng
8,9781617294631,Natural Language Processing in Action,"Lane, Howard, and Hapke"
9,9781492041139,Data Science from Scratch,Joel Grus


In [5]:
isbn = data['isbn'].tolist()
isbn.sort()
isbn

[9780262035613,
 9780596529321,
 9780999247108,
 9781449361327,
 9781449369415,
 9781491957660,
 9781492032649,
 9781492041139,
 9781617294631,
 9781789955750]

### Interpolation Search:

1. Start searching data from mid of the list. 
2. If it is a match, return the index of the item, and exit. 
3. If it is not a match, probe position. 
4. Divide the list using probing formula and find the new middle. 
5. If data is greater than middle, search in higher sub-list. 
6. If data is smaller than middle, search in lower sub-list. 
7. Repeat until match.

### Creating Dictionary with inputs and outputs

In [6]:
test = {
    'input' : {
        'books': isbn,
        'query': 9781449369415
    },
    'output': 4
}

#### Creating input dictionary


    Expected book is in middle of the list
    Expected book is in first of the list
    Expected book is in last of the list
    Book list has only one element which is expected book
    Book list does not contain the expected book
    Book list is empty
    Book list contains duplicate entries
    Expected book occurs multiple times in the list array

In [7]:
# tests is a list of directories
tests = []

In [8]:
# Expected book is in middle of the list
tests.append({
    'input': {
        'books': isbn,
        'query':9781491957660
    },
    'output': 5
})

In [9]:
# Expected book is in the first of the list
tests.append({
    'input': {
        'books': isbn,
        'query':9780262035613
    },
    'output': 0
})

In [10]:
# Expected book is in the last of the list
tests.append({
    'input': {
        'books': isbn,
        'query':9781789955750
    },
    'output': 9
})

In [11]:
# Book list has only one element which is expected book
tests.append({
    'input': {
        'books': [9781789955750],
        'query':9781789955750
    },
    'output': 0
})

In [12]:
# Book list does not contain the expected book
tests.append({
    'input': {
        'books': isbn,
        'query':9781789955751
    },
    'output': -1
})

In [13]:
# Book list is empty
tests.append({
    'input': {
        'books': [],
        'query':9781789955751
    },
    'output': -1
})

In [14]:
#Book list contains duplicate entries
tests.append({
    'input': {
        'books': [9780262035613, 9780596529321, 9780596529321,9780999247108, 9780999247108,9781449361327, 9781449369415, 9781491957660, 9781492032649, 9781492032649,9781492041139, 9781617294631,9781617294631, 9781789955750],
        'query':9781491957660
    },
    'output': 7
})

In [15]:
tests.append({
    'input': {
        'books': [9780262035613, 9780596529321, 9780596529321,9780999247108, 9780999247108,9781449361327, 9781449369415,9781491957660,9781491957660,9781491957660, 9781491957660, 9781492032649, 9781492032649,9781492041139, 9781617294631,9781617294631, 9781789955750],
        'query':9781491957660
    },
    'output': 7
})

In [16]:
tests

[{'input': {'books': [9780262035613,
    9780596529321,
    9780999247108,
    9781449361327,
    9781449369415,
    9781491957660,
    9781492032649,
    9781492041139,
    9781617294631,
    9781789955750],
   'query': 9781491957660},
  'output': 5},
 {'input': {'books': [9780262035613,
    9780596529321,
    9780999247108,
    9781449361327,
    9781449369415,
    9781491957660,
    9781492032649,
    9781492041139,
    9781617294631,
    9781789955750],
   'query': 9780262035613},
  'output': 0},
 {'input': {'books': [9780262035613,
    9780596529321,
    9780999247108,
    9781449361327,
    9781449369415,
    9781491957660,
    9781492032649,
    9781492041139,
    9781617294631,
    9781789955750],
   'query': 9781789955750},
  'output': 9},
 {'input': {'books': [9781789955750], 'query': 9781789955750}, 'output': 0},
 {'input': {'books': [9780262035613,
    9780596529321,
    9780999247108,
    9781449361327,
    9781449369415,
    9781491957660,
    9781492032649,
    978149204

In [17]:
len(tests)

8

### Implementation of Interpolation Search

In [18]:
def find_books_interpolation(books, query):
    """Function to find a book in books using Interpolation search"""
    l, r = 0, len(books) - 1
    while l <= r:
        mid = l + ((r - l) * (query - books[l]) // (books[r] - books[l]))
        if books[mid] == query:
            return mid
        if query < books[mid]:
            r = mid - 1
        elif query > books[mid]:
            l = mid + 1
    return -1
        

In [19]:
evaluate_test_case(find_books_interpolation, test)

Test Output is  4
[32mTEST PASSED[0m
Function Execution Time:  4.76837158203125e-06  seconds


In [20]:
evaluate_test_cases(find_books_interpolation, tests)

Test Output is  5
[32mTEST PASSED[0m
Function Execution Time:  5.0067901611328125e-06  seconds
Test Output is  0
[32mTEST PASSED[0m
Function Execution Time:  2.1457672119140625e-06  seconds
Test Output is  9
[32mTEST PASSED[0m
Function Execution Time:  1.430511474609375e-06  seconds


ZeroDivisionError: integer division or modulo by zero