## Introduction to Binary Search and Complexity Analysis with Python

### Problem
- This course takes a coding-focused approach towards learning. In each notebook, we'll focus on solving one problem, and learn the techniques, algorithms, and data structures to devise an efficient solution. We will then generalize the technique and apply it to other problems.

- In this notebook, we focus on solving the following problem:

- QUESTION 1: Alice has some cards with numbers written on them. She arranges the cards in decreasing order, and lays them out face down in a sequence on a table. She challenges Bob to pick out the card containing a given number by turning over as few cards as possible. Write a function to help Bob locate the card.


- This may seem like a simple problem, especially if you're familiar with the concept of binary search, but the strategy and technique we learning here will be widely applicable, and we'll soon use it to solve harder problems.

## Solution
# 1. State the problem clearly. Identify the input & output formats.
- You will often encounter detailed word problems in coding challenges and interviews. The first step is to state the problem clearly and precisely in abstract terms
- ![](image/cards.png)
- In this case, for instance, we can represent the sequence of cards as a list of numbers. Turning over a specific card is equivalent to accessing the value of the number at the corresponding position the list.
- ![](image/G9fBarb.png)
## problem
- We need to write a program to find the position of a given number in a list of numbers arranged in decreasing order. We also need to minimize the number of times we access elements from the list.
## input
- cards: A list of numbers sorted in decreasing order. E.g. [13, 11, 10, 7, 4, 3, 1, 0]
- query: A number, whose position in the array is to be determined. E.g. 7

## Tips
Name your function appropriately and think carefully about the signature
Discuss the problem with the interviewer if you are unsure how to frame it in abstract terms
Use descriptive variable names, otherwise you may forget what a variable represents

In [1]:
def locate_card(cards, query):
    pass

# 2. Come up with some example inputs & outputs. Try to cover all edge cases.
Before we start implementing our function, it would be useful to come up with some example inputs and outputs which we can use later to test out problem. We'll refer to them as test cases.

Here's the test case described in the example above.

In [2]:
cards = [13, 11, 10, 7, 4, 3, 1, 0]
query = 7
output = 3

We'll represent our test cases as dictionaries to make it easier to test them once we write implement our function. For example, the above test case can be represented as follows:

In [3]:
test1 = {
    'input': { 
        'cards': [13, 11, 10, 7, 4, 3, 1, 0], 
        'query': 7
    },
    'output': 3 
}

In [4]:
locate_card(**test1['input']) == test1['output']

False

Our function should be able to handle any set of valid inputs we pass into it. Here's a list of some possible variations we might encounter:

1. The number query occurs somewhere in the middle of the list cards.
2. query is the first element in cards.
3. query is the last element in cards.
4. The list cards contains just one element, which is query.
5. The list cards does not contain number query.
6. The list cards is empty.
7. The list cards contains repeating numbers.
8. The number query occurs at more than one position in cards.
9. (can you think of any more variations?)
    - Edge Cases: It's likely that you didn't think of all of the above cases when you read the problem for the first time. Some of these (like the empty array or query not occurring in cards) are called edge cases, as they represent rare or extreme examples.

While edge cases may not occur frequently, your programs should be able to handle all edge cases, otherwise they may fail in unexpected ways. Let's create some more test cases for the variations listed above. We'll store all our test cases in an list for easier testing.

In [5]:
tests = []
# query occurs in the middle
tests.append(test1)

tests.append({
    'input': {
        'cards': [13, 11, 10, 7, 4, 3, 1, 0],
        'query': 1
    },
    'output': 6
})
# query is the first element
tests.append({
    'input': {
        'cards': [4, 2, 1, -1],
        'query': 4
    },
    'output': 0
})
# query is the last element
tests.append({
    'input': {
        'cards': [3, -1, -9, -127],
        'query': -127
    },
    'output': 3
})
# cards contains just one element, query
tests.append({
    'input': {
        'cards': [6],
        'query': 6
    },
    'output': 0 
})
# cards does not contain query 
tests.append({
    'input': {
        'cards': [9, 7, 5, 2, -9],
        'query': 4
    },
    'output': -1
})
# cards is empty
tests.append({
    'input': {
        'cards': [],
        'query': 7
    },
    'output': -1
})
# numbers can repeat in cards
tests.append({
    'input': {
        'cards': [8, 8, 6, 6, 6, 6, 6, 3, 2, 2, 2, 0, 0, 0],
        'query': 3
    },
    'output': 7
})
# query occurs multiple times
tests.append({
    'input': {
        'cards': [8, 8, 6, 6, 6, 6, 6, 6, 3, 2, 2, 2, 0, 0, 0],
        'query': 6
    },
    'output': 2
})

The problem statement does not specify what to do if the list cards does not contain the number query.

1. Read the problem statement again, carefully.
2. Look through the examples provided with the problem.
3. Ask the interviewer/platform for a clarification.
4. Make a reasonable assumption, state it and move forward.
We will assume that our function will return -1 in case cards does not contain query.

In [6]:
tests

[{'input': {'cards': [13, 11, 10, 7, 4, 3, 1, 0], 'query': 7}, 'output': 3},
 {'input': {'cards': [13, 11, 10, 7, 4, 3, 1, 0], 'query': 1}, 'output': 6},
 {'input': {'cards': [4, 2, 1, -1], 'query': 4}, 'output': 0},
 {'input': {'cards': [3, -1, -9, -127], 'query': -127}, 'output': 3},
 {'input': {'cards': [6], 'query': 6}, 'output': 0},
 {'input': {'cards': [9, 7, 5, 2, -9], 'query': 4}, 'output': -1},
 {'input': {'cards': [], 'query': 7}, 'output': -1},
 {'input': {'cards': [8, 8, 6, 6, 6, 6, 6, 3, 2, 2, 2, 0, 0, 0], 'query': 3},
  'output': 7},
 {'input': {'cards': [8, 8, 6, 6, 6, 6, 6, 6, 3, 2, 2, 2, 0, 0, 0],
   'query': 6},
  'output': 2}]

Great, now we have a fairly exhaustive set of test cases to evaluate our function.

Creating test cases beforehand allows you to identify different variations and edge cases in advance so that can make sure to handle them while writing code. Sometimes, you may start out confused, but the solution will reveal itself as you try to come up with interesting test cases.

Tip: Don't stress it if you can't come up with an exhaustive list of test cases though. You can come back to this section and add more test cases as you discover them. Coming up with good test cases is a skill that takes practice.

# 3. Come up with a correct solution for the problem. State it in plain English.
Our first goal should always be to come up with a correct solution to the problem, which may necessarily be the most efficient solution. The simplest or most obvious solution to a problem, which generally involves checking all possible answers is called the brute force solution.

In this problem, coming up with a correct solution is quite easy: Bob can simply turn over cards in order one by one, till he find a card with the given number on it. Here's how we might implement it:

1. Create a variable position with the value 0.
2. Check whether the number at index position in card equals query.
3. If it does, position is the answer and can be returned from the function
4. If not, increment the value of position by 1, and repeat steps 2 to 5 till we reach the last position.
5. If the number was not found, return -1.
    - Linear Search Algorithm: Congratulations, we've just written our first algorithm! An algorithm is simply a list of statements which can be converted into code and executed by a computer on different sets of inputs. This particular algorithm is called linear search, since it involves searching through a list in a linear fashion i.e. element after element.

Tip: Always try to express (speak or write) the algorithm in your own words before you start coding. It can be as brief or detailed as you require it to be. Writing is a great tool for thinking clearly. It's likely that you will find some parts of the solution difficult to express, which suggests that you are probably unable to think about it clearly. The more clearly you are able to express your thoughts, the easier it will be for you to turn into code.

# 4. Implement the solution and test it using example inputs. Fix bugs, if any.
Phew! We are finally ready to implement our solution. All the work we've done so far will definitely come in handy, as we now exactly what we want our function to do, and we have an easy way of testing it on a variety of inputs.

In [6]:
def locate_card(cards, query):
    # Create a variable position with the value 0
    position = 0
    print('cards',cards)
    print('query',query)
    # Set up a loop for repetition
    while position<len(cards):
       
        # Check if element at the current position matche the query
        if cards[position] == query:
            # Answer found! Return and exit..
            return position
        
        # Increment the position
        position += 1
        
        # Check if we have reached the end of the array
        if position == len(cards):
            
            # Number not found, return -1
            return -1

In [8]:
locate_card(**test1['input']) == test1['output']

cards [13, 11, 10, 7, 4, 3, 1, 0]
query 7


True

In [9]:
import time
start_time = time.time()
for test in tests:
    print(locate_card(**test['input']) == test['output'])
end_time = time.time()
print('Time taken',end_time-start_time)

cards [13, 11, 10, 7, 4, 3, 1, 0]
query 7
True
cards [13, 11, 10, 7, 4, 3, 1, 0]
query 1
True
cards [4, 2, 1, -1]
query 4
True
cards [3, -1, -9, -127]
query -127
True
cards [6]
query 6
True
cards [9, 7, 5, 2, -9]
query 4
True
cards []
query 7
False
cards [8, 8, 6, 6, 6, 6, 6, 3, 2, 2, 2, 0, 0, 0]
query 3
True
cards [8, 8, 6, 6, 6, 6, 6, 6, 3, 2, 2, 2, 0, 0, 0]
query 6
True
Time taken 0.003021240234375


Tip: In a real interview or coding assessment, you can skip the step of implementing and testing the brute force solution in the interest of time. It's generally quite easy to figure out the complexity of the brute for solution from the plain English description.

# 5. Analyze the algorithm's complexity and identify inefficiencies, if any.
Recall this statement from original question: "Alice challenges Bob to pick out the card containing a given number by turning over as few cards as possible." We restated this requirement as: "Minimize the number of times we access elements from the list cards"
![](image/cards.png)

Before we can minimize the number, we need a way to measure it. Since we access a list element once in every iteration, for a list of size N we access the elements from the list up to N times. Thus, Bob may need to overturn up to N cards in the worst case, to find the required card.

Suppose he is only allowed to overturn 1 card per minute, it may take him 30 minutes to find the required card if 30 cards are laid out on the table. Is this the best he can do? Is a way for Bob to arrive at the answer by turning over just 5 cards, instead of 30?

The field of study concerned with finding the amount of time, space or other resources required to complete the execution of computer programs is called the analysis of algorithms. And the process of figuring out the best algorithm to solve a given problem is called algorithm design and optimization.

Complexity and Big O Notation
Complexity of an algorithm is a measure of the amount of time and/or space required by an algorithm for an input of a given size e.g. N. Unless otherwise stated, the term complexity always refers to the worst-case complexity (i.e. the highest possible time/space taken by the program/algorithm to process an input).

In the case of linear search:

1. The time complexity of the algorithm is cN for some fixed constant c that depends on the number of operations we perform in each iteration and the time taken to execute a statement. Time complexity is sometimes also called the running time of the algorithm.

2. The space complexity is some constant c' (independent of N), since we just need a single variable position to iterate through the array, and it occupies a constant space in the computer's memory (RAM).

Big O Notation: Worst-case complexity is often expressed using the Big O notation. In the Big O, we drop fixed constants and lower powers of variables to capture the trend of relationship between the size of the input and the complexity of the algorithm i.e. if the complexity of the algorithm is cN^3 + dN^2 + eN + f, in the Big O notation it is expressed as O(N^3)

Thus, the time complexity of linear search is O(N) and its space complexity is O(1).

# 6. Apply the right technique to overcome the inefficiency. Repeat steps 3 to 6.
- At the moment, we're simply going over cards one by one, and not even utilizing the face that they're sorted. This is called a brute force approach.

- It would be great if Bob could somehow guess the card at the first attempt, but with all the cards turned over it's simply impossible to guess the right card.
![](image/cards.png)
- The next best idea would be to pick a random card, and use the fact that the list is sorted, to determine whether the target card lies to the left or right of it. In fact, if we pick the middle card, we can reduce the number of additional cards to be tested to half the size of the list. Then, we can simply repeat the process with each half. This technique is called binary search. Here's a visual explanation of the technique
- ![](image/soln.png)

# 7. Come up with a correct solution for the problem. State it in plain English.
Here's how binary search can be applied to our problem:

1. Find the middle element of the list.
2. If it matches queried number, return the middle position as the answer.
3. If it is less than the queried number, then search the first half of the list
4. If it is greater than the queried number, then search the second half of the list
5. If no more elements remain, return -1.

# 8. Implement the solution and test it using example inputs. Fix bugs, if any.
Here's an implementation of binary search for solving our problem. We also print the relevant variables in each iteration of the while loop.

In [14]:
def locate_card(cards,query):
    lo,hi = 0,len(cards)-1
    print('cards',cards)
    print('query',query)
    while lo<=hi:
        mid = (lo+hi)//2
        print(lo,mid,hi)
        if cards[mid]==query:
            return mid
        elif cards[mid] > query:
            lo = mid + 1
        elif cards[mid] < query:
            hi=mid - 1
    return -1

In [15]:
locate_card(**test1['input']) == test1['output']

cards [13, 11, 10, 7, 4, 3, 1, 0]
query 7
0 3 7


True

In [16]:
import time
start_time = time.time()
for test in tests:
    print(test['output'])
    print(locate_card(**test['input']) == test['output'])
end_time = time.time()
print('Time taken',end_time-start_time)

3
cards [13, 11, 10, 7, 4, 3, 1, 0]
query 7
0 3 7
True
6
cards [13, 11, 10, 7, 4, 3, 1, 0]
query 1
0 3 7
4 5 7
6 6 7
True
0
cards [4, 2, 1, -1]
query 4
0 1 3
0 0 0
True
3
cards [3, -1, -9, -127]
query -127
0 1 3
2 2 3
3 3 3
True
0
cards [6]
query 6
0 0 0
True
-1
cards [9, 7, 5, 2, -9]
query 4
0 2 4
3 3 4
True
-1
cards []
query 7
True
7
cards [8, 8, 6, 6, 6, 6, 6, 3, 2, 2, 2, 0, 0, 0]
query 3
0 6 13
7 10 13
7 8 9
7 7 7
True
2
cards [8, 8, 6, 6, 6, 6, 6, 6, 3, 2, 2, 2, 0, 0, 0]
query 6
0 7 14
False
Time taken 0.0059604644775390625


In [17]:
def test_location(cards, query, mid):
    mid_number = cards[mid]
    if mid_number == query:
        if mid-1 >= 0 and cards[mid-1] == query:
            return 'left'
        else:
            return 'found'
    elif mid_number < query:
        return 'left'
    else:
        return 'right'

def locate_card(cards, query):
    lo, hi = 0, len(cards) - 1
    
    while lo <= hi:
        mid = (lo + hi) // 2
        result = test_location(cards, query, mid)
        
        if result == 'found':
            return mid
        elif result == 'left':
            hi = mid - 1
        elif result == 'right':
            lo = mid + 1
    return -1

In [32]:
import time
start_time = time.time()
for test in tests:
    print(locate_card(**test['input']) == test['output'])
end_time = time.time()
print('Time taken',end_time-start_time)

True
True
True
True
True
True
True
True
True
Time taken 0.00099945068359375


# 9. Analyze the algorithm's complexity and identify inefficiencies, if any.
Once again, let's try to count the number of iterations in the algorithm. If we start out with an array of N elements, then each time the size of the array reduces to half for the next iteration, until we are left with just 1 element.

Initial length - N

Iteration 1 - N/2

Iteration 2 - N/4 i.e. N/2^2

Iteration 3 - N/8 i.e. N/2^3

...

Iteration k - N/2^k

Since the final length of the array is 1, we can find the

N/2^k = 1

Rearranging the terms, we get

N = 2^k

Taking the logarithm

k = log N

Where log refers to log to the base 2. Therefore, our algorithm has the time complexity O(log N). This fact is often stated as: binary search runs in logarithmic time. You can verify that the space complexity of binary search is O(1).

# Binary Search vs. Linear Search
- The binary search version is over 55,000 times faster than the linear search version.

Furthermore, as the size of the input grows larger, the difference only gets bigger. For a list 10 times, the size, linear search would run for 10 times longer, whereas binary search would only require 3 additional operations! (can you verify this?) That's the real difference between the complexities O(N) and O(log N).

Another way to look at it is that binary search runs c * N / log N times faster than linear search, for some fixed constant c. Since log N grows very slowly compared to N, the difference gets larger with the size of the input. Here's a graph showing how the comparing common functions for running time of algorithms
![](image/z4bbf8o1ly77wmkjdgge.jfif)

# Generic Binary Search implemented in python
Here is the general strategy behind binary search, which is applicable to a variety of problems:

1. Come up with a condition to determine whether the answer lies before, after or at a given position
2. Retrieve the midpoint and the middle element of the list.
3. If it is the answer, return the middle position as the answer.
4. If answer lies before it, repeat the search with the first half of the list
5. If the answer lies after it, repeat the search with the second half of the list.
Here is the generic algorithm for binary search, implemented in Python:

In [58]:
def binary_search(lo, hi, condition):
    """TODO - add docs"""
    while lo <= hi:
        mid = (lo + hi) // 2
        result = condition(mid)
        if result == 'found':
            return mid
        elif result == 'left':
            hi = mid - 1
        else:
            lo = mid + 1
    return -1

The worst-case complexity or running time of binary search is O(log N), provided the complexity of the condition used to determine whether the answer lies before, after or at a given position is O(1).

Note that binary_search accepts a function condition as an argument. Python allows passing functions as arguments to other functions, unlike C++ and Java.

We can now rewrite the locate_card function more succinctly using the binary_search function.

In [60]:
def locate_card(cards, query):
    
    def condition(mid):
        if cards[mid] == query:
            if mid > 0 and cards[mid-1] == query:
                return 'left'
            else:
                return 'found'
        elif cards[mid] < query:
            return 'left'
        else:
            return 'right'
    
    return binary_search(0, len(cards) - 1, condition)

In [61]:
import time
start_time = time.time()
for test in tests:
    print(locate_card(**test['input']) == test['output'])
end_time = time.time()
print('Time taken',end_time-start_time)

True
True
True
True
True
True
True
True
True
Time taken 0.0060536861419677734


## Question: Given an array of integers nums sorted in ascending order, find the starting and ending position of a given number.

This differs from the problem in only two significant ways:

The numbers are sorted in increasing order.
We are looking for both the increasing order and the decreasing order.
Here's the full code for solving the question, obtained by making minor modifications

In [62]:
def first_position(nums, target):
    def condition(mid):
        if nums[mid] == target:
            if mid > 0 and nums[mid-1] == target:
                return 'left'
            return 'found'
        elif nums[mid] < target:
            return 'right'
        else:
            return 'left'
    return binary_search(0, len(nums)-1, condition)

def last_position(nums, target):
    def condition(mid):
        if nums[mid] == target:
            if mid < len(nums)-1 and nums[mid+1] == target:
                return 'right'
            return 'found'
        elif nums[mid] < target:
            return 'right'
        else:
            return 'left'
    return binary_search(0, len(nums)-1, condition)

def first_and_last_position(nums, target):
    return first_position(nums, target), last_position(nums, target)

In [67]:
first_and_last_position([1,2,3,3,5,6,7,8,9,10],5)

(4, 4)