<table border="0" align="left" width="700" height="144">
<tbody>
<tr>
<td width="120"><img width="100" src="https://static1.squarespace.com/static/5992c2c7a803bb8283297efe/t/59c803110abd04d34ca9a1f0/1530629279239/" /></td>
<td style="width: 600px; height: 67px;">
<h1 style="text-align: left;">Algorithms, Big O, Linear Search, Binary Search</h1>
<p><em>with excerpts from Grokking Algorithms, by Aditya Y. Bhargava</em>
<p><a href="https://colab.research.google.com/github/KenzieAcademy/python-notebooks/blob/master/cs_binary_search.ipynb"> <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab" align="left" width="188" height="32" /> </a></p>
</td>
</tr>
</tbody>
</table>

# Exercise: Find the value 7 in the following list of x's
Imagine that each `x` in the top list hides a number. How would you find the value 7?

To make this fair to all the computers out there, since a computer can only see one item of a list at a time, only one new `x` should be revealed at a time!

> # [x, x, x, x, x, x, x]


# Algorithms

An **algorithm** is a series of steps for solving a problem. That's it. There's nothing more to it.

If, in the previous exercise, you looked at each item from the beginning, one at a time, until you found what you were looking for or you reached the end of the list, you performed a **linear search**.

Linear search, as simple as it is, is an example of an algorithm. You could define its steps as:
* *Repeat until the item is found or there are no more items*
  * *Look at the next item*

And, in Python code...


In [None]:
# Example of linear search
nums = [4, 9, 3, 11, 6, 1, 7]
def linear_search(value):
  for i, n in enumerate(nums):  # repeat until...
    if n == value:  # the item is found
      return True
    elif i == len(nums)-1:  # there are no more items
      return False
    # look at the next item

linear_search(7)

# Big O

In computer science, it is important to determine and be able to express how efficient an algorithm is at solving the problem for which it is meant. In doing so, you can make informed choices about which algorithm might be best suited for a problem.

Let's consider the linear search algorithm. For any given list size, what is the number of steps that will need to be taken to accomplish its goal?
* In the best case, the item being searched for is the first one in the list.
* In the worst case, the item being searched for is the last item in the list.

Well, in order to accurately express an algorithm's efficiency, we have to focus on the *worst case scenario*. In the example of a linear search, that means that every item in the list must potentially be looked at in order to accomplish the goal &mdash; that is, *n* items.

There is a notation used to express the efficiency of an algorithm, and it is referred to as **Big O**.

**Big O** is a shorthand notation and, like any other notation or abbreviation, is meant to succintly communicate a larger idea. The *O* means "on the order of" as a way to express magnitude. What follows the *O*, in parentheses, is the "worst case scenario" of how many steps it will require. To express the efficiency of the linear search algorithm, we would write it as follows:
* *O(n)* &mdash; pronounced "Big O of n", meaning "on the order of *n*" (also known as *linear time*)

From just the phrase "linear search is *O(n)*", we can deduce that, however it works, it must potentially look at every single item of a list in order to do its job. As the number and types of algorithms you become familiar with grows, this will be a valuable reminder at a glance to inform your decision on what approach to take in solving a given problem.

# A Different Approach to Searching

I'm thinking of a number between 1 and 100. You have to try to guess my number in the fewest tries possible. With every guess, I'll tell you if your guess is too low, too high, or correct.

With a linear search, you could start guessing like this: 1, 2, 3, 4, ...
...and here's how that would go:

* 1 ..."too low"
* 2 ..."too low"
* 3, 4, 5, 6..."too low"
* "AARGGH!"

With each guess, you're eliminating only one number. If my number was 99, it could take you 99 guesses to get there!

## Binary Search

Here's a better way to guess the number above &mdash; Start with 50.
* 50 ..."too low"
...too low! But, you just eliminated *half* the numbers! Now you know that 1-50 are all too low. Next guess: 75.
* 75 ..."too high"
...too high, but again you cut down half the remaining numbers!

This algorithm is known as **binary search**, you guess the middle number and eliminate half of the remaining numbers every time.

*Spoiler alert*: My number is 57.

Next guess is 63 (halfway between 50 and 75).
* 63 ..."too high"
* 57 ..."yes!"

Here is how many numbers were eliminated and how many steps it took.
* 100 items -> 50 -> 25 -> 13 -> 7 -> 4 -> 2 -> 1
That's just 7 steps, no matter what number you're searching for!

Suppose you were looking for a word in a dictionary that contains 240,000 words. In the worst case, a linear search will take 240,000 steps if the word you're looking for is the very last one in the book. A binary search will take only 18 steps!

* 240k -> 120k -> 60k -> 30k -> 15k -> 7.5k -> 3750 -> 1875 -> 938 -> 469 -> 235 -> 118 -> 59 -> 30 -> 15 -> 8 -> 4 -> 2 -> 1

In general, binary search will take log<sub>2</sub> *n* (or just log *n*) steps.

* Binary search: **O(log *n*)** &mdash; pronounced "Big O of log *n*" (also known as *logarithmic time*)

## It's not all rainbows and unicorns...
You may have noticed a potential downside to the whole business of performing a binary search.

**Binary search requires its input to be in sorted order**.

What would happen if the list was not sorted?

### Unsorted
> # [4, 9, 3, 11, 6, 1, 7]
In the list above, it cannot be assumed that everything to the left of the middle item is smaller, nor can it be assumed that everything to the right is larger. Binary search does not work on unsorted data. We will need to rely on the *O(n)* efficiency of a linear search algorithm here.

### Sorted
> # [1, 3, 4, 6, 7, 9, 11]

In this sorted list, we can safely assume that items to the left of a number are smaller and items to the right are larger, so we can proceed with a binary search and take advantage of its *O(log n)* efficiency.

### Tradeoffs
In the world of computer science you will often have to choose one benefit at the cost of another. In the case above between the sorted and unsorted lists, we could have determined that it would ultimately be better to first sort the previously unsorted list so that a binary search could be performed.

In doing so, we are deciding that the initial cost of sorting the items would be beneficial in the long run &mdash; maybe because the items will be searched over and over again and so that is where we want to focus our optimizations. We trade the simplicity of searching immediately (linear search) for one potentially complex setup step (sorting) up front so that every successive search will be more optimized.

If, in contrast, you know that you will only be performing a single search of the list, the added cost of sorting the data before searching might not be worthwhile.

These are the kinds of decisions that computer science can enable you to make.

**Algorithms** such as the ones discussed here provide steps to solving common problems.

**Big O** notation allows us to succinctly describe the efficiency of algorithms in order to aid in the decision making process about which approach might be better for the problem at hand.

The actual implementation of any algorithm can be flexible and takes practice. For example, we'll leave you with two ways to implement a binary search using Python...

## Binary Search in Python &mdash; 2 ways
*Note*: If the element you're looking for is in the list, binary search returns the position where it's located. Otherwise, it will return a *null* value (*None* in Python).

In [None]:
# Binary search, iteratively
def binary_search(list_, item):
    low = 0
    high = len(list_) - 1
    i = 1  # track how many steps it takes to find the item
    while low <= high:
        print(f"Step #{i}: {list_[low:high+1]}")
        mid = (low + high) // 2
        if list_[mid] == item:
          return mid
        if list_[mid] > item:
          high = mid - 1
        else:
          low = mid + 1
        i += 1
    return None

In [None]:
# Binary search, recursively
def r_bsearch(list_, low, high, item):
  if low > high:
    return None

  mid = (low + high) // 2
  if list_[mid] == item:
    return mid
  if list_[mid] > item:
    low_high = (low, mid-1)
  else:
    low_high = (mid+1, high)
  return r_bsearch(list_, *low_high, item)

In [None]:
items = [1, 2, 3, 4, 8, 9, 13, 21, 42, 50]
item = 4
print(f"Searching for {item} in\n{items}")
print(f"\nIterative: found {item} at index {binary_search(items, item)}")
print(f"\nRecursive: found {item} at index {r_bsearch(items, 0, len(items)-1, item)}")