# 1. Let's Play a Game

Imagine you and I are playing a game. You have to guess a number between 1 and 100, and after each guess I'll tell you whether the answer is higher or lower than your guess.

Perhaps your strategy is to start at 1. If 1 isn't the answer, you guess 2, then 3, and so on. This strategy resembles the linear search we learned in our last mission. However, since I'm giving you helpful hints, I'll tell you that a linear search is a naive approach to this game.

# 2. A Better Strategy

Instead, imagine guessing 50 first. I tell you the answer is higher. Suddenly, you've removed half of the original possibilities for consideration. You then guess 75, and I tell you the answer is lower. In only two guesses, you've eliminated 3/4 of the possibilities, and you now know that the answer lies between 50 and 75. That's a significant reduction, and your strategy is very efficient.

**A binary search can help us find an item in a list efficiently if we know the list is ordered. We can check the middle element of the list, compare it to the item we're looking for, and continue narrowing our search in this manner.**

# 3. When can we use binary search?

* So if binary search is more efficient than linear search, why ever bother with linear search at all?

The answer is that we can `only perform a binary search on ordered data`. 

* `To order data, we must be able to compare two elements and determine which is greater`, or if they're equal. We can compare two strings the same way we compare integers. For instance, "A" is less than "Z", and "A" < "Z" would evaluate to True.

# 4. Implementing Binary Search: Part 1

## TODO:
* Write the function player_age, which takes in name as a parameter.

For now, start your guess at the middle of the list. Return "later" if the name we want is later in the list, "earlier" if it's earlier in the list, and "found" if you've found the right name.
Store the result of calling player_age on "Darius Johnson-Odom" in johnson_odom_age.

* Store the result of calling player_age on "Nick Young" in young_age.

* Store the result of calling player_age on "Jeff Adrien" in adrien_age.

In [1]:
import csv
import math

nba=list(csv.reader(open('nba_2013.csv')))
print(nba[:3])

[['player', 'pos', 'age', 'bref_team_id', 'g', 'gs', 'mp', 'fg', 'fga', 'fg.', 'x3p', 'x3pa', 'x3p.', 'x2p', 'x2pa', 'x2p.', 'efg.', 'ft', 'fta', 'ft.', 'orb', 'drb', 'trb', 'ast', 'stl', 'blk', 'tov', 'pf', 'pts', 'season', 'season_end'], ['Quincy Acy', 'SF', '23', 'TOT', '63', '0', '847', '66', '141', '0.468', '4', '15', '0.266666666666667', '62', '126', '0.492063492063492', '0.482', '35', '53', '0.66', '72', '144', '216', '28', '23', '26', '30', '122', '171', '2013-2014', '2013'], ['Steven Adams', 'C', '20', 'OKC', '81', '20', '1197', '93', '185', '0.503', '0', '0', 'NA', '93', '185', '0.502702702702703', '0.503', '79', '136', '0.581', '142', '190', '332', '43', '40', '57', '71', '203', '265', '2013-2014', '2013']]


In [2]:
# A function to extract a player's last name
def format_name(name):
    return name.split(" ")[1] + ", " + name.split(" ")[0]

# The length of the data set
length = len(nba)

# Implement the player_age function. For now, just return what the instructions specify
def player_age(name):
    # We need to format our name appropriately for successful comparison
    name = format_name(name)
    # First guess halfway through the list
    first_guess_index = math.floor(length/2)
    first_guess = format_name(nba[first_guess_index][0])
    # Check where we should continue searching
def player_age(name):
    name = format_name(name)
    first_guess_index = math.floor(length/2)
    first_guess = format_name(nba[first_guess_index][0])
    if name < first_guess:
        return "earlier"
    elif name > first_guess:
        return "later"
    else:
        return "found"
    
johnson_odom_age = player_age("Darius Johnson-Odom")
young_age = player_age("Nick Young")
adrien_age = player_age("Jeff Adrien")

# 5. Implementing Binary Search: Part 2

We've found our first guess and figured out where to keep looking. The next step is to continue our binary search.

Let's imagine a round of our game from before. You guess 50, and I tell you the answer is higher. Now what do you do? You guess 75 - but how did you calculate that value? This is the step we'll focus on in part two of our implementation.

We can calculate the index of the next split in several ways. Whichever method we use, we must keep track of the upper and lower bounds of our search. At the beginning of our game, the lower bound is 1, and the upper bound is 100. After I tell you the answer is greater than 50, the lower bound becomes 51 while the upper bound remains 100.

The bounds will look slightly different in our binary search implementation, but only because the data set's index starts at 0 instead of 1. It's important to note that our bounds are inclusive.

In [3]:
# A function to extract a player's last name
def format_name(name):
    return name.split(" ")[1] + ", " + name.split(" ")[0]

# The length of the data set
length = len(nba)

# Implement the player_age function. For now, just return what the instructions specify

def player_age(name):
    # We need to format our name appropriately for successful comparison
    name = format_name(name)
    upper_bound = length - 1
    lower_bound = 0
    first_guess_index = math.floor(length/2)
    first_guess = format_name(nba[first_guess_index][0])
    if name < first_guess:
        upper_bound = first_guess_index - 1
    elif name > first_guess:
        lower_bound = first_guess_index + 1
    else:
        return first_guess
    second_guess_index = math.floor((lower_bound + upper_bound) / 2)
    second_guess = format_name(nba[second_guess_index][0])
    return second_guess
    
gasol_age = player_age("Pau Gasol")
pierce_age = player_age("Paul Pierce")

# 6. Pseudo-Code

`Writing algorithms is less an exercise in coding than an exercise in reasoning.` It's important to train your ability to develop and visualize algorithms. pseudo-code is a powerful, easy-to-use tool that will help you do this. You've already seen plenty of pseudo-code, even in this mission.

Pseudo-code comments reflect the code we want to write, but describe it in a high-level human language. For example, we saw the following code snippet on the previous screen:


#If the name comes before our guess
    # Adjust the bounds as needed
#Else if the name comes after our guess
    # Adjust the bounds as needed
#Else
    # Player found, so return first guess
    
    
The comments in this snippet serve as placeholders for code we haven't written yet.` Writing pseudo-code like this can often help us plan and visualize an algorithm before worrying about syntactic details.`

`Pseudo-code is a great tool for all aspects of programming, and we'll use it in this mission to indicate where we need to write certain code.`

# 7. Implementing Binary Search: Part 3

We've implemented a binary search function that runs for two iterations. It guesses twice, but if it doesn't find the answer in those two guesses, it gives up. This isn't robust, and we shouldn't stop until we've found our answer.

We've also seen that the guessing code is very repetitive. After each guess, we check whether it's correct, adjust our bounds as needed, and then guess again. This is precisely the logic we need, and we can run that logic over and over again. Next, we'll translate it into a loop.

In [4]:
# A function to extract a player's last name
def format_name(name):
    return name.split(" ")[1] + ", " + name.split(" ")[0]

# The length of the data set
length = len(nba)

# Implement the player_age function. For now, just return what the instructions specify

def player_age(name):
    # We need to format our name appropriately for successful comparison
    name = format_name(name)
    # Bounds of the search
    upper_bound = length - 1
    lower_bound = 0
    # Index of first split
    index = math.floor((lower_bound + upper_bound) / 2)
    # First guess halfways through the list
    guess = format_name(nba[index][0])
    # Search until it finds the name
    while name != guess:
        if name < guess:
            upper_bound = index - 1
        else:
            lower_bound = index + 1
        index = math.floor((lower_bound + upper_bound) / 2)
        guess = format_name(nba[index][0])
    return "found"

carmelo_age = player_age("Carmelo Anthony")

# 8. Implementing Binary Search: Part 4

We're almost finished implementing our binary search. We still have to retrieve the player's age if we find him, and return -1 if we don't. We can tell when the function doesn't find a player by adding a small condition to our search.

We should continue to search until we find the player, or until our list of possible answers is depleted. If we deplete all possible answers, the final step of our search, when upper_bound is equal to lower_bound (and also equal to index), will result in either upper_bound being decremented, or lower_bound being incremented. When this happens, lower_bound will be above upper_bound. We can easily check for this in our loop. It's very important to understand this nuance of our algorithm in order to take advantage of it.

Because these additions are short, we've also left it up to you to fill in the missing components of our algorithm.

In [5]:
# A function to extract a player's last name
def format_name(name):
    return name.split(" ")[1] + ", " + name.split(" ")[0]

# The length of the data set
length = len(nba)

# Implement the player_age function. For now, just return what the instructions specify

def player_age(name):
    name = format_name(name)
    upper_bound = length - 1
    lower_bound = 0
    index = math.floor((upper_bound + lower_bound) / 2)
    guess = format_name(nba[index][0])
    while name != guess and upper_bound >= lower_bound:
        if name < guess:
            upper_bound = index - 1
        else:
            lower_bound = index + 1
        index = math.floor((lower_bound + upper_bound) / 2)
        guess = format_name(nba[index][0])
    if name == guess:
        return nba[index][2]
    else:
        return -1
    
curry_age = player_age("Stephen Curry")
griffin_age = player_age("Blake Griffin")
jordan_age = player_age("Michael Jordan")

# 9. Binary Search Time Complexity Analysis

We've established that every iteration of the algorithm reduces the size of our problem by a factor of two. Because the algorithm's time complexity depends on the input size, we can conclude that it's not constant time. It's not linear time either, though, because it's more efficient than a linear search.

It turns out that binary search runs in logarithmic time, which we denote as O(log(n)). Logarithms are the mathematical counterpart to exponents. It makes sense that an algorithm that cuts its problem size in half (or by any fraction) with each iteration will be logarithmic. Here's a graph of constant, linear, and logarithmic time: