An algorithm is a well-defined series of steps for performing a task, such as making calculations or processing data. An algorithm usually has an input and an output. In reality, any code we write performs an algorithm, whether it be simple or complicated.

In real life, we perform algorithms daily. Following a cookie recipe is an example of a series of steps that takes an input (the ingredients) and produces an output (the cookies).

Linear search checks a list of items for a particular value by reviewing each item in the list until it finds the one it's looking for. If it doesn't find a matching item, we can conclude that there's no matching item in the list.

As algorithms become more complex, it's important to make sure the code remains modular.

Modular code consists of smaller chunks that we can reuse for other things. The most common way to make code modular is to use functions.

Abstraction is the idea that someone can use our code to perform an operation without having to worry about how we wrote or implemented it.

The sum() function exhibits both modularity and abstraction. We don't know exactly how the function is implemented, and we don't need to; we only need to know what it does. That makes it abstract. It also saves us the work of having to manually compute sums in many parts of our code. That makes it modular.

In [1]:
import pandas as pd
nba = pd.read_csv('nba_2013.csv')

In [2]:
def player_age(name):
    for row in nba:
        if row[0]==name:
            return row[2]
    else:
        return -1
    
allen_age = player_age("Ray Allen")
durant_age = player_age("Kevin Durant")
shaq_age = player_age("Shaquille O'Neal")        

In [3]:
print(allen_age)
print(durant_age)
print(shaq_age)

-1
-1
-1


In [4]:
nba.head()

Unnamed: 0,player,pos,age,bref_team_id,g,gs,mp,fg,fga,fg.,...,drb,trb,ast,stl,blk,tov,pf,pts,season,season_end
0,Quincy Acy,SF,23,TOT,63,0,847,66,141,0.468,...,144,216,28,23,26,30,122,171,2013-2014,2013
1,Steven Adams,C,20,OKC,81,20,1197,93,185,0.503,...,190,332,43,40,57,71,203,265,2013-2014,2013
2,Jeff Adrien,PF,27,TOT,53,12,961,143,275,0.52,...,204,306,38,24,36,39,108,362,2013-2014,2013
3,Arron Afflalo,SG,28,ORL,73,73,2552,464,1011,0.459,...,230,262,248,35,3,146,136,1330,2013-2014,2013
4,Alexis Ajinca,C,25,NOP,56,30,951,136,249,0.546,...,183,277,40,23,46,63,187,328,2013-2014,2013


So far, we've been working with linear search, which is a fairly basic algorithm. When we need to perform more complicated tasks, algorithms can become very involved, especially considering that many different ones can achieve the same result.

With multiple algorithms to choose from, a programmer has to make trade-offs and decide which algorithm best suits his or her needs. The most common factor to consider is time complexity.

##### Time complexity is a measurement of how much time an algorithm takes with respect to its input size. Algorithms with smaller time complexities generally take less time and are more desirable.

A constant algorithm takes the same amount of time to complete, regardless of the input size.

For example, let's consider an algorithm that returns the first element of a list:

def first(ls):

In [5]:
def first(ls):
    return ls[0]

Regardless of list size, the algorithm returns the first element in constant time. It only takes one operation to retrieve this element, no matter how large the list.

We tend to think of algorithms in terms of steps. We consider any basic operation like setting a variable or performing arithmetic a step. Algorithms that take a constant number of steps are always constant time, even if that constant number is not 1.

Most complicated algorithms are not constant time. However, many operations within larger algorithms are constant time. Since we don't particularly care about what the constant is, we don't need to tediously count steps, as long as we're certain we'll get a constant.

An example of an operation that's not constant time is a loop that touches every element in an input list. Since a larger input would necessitate more steps, we can't treat this operation as a constant.

We said earlier that we often consider small steps in an algorithm to be constant time. However, be careful not to assume that every small operation is. For instance, function calls and built-in Python operations are often not constant time because the function/operator itself isn't.



In [7]:
def has_milk(fridge_items):
    if "milk" in fridge_items:
        return True
    else:
        return False

It's easy to mistake the function above for a constant time algorithm. However, Python's in operator has to search through the list we passed in to check whether the element "milk" exists. This can take more or less time, depending on the size of the list. Therefore, this algorithm is not constant time.

Now let's consider the linear search we wrote earlier. It looked something like this:

In [8]:
def player_age(name):
    for row in nba:
        if row[0] == name:
            return row[2]
    return -1

The code above stops executing and returns immediately when it finds the NBA player. If the algorithm performs a linear search and the element we're looking for happens to be first on the list, then the search is very quick.

However, that case isn't very interesting, and it doesn't tell us very much about what trade-offs we're really making by choosing that specific algorithm.

The opposite scenario occurs when the element is very far down on the list, or doesn't exist at all. This is the case we care about, because accounting for the worst case scenario will ensure that the algorithm we choose or build is more robust.

In the worst case scenario for a list of size n, the algorithm has to check n elements. We refer to this time complexity as linear time because the runtime grows at a constant rate with respect to the size of the input.

Algorithms that take constant multiples of n steps (where n is the input size) are still linear time. For instance, an algorithm that takes 5n steps, or even 0.5n steps, is linear time. If we have an algorithm that prints the first half of a list (and we know the length of the list ahead of time), the algorithm will take 0.5n time. Even though it takes less than n time, we still consider it linear.

It's also worth noting that we only care about performance at a large scale. At a small scale, most algorithms will run pretty quickly, and it's only when n becomes large that we worry about time complexity.

Consequently, we only consider the highest order of n for time complexity. That means that an algorithm that runs in 9n + 20 time is linear, because the constant component is negligible for large values of n.

In [9]:
# Find the length of a list
def length(ls):
    count = 0
    for elem in ls:
        count = count + 1
length_time_complexity = "linear"

# Check whether a list is empty -- Implementation 1
def is_empty_1(ls):
    if length(ls) == 0:
        return True
    else:
        return False
is_empty_1_complexity = "linear"

# Check whether a list is empty -- Implementation 2
def is_empty_2(ls):
    for element in ls:
        return False
    return True
is_empty_2_complexity = "constant"

When discussing time complexity, we should use the proper notation. Most commonly, we use Big-O Notation.

To denote constant time, we would write O(1), because 1 is a constant (and a simple constant).

To denote linear time, we would write O(n), because n is the simplest example of linearity.

Big-O Notation follows a similar pattern for other time complexities. For example, O(n^2), O(2^n), and O(log(n)) are all valid notation. The algorithms with these complexities are probably rather complicated, and we don't need to worry about them at the moment

Time complexity is an important consideration when we're analyzing real-world data. An inefficient algorithm will perform very slowly on a large data set.

Algorithms with lower-order time complexities are more efficient. Constant time algorithms, which we denote with O(1), are more efficient than linear time algorithms, which we denote with O(n). Similarly, an algorithm with complexity O(n^2) is more efficient than one with complexity O(n^3).

When considering algorithms, we always want to choose the one with the lowest time complexity. It may not always be the easiest one to implement, but the extra effort is usually worth the resulting efficiency.

Instead, imagine guessing 50 first. I tell you the answer is higher. Suddenly, you've removed half of the original possibilities for consideration. You then guess 75, and I tell you the answer is lower. In only two guesses, you've eliminated 3/4 of the possibilities, and you now know that the answer lies between 50 and 75. That's a significant reduction, and your strategy is very efficient.

This is the strategy a binary search uses. A binary search can help us find an item in a list efficiently if we know the list is ordered. We can check the middle element of the list, compare it to the item we're looking for, and continue narrowing our search in this manner.

So if binary search is more efficient than linear search, why ever bother with linear search at all?

The answer is that we can only perform a binary search on ordered data. Recall that in our game, the key to our strategy was that we knew exactly how our guess compared to the correct number. We only knew this because there was an order to the "data."

To order data, we must be able to compare two elements and determine which is greater, or if they're equal. We can compare two strings the same way we compare integers. For instance, "A" is less than "Z", and "A" < "Z" would evaluate to True.

Next, we'll be searching a data set for the names of specific athletes who played in the NBA in 2012. The data set is in alphabetical order by last name, then first name. This a problem, because the data is ordered alphabetically by last name, but the first name is the first thing that appears in each row. As a result, we can't directly compare the names in their current, raw format. Instead, we'll need to format them as last_name, first_name.

Before moving on, be sure you understand why reformatting the names is important, and why it will allow us to compare names properly.

Let's start implementing a binary search on our list of NBA players.

We'll need to do some division by two to perform binary search. To ensure we get a sensible index, we'll cast the result of this division to an integer using the math.floor() function, which rounds down to the nearest integer.

We need to do this because if we're splitting an interval with an odd length, we'll get an index that has a fraction. Since a fraction is nonsense in the context of indexing a data set, we'll cast it to an integer. The choice to round down rather than up is arbitrary, but we'll use it for our implementation.

Because this is a fairly involved algorithm, we'll implement it piece by piece. First, we need to understand what step to take after each guess. We've created the format_name function to save you from tedious string manipulation. We've also loaded the nba data set for you.



In [12]:
nba = pd.read_csv('nba_2013.csv')

In [16]:
# A function to extract a player's last name
def format_name(name):
    return name.split(" ")[1] + ", " + name.split(" ")[0]

# The length of the data set
length = len(nba)

# Implement the player_age function. For now, just return what the instructions specify
def player_age(name):
    # We need to format our name appropriately for successful comparison
    name = format_name(name)
    # First guess halfway through the list
    first_guess_index = math.floor(length/2)
    first_guess = format_name(nba[first_guess_index][0])
    # Check where we should continue searching
    if name < first_guess:
        return "earlier"
    elif name > first_guess:
        return "later"
    else:
        return "found"
    
# johnson_odom_age = player_age("Darius Johnson-Odom")
# young_age = player_age("Nick Young")
# adrien_age = player_age("Jeff Adrien")

In [18]:
# A function to extract a player's last name
def format_name(name):
    return name.split(" ")[1] + ", " + name.split(" ")[0]

# The length of the data set
length = len(nba)

# Implement the player_age function. For now, just return what the instructions specify
def player_age(name):
    # We need to format our name appropriately for successful comparison
    name = format_name(name)
    # Bounds of the search
    upper_bound = length - 1
    lower_bound = 0
    # Index of first split. It's important to understand how we compute this
    index = math.floor((upper_bound + lower_bound) / 2)
    # First, guess halfway through the list
    guess = format_name(nba[index][0])
    # Keep guessing until it finds the name. Use a while loop here.
        # Check where our guess is in relation to the name we're requesting,
        #     and adjust our bounds as necessary (multiple lines here).
        #     If we have found the name, we wouldn't be in this loop, so
        #     we shouldn't worry about that case
        # Find the new index of our guess
        # Find and format the new guess value
    # When our loop terminates, we have found the right NBA player's name
def player_age(name):
    name = format_name(name)
    #Bounds for Search
    upper_bound = length - 1
    lower_bound = 0
    #index for first split
    index = math.floor((lower_bound + upper_bound)/2)
    #First guess half way through the list
    guess = format_name(nba[index][0])
    #Search until it finds the name
    while name != guess:
        if name < guess:
            upper_bound = index -1
        else:
            lower_bound = index + 1
        index = math.floor((lower_bound + upper_bound)/2)
        guess = format_name(nba[index][0])
    return "found"
# carmelo_age = player_age("Carmelo Anthony")

We still have to retrieve the player's age if we find him, and return -1 if we don't. We can tell when the function doesn't find a player by adding a small condition to our search.

We should continue to search until we find the player, or until our list of possible answers is depleted. If we deplete all possible answers, the final step of our search, when upper_bound is equal to lower_bound (and also equal to index), will result in either upper_bound being decremented, or lower_bound being incremented. When this happens, lower_bound will be above upper_bound. We can easily check for this in our loop. It's very important to understand this nuance of our algorithm in order to take advantage of it.

Because these additions are short, we've also left it up to you to fill in the missing components of our algorithm.

In [23]:
# A function to extract a player's last name
def format_name(name):
    return name.split(" ")[1] + ", " + name.split(" ")[0]

# The length of the data set
length = len(nba)

# Implement the player_age function. For now, just return what the instructions specify
def player_age(name):
    name = format_name(name)
    # Set the initial upper bound of the search
    # Set the initial lower bound of the search
    # Set the index of the first split (remember to use math.floor)
    # First guess at index (remember to format the guess)
    # Run search code until the name is equal to the guess, or upper bound is less than lower bound
        # If name comes before the guess
            # Change the appropriate bound
        # Else (name comes after the guess)
            # Change the appropriate bound
        # Set the index of our next guess (remember to use math.floor)
        # Retrieve and format our next guess
        
    ### Now that our loop has terminated, we must find out why ###
    # If the name is equal to the guess
        # Return the age of the player at index (column index 2 in data set)
    # Else
        # Return -1, because the function didn't find our player
def player_age(name):
    name = format_name(name)
    upper_bound = length -1
    lower_bound = 0
    index = math.floor((upper_bound+ lower_bound)/2)
    guess = format_name(nba[index][0])
    while name != guess and upper_bound >=lower_bound:
        if name < guess:
            upper_bound = index - 1
        else:
            lower_bound = index + 1
        index = math.floor((upper_bound + lower_bound)/2)
        guess = format_name(nba[index][0])
    if name == guess:
        return nba[index][2]
    else:
        return -1
    
# curry_age = player_age("Stephen Curry")
# griffin_age = player_age("Blake Griffin")
# jordan_age = player_age("Michael Jordan")       
            
            