# Algorithms

## What is an algorithm?

An algorithm is a well-defined series of steps for performing a task, such as making calculations or processing data. An algorithm usually has an input and an output. 

### In reality, any code we write performs an algorithm, whether it be simple or complicated.

In real life, we perform algorithms daily. Following a cookie recipe is an example of a series of steps that takes an input (the ingredients) and produces an output (the cookies).<br>

You may have seen machine learning algorithms on Dataquest. These are a special type of algorithm. There are many other kinds of algorithms. In this mission, we'll show you **a few examples** of what an algorithm looks like, and introduce **some methods for evaluating their efficiency**.

## Implementing an Algorithm

Let's start with a simple algorithm that searches for a value in a list. We could use a linear search algorithm to do this. Remember that an algorithm is a particular method for performing a task, and linear search is only one of several algorithms that can solve this problem.<br>

Linear search checks a list of items for a particular value by reviewing each item in the list until it finds the one it's looking for. If it doesn't find a matching item, we can conclude that there's no matching item in the list.

You'll be working with `nba`, a data set containing the names and ages of National Basketball Association (NBA) players from 2013, along with some statistics.

* Write a linear search algorithm to find `"Kobe Bryant"` in the `nba` data set.
  * The first column (index 0) contains each player's name.
  * Once the algorithm finds `"Kobe Bryant"`, store his position (from the second column) in the variable `kobe_position`.

In [1]:
# When the algorithm finds Kobe in the data set, store his position in Kobe_position
kobe_position = ""

# Find Kobe in the data set

In [2]:
import pandas as pd

nba = pd.read_csv('data/nba_2013.csv')
nba.head()

Unnamed: 0,player,pos,age,bref_team_id,g,gs,mp,fg,fga,fg.,...,drb,trb,ast,stl,blk,tov,pf,pts,season,season_end
0,Quincy Acy,SF,23,TOT,63,0,847,66,141,0.468,...,144,216,28,23,26,30,122,171,2013-2014,2013
1,Steven Adams,C,20,OKC,81,20,1197,93,185,0.503,...,190,332,43,40,57,71,203,265,2013-2014,2013
2,Jeff Adrien,PF,27,TOT,53,12,961,143,275,0.52,...,204,306,38,24,36,39,108,362,2013-2014,2013
3,Arron Afflalo,SG,28,ORL,73,73,2552,464,1011,0.459,...,230,262,248,35,3,146,136,1330,2013-2014,2013
4,Alexis Ajinca,C,25,NOP,56,30,951,136,249,0.546,...,183,277,40,23,46,63,187,328,2013-2014,2013


In [4]:
nba_player = nba.iloc[:,0]
kobe_row = nba[nba_player == 'Kobe Bryant']

In [8]:
kobe_position = kobe_row['pos']
kobe_position

68    SG
Name: pos, dtype: object

## The Importance of Modularity and Abstraction

As algorithms become more complex, it's important to make sure the code remains modular.<br>

**Modular** code consists of smaller chunks that we can reuse for other things. The most common way to make code modular is to use functions.<br>

**Abstraction** is the idea that someone can use our code to perform an operation without having to worry about how we wrote or implemented it.<br>

The `sum()` function exhibits both **modularity** and **abstraction**. We don't know exactly how the function is implemented, and we don't need to; we only need to know what it does. That makes it **abstract**. It also saves us the work of having to manually compute sums in many parts of our code. That makes it **modular**.

## Linear Search with Modular Code

Now let's try writing a modular search function that can find the age of any player in our data set without having to repeat code.

* Write a function called `player_age` that takes in a name parameter.
  * The function should return the player's age from the `nba` data set, which we've loaded in for you.
  * If the function doesn't find the player, it should return `-1`.
  * The third column of `nba` (index 2) contains the players' ages.
* Store the age of `"Ray Allen"` in the variable `allen_age`.
* Store the age of `"Kevin Durant"` in the variable `durant_age`.
* Store the age of `"Shaquille O'Neal"` in the variable `shaq_age`.

In [24]:
def player_age(name):
    try:
        return nba[nba['player']==name]['age'].values[0]
    except:
        return -1

In [25]:
allen_age = player_age('Ray Allen')
durant_age = player_age('Kevin Durant')
shaq_age = player_age("Shaquille O'Neal")

In [26]:
print(allen_age, durant_age, shaq_age)

38 25 -1


## What Makes an Algorithm Smart?

So far, we've been working with linear search, which is a fairly basic algorithm. When we need to perform more complicated tasks, algorithms can become very involved, especially considering that many different ones can achieve the same result.<br>

With multiple algorithms to choose from, a programmer has to make trade-offs and decide which algorithm best suits his or her needs. The most common factor to consider is **time complexity**.<br>

**Time complexity** is a measurement of how much time an algorithm takes with respect to its input size. Algorithms with smaller time complexities generally take less time and are more desirable.

## Constant Time Algorithms

A constant algorithm takes the same amount of time to complete, regardless of the input size.<br>

For example, let's consider an algorithm that returns the first element of a list:

```python
def first(ls):
    return ls[0]
```

Regardless of list size, the algorithm returns the first element in **constant time**. 

### It only takes one operation to retrieve this element, no matter how large the list.<br>

We tend to think of algorithms in terms of steps. We consider any basic operation like setting a variable or performing arithmetic a step. Algorithms that take a constant number of steps are always constant time, even if that constant number is not 1.<br>

Most complicated algorithms are not constant time. However, many operations within larger algorithms are constant time. Since we don't particularly care about what the constant is, we don't need to tediously count steps, as long as we're certain we'll get a constant.<br>

An example of an operation that's not constant time is a loop that touches every element in an input list. Since a larger input would necessitate more steps, we can't treat this operation as a constant. We'll look closely at cases like this soon.


## Exercise: Recognizing Constant Time Algorithms
It's important to recognize the time complexity of your algorithms. This exercise will help you learn to identify them.

* Read the function implementations in the code cell. Of A, B, and C, one implementation is not constant time.
* Indicate which one is `not constant` time by setting `not_constant` to the value `"A"`, `"B"`, or `"C"`.

In [27]:
# Implementation A: Convert degrees Celcius to degrees Fahrenheit
def celcius_to_fahrenheit(degrees):
    step_1 = degrees * 1.8
    step_2 = step_1 + 32
    return step_2

# Implementation B: Reverse a list
def reverse(ls):
    length = len(ls)
    new_list = []
    for i in range(length):
        new_list[i] = ls[length - i]
    return new_list

# Implementation C: Print a blastoff message after a countdown
def blastoff(message):
    count = 10
    for i in range(count):
        print(count - i)
    print(message)

not_constant = "B"

## A Common Pitfall

We said earlier that we often consider small steps in an algorithm to be constant time. However, be careful not to assume that every small operation is. For instance, function calls and built-in Python operations are often not constant time because the function/operator itself isn't.


```python
def has_milk(fridge_items):
    if "milk" in fridge_items:
        return True
    else:
        return False
```

It's easy to mistake the function above for a constant time algorithm. However, Python's `in` operator has to search through the list we passed in to check whether the element `"milk"` exists. **This can take more or less time, depending on the size of the list. Therefore, this algorithm is not constant time**.

## Linear Time Algorithms

Now let's consider the linear search we wrote earlier. It looked something like this:

```python
def player_age(name):
    for row in nba:
        if row[0] == name:
            return row[2]
    return -1
```

The code above stops executing and returns immediately when it finds the NBA player. If the algorithm performs a linear search and the element we're looking for happens to be first on the list, then the search is very quick.<br>

However, that case isn't very interesting, and it doesn't tell us very much about what trade-offs we're really making by choosing that specific algorithm.<br>

The opposite scenario occurs when the element is very far down on the list, or doesn't exist at all. This is the case we care about, because accounting for the worst case scenario will ensure that the algorithm we choose or build is more robust.<br>

In the worst case scenario for a list of size n, the algorithm has to check n elements. We refer to this time complexity as **linear time** because the runtime grows at a constant rate with respect to the size of the input.<br>

Algorithms that take constant multiples of n steps (where n is the input size) are still linear time. For instance, an algorithm that takes 5n steps, or even 0.5n steps, is linear time. If we have an algorithm that prints the first half of a list (and we know the length of the list ahead of time), the algorithm will take 0.5n time. Even though it takes less than n time, we still consider it linear.<br>

It's also worth noting that we only care about performance at a large scale. At a small scale, most algorithms will run pretty quickly, and it's only when n becomes large that we worry about time complexity.<br>

Consequently, we only consider the highest order of n for time complexity. That means that an algorithm that runs in 9n + 20 time is linear, because the constant component is negligible for large values of n.

## Some Other Algorithms
So far, we've only seen linear time and constant time algorithms. While there are infinitely many categories of algorithms and time complexities, these two cover a large variety of possibilities.

Read the following implementations of algorithms, and indicate their time complexities by setting the corresponding variables to `"linear"` or `"constant"`.

In [37]:
# Find the length of a list
def length(ls):
    count = 0
    for elem in ls:
        count = count + 1

length_time_complexity = "linear"

# Check whether a list is empty -- Implementation 1
def is_empty_1(ls):
    if length(ls) == 0:
        return True
    else:
        return False

# since `length(ls)` function returns nothing,
# `is_empty_1(ls)` function always perform the same amount of operations.
is_empty_1_complexity = "constant"

# Check whether a list is empty -- Implementation 2
def is_empty_2(ls):
    for element in ls:
        return False
    return True

# to perform 'for' loop, 
# there should occur 'length of input list' times of operations.
is_empty_2_complexity = "linear"

## Notation for Time Complexity

When discussing time complexity, we should use the proper notation. Most commonly, we use **Big-O Notation**.<br>

To denote **constant time**, we would write **O(1)**, because 1 is a constant (and a simple constant).<br>

To denote **linear time**, we would write **O(n)**, because n is the simplest example of linearity.<br>

Big-O Notation follows a similar pattern for other time complexities. For example, **O(n^2), O(2^n), and O(log(n))** are all valid notation. The algorithms with these complexities are probably rather complicated, and we don't need to worry about them at the moment. For now, it's enough to be able to recognize **Big-O Notation** so we can use it to describe time complexities in future missions.

## Why Time Complexity Matters

Time complexity is an important consideration when we're analyzing real-world data. An inefficient algorithm will perform very slowly on a large data set.<br>

Algorithms with lower-order time complexities are more efficient. Constant time algorithms, which we denote with **O(1)**, are more efficient than linear time algorithms, which we denote with **O(n)**. Similarly, an algorithm with complexity **O(n^2)** is more efficient than one with complexity **O(n^3)**.<br>

When considering algorithms, we always want to choose the one with the lowest time complexity. It may not always be the easiest one to implement, but the extra effort is usually worth the resulting efficiency.