**Python algorithms**

Resources:

 - [MIT Introduction to Algorithms](https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-046j-introduction-to-algorithms-sma-5503-fall-2005/)
 - [PDF notes from MIT class](https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-046j-introduction-to-algorithms-sma-5503-fall-2005/video-lectures/lecture-1-administrivia-introduction-analysis-of-algorithms-insertion-sort-mergesort/lec1.pdf)
 - [Runestone course (notes below follow this course)](https://runestone.academy/runestone/books/published/pythonds/Introduction/GettingStartedwithData.html)
 
- Objective is to cover this in 6 weeks (Nov 3 - Dec 22, account for Thanksgiving)
- There are 112 subsections in sections 1-7:
    - Cover 19 sections per week or about 4 per day (M-F)
    - Move quickly through sections 1-2 (try to do 6 per day)
    - Generally prioritize sections 3-7
- Only included subsections where notes are needed

Goals:
- By week 1 (11/8): finish sections 1 and 2
- By week 2 (11/15): finish section 3
- By week 3 (11/22): finish half of section 4
- By week 4 (11/29): finish section 4
- By week 5 (12/6): finish section 5
- By week 6 (12/13): finish section 6, start section 7
- By week 7 (12/20): finish section 7

Log:

- 11/6/19: halfway through section 1.13
- 11/12/19: finished going quickly through everything then starting over in detail, focusing on what Insight prioritizes


Insight advice:

*Action Item: Code the examples in Problem Solving with Algorithms and Data Structures in Python. In particular, become familiar with:
- stacks
- queues
- linked lists
- merge sort
- quick sort
- searching and hashing

If you prefer to learn by watching lectures, check out the MIT Introduction to Algorithms course. Bonus: For each algorithm or data structure you learn about, try to program it from scratch in Python, from memory. Many Fellows have also found Leetcode to also be useful in the interview prep for their CS section.*

In [7]:
# Code formatting Jupyter black
%load_ext nb_black

<IPython.core.display.Javascript object>

[Writing math symbols in markdown](https://jupyter-notebook.readthedocs.io/en/stable/examples/Notebook/Typesetting%20Equations.html)

# Introduction

17 subsections (try to go quickly; come back to it if needed)


**1.8. Getting Started with Data**

Nice overview of methods for lists, dictionaries, sets
https://runestone.academy/runestone/books/published/pythonds/Introduction/GettingStartedwithData.html

**1.9. Input and Output**

Helpful notes for string formatting (such as with % operator)
https://runestone.academy/runestone/books/published/pythonds/Introduction/InputandOutput.html

**1.13. Object-Oriented Programming in Python: Defining Classes**

Helpful for me to go over
https://runestone.academy/runestone/books/published/pythonds/Introduction/ObjectOrientedProgramminginPythonDefiningClasses.html


 - We use abstract data types to provide the logical description of what a data object looks like (its state) and what it can do (its methods).
 - An object is an instance of a class
 
 **1.13.1. A Fraction Class (an example)**

In [1]:
# Example of creating a class

class Fraction:    
    # Constructor method (all classes have this) - it defines the way in which data objects are created
    # The self is always there, it references back to the object itself
    def __init__(self,top,bottom):
        # The numerator and denominator represent "state" data
        # This provides the fraction object its starting value
        self.num = top
        self.den = bottom
        

In [2]:
# Create an instance of the class
myfraction = Fraction(3,5)
myfraction

<__main__.Fraction at 0x1075e5358>

In [3]:
# Example of creating a class and then adding a method to display it

class Fraction:    
    # Constructor method (all classes have this) - it defines the way in which data objects are created
    # The self is always there, it references back to the object itself
    def __init__(self,top,bottom):
        # The numerator and denominator represent "state" data
        # This provides the fraction object its starting value
        self.num = top
        self.den = bottom
        
    def show(self):
        print(str(self.num) + '/' + str(self.den))

In [45]:
# note something is wrong below but above gives me a basic idea of what's going on

In [26]:
# Create an instance of the class
myfraction = Fraction(3,5)
print(myfraction)
print(myfraction.show())

<__main__.Fraction object at 0x10c949080>
3 / 5
None


In [59]:
# Including a multiplication of fractions function

def gcd(m,n):
    while m%n != 0:
        oldm = m
        oldn = n

        m = oldn
        n = oldm%oldn
    return n

class Fraction:    
    # Constructor method (all classes have this) - it defines the way in which data objects are created
    # The self is always there, it references back to the object itself
    def __init__(self,top,bottom):
        # The numerator and denominator represent "state" data
        # This provides the fraction object its starting value
        self.num = top
        self.den = bottom
        
    def __str__(self):
         return str(self.num)+"/"+str(self.den)

    def show(self):
         print(self.num,"/",self.den)

    def __mul__(self,otherfraction):
        newnum = self.num*otherfraction.num
        newden = self.den * otherfraction.den
        #common = gcd(newnum,newden)
        # Note return line here
        return Fraction(newnum,newden)
        #return Fraction(newnum//common,newden//common)


In [60]:
f1 = Fraction(3,5)
f2 = Fraction(1,2)
f3 = f1*f2
print(f3)

3/10


In [47]:
# Below is the entirely from the course notes for completion 

def gcd(m,n):
    while m%n != 0:
        oldm = m
        oldn = n

        m = oldn
        n = oldm%oldn
    return n

class Fraction:
    def __init__(self,top,bottom):
        self.num = top
        self.den = bottom

    def __str__(self):
        return str(self.num)+"/"+str(self.den)

    def show(self):
        print(self.num,"/",self.den)

    def __add__(self,otherfraction):
        newnum = self.num*otherfraction.den + \
                      self.den*otherfraction.num
        newden = self.den * otherfraction.den
        common = gcd(newnum,newden)
        return Fraction(newnum//common,newden//common)

    def __eq__(self, other):
        firstnum = self.num * other.den
        secondnum = other.num * self.den

        return firstnum == secondnum

    
x = Fraction(1,2)
y = Fraction(2,3)
print(x+y)
print(x == y)

**1.13.2. Inheritance: Logic Gates and Circuits**
    

- ability for one class to be related to another class
- example: python sequential collection has list as a subclass `IS-A Relationship`
- concept of inheritance can be extended to series of logic gates (and/or/not statements)

- From notes:
*Now, with the Connector class, we say that a Connector **HAS-A LogicGate** meaning that connectors will have instances of the LogicGate class within them but are not part of the hierarchy. When designing classes, it is very important to distinguish between those that have the IS-A relationship (which requires inheritance) and those that have HAS-A relationships (with no inheritance).*

Possibly come back to this if needed

**1.15. Key Terms**

(good to quiz self)

abstract data type
abstraction
algorithm
class
computable
data abstraction
data structure
data type
deep equality
dictionary
encapsulation
exception
format operator
formatted strings
HAS-A relationship
implementation-independent
information hiding
inheritance
inheritance hierarchy
interface
IS-A relationship
list
list comprehension
method
mutability
object
procedural abstraction
programming
prompt
self
shallow equality
simulation
string
subclass
superclass
truth table

**Probably good to do programming exercises in this chapter - come back to it**

# A Proper Class

2 subsections

More in-depth `class` exercises

Come back to this if needed

# Analysis

11 subsections (goes over Big-O)

## Objectives

- To understand why algorithm analysis is important.
- To be able to use “Big-O” to describe execution time.
- To understand the “Big-O” execution time of common operations on Python lists and dictionaries.
- To understand how the implementation of Python data impacts algorithm analysis.
- To understand how to benchmark simple Python programs.

## What is algorithm analysis?

When two programs solve the same problem but look different, is one program better than the other?

In order to answer this question, we need to remember that there is an important difference between a program and the underlying algorithm that the program is representing.

Criteria to determine if a program is "better" (all are important):
- How readable it is
- How much space it takes
- Real-world execution time (although this is system dependent)
- Big-O

*The benchmark technique computes the actual time to execute. It does not really provide us with a useful measurement, because it is dependent on a particular machine, program, time of day, compiler, and programming language. Instead, we would like to have a characterization that is independent of the program or computer being used. This measure would then be useful for judging the algorithm alone and could be used to compare algorithms across implementations.*

**Compare using Big-O instead of time since there are different system configurations**

In [103]:
# One program to sum numbers using iteration
def sumOfN(n):
    theSum = 0
    for i in range(1, n + 1):
        # print(i)
        theSum = theSum + i

    return theSum


print(sumOfN(10000000))

50000005000000


<IPython.core.display.Javascript object>

In [104]:
# A different program to sum numbers without using iteration - compare execution time with iterative approach
# This is a clever way, using multiplication


def sumOfN3(n):
    # For n=5, it's doing (5*6)/2 = 30/2 = 15
    return (n * (n + 1)) / 2


print(sumOfN3(10000000))

50000005000000.0


<IPython.core.display.Javascript object>

## Big-O Notation


...it is important to quantify the number of operations or steps that the algorithm will require. If each of these steps is considered to be a basic unit of computation, then the execution time for an algorithm can be expressed as the number of steps required to solve the problem...
A good basic unit of computation for comparing the summation algorithms shown earlier might be to count the number of assignment statements performed to compute the sum.

We only need to consider the dominant term (the O = order of magnitude) when considering running time.


**Common functions for Big-O**
<br>
In order of increasing run time


| f(n) | Name |
|------|------|
|1 | Constant |
| log𝑛 | Logarithmic |
| 𝑛 | Linear |
| 𝑛log𝑛 | Log Linear |
| $𝑛^2$ | Quadratic |
| $𝑛^3$ | Cubic |
| 2𝑛 | Exponential |

In [107]:
# Example code where T(n) = 3 + 3n^2 + 2n + 1
# Match the term in the equation to the steps in the program

# This n is meaningless for the exercise (it's just so the code doesn't throw an error)
n = 10

# Three assignment steps
a = 5
b = 6
c = 10

# The two loops form the 3n^2 term
for i in range(n):
    for j in range(n):
        x = i * i
        y = j * j
        z = i * j

# This loop forms the 2n term
for k in range(n):
    w = a * k + 45
    v = b * b

# One assignment step
d = 33

<IPython.core.display.Javascript object>

*The number of assignment operations is the sum of four terms. The first term is the constant 3, representing the three assignment statements at the start of the fragment. The second term is 3𝑛2, since there are three statements that are performed 𝑛2 times due to the nested iteration. The third term is 2𝑛, two statements iterated n times. Finally, the fourth term is the constant 1, representing the final assignment statement. This gives us 𝑇(𝑛)=3+3𝑛2+2𝑛+1=3𝑛2+2𝑛+4. By looking at the exponents, we can easily see that the 𝑛2 term will be dominant and therefore this fragment of code is 𝑂(𝑛2). Note that all of the other terms as well as the coefficient on the dominant term can be ignored as n grows larger.*

**Write two Python functions to find the minimum number in a list. The first function should compare each number to every other number on the list. 𝑂($𝑛^2$). The second function should be linear 𝑂(𝑛).** 

In [119]:
def find_min_number1(arr):
    """
    Nested loop - O(n^2). For each pair combination, find the 
    number that has the most instances of it being less 
    than another number. Keep that in a dictionary.
    """

    arr_dict = dict()
    for i in arr:
        counter = 0

        for j in arr:
            if i < j:
                counter += 1

        arr_dict[i] = counter

    # Find the key with the highest value
    k_min = 0
    for k in arr_dict:
        if arr_dict[k] > k_min:
            k_min = k

    return k_min

<IPython.core.display.Javascript object>

**Note: runestone video has a different solution**

In [122]:
my_arr = [2, 5, 3, 1]
find_min_number1(my_arr)

1

<IPython.core.display.Javascript object>

In [121]:
def find_min_number2(arr):
    """
    One - O(n). For each element, save a minimum value.
    """

    min_val = arr[0]  # Initialize the value
    for i in arr:
        if i < min_val:
            min_val = i

    return min_val

<IPython.core.display.Javascript object>

In [123]:
my_arr = [2, 5, 3, 1]
find_min_number2(my_arr)

1

<IPython.core.display.Javascript object>

In [None]:
# two loops but still linear - O(n)
test = 0
for i in range(n):
    test = test + 1

for j in range(n):
    test = test - 1

In [None]:
# the division cuts down the size - O(log n)
i = n
while i > 0:
    k = 2 + 2
    i = i // 2

## An Anagram Detection Example

*A good example problem for showing algorithms with different orders of magnitude is the classic anagram detection problem for strings. One string is an anagram of another if the second is simply a rearrangement of the first. For example, 'heart' and 'earth' are anagrams. The strings 'python' and 'typhon' are anagrams as well. For the sake of simplicity, we will assume that the two strings in question are of equal length and that they are made up of symbols from the set of 26 lowercase alphabetic characters. Our goal is to write a boolean function that will take two strings and return whether they are anagrams.*



**3.4.1. Solution 1: Checking Off**

*Our first solution to the anagram problem will check the lengths of the strings and then to see that each character in the first string actually occurs in the second. If it is possible to “checkoff” each character, then the two strings must be anagrams. Checking off a character will be accomplished by replacing it with the special Python value None. However, since strings in Python are immutable, the first step in the process will be to convert the second string to a list. Each character from the first string can be checked against the characters in the list and if found, checked off by replacement*

In [5]:
def check_off_1(word1, word2):
    # check the lengths of the strings
    if len(word1)==len(word2):
        print('words are same length')
        word2_as_list = list(word2)
        
        # see that each character in the first string actually occurs in the second
        pos_word1 = 0
        while pos_word1 < len(word1):
            pos_word2 = 0
            print('before: ', word1[pos_word1], word2_as_list[pos_word2])
            while pos_word2 < len(word2_as_list):
                if word1[pos_word1]==word2_as_list[pos_word2]:
                    word2_as_list[pos_word2] = None
                    print('letter is checked off')
                else:
                    pos_word2 += 1
                    #print('not an anagram')
            #print('after: ', word1[pos_word1], word2[pos_word2])
            pos_word1 += 1
    else:
        print('words are not anagrams')

    
    return (word1, word2, word2_as_list)

In [6]:
word1, word2, word2_as_list = check_off_1('heart', 'earth')

words are same length
before:  h e
letter is checked off
before:  e e
letter is checked off
before:  a None
letter is checked off
before:  r None
letter is checked off
before:  t None
letter is checked off


**This is an O(n^2) algorithm**

**3.4.2 Solution 2: Sort and Compare**
    
Another solution to the anagram problem will make use of the fact that even though `s1` and `s2` are different, they are anagrams only if they consist of exactly the same characters. So, if we begin by sorting each string alphabetically, from a to z, we will end up with the same string if the original two strings are anagrams. ActiveCode 2 shows this solution. Again, in Python we can use the built-in `sort` method on lists by simply converting each string to a list at the start.

In [38]:
def sort_and_compare(s1, s2):
    '''
    Use of sort and compare method for anagram detection
    '''
    
    # .sort is not a method for a string! convert to a list
    s1_list = list(s1)
    s2_list = list(s2)
    
    # .sort() works in place!
    s1_list.sort()
    s2_list.sort()
    
    pos=0
    matches=True
    
    while pos < len(s1_list) and matches:
        print(pos)
        if s1_list[pos]==s2_list[pos]:
            pos += 1
        else:
            matches=False
        
    return matches

In [40]:
sort_and_compare('sdfa', 'afdd')

0
1
2


False

**Big-O assessment**

On the surface, it looks like an O(n) algorithm but the .sort method has its own cost. It is closer to O(logn).

**Below are notes I made for testing**

In [35]:
s1 = 'sdfa'
# .sort is not a method for a string! convert to a list
s1_list = list(s1)
s1_list

['s', 'd', 'f', 'a']

In [36]:
# .sort() works in place!
s1_list.sort()
s1_list

['a', 'd', 'f', 's']

**3.4.2 Solution 3: Brute force**

Generate a list of all possible strings using the characters from `s1` and then see if `s2` occurs. But suppose the following:
```
s1 = 'blah'
s2 = 'whatsapp'
```

This means you have to generate the following combinations for `s1`:
```
- blah
- balh
- bhla
- etc.
```

In other words, there's 4 possibilities for the first position, 3 possibilities for the second, 2 possibilities for the third and the last position will be the remaining letter, meaning there are `n!` possibilities. As `n` gets large, it would actually exceed `2^n`.



**This would take forever. It might be good as a first attempt on a coding interview but not an optimal solution.**
    


## Performance of Python Data Structures

## Lists


Interesting performance comparisons between growing a list by (shown in order of decreasing run time, concatenation is much, much higher than the others):
    - concatenation 
    - append
    - list comprehension
    - range
https://runestone.academy/runestone/books/published/pythonds/AlgorithmAnalysis/Lists.html


O(1) operations

- index
- index assignment
- append
- pop


O(n) operations

- pop(i)
- insert(i, item)
- del
- iteration
- contains (in)
- del slice

O(k) operations

- get slice [x:y]
- concatenate


O(n log n)

- sort

O(nk) operations

- multiply

## Dictionaries

The thing that is most important to notice right now is that the get item and set item operations on a dictionary are 𝑂(1). Another important dictionary operation is the contains operation. Checking to see whether a key is in the dictionary or not is also 𝑂(1).


O(1) operations

- get item
- set item
- delete item
- contains (in)

O(n) operations

- copy
- iteration

![3.7_fig4.png](attachment:3.7_fig4.png)

## Summary

- Algorithm analysis is an implementation-independent way of measuring an algorithm.
- Big-O notation allows algorithms to be classified by their dominant process with respect to the size of the problem.

## Key terms

- average case
- Big-O notation
- brute force
- checking off
- exponential
- linear
- log linear
- logarithmic
- order of magnitude
- quadratic
- time complexity
- worst case

## Discussion questions

## Programming exercises