# What is an Algorithm anyway?

An algorithm is an explicit, precise, unambiguous, mechanically-executable
sequence of elementary instructions, usually intended to accomplish a specific purpose

Source: https://jeffe.cs.illinois.edu/teaching/algorithms/

## Al-gebra and al-gorithm and al-gorists (not Al-Gorists)

![Musa-al-Kharismi](https://upload.wikimedia.org/wikipedia/commons/a/a6/Khwarizmi_Amirkabir_University_of_Technology.png)

https://en.wikipedia.org/wiki/Muhammad_ibn_Musa_al-Khwarizmi

Muhhamad ibn Musa al-Khwarizmi was a Persian mathematician, astronomer, geographer, and scholar in the House of Wisdom in Baghdad, whose name means 'the native of Khwarazm', a region that was part of Greater Iran and is now in Uzbekistan. He lived in Baghdad, working under the patronage of the Abbasid Caliphate.

He is considered the father of algebra, a name that comes from the title of his book, Kitab al-Jabr. His pioneering work offered practical solutions to mathematical problems in areas such as algebra, trigonometry, and Indian and Greek arithmetic.

The years of his activity are not entirely certain, but it is known that he wrote his Al-Jabr wa-l-Muqabala (algebra) in 820, and his Al-Zhir wa-l-Takhtit (astronomy) in 830, and he died around 850.

## Decent Algorithm

BottlesOfBeer(n):

```
For i ← n down to 1

  Sing “i bottles of beer on the wall, i bottles of beer, ”

  Sing “ Take one down, pass it around, i − 1 bottles of beer on the wall. ”

  Sing “ No bottles of beer on the wall, no bottles of beer, ”

  Sing “ Go to the store, buy some more, n bottles of beer on the wall. ”
```

Note not all the constraints are specified, but it is a decent algorithm.


## Not a real algorithm 

BeAMillionaireAndNeverPayTaxes():


```

Get a million dollars. 

If the tax man comes to your door and says, “ You have never paid taxes! ”

Say “ I forgot.
```




### How about  Get a million dollars algorithm?
1. Collect underpants
2. ?
3. Profit

Source: https://en.wikipedia.org/wiki/Gnomes_(South_Park)

Still not a real algorithm.


## Describing Algorithms

The skills required to effectively design and analyze algorithms are entangled with the skills required to effectively describe algorithms. A complete description of any algorithm has four components:

 *  **What:** A precise specification of the problem that the algorithm solves.
 *  **How:** A precise description of the algorithm itself.
 * **Why:** A proof that the algorithm solves the problem it is supposed to solve.
 * **How fast:** An analysis of the running time of the algorithm.

## Heuristic - not an algorithm

### See lion - run!

Above is a rule of thumb. It is not an algorithm. It is a heuristic.

Heuristic is a technique designed for solving a problem more quickly when classic methods are too slow, or for finding an approximate solution when classic methods fail to find any exact solution.

Heuristic is not an algorithm. Heuristic is a technique which is used as an aid in the process of finding a solution to a problem. It could be a mental shortcut, an educated guess, or a rule of thumb. Heuristics as a noun is another name for heuristic techniques.

# How to find a the largest number in an unsorted list(array)?
#

In [1]:
mylist = [1,4,67,2,7,9000,2] # most languages called an array (not a linked list!) also not as optimal as C arrays because of overhead
print(max(mylist)) # built in max function in Python

9000


In [2]:
# lets implement our naive max algorithm
# or as they call it Brute Force
def find_my_max(seq):
    my_max = None # alternatives start with first value or max negative 
    for n in seq:
        if my_max == None or n > my_max:
            my_max = n
    return my_max
    # set max to negative infinitty
    # loop through seq
    # if any number is larger than max we set max to this number
    # return max

In [3]:
print(find_my_max(mylist))

9000


In [4]:
# before proving things a good idea is to run a suite of tests
print(find_my_max([-4,23,-6,2,0,0,5,99953252523525252,-3523525253]))

99953252523525252


In [5]:
import random # random provides umm random functions in Python
random.seed(42) # answer to the Universe
# note randint is inclusive on both ends (rare for a function / method )
big_list = [random.randint(1,100_000_000) for _ in range(1_000_000)]  # list comprehension, _ signifies that iterator is not important
len(big_list) # so 1 million items of random numbers from 1 to 100million
# btw random is not truly random in most architectures it is pseudo-random

1000000

In [6]:
big_list[:5] # first 5

[85822413, 14942604, 3356887, 99529224, 36913811]

In [7]:
big_list[-5:] # last 5

[4235542, 37491616, 60085083, 52300740, 44742665]

In [8]:
max(big_list)

99999938

In [9]:
find_my_max(big_list)

99999938

## Checking the speed/time of an algorithm

In [10]:
%%timeit
max(mylist)

222 ns ± 8.38 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


In [11]:
%%timeit
find_my_max(mylist)

549 ns ± 33.6 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


In [12]:
print(f"Lenght of mylist is {len(mylist)}")

Lenght of mylist is 7


In [13]:
%%timeit
max(big_list) # most likely this is already cached so running it 1000 times will not prove nothing

15.4 ms ± 704 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [14]:
%%timeit
find_my_max(big_list)

49.7 ms ± 1.53 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [15]:
huge_list = [random.randint(1,1_000_000_000) for _ in range(10_000_000)] # lists are not as efficient as c arrays


In [16]:
len(huge_list)

10000000

In [17]:
%%timeit
max(huge_list)

149 ms ± 4.07 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [18]:
%%timeit
find_my_max(huge_list)

502 ms ± 17.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [19]:
%%timeit
find_my_max(huge_list)

497 ms ± 15.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


## Can we find max(or min) in a faster way in a list?

In [21]:
sorted_big_list = sorted(big_list)  # sorted has its own complexity (about that later)
print(f"We have {len(sorted_big_list)} items in our list")
sorted_big_list[:5]


We have 1000000 items in our list


[102, 187, 476, 615, 684]

In [22]:
%%timeit
max(sorted_big_list)

96.2 ms ± 4.21 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [23]:
%%timeit
min(sorted_big_list)

105 ms ± 12.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [24]:
%%timeit
find_my_max(sorted_big_list)

230 ms ± 10.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [25]:
sorted_big_list[0], sorted_big_list[-1]  # we can find min and max in O(1) CONSTANT time! assuming the list is sorted
# we could have as many items as memory allows

(102, 99999938)

In [27]:
%%timeit
## thus I could find max in sorted list in O(1) time - just get the last item
sorted_big_list[-1]

40.2 ns ± 1.63 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)


In [28]:
# How about finding some value in a list
# i make a list of booleans to see if any values are in range from 9000 until 9199
over_9000 = [n in big_list for n in range(9000,9200)]
over_9000

[False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,


In [29]:
True in over_9000

True

In [30]:
over_9000.index(True)

129

In [31]:
# how many are in range 9000 to 9199
sum(over_9000)

1

In [32]:
9128 in big_list, 9129 in big_list

(False, True)

In [33]:
%%timeit
9129 in big_list # so this check took a while

10.4 ms ± 314 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [34]:
%%timeit
9129 in sorted_big_list

921 ns ± 21.4 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)


In [35]:
# let's check indexes of both lists
print("Index of 9129 in unsorted list", big_list.index(9129))
print("Index of 9129 in sorted list", sorted_big_list.index(9129))

Index of 9129 in unsorted list 891828
Index of 9129 in sorted list 94


In [37]:
# why is finding 9129 so much faster in sorted list in this case?
big_list.index(9129),sorted_big_list.index(9129)

(891828, 94)

In [38]:
sorted_big_list[-5:] # last five values

[99999266, 99999353, 99999851, 99999855, 99999938]

In [39]:
%%timeit
99999938 in sorted_big_list

68 ms ± 9.08 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [40]:
%%timeit
99999938 in big_list

12.7 ms ± 175 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [43]:
# so in this case finding something in unsorted list was faster
# because it was found earlier, naturally when it was sorted it was at the very end
big_list.index(99999938), sorted_big_list.index(99999938)

(544926, 999999)

In [None]:
# Can we do better for membership testing in sorted list?
# in an unsorted list we have to spend O(n) time, meaning the more items we have the longer we search

In [44]:
# this is the best way if you need to do a lot of memerbership testing
big_set = set(big_list) # another data structure based on hashmap

In [45]:
%%timeit
99999983 in big_set


53.5 ns ± 0.96 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In [46]:
%%timeit
-14124215151 in big_set

57.6 ns ± 1.84 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In [None]:
# Key takeaway - choosing a right data structure can be crucial
# in set is O(1) lookup
# in list is O(n) - on average, sometimes the value is found quickly sometimes you have to go through whole list

In [None]:
%%timeit
9000 in big_list

20.8 ms ± 782 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [None]:
%%timeit
9000 in big_set

90.3 ns ± 10.2 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)


In [None]:
%%timeit
count = 100_000
nums = [] # lets image we do not know about list comprehension
for n in range(count):
    nums.append(n)
nums.reverse() # in place number reversal
# of course list(range(count))
# nums[:10]

11.2 ms ± 86.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [None]:
# can we try to do better ? why reverse if we could already make reverse immediate....

In [None]:
%%timeit
count = 100_000
nums2 = []
for n in range(count):
    nums2.insert(0, n)
    

3.78 s ± 9.68 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [None]:
nums[:5],nums2[:5]

([999999, 999998, 999997, 999996, 999995], [99999, 99998, 99997, 99996, 99995])

In [None]:
# if we did not have set we could use binary search in a sorted list

In [None]:
# pseudo code for binary search
# check if needle is bigger or lesser than middle of sorted collection
# check the appropriate half

In [47]:
import math

In [48]:
math.log2(256), math.log2(1024)

(8.0, 10.0)

In [49]:
10**100 # Googol https://en.wikipedia.org/wiki/Googol

10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

In [50]:
math.log2(1_000_000),math.log2(1_000_000_000),math.log2(10**100) # 1 with 100 zeroes

(19.931568569324174, 29.897352853986263, 332.19280948873626)

In [55]:
def is_needle_in_seq(seq, needle, debug=False):
    """
    seq should be sorted and indexable!!!
    needle can be or not be in the seq
    """
    start = 0
    end = len(seq)-1 # len can hide linear complexity in some structures... not here
    
    # now we will make a while loop
    while start <= end:
        mid = (end+start) // 2  # // means no reminder , integer division
        if needle == seq[mid]:
            if debug:
                print("Found it at index", mid)
            return True
        elif needle > seq[mid]:
            start = mid + 1
        elif needle < seq[mid]: # else would work here just as well
            end = mid - 1
    if debug:
        print("Sorry did  not find your needle", needle)
    return False

In [52]:
is_needle_in_seq(sorted_big_list, 9129)

Found it at index 94


True

In [53]:
9024 in sorted_big_list

False

In [56]:
%%timeit
is_needle_in_seq(sorted_big_list, 9129)

5.96 µs ± 65.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


# What is GCD?
## the largest number that divides two numbers evenly (no reminder)
## 12 8 will have GCD of 4, and 21, 18, will have GCD of 3

# one algorithm to solve would be this
* divide numbers in prime factors
* GCD will be the common prime factors in both numbers multiplied

In [None]:
# GCD or 30 and 12 would be 6 because:
# 30 = 2*3*5
# 12 = 2*2*3
# so GCD is 2*3
# the only catch being that you have to find prime factors for a number
# well

In [None]:
# if we did not know about Euclid
# brute force algorithm
def naiveGCD(x, y):
    gcd = 1 # so 1 will always be a fallback GCD
    for n in range(2, min(x,y)+1): # also those off by one errors you have to watch out
        if x%n == 0 and y%n ==0:
            gcd = n # wow we got a new high gcd
    return gcd

In [None]:
naiveGCD(12,8)

4

In [None]:
naiveGCD(30,12)

6

In [None]:
%%timeit
naiveGCD(10_000_200,900_000)

78.6 ms ± 1.28 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [None]:
# This 2000 years old algorithm
# https://en.wikipedia.org/wiki/Euclidean_algorithm

In [None]:
def gcd(x, y):
    while(y):
        x, y = y, x % y # calculate the reminder and then swap the values, in other languages you'd use a temp variable
    return x

In [None]:
gcd(10_000_200,900_000)

600

In [None]:
%%timeit
gcd(10_000_200,900_000)

581 ns ± 10 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


In [None]:
gcd(12, 8)

4

In [None]:
gcd(21,18)

3

In [None]:
gcd(100, 93)

1

In [None]:
gcd(10, 8)

2

In [None]:
gcd(24,12)

12