# What is an Algorithm?

At its most basic level, an algorithm is a step-by-step procedure for solving a problem or specifying tasks to be done. Although you can theoretically use an algorithm for almost any application, it is commonly used in computer science and mathematics. 

The following picture is a basic example of an algorithm: 

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/9/91/LampFlowchart.svg/1200px-LampFlowchart.svg.png" width="300" height="450" >

Algorithms are the cornerstone of the programming world, as every program revolves around the use of algorithms. Furthermore, algorithms are created independently from underlying languages and can therefore be implemented in every programming language. The following is a basic outline of how to create an algorithm for computer science:
1. Understand the problem
2. Think of different ways to solve it, and pick what you believe to be the most efficient way
3. Map it out, using psuedo-code, a flowchart, or something similar
4. Implement the solution, i.e. translate your method into the actual coding language
5. Test and debug your code

## Practical example

This is Bob. Bob has a walnut tree in his garden. Bob loves walnuts. He knows, that the his tree has already let fallen some nuts on the ground. While some of them are already rotten, the others are ready for pickung up and consuming. Bob knows, that over the next week, more and more walnuts will fall of the tree on the ground. Bob no wants to solve the problem of collecting the wallnuts. Since he's a computer scientist, he tackels the problem as a proper computerscientist would do it, for any problem:

1. Bob tries to understand the problem:
Bob wants to collect the walnuts. Some are still up on the tree getting ripe, the other are lying on the ground in de moisty gras: fresh an rotten ones. In addtion, the nuts are hard to spot in the gras and to distiguish between fresh and rotten, Bob has to pick them up to get a closer look. Bob also has a bucket for collecting. Of course Bob is only intersted in the non-rotten nuts.
2. Bob thinks about different ways to solve the problem of collectin nuts:
Theoraticly 

## Algorithm Implementation

Today, we want to focus on the fourth step of algorithm creation: implementation. In this step, we want to translate our method into our coding method of choice, and ensure that it is written in an elegant and an optimal way. In other words, we want to make sure that the program is compact and fast. Oftentimes, these goals are aligned, but sometimes they are not. These ideas are talked about in terms of time complexity and space complexity. These are defined as follows:
- Time complexity: the amount of time it takes to run an algorithm as a function of the amount of input
- Space complexity: the amount of space/memory taken by an algorithm as a function of the amount of input

In this tutorial, we will primarily discuss time. Although time complexity is extremely difficult to measure, in Python we can get a rough idea of the time demands of an algorithm by using the time library. The time library can measure many different types of time, but we need to focus on wall-clock time and CPU time, i.e. natural and processing time. Wall-clock time is the time we, as humans, are familiar with. In other words, this is the time a stopwatch would read if you started it at the beginning of the process and ended it exactly as it finished. CPU time is the time the computer dedicates to the process. As computers are running multiple processes at a time, not 100% of the wall-clock time will be dedicated to the specific process you are measuring. CPU time avoids this issue, and allows a more comparable measure.

In [None]:
#Presenting a problem 

In [None]:
# Make three algorithms

In [None]:
# Compare speed and talk about "jit"

In [None]:
#Talk about the payoff between spending time optimizing the code and the actual benefits of more efficient code
# Make another parallel to the nut problem. 

In [None]:
#Talk about stackoverflow with a new problem to solve that is more suited to be a subject of stackoverlow.

In [2]:
#Here we create an list that will then be sorted by our algorithms
import numpy as np 
import random as rdm


N = 10   #How many numbers that should be sorted
a = 1        #Lower bound for the generated numbers
b = N*5      #Upper bound for the genereated numbers

list1 = []
list1.extend(rdm.sample(range(int(a), int(b)), int(N)))
sortedlist = sorted(list1) #this will also sort everything hehe
print(list1)

#Now we are going to sort the list again, but this time we will be using algortihms. 
    


[11, 16, 27, 48, 13, 1, 46, 40, 41, 8]


In [None]:
#Algorithm 1: A so called insertion sort
def isort(the_list):
    for i in range(1, len(the_list)):
        value = the_list[i]
        spot = i 
        while spot > 0 and the_list[spot-1] > value:
            the_list[spot] = the_list[spot-1]
        the_list[spot] = value
the_list = list1
isort(the_list)
print(the_list)

In [None]:
#Algorithm 2

In [None]:
#Algoritm 3    SAM YOU DO THIS ONE

<div class="alert alert-block alert-info">
# Summary
<li> Do not trust Python if you to not exactely know, what the code is doing. It might do something else, like filling an empty vector with weird numbers.
<li> Measuring performance can be implemented by checking the time a code uses to run. For example with the commands time.time(), time.perf_counter or time.process_time. Better compare CPU time than wallclock time by checking the performance of a code.
<li> Even though a code is faster with Python's functions, a programmer should trade of his / her own effort by the codes ones.
<li> Codes are using binary numbers which might cause an overflow for hugh calculations. 
<li> In class, there were two of Python's built-in function: "numpy" and "time". Some commands with there are explained in chapter 3. 
<li> Homework for next week contains plotting and creating fake $AR(1)$ data as well as finding the parameter $\alpha$. 
</div>

# Compute Polynomial
This exercise should show if Python is faster using certain codes instead of others. Therefore, a polynomial  using a "for loop" and one using "matrix algebra" should prove this. <br> <br>
The exercise: <br>
Write an algorithm that given a set of coefficients $\{a_{i}\}_{i=0}^{n}$ evaluates the following polynomial at any point $x$:<br>
<center> $p(x)=a_{0}x^{0}+a_{1}x^{1}+\cdots+a_{n}x^{n}$ </center>
Equivalent to:
<center>
$p(x)=
\begin{pmatrix}
a_{0} & a_{1} & a_{2} & \cdots & a_{n}
\end{pmatrix} \times 
\begin{pmatrix}
x^{0}\\
x^{1}\\
x^{2}\\
\cdots\\
x^{n}
\end{pmatrix}$
</center>
<br> <br>
In the first step, there is a short review of the code, before the focus goes on measuring performance and the discussion of the results. In an excursion, binary numbers are explained. 

In [None]:
# Loop implementation gustaf is a fittunge
def p_loop(x,coef):
    total = 0 # Keep track of the sum
    for i, a in enumerate(coef): # Generates a sequence for i and a starting with 0 and names it coef
        total = total + (a*(x**i)) # Generates the result of the polynomial
    return total

# Matrix algebra implementation
import numpy as np # Numpy allows using vectors and matrix algebra
    # Note: Python is able to distinguish between capital and small letters, so X ≠ x
def p_ma(x, coef): 
    X = np.empty(len(coef)) # Creates a vector with the same lengh as coef
    X[0] = 1 # Defines the first element of the vector as 1
    X[1:] = x # Defines every other element of the vector as x
    y = np.cumprod(X) # Result will be the cumulative product of the vector
    return coef @ y #do the matrix multiplication

<div class="alert alert-block alert-info">
<b>Empty vectors</b> in human language are filled with zeros. In Python, np.empty gives a vector with weird numbers. If not exactely defined, humans should not trust the output of the code. Better define the vector as it has to be or use np.zeros for a vector filled with zeros.
</div>

## Measure performance
Measuring performance can be implemented by checking the used time. Below, there is the code with additional information about the thoughts behind. Relevant here is, that time performance is measured in the same code for both kinds of implementation (for loop and matrix algebra). Just like this a difference due to background computer task can be eliminated. By deleting all print commands, the code does not use time and power for that. In this case, time.time is used, which registers real time at the starting point and after finishing this task. Other possible packages will be discussed in section [3. Programming codes of this lecture](#3).

<div class="alert alert-block alert-info">
time.time registers the wallclock time (also called real time or natural time). <br><br>
<b>[Wallclock time](https://wiki.scinet.utoronto.ca/wiki/index.php/Wallclock_time)</b>: This kind of time is the one people are used to. It is the actual time a computer shows the user. Depending on the computer system, this only is exact for whole seconds, but not for fractions of it. 
<br><br>
<b>[CPU time</b>](https://www.techopedia.com/definition/2858/cpu-time): This time measures the time a computer spends on running a certain task. If a computer uses CPU for another programm or process, this does not count in the time measureed in Python. 
</div>

In [None]:
import time # Package to measure time
n_coef = 10000000 # define a certain number of coefficients
sample_coef = np.random.uniform(1,50,n_coef) # Random uniform distributed coefficients between 1 and 50
eval_x = 1 # X at which polynomial is evaluated

# Measure time performance of our loop implementation
start = time.time() # registers the time of start
loop_result = p_loop(eval_x,sample_coef) # calculates the result of the loop version
end = time.time() # registers the time of end
print("Our loop implementation returns as output {} in {} seconds".format(loop_result,(end - start)))

# Measure time performance of our loop implementation
start = time.time()# registers the time of start
ma_result = p_ma(eval_x,sample_coef) # calculates the result of the matrix version
end = time.time() # registers the time of end
print("Our matrix algebra implementation returns as output {} in {} seconds".format(ma_result,(end - start)))

## Results
The result of the different types of code are remarkable. The loop version takes much longer than the matrix version. More general, Python's built-in function are that much optimated, that they usually will be faster. Time.time is not that precise. It can just be sure that the second is correct. This does not matter in this context, because the diffeence of the loop implementation and the matrix algebra are hugh. For not that clear results, other possibilities will be discussed in chapter 3.<br>
<div class="alert alert-block alert-info">
Even though there are faster and slower ways of programs, the programmer has to trade-off his / her own time on programming against the running time of the code: better make a correct code knowing exactly what the computer is doing than using a function giving wrong results. An example for wrong results was the empty vector that was filled up with random numbers.

## Digression: Binary numbers
[<b>Binary numbers</b>](https://math.tutorvista.com/number-system/binary-numbers.html) insist only of the digits 0 and 1.<br>
Calculating a binary number into a natural number starts on the right handside and moves to the left. Every 1 is replaced with a 2. Then, starting on the right, the number is powered by something. This begins with a 0 and goes up to n, while n is the position the first 1 (from the left) is in binary code. <br> <br>
Examples: <br>
$0=0 0 0 = 0^{2}+0^{1}+0^{0}=0+0+0=0$ <br>
$1=0 0 1 = 0^{2}+0^{1}+2^{0}=0+0+1=1$ <br>
$2=0 1 0 = 0^{2}+2^{1}+0^{0}=0+2+0=2$ <br>
$3=0 1 1 = 0^{2}+2^{1}+2^{0}=0+2+1=3$ <br>
$4=1 0 0 = 2^{2}+0^{1}+0^{0}=4+0+0=4$ <br>
$5=1 0 1 = 2^{2}+0^{1}+2^{0}=4+0+1=5$ <br>
$6=1 1 0 = 2^{2}+2^{1}+0^{0}=4+2+0=6$ <br>
$7=1 1 1 = 2^{2}+2^{1}+2^{0}=4+2+1=7$
<div class="alert alert-block alert-info">
[<b>Overflow</b>](https://www.allaboutcircuits.com/textbook/digital/chpt-2/binary-overflow/) can be caused by mathematic operation, that generates hugh binary numbers. Then, the computers capacity (bits) are not sufficient for this large binary number. If this happens in Python, close the actual used jupyter notebook file. Then, activate the box of the file in the project overview and click on shutdown. After, the file should be able to be reopened.

# Functions of this lecture (numpy & time)
This chapter focus on giving a greater understanding of the functions "numpy" and "time". Numpy allows using vectors and matrix algebra. Time implements functions of measuring actual time. 

In [None]:
# NUMPY

import numpy as np # allows using numpy in this code and renames it as np

v = np.empty(3) # create an empty vector for later usage # pay attention: this vector is filled with very small numbers but not with zeros or ones
print('This is an empty vector {}'.format(v))

v[1:] = 3 # fill an empty vector # expect the first number, every other is defined as 3
print('This is an empty vector {}'.format(v))

z = np.zeros(3) # create a zero vector
print('This is a zero vector {}'.format(z))

s = np.array([1, 3, 2]) # create a specific vector
print('This is a specific vector {}'.format(s))

i = np.identity(3) # create an identity matrix
print('This is an identity matrix {}'.format(i))

# Python is able to calculate with vector and matrix
# there are two ways of using numpy. The first one is with vector.command()
s.sort() # sorts all elements of the vector
print('This is sorted vector {}'.format(s))

ss = s.sum() # sumarize all elements of the vector
print('This is sum of the vectors elements {}'.format(ss))

ma = s.max() # identify the maximal number in the vector
mi = s.min() # identify the minimal number in the vector
print('{} is the largest number and {} the smallest one of the vector'.format(ma, mi))

cs = s.cumsum() # sumarize all the elements before and creates a new vector (e.g. [1,2,3]=[1,1+2,(1+2)+3]=[1,3,6])
print('This is the result of cumsum: {}'.format(cs))

cp = s.cumprod() # creates the product of the elements before (e.g. [1,2,3]=[1,1*2,(1*2)*3]=[1,3,6])
print('This is the result of cumprod: {}'.format(cs))

# another way to write this: instead of the vector.command, you can write np.command(vector). See example below:
ss2 = np.sum(s)
print('This is sum of the vectors elements created with the alternative code {}'.format(ss2))

In [None]:
# TIME

import time
help(time)

# There are five main functions for measuring performance
time.time() # measures wallclock time and is not able to return fractional seconds
time.clock() # measures CPU time but has not been updated.
# new versions of time.clock are time.perf_counter and time.process_counter
time.monotonic() # measures wallclock time more precisly than time.time
time.perf_counter() # measures CPU time in fractional seconds
time.process_counter() # measures CPU time in fractional seconds

<div class="alert alert-block alert-info">
Measuring time performance is possible with several functions. Time.time is not preferable, because firstly, it measures wallclock time that can be manipulated and secondly, it is not able to measure fractional seconds.<br>
Whether [time.perf_counter or time.process_time](https://stackoverflow.com/questions/25785243/understanding-time-perf-counter-and-time-process-time) is more favourable depends on the exact code. Both use CPU time and measure fractional seconds. There is no clear answer, what should be used in which situation. For comparing performance, both should deliver the same results. 
</div>