# Lecture 3 - *is_prime()* , debugging, benchmarking and counting

## `is_prime` - continued from last week
We can achieve faster processing times by not processing what we don't have to.

* 1 should be always skipped because all numbers are divisible by 1
* the number itself should be skipped because all numbers are divisible by itself
* all numbers that are greater than half of the value should be skipped
* early termination, if a number is divisible by 3, there's no need to check against following numbers, this will not be faster to find if a number prime or not instead it will be quicker to eliminate if a number is prime or not

Let's use the latest function we had written in previous notebook and name it as `is_prime_v1()`

In [1]:
def is_prime_v1(left):  
    right_numbers = range(2, left - 1)
    output = 0
    for right in right_numbers:
        result = left % right

        if result is 0:         
            output += 1
            
    return output is 0

In [2]:
is_prime_v1(15485863)

True

Let's start improving the function

In [3]:
def is_prime_v2(left):
    left_half  = left // 2 # <-- changed, integer division

    right_numbers = range(2, left_half) # <-- changed, test divisibility up to half of the numbers
    output = 0
    
    for right in right_numbers:
        result = left % right

        if result is 0:         
            output += 1
            
    return output is 0


In [4]:
is_prime_v2(15485863)

True

Let's do one more improvement. We can skip even numbers while generating the numbers. The `range()` function accepts `step` argument as third argument.

In [5]:
list(range(3,10,2))

[3, 5, 7, 9]

In [6]:
def is_prime_v3(left):
    left_half  = left // 2

    right_numbers = range(3, left_half, 2) # <-- changed, generate odd numbers only (minor problem about 2)
    output = 0
    
    for right in right_numbers:
        result = left % right

        if result is 0:         
            output += 1
            
    return output is 0

In [7]:
%time is_prime_v3(15485863)

CPU times: user 314 ms, sys: 0 ns, total: 314 ms
Wall time: 313 ms


True

In [8]:
is_prime_v3(15485863)

True

In [9]:
def is_prime_v3_1(left):
    left_half  = left // 2

    right_numbers = range(3, left_half, 2) 
    output = 0
    
    for right in right_numbers:
        result = left % right

        if result is 0:         
            output += 1
        
        if output > 0:  #
            break       # <-- changed, if output is above 0, then stop the for loop
            
    return output is 0

In [10]:
%time is_prime_v3_1(15485869)

CPU times: user 7 µs, sys: 1 µs, total: 8 µs
Wall time: 9.78 µs


False

In [11]:
def is_prime_v3_2(left):
    left_half  = left // 2

    right_numbers = range(3, left_half, 2) 
    output = 0
    
    for right in right_numbers:
        result = left % right

        if result is 0:         
            output += 1
        
        if output > 0:  
            return False       # <-- changed, if output is above 0, then 
                               # return False and ignore rest of the steps in function
    return output is 0

In [12]:
%time is_prime_v3_2(15485869)

CPU times: user 10 µs, sys: 0 ns, total: 10 µs
Wall time: 14.5 µs


False

In [13]:
is_prime_v3_1(1)

True

How can you fix the minor problem about 2? Let's use similar approach as in early termination. If the number to be checked is 2, then function should return `True` and exit.

In [14]:
def is_prime_v4(left):
    if left is 2:           # <-- changed, if number to be checked is 2 
        return True         # then return True and ignore rest of the steps in function
    left_half  = left // 2

    right_numbers = range(3, left_half, 2) 
    output = 0
    
    for right in right_numbers:
        result = left % right

        if result is 0:         
            output += 1
        
        if output > 0:  
            return False       
                               
    return output is 0

> Adding one more `if` statement for gaining speed might be used with caution. Because, adding one more `if` into the function will slow down the function overall because now the interpreter needs to check the result of that `if` statement. For long loops that will introduce a huge penalty.

## Benchmarking

We already used `%time` magic to calculate how long a statement takes to run. If a function is running too quick, i.e nanoseconds (`ns`) scale, `%time` won't help us measure run time. Thus, there's another magic command, `%timeit`, which runs a statement 100-1000 times and then provides the average runtime, along with other info. 

In [15]:
%timeit is_prime_v1(511)

36.2 µs ± 684 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)


In [16]:
%timeit is_prime_v2(511)

16.3 µs ± 70.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [17]:
%timeit is_prime_v3(511)

8.68 µs ± 43.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


In [18]:
%timeit is_prime_v4(511)

792 ns ± 13 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)


## Debugging

This is an essential and inevitable part of coding. When we make syntax errors, the interpreter will tell us line number and type of error. But, apart from those mistakes, we can make logical mistakes which will not generate error message, they will generate an unexpected result. Even worse, they might fail silently without any indicator.

In those cases, we have a "bug" in our code and we have to debug it.  

There are couple ways to debug code. A simple approach would be adding `print()` statements into the code to check if the variable has the expected value at certain step.

Another way would be using "variable inspector" which will print about current status at each step of the code. For that purpose http://pythontutor.com can be used. We'll be using that site to debug a function.

## Counting

When there's a loop, there's usually a counter. A counter is usually used to direct the flow of program. For example, "when a certain number is reached exit the loop" is achievable by a counter.

In many languages you'll see such lines:

```
counter++
```

Which is the short form of `counter = counter + 1`. Python does not support that notation and short form of increment operation is `counter += 1`

Let's count how many prime numbers are found smaller than 20:

In [19]:
counter = 0
for i in range(2,20):
    if is_prime_v4(i):
        counter+=1
print(counter)

14


That's way too many than expected. Let's see what is going on. Let's see what the function says for each number up to 20.

In [20]:
for i in range(2,20):
    print(i,is_prime_v4(i))

2 True
3 True
4 True
5 True
6 True
7 True
8 True
9 False
10 True
11 True
12 False
13 True
14 True
15 False
16 True
17 True
18 False
19 True


In [21]:
is_prime_v1(4)

False

It is time for debugging. Please head over to http://pythontutor.com and debug the `is_prime_v4()` function.

## Exercise

1. Twin primes are [defined](https://en.wikipedia.org/wiki/Twin_prime) as "a prime number that is either 2 less or 2 more than another prime number". Please write down a code which will return twin primes up to 1000. Please count the pairs of primes and check online if you get the correct count.