# Recap of Lecture 08: pandas

```python
import pandas as pd # great package for tabular data!
```
**Methods & functions:**
* Reading & writing data `read_csv(), .to_csv()` 
* Exploring data set `.head(), .tail(), .describe()`
* Modifying and copying data set `.drop(), .sort_values(), .copy()`
* Check for missing values `.isna(), .notna()`

**Attributes** of data frames: `.index, .columns, .dtypes`

**Indexing:** `[]`, `.loc[]`

In [1]:
# importing packages for today - good practice: 
# import them AT THE BEGINNING of your notebook
import random
import time

# Lecture 09

* Recursive functions
* Binary search algorithm 
    * How to put a stop watch within your code (easily)
    * How to make your own modules
* managing environments with conda

# Recursive functions

In [2]:
def magic_function(n):
    if n <= 1:
        return 1
    else:
        return magic_function( n-1 ) * n

In [7]:
# what the hell?
magic_function(5)

120

> show on blackboard

In [None]:
def magic_function_with_statements(n):
    print(f"calling function, n is {n}")
    if n <= 1:
        print("returning 1")
        return 1
    else:
        print(f"will now compute magic_function_with_statements({n-1})*{n}")
        return magic_function_with_statements(n-1) * n
magic_function_with_statements(5)

## **0, 1**, 1, 2, 3, 5, 8, 13, 21, 34, ... ?

Can you come up with a rule for finding the next number?

The same rule applies for all numbers, except for the first 2 (0 and 1)

> shown in class

We wrote our own module for this!

In [18]:
# import my own module (just a python file called "fibo.py" 
# in the same location as the notebook; 
# which contains the function definition def fib(...))
# import a function from that module: "from fibo import fib"
import fibo
fibo.fib(10)

55

In [20]:
for i in range(11):
    print(f"fib {i} is: {fib(i)}")

fib 0 is: 0
fib 1 is: 1
fib 2 is: 1
fib 3 is: 2
fib 4 is: 3
fib 5 is: 5
fib 6 is: 8
fib 7 is: 13
fib 8 is: 21
fib 9 is: 34
fib 10 is: 55


In [23]:
# try for different values of n:
fib(40)

102334155

<p style="text-align:left;">
    <img src="images/fibo.png" alt="The Fibonacci sequence" width=1000px>
</p>

[image credit & more on fibonacci](https://tecadmin.net/what-is-fibonacci-sequence/)

In [None]:
# Recursive function that returns the n-th number
# from the fibonacci sequence: 
# (this is the code from the fibo.py script)
def fib(n):
    # base case:
    if n == 1:
        return 1
    elif n == 0:
        return 0
    else:
        return fib(n-1) + fib(n-2)

In [None]:
# One problem with recursive functions: they can easily run into memory issues
# "Memoization" >> see Exercise 09!

***
# Binary search

> cf. slides: 09a_binarysearch

# Break until 11:05

***

# Takeaway 1: searching a sorted list is much faster!

# Takeaway 2: binary search is much faster than checking every item!

But **how much** faster?

```python
import time # import the time module
time.time() # gives you the time RIGHT NOW in seconds
# time.time_ns() # if you want to be VERY precise
```

**fun fact:** When did [(Unix) time](https://en.wikipedia.org/wiki/Unix_time) start? January 1st, 1970!

In [27]:
# we imported the time module at the beginning of the notebook
# use the time.time() function twice
# time1 = time.time()
time2 = time.time()
# compute the difference
print("Time that passed in seconds: ", time2 - time1)

Time that passed in seconds:  16.52408790588379


In [42]:
# time.time() gives you seconds, as float; to avoid float imprecisions:
# time.time_ns() gives you nanoseconds, as integer
time1 = time.time_ns()
time2 = time.time_ns()
# compute the difference
print("Time that passed in NANOseconds: ", time2 - time1)

Time that passed in NANOseconds:  25000


In [36]:
# import my own module (just a python file in the same location; 
# which contains the function definition def binary_search(...))
import my_binary_search
?my_binary_search.binary_search

In [37]:
# generate a long list of random numbers
many_numbers = random.choices(range(10**9), k=10**7)
# sort the list
many_numbers.sort()
# show the first couple of items on the list
many_numbers[0:10]

[61, 140, 308, 601, 625, 691, 857, 891, 966, 984]

In [38]:
# Iterative search
my_number = 12345
t_start = time.time()
for item in many_numbers:
    if item==my_number:
        print("Found it!")
        break
t_finish = time.time()
t_iterative = t_finish - t_start
print("Iterative search: ", t_iterative, "seconds")
print(f"In nanoseconds: {t_iterative*10**9}")

Iterative search:  1.3899939060211182 seconds
In nanoseconds: 1389993906.0211182


In [39]:
t_start = time.time()
my_number in many_numbers
t_finish = time.time()
t_keyword = t_finish - t_start
print("Keyword search:", t_keyword, "seconds")
print(f"In nanoseconds: {t_keyword*10**9}")

Keyword search: 1.4055328369140625 seconds
In nanoseconds: 1405532836.9140625


In [43]:
t_start = time.time()
my_binary_search.binary_search(my_number, many_numbers)
t_finish = time.time()
t_binary = t_finish - t_start
print("Binary search: ", t_binary, "seconds")
print(f"In nanoseconds: {t_binary*10**9}")

Binary search:  5.888938903808594e-05 seconds
In nanoseconds: 58889.38903808594


In [46]:
# compare the execution times
# t_iterative / t_keyword # almost equally long
# t_keyword / t_binary # 18.000 times faster
t_iterative / t_binary # almost 20.000 times faster

23603.46963562753

# Cell magic in Jupyter notebook

[read more here](https://ipython.readthedocs.io/en/stable/interactive/magics.html#)

(You already know the line magic `%run` for running python files)

Cell magic: needs to go in very first line of cell;

`%%time` gives you time for entire cell

`%%timeit` gives you time for entire cell, averaged over several runs

In [47]:
%%time
for item in many_numbers:
    if item==my_number:
        print("Found it!")
        break

CPU times: user 1.46 s, sys: 9.92 ms, total: 1.47 s
Wall time: 1.47 s


In [48]:
%%timeit
my_number in many_numbers

1.39 s ± 8.64 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [49]:
%%timeit
my_binary_search.binary_search(my_number, many_numbers)

2.18 µs ± 24.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)


# Takeaways:
* **Writing your own modules:** just put everything in a .py file and import it `import filename` (without the `.py`)
* Binary search algorithm: simple, FAST, and you can implement it yourself!
* Timing code, option 1: from the `time` module, `time.time()` (CURRENT TIME in seconds before & after code)
* Timing code, option 2: `%%time` / `%%timeit` (times entire cell), just for Jupyter notebook

***
# Conda

> cf. slides 09b_environments + CLI commands shown in class