# COMP 364: Advanced Python Structures

Today we briefly will covere some intermediate-to-advanced Python concepts.

Topics:

* iterators and generators
* context managers
* decorators



Knowledge of these concepts will take your Python to a more advanced level.


## Lazy lists: generators

Remember: Any data Python computes during runtime is stored as an object in **memory** (RAM).

** Not to be confused with long term storage.** 

Objects take up **space**. Just like books on a bookshelf.

In general, we want to keep the memory footprint of our programs to a minimum.

Misuse of memory can cause:

1) Program and system slowdowns
2) Full program crashes (`MemoryError`)

In today's world of "Big Data", memory usage is a growing concern.

### Problem

Typical computer memory size is a couple of GB ~4-16 GB of memory.

What happens when you need to process a data file that is 50GB large?

Generators to the rescue!

### Pre-computing vs lazy computing

Under the hood we've been dealing with generators all along but thinking of them as lists.

A **list** is a pre-computed container object because all its values exist in memory all at once.


In [None]:
# this is a list of the first 10 numbers 

nums = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

We can access each element directly as they are all stored in memory.

In [None]:
nums[6]

I can also create this list with a loop.

In [None]:
def firstn(n):
    nums = []
    count = 0
    while count < n:
        nums.append(count)
        count += 1
    return nums

In [None]:
nums = firstn(10)
print(nums)

But what if we don't need all the numbers stored all at once?

If we know the rule for computing the next number in the sequence we can just not do anything and spit out the next number when asked for it.

This is what **generator** functions do.

**generator** funcions return **generator** objects and can **yield** values during their execution.

In this case, the rule is "the next number is the previous number + 1"

In [None]:
def firstn_gen(n):
    count = 0
    while count < n:
        yield count
        count += 1

In [None]:
nums = firstn_gen(10)

In [None]:
print(nums)

Ok so we got a `generator` object instead of a list.

`generator` objects have an attribute called `__next__()` which is a function that returns the next item in the sequence.

In [None]:
nums.__next__()

If we put a geneator in a for loop, python repeatedly calls `__next__()` for you until it runs out of items.

In [None]:
nums = firstn_gen(10)
for i in nums:
    print(i)

Note: once we pulled out all the items in a geneator, we can't re-use it.

In [None]:
for i in nums:
    print(i)

Okay back to the funny `yield` statement.

In [None]:
def firstn_gen(n):
    count = 0
    print("before loop")
    while count < n:
        print("yielding number")
        yield count
        print("i'm back. incrementing count")
        count += 1

In [None]:
nums = firstn_gen(10)

`nums` is a generator and we can call its `__next__` method.

Generators behave like functions except their execution can be interrupted and re-started.

When we reach a `yield` statement, the function exits, yields a number, and all varible assignments are maintained.

With a normal function, once a function is exited, all local memory is lost.

`__next__` executes the function code. 

If it's the first time `__next__` is called, execution begins at the top of the definition and stops when `yield` is reached.

Subsequent calls resume after the `yield`.

In [None]:
#first call
nums.__next__()

In [None]:
#second call, resumes after 'yield'
nums.__next__()

Because the generator remembers its current state, we can continue with the rest of the numbers with a for loop.

In [None]:
for n in nums:
    print(n)

### Handling large data files with generators

Often we want to read from a very large file and do something with each line but we don't need the whole file loaded all at once.

I [downloaded](https://www.kaggle.com/stackoverflow/pythonquestions/data) all the questions about Python on Stack Overflow, a website I'm sure you're familiar with by now.

In [2]:
filepath = "Questions.csv"

Here is the wrong way.

In [3]:
def load_data_list(path, encoding="utf-8"):
    file_handle = open(path, "r", encoding=encoding)
    file_lines = file_handle.readlines()
    file_handle.close()
    return file_lines

In [4]:
#this file has a non-standard encoding "latin-1" so I have to specify that. don't worry about file encodings.
questions = load_data_list(filepath, encoding="latin-1")

We can use `memory_profiler` (doesn't work in Notebooks) to see how much memory this uses.

In [None]:
print(questions[:10])

In [None]:
type(questions)

Now let's lazily read the lines from the file.

In [5]:
def lazy_read(filepath, lines, encoding="utf-8"):
    file_handle = open(filepath, "r", encoding=encoding)
    line_count = 0
    while line_count < lines:
        yield file_handle.readline()
        line_count += 1
    file_handle.close()
    

In [6]:
g = lazy_read(filepath, 10, encoding="latin-1")

In [7]:
for line in g:
    print(line)

Id,OwnerUserId,CreationDate,Score,Title,Body

469,147,2008-08-02T15:11:16Z,21,How can I find the full path to a font from its display name on a Mac?,"<p>I am using the Photoshop's javascript API to find the fonts in a given PSD.</p>



<p>Given a font name returned by the API, I want to find the actual physical font file that that font name corresponds to on the disc.</p>



<p>This is all happening in a python program running on OSX so I guess I'm looking for one of:</p>



<ul>

<li>Some Photoshop javascript</li>

<li>A Python function</li>



This is part of what Python does for you when you open a file using `with open()`.

### Generator comprehensions

Just like list comprehensions, we can create generator comprehensions.

Generator comprehensions look exactly like list comprehensions except they use round brackets ().

In [12]:
evens = (x for x in range(100) if x % 2 == 0)

Is equivalent to:

In [13]:
def evens():
    x = 0
    while x < 100:
        if x % 2 == 0:
            yield x
        x += 1
e = evens()

Generators can create an infinite amount of information while requiring almost zero memory!

Example: [fibonacci](https://en.wikipedia.org/wiki/Fibonacci_number) numbers

$F_n = F_{n-1} + F_{n-2}$ where $F_0 = 0$ and $F_1 = 1$

e.g. $0, 1, 1, 2, 3, 5, 8, 13, 21, ...$

![](http://seyferseed.ru/wp-content/uploads/2017/03/Fibonacci-Spiral.png)

![](https://qph.ec.quoracdn.net/main-qimg-0281d782e4ec471ce2d5091d2c40f1b5-c)

In [19]:
def fib():
    a = 0
    b = 1
    while True:
        yield a
        a, b = b, a + b

In [21]:
f = fib()
for i in range(10):
    print(next(f))

0
1
1
2
3
5
8
13
21
34


So now whenver we need the next fibonacci number we can just call our generator function without having to store anything practially in memory.


## Wrapping functions: Decorators


In the spirit of Christmas, let's talk about **wrapping** functions and **decorators**.

![](http://mybbaddict.com/wp-content/uploads/2017/10/delightful-decoration-snoopy-christmas-tree-274-best-peanuts-images-on-pinterest-charlie.jpg)

First let's do a little recap on functions.

**Functions are objects**

Just like objects functions can be:

* passed as arguments
* bound to names
* returned


In [29]:
def foo(x):
    return x
f = foo
b = foo
f(5)
b(10)

10

In [28]:
def caller(func, arg):
    return f"calling {func.__name__} with input {arg} result is: {func(arg)}"

def is_even(num):
    return not num % 2

def is_odd(num):
    return bool(num % 2)

print(caller(is_even, 5))
print(caller(is_odd, 5))

calling is_even with input 5 result is: False
calling is_odd with input 5 result is: True


We can also define functions inside other functions and return them.

In [37]:
def foo():
    def boo():
        return 5
    return boo

five = foo()
five()

5

What if I want to time how long a function takes to run?

In [30]:
import time
import random

#this function runs for 0 to 5 seconds
def foo():
    print("doing some stuff..")
    time.sleep(random.randrange(5))


In [32]:
start = time.time() #get current time
foo()
print(f"time elapsed: {time.time() - start} seconds")

doing some stuff..
time elapsed: 3.0057389736175537 seconds


That's nice but what happens next time I want to time a different function?

I have to write the same code again... repetitive code is no good.

In [33]:
def boo():
    print("doing some other stuff")
    time.sleep(random.randrange(3))
start = time.time()
boo()
print(f"time elapsed: {time.time() - start} seconds")

doing some other stuff
time elapsed: 0.0006089210510253906 seconds


What if I want to make it so that I can automatically modify any function with timing functionality?

This is where decorators come in.

You can think of decorators as functions that create functions but with some useful "decorations".

Since we know that we can return functions, why don't we write a function that takes a function as input and returns a decorated version of it?

In [44]:
def timer(func):
    def wrapped():
        start = time.time()
        func()
        print(f"{func.__name__} took: {time.time() - start} seconds")
    #return the decorated function
    return wrapped

In [45]:
def stuff():
    print("doing some stuff..")
    time.sleep(random.randrange(3))

Let's transform our boring `stuff` function into a decorated version of itself.

In [46]:
timed_stuff = timer(stuff)

Now we can use the improved version of our function.

In [47]:
timed_stuff()

doing some stuff..
stuff took: 0.00034308433532714844 seconds


And we can use the decorator to transform any function (**that takes no arguments**) without having to re-write any code.

In [48]:
def other_stuff():
    print("doing some other stuff..")
    time.sleep(random.randrange(3))


In [52]:
timed_other_stuff = timer(other_stuff)
timed_other_stuff()

doing some other stuff..
other_stuff took: 0.003927707672119141 seconds


### The @ operator

Python makes using decorators a little easier with the `@` operator.

Instead of creating a new function explicitly, we can just put `@decoratorname` before any function we want to decorate.

In [63]:
@timer
def fibonaccis():
    f = fib()
    max_fib = 10
    for i, num in enumerate(f):
        print(num)
        if i > max_fib:
            break
        i += 1

In [64]:
fibonaccis()

0
1
1
2
3
5
8
13
21
34
55
89
fibonaccis took: 0.0017268657684326172 seconds


Another good use of decorators is argument type checking. (source: https://www.python-course.eu/python3_decorators.php)

# Context Managers