### Advanced Python

### Classes

We have seen the basic built in data types of Python, and you have probably noticed a few other types, such as the linear regression object from statsmodels and the NumPy array.

Python is an object orientated language. We have objects, which are instances of types, these interact with each other, and can contain data or attributes. We can compare this to something like R or Scala which are more functional, we have objects or data, and we modify them by functions.

This gets a bit theoretical so we can use Pac-Man as an example. In Pac-Man, we can say we have the class of ghosts, and each ghost: Inky, Blinky, Pinky and Clyde are instances of the type. They share general attributes such as wanting to chase pacman and having similar appearances and general behaviour. However, they are different when it comes to other attributes such as colour, speed and behaviour. 

If we treat these ghosts as objects it is easier to talk about them and how they work, rather than if they are arrays which are updated by functions in every frame. However, if we want to treat their behaviour stricly mathematically, it is easier to treat them functionally.

Recall, we have methods which are specific to certain types of objects we have seen:

In [None]:
x = [1,2,3,4]
print(x.pop())
x = 'this is a string!'
print(x.pop())

How is this implemented? We have an Class, list, of which x is an instance. The list object has a method, pop, which is implemented, and can work on objects of this class. The string class does not have this method.

So, let's make our own class in order to understand how it works:

In [38]:
#create a robot class as type 'object'
class Robot(object):
    
    def __init__(self, myName, myColor, myHeight = 6):
        self.name = myName
        self.color = myColor
        self.height = myHeight
        
    
    def welcome_msg(self):
        print(f"Hi! My name is {self.name} \n and I'm a {self.color} robot!")
        print(f"I am {self.height} feet tall!")
        




# class Basket(object): #convention is to use capitals
#     pass

# x = Basket()
# print(x)
# print(type(x))

In [41]:
robot1 = Robot('Ali','red', 7)

In [42]:
robot1.welcome_msg()


Hi! My name is Ali 
 and I'm a red robot!
I am 7 feet tall!


We just made a new object, Basket, with a single instance, x.

Right now it doesn't do much... let's add some attributes. We can think of attributes as slots in an object that we want to describe. Think back to our linear model object, it had attributes like p-value, residuals and so on.


In [43]:
class Basket(object): #convention is to use capitals
    '''
    We can use docstrings in objects, just like functions
    '''
    max_size = 100

my_basket = Basket()
print(my_basket.max_size)
#help(Basket) # we have help implemented using our docstring and defined attributes

100


It is not super useful to have a hardcoded attribute. We probably want to have different values for different objects.

The usual way to instantiate an instance of a class is the `__init__` method. In python, two underscores designates a special attribute or method, and are pronounced as 'dunder' item. We often also use a single underscore to designate a protected or internal item.

The init method looks just like a function, the first argument is `self` which refers back to particular instance.

In [89]:
# Define the class Basket

class Basket(object): 
    '''
    A Class to represent a single transactions total basket
    '''
    #notice ther that max_size is defined outside the __init__
    #max_size = 100
    def __init__(self, items, amounts, costs, max_size = 10):
        '''
        __init__ is a method, that we call when we make an instance of a class
        '''
        #assert that items is a list
        assert isinstance(items, list), 'Items must be a list'
        #assert that each item is a str
        assert all(isinstance(item, str) for item in items), 'Items must be a list of strings'
        
        assert isinstance(amounts, list), 'Amounts must be a list'
        assert all(isinstance(i, int) for i in amounts), 'Amounts must be a list of integers'
        assert all(i>0 for i in amounts), 'One or more list elements in Amounts is negative'
        
        assert isinstance(costs, list), 'Costs must be a list'
        assert all(isinstance(i, float) for i in costs), 'Costs must be a list of floats'
        assert all(i>0 for i in costs), 'One or more list elements in Costs is negative'
        
        self.items = items
        self.amounts = amounts
        self.costs = costs
        self.max_size = max_size
        
        assert isinstance(self.max_size, int), 'max_size must be an integer'
        if self.max_size < 1: raise Exception('max_size must be at least 1!')
        if sum(self.amounts) >self.max_size: raise Exception(f'Cannot have an amount greater than {self.max_size}!')
        

In [93]:
        
# Create an instance of the Basket class, then print out the items it contains
my_basket = Basket(['Razor', 'Milk', 'Bread'], [1, 6, 3], [12.99, 4.99, 2.99])
print(f"Items: {my_basket.items}")


Items: ['Razor', 'Milk', 'Bread']


In [94]:
# Create another instance of a Basket
other_basket = Basket(['apples'], [3], [0.70])
print(f"Items: {other_basket.items}")

Items: ['apples']


We now have a mildly useful Class. We can hold customer transactions as baskets, and make sure to enforce the types and data structures we expect.

### Exercise

1. We can use assertions inside classes and methods, just like in functions. Add assertions inside the `__init__` to ensure that items is a list with all strings, amounts is a list with all integers, and costs is a list with all floats.

2. Write an assertion, inside the `__init__`, which enforces the max_size attribute (hint, access it using self.max_size).

**SEE ABOVE**

As well as the init, we can implement arbitrarily useful methods: 

In [103]:
class Basket(object): 
    '''
    A Class to represent a single transactions total basket
    '''
    max_size = 100
    def __init__(self, items, amounts, costs):
        '''
        __init__ is a method, that we call when we make an instance of a class
        '''
        self.items = items
        self.amounts = amounts
        self.costs = costs
    
    def cost(self):
        '''
        gives the total cost of a basket
        '''
        import numpy as np
        return round(np.sum(np.array(self.amounts) * np.array(self.costs)),2)

    
# Create an instance of the Basket class and calculate the cost of its items    
my_basket = Basket(['Razor', 'Milk', 'Bread'], [1, 2, 3], [12.99, 4.99, 2.99])
print(my_basket.cost())

# Do the same, but for a new instance of a Basket
other_basket = Basket(['apples'], [3], [0.70])
print(other_basket.cost())

31.94
2.1


### Exercise

Complete the below class, a Customer, which holds multiple baskets, taken as arguments to the init call, or the addtrans method, and calculates totals using the total_spent method.

In [137]:

class Customer(object):
    
    def __init__(self, custId, baskets = None):
        self.id = custId
        self.baskets = []
        if self.baskets is not None:
            self.addBaskets(baskets)
    
    def addBaskets(self, baskets):
        for b in baskets:
            self.baskets.append(b)
            
    def total_spent(self):
        '''
        fill this in!
        '''
        total_costs = 0
        for i in self.baskets:
            #import numpy as np
            #print(i.amounts)
            #print(i.costs)
            
            # WE COULD CALCUALTE AGAIN BUT REMEMBR THE BASKET CLASS HAS A COST FUNCTION WHICH WE CAN CALL - see below
#             basket_cost = round(np.sum(np.array(i.amounts) * np.array(i.costs)),1)
            
#             #print(f"Basket cost: {basket_cost}")
#             total_costs += basket_cost
        
            total_costs += i.cost()
        return total_costs
    
    def total_baskets(self):
        return len(self.baskets)

    
# Create an instance of a Basket    
my_basket = Basket(['Razor', 'Milk', 'Bread'], [1, 2, 3], [12.99, 4.99, 2.99])
# Create an instance of a Customer
my_cust = Customer(1, [my_basket])

print( f"Total cost for all {my_cust.total_baskets()} basket(s): {my_cust.total_spent()}" ) # Should return 31.94


# Create another instance of a Basket and add it to our Customer's list of Baskets 
other_basket = Basket(['apples'], [3], [0.70])
my_cust.addBaskets([other_basket])

print( f"Total cost for all {my_cust.total_baskets()} basket(s): {my_cust.total_spent()}" )  # Should return 34.04


Total cost for all 1 basket(s): 31.94
Total cost for all 2 basket(s): 34.04


### Special Methods

As well as `__init__` we have many other special methods, [you can see the docs here](https://docs.python.org/3/reference/datamodel.html#basic-customization). A useful method is `__str__` which determines how we print our object, as well as the `__eq__` etc ones, which you could imagine we could implement to compare basket costs.

In [141]:
a = [5,1]
print(a[::-1])

print(my_cust)

[1, 5]
<__main__.Customer object at 0x000000F0BE808828>


In [145]:
class FunList(list):
    
    #Without the below function, the print function will just print out '[1,2,3,4,5]'
    #With it, it changes how the print function works
    def __str__(self):
        return 'Hey Im a fun list, look at my values: ' + super.__str__(self)
    
    def __eq__(self, other):
        return self[0] == other[-1]

In [147]:
x = FunList([1,2,3,4,5])
print(x)

Hey Im a fun list, look at my values: [1, 2, 3, 4, 5]


In [148]:
x = FunList([1,2,3,4,5])
y = FunList([5,1])
print(x == y)

x = FunList([1,2,3,4,5])
y = FunList([5,4,3,2,1])
print(x == y)

x = list([1,2,3,4,5])
y = list([5,4,3,2,1])
print(x == y)

True
True
False


### Inheritance

Inheritance refers to objects which are 'subtypes' of other objects. When we defined our classes, we had `object` in brackets, as we were inheriting from the generic object type.

If we used another object there, we could inherit from it. We keep all of the attributes and methods from our parent class, but add or overwrite any we include in our new class:

In [None]:
class FunList(list):
    def __str__(self):
        return 'Hey Im a fun list, look at my values: ' + super.__str__(self)
            
x = FunList([1,2,3,4,5])

#Just like an old list:
x.append(100)
print(x.pop())

#But with the cool print:
print(x)

## Advanced Functions

### Functions-from-Functions
As well as values, we can actually return functions from functions. This is useful for caching and for when we want to use another function with different defaults.

In [149]:
def make_power(n):
    def local_func(x):
        return x ** n
    return local_func

In [150]:
my_squarer = make_power(2)
my_squarer(3)

9

### Lambda functions
A *lambda function* is a fancy name for a very simple function that we want to use once and then likely never use again. Lambda functions work the same way as the type of functions we've seen before-- taking some arguments as input, doing something to them, and returning an output -- but the syntax is quicker and easier.

In [151]:
my_func = lambda x,y: x**y
my_func(2,3)

8

The case above is a good example to show the syntax of lambda functions, but in practice, we would almost never give a lambda function a name to store it. They're intended to be disposable and single-use. We more often use lambda functions when working with *iterators*, some of which take in functions as arguments. Let's start exploring iterators and then we'll revisit lambda functions in the context that they're intended to be used.

### Iterators

Throughout this course we have repeatedly encountered the `range` function, it is now time to explpore it in more depth. In Python 2.x the `range` function works in a considerably different way than it does in Python 3. This is becuase it was reimplemented so it now gives a strange print out:

In [152]:
range(10)

range(0, 10)

What exactly are we doing? Let's take a look at a more interesting example:

In [153]:
range(10**100)

range(0, 10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000)

Given that there are about $10^{80}$ atoms in the universe, it is likely that we have not constructed a list which contains them all.

Here are a couple of other built in functions that work in a similar manner:

In [154]:
print(map(lambda x: x*2, [1,2,3,4,5]))
print(zip([1,2,3,4,5], [1,2,3,4,5]))

<map object at 0x000000F0BE81CEF0>
<zip object at 0x000000F0BE7FF748>


These are all iterators (NB., this is distinct from iterables, which are any sequence we can loop over).

Internally, python has a method of generating the next object in these iterator, and lazily evaluates the next item only when we need it. This is extremely useful for conserving memory when working with large data. When we hit the last object, we raise an error, which signifies the end of the iterable to most python objects.

In [155]:
my_square = (map(lambda x: x*2, [1,2,3,4,5]))
print(next(my_square))
print(next(my_square))
list(my_square) #huh!

2
4


[6, 8, 10]

In [156]:
my_square_2 = (map(lambda x: x*2, [1]))
next(my_square_2)
next(my_square_2)

StopIteration: 

Once an iterator has hit the Stop Iteration error, it is empty, and we have to reinitialize it.

Map here is mapping the given function onto each item in the iterable.

Zip `zips` together items - useful if we want to iterate over more than item at once

In [157]:
list(zip([1,2,3,4,5], ['a','b','c','d','e']))

[(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd'), (5, 'e')]

Range is not 100% the same, we cannot call next on it, and it has some extra attributes:

In [158]:
x = range(10)

#Get some elements from range
print(x.start)
print(x.step)

#But it's not an iterator...
next(x)

0
1


TypeError: 'range' object is not an iterator

We can make iterators in one of two ways. The first is by wrapping an iterable in the `iter` function:

In [159]:
x = range(10)
range_iterator = iter(x)

print(next(range_iterator))
print(next(range_iterator))

0
1


The second is by making our own, by replacing the `return` keyword in a function, with the `yield` keyword:

In [160]:
def my_func(x):
    vals = [1,2,3,4,5]
    for i in vals:
        yield x * i

range_iterator = my_func(5)
print(next(range_iterator))
print(next(range_iterator))

5
10


Iterators are just special functions which return when they encounter a `yield` operation. Every time we call `next` (or an equivalent operation like a for-loop) we continue running the function until we hit `yield` again. Once we run out of `yield` operations we throw the error. 

Iterators do not need to be finite, the following iterator will run forever, it just keeps yielding the same value over and over and over...

In [161]:
def forever() :
    while True :
        yield 1
        
forever_iterator = forever()

for i in range(500) :
    next(forever_iterator)

### Exercises

In data science we will often have to create combinations of variables, columns or data points. Iterators give us a fast and efficient way of doing this.

The `itertools` package is part of the standard library, and contains many useful combinatorial functions, which mostly produce iterators. Take a look at the [documentation here](https://docs.python.org/3/library/itertools.html).

1. Create a nested for loop to generate all pairwise combinations of range(10) and range(10)
2. Create the same using an itertools function
3. Crate all combinations of 3 items from range(10)
4. Use itertools for the sample problem.

In [166]:
#1
#METHOD ONE
combos = []
for i in range(10):
    for j in range(10):
        combos.append((i,j))
        
print(combos)

#Method two
import itertools

combos = list(itertools.product(range(10), repeat=2))
        
print(combos)



[(0, 0), (0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7), (0, 8), (0, 9), (1, 0), (1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (1, 7), (1, 8), (1, 9), (2, 0), (2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6), (2, 7), (2, 8), (2, 9), (3, 0), (3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6), (3, 7), (3, 8), (3, 9), (4, 0), (4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6), (4, 7), (4, 8), (4, 9), (5, 0), (5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6), (5, 7), (5, 8), (5, 9), (6, 0), (6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6), (6, 7), (6, 8), (6, 9), (7, 0), (7, 1), (7, 2), (7, 3), (7, 4), (7, 5), (7, 6), (7, 7), (7, 8), (7, 9), (8, 0), (8, 1), (8, 2), (8, 3), (8, 4), (8, 5), (8, 6), (8, 7), (8, 8), (8, 9), (9, 0), (9, 1), (9, 2), (9, 3), (9, 4), (9, 5), (9, 6), (9, 7), (9, 8), (9, 9)]
[(0, 0), (0, 1), (0, 2), (0, 3), (0, 4), (0, 5), (0, 6), (0, 7), (0, 8), (0, 9), (1, 0), (1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (1, 7), (1, 8), (1, 9), (2, 0), (2, 1), (2, 2), (2, 3), (2, 4)

In [171]:
#3
combos_3 = list(itertools.combinations(range(10), 3))
print(combos_3)

[(0, 1, 2), (0, 1, 3), (0, 1, 4), (0, 1, 5), (0, 1, 6), (0, 1, 7), (0, 1, 8), (0, 1, 9), (0, 2, 3), (0, 2, 4), (0, 2, 5), (0, 2, 6), (0, 2, 7), (0, 2, 8), (0, 2, 9), (0, 3, 4), (0, 3, 5), (0, 3, 6), (0, 3, 7), (0, 3, 8), (0, 3, 9), (0, 4, 5), (0, 4, 6), (0, 4, 7), (0, 4, 8), (0, 4, 9), (0, 5, 6), (0, 5, 7), (0, 5, 8), (0, 5, 9), (0, 6, 7), (0, 6, 8), (0, 6, 9), (0, 7, 8), (0, 7, 9), (0, 8, 9), (1, 2, 3), (1, 2, 4), (1, 2, 5), (1, 2, 6), (1, 2, 7), (1, 2, 8), (1, 2, 9), (1, 3, 4), (1, 3, 5), (1, 3, 6), (1, 3, 7), (1, 3, 8), (1, 3, 9), (1, 4, 5), (1, 4, 6), (1, 4, 7), (1, 4, 8), (1, 4, 9), (1, 5, 6), (1, 5, 7), (1, 5, 8), (1, 5, 9), (1, 6, 7), (1, 6, 8), (1, 6, 9), (1, 7, 8), (1, 7, 9), (1, 8, 9), (2, 3, 4), (2, 3, 5), (2, 3, 6), (2, 3, 7), (2, 3, 8), (2, 3, 9), (2, 4, 5), (2, 4, 6), (2, 4, 7), (2, 4, 8), (2, 4, 9), (2, 5, 6), (2, 5, 7), (2, 5, 8), (2, 5, 9), (2, 6, 7), (2, 6, 8), (2, 6, 9), (2, 7, 8), (2, 7, 9), (2, 8, 9), (3, 4, 5), (3, 4, 6), (3, 4, 7), (3, 4, 8), (3, 4, 9), (3, 5, 6)

In [172]:
list_of_atts = ['height','weight','shoesize']
list(itertools.combinations(list_of_atts,2 ))

[('height', 'weight'), ('height', 'shoesize'), ('weight', 'shoesize')]

### Recursion

A common interview question is to create the nth number in the fibonacci sequence:

$$ F_n = F_{n-1} + F_{n-2} $$

There are many, many ways of implementing this.

One of the main things they are looking for is clean, efficient code (as well as working, depending on who you ask, 50% of people interviewing for programming jobs cannot implement FizzBuzz).

One way to measure, is run time.

We have seen the magic functions, that begin with `%` in a few places.

In jupyter notebooks, we can use `%timeit` to see how long a line takes to run, or `%%timeit` to see the whole cell.

In [173]:
def fib1(x):
    n = 2
    a, b = 1,1
    if x < 2:
        return a
    else:
        while n <= x:
            a, b = b, a + b
            n += 1
    return b

In [174]:
%%timeit
[fib1(i) for i in range(10)]

6.84 µs ± 697 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


This is nice code, but we can think a little harder, and make a recursive solution.

Recursion allows us to loop back, and use the same function inside itself. Fibonacci is the classic example of recursion:

In [175]:
def fib2(x):
    if x < 3:
        return 1
    else:
        return fib2(x - 1) + fib2(x - 2)

In [176]:
%%timeit
[fib2(i) for i in range(10)] 

28.7 µs ± 1.69 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)


Cool, but what if we want a few more numbers:

In [None]:
%timeit [fib1(i) for i in range(30)]

In [None]:
%timeit [fib2(i) for i in range(30)]

Our performace is terrible! With a bit of thinking, we can see that if we call fib2(5), we have to calculate fib2(4) and fib2(3), but fib2(4) calls fib2(3) again, and it multiplies out.

One of the key parts of recursive programming is to make sure we do not run the same function on the same value more than once.

We can cache the results!

In [177]:
fib_cache = {}
def fib3(x):
    if x in fib_cache:
        return fib_cache[x]
    else:
        if x < 3:
            fib_cache[x] = 1
        else:
            fib_cache[x] = fib3(x - 1) + fib3(x - 2)
    return fib_cache[x]

print(fib3(10))
print(fib_cache)       

55
{2: 1, 1: 1, 3: 2, 4: 3, 5: 5, 6: 8, 7: 13, 8: 21, 9: 34, 10: 55}


What have we done here? We have a function, which checks if the value already exists in a dictionary. If it is found, we return the value from the dict, otherwise we run the fucntion, and put the results in the dict.

Let's time it:

In [180]:
%%timeit

[fib3(i) for i in range(30)]

7.53 µs ± 296 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)


So, we can see this is super fast but it was a bit of work!

Can we generalize this?

The answer is yes, we have a way of adding this functionality to arbitrary functions:

In [None]:
from functools import lru_cache

@lru_cache()
def fib4(x):
    if x < 3:
        return 1
    else:
        return fib4(x - 1) + fib4(x - 2)

In [None]:
%%timeit 
[fib4(i) for i in range(100)]

What exactly have we done here? And what is that `@` symbol?

If a function returns a function, we can use it to modify our own functions. For example:

In [None]:
def my_decorator(function):
    
    def my_func(x):
        print('running your function now')
        output_of_function = function(x)
        print('done running')
        return output_of_function * output_of_function
    
    return my_func

def my_adder(x):
    return x + 1

wrapped_function = my_decorator(my_adder)
wrapped_function(5)

A shortcut for this is to use the decorator notation:

In [None]:
@my_decorator
def my_adder(x):
    return x + 1

In [None]:
my_adder(5)

`lru_cache` stands for least recently used. We keep the n most recently used calls (by default 128) in our cache, and return them if found, rather than running the function. This used the same dict implementation as our naive cache.

Today we have had a whirlwind tour of some of the more pythonic features of the language. Don't worry too much if you didn't follow everything. Python skills come with pratice, some good resources are: 
* [Stack Overflow](https://stackoverflow.com/questions/tagged/python), where you can see the most common, or most recent python questions from users around the world. 
* [Project Euler](https://projecteuler.net/) which is a series of math and programming problems with a focus on smart implementations.
* [Rosalind](http://rosalind.info/about/) which is a series of questions based on biology.
* [HackerRank](https://www.hackerrank.com/), which is a site used to screen applicants, but has practice question banks.

The best resource however is just practice. Work on your project, google around, and always be thinking of whether  there is a better way.

### Exercises

1. Make a new implementation of fibonacci. Can you beat the timing of the examples given above?
2. Make an implementation of FizzBuzz (for a given list of numbers, if a number is divisible by 5, print buzz, if divisible by 3 print fizz, if by 3 and 5 print fizzbuzz, otherwise print the number). Make sure you return the correct value. Walk through your version, and a partners version, did you get any good ideas? Can your partner break your function with weird values?
3. Work in pairs to create a function to find the prime factors of a given integer (the prime factors of a number are the unique set of prime numbers that multiply to give the number for example, 9 is [3,3], 12 is [2,2,3]). Feel free to use google. Check with neighbors, whose function is faster? How about with larger numbers?
4. (Advanced!) Write a function to find the smallest number that is the multiple of a given list of integers. For example, [2,3,4,5,6] is 60. Check against known implementations. Your answer to step 3 might help. Talk to your neighbors, use google and work together.