# Python Bootcamp Day 2 Afternoon

* Instructor:  Andrew Yarmola [andrew.yarmola@gmail.com](mailto:andrew.yarmola@gmail.com)
* Bootcamp files: [github.com/andrew-yarmola/python-bootcamp](https://github.com/andrew-yarmola/python-bootcamp)

## Classes

So far, we have been mostly using built-in objects available to us from within python itself. Now, we will learn how to use classes to build our own objects. Let's start with a simple example.

In [None]:
class Dog :
    # Global class values
    breed = 'husky'
    tricks = ['sit']
    
    # instance initialization method
    def __init__(self, dog_name) :
        self.name = dog_name # <--- just attach a new attribute
    
    # instance method
    def say_name(self) :
        print("My name is", self.name)

The code above gives a description of how to build an object of type `Dog`. An object is constructed by this class object an **instance** of type `Dog`.

Let's explore what we have built. We have defined a general object called `Dog` that describes how to build an **instance** of a dog.

In [None]:
# create a Dog instance
my_dog = Dog('Sam')

print(type(my_dog))
print(my_dog.name)
print(my_dog.breed)

my_dog.say_name()

Above, when we call `Dog('sam')`, we create a new object of type `Dog`. During this creation, python calls the `__init__()` method.

Notice the presence of the `self` variable in the function declaration of `__init__`. The keyword `self` references the object **instance** we have just created and **must be included as the first argument of any instance method**.

#### Instances have local attributes

The `name` attribute is an example of a local instance attribute. That is, we can create two different objects of type `Dog` with different names.

In [None]:
other_dog = Dog('jack')
print("Othwe dog is named", other_dog.name)

# We should probably capitalize the name
other_dog.name = 'Jack'
other_dog.say_name()

# By the way, other_dog knows the same tricks as *all* dogs!
print(other_dog.tricks)

Notice that above we have **direct access** to the name attribute! Even though the `__init__` method set it to `'jack'`, I redefined it without the object `other_dog` knowing about it.

#### Class global attributes are shared by everyone
The attributes `breed` and `tricks` are **class** variables, which means that if we change them, they changes everywhere.

In [None]:
print(my_dog.breed)
Dog.breed = 'terrier' # I change the class!
print(my_dog.breed) # but the instance chagnes too!

# Similarly, ticks can be modified
my_dog.tricks.append('down')
print(Dog.tricks)
print(other_dog.tricks)

### `dir` function

The `dir()` function lists the attributes of an object. This can be helpful for inspection.

In [None]:
print(dir(my_dog))

In [None]:
print(dir(object()))

Almost all classes in python are actually derived from `object` and any classes you declare without specifying a base class, will inherit from `object`.

## Inheritance 

We can build a derived class very easily with the `class NewClass(Parent)` syntax

In [None]:
class Terrier(Dog) :
    def __init__(self, dog_name) :
        super().__init__(dog_name) # <---- This is how you call your parent methods
        self.breed = 'terrier'
    def is_happy(self) :
        return True
    
terry = Terrier("Terry")

print(terry.is_happy())
terry.say_name()

Notice the call to `super().__init__(*args)`. Be sure to call this method as it is not automatically called for you. Similarly, if you are overloading some method and requite the base class to do something, call `super().base_class_method()`


Another example inheriting from `Exception`, which just happens to not be derived from `object`.

In [None]:
class SeriousError(Exception):
    pass

raise SeriousError

### A comment about abstract base classes

Python provides a tool for creating **abstract base classes**, which are classes that cannot be instantiated. For example, the below class cannot be instanciated and only exists to be inherited.

In [None]:
from abc import ABC, abstractmethod

class Employee(ABC):
    def __init__(self, id, name):
        self.id = id
        self.name = name

    @abstractmethod
    def calculate_payroll(self):
        pass

In [None]:
employee = Employee(42,'Sam')

In [None]:
class HourlyEmployee(Employee):
    def __init__(self, id, name, hours_worked, hour_rate):
        super().__init__(id, name)
        self.hours_worked = hours_worked
        self.hour_rate = hour_rate

    def calculate_payroll(self):
        return self.hours_worked * self.hour_rate
    
employee = HourlyEmployee(42,'Sam', 20, 50)
print(employee.calculate_payroll())

### Multiple inheritance and MRO

Python supports inheriting from multiple object at the same time. First, let's define a few more classes

In [None]:
class SalaryEmployee(Employee):
    def __init__(self, id, name, weekly_salary):
        super().__init__(id, name)
        self.weekly_salary = weekly_salary

    def calculate_payroll(self):
        return self.weekly_salary

    
class Secretary(SalaryEmployee):
    def work(self, hours):
        print(f'{self.name} expends {hours} hours doing office paperwork.')

In [None]:
class TemporarySecretary(Secretary, HourlyEmployee):
    def __init__(self, id, name, hours_worked, hour_rate):
        # what happens if we call super().__init__(*args)?
        HourlyEmployee.__init__(self, id, name, hours_worked, hour_rate) 

    def calculate_payroll(self):
        return HourlyEmployee.calculate_payroll(self)

In [None]:
employee = TemporarySecretary(42,'Sam', 20, 50)
print(employee.calculate_payroll())

Python allows us to inspect the **method resolution order** (MRO).

In [None]:
TemporarySecretary.mro()

 So if we call `super().__init__()` in the delcaration of `TemporarySecretary`, we will actually be calling the `SalaryEmployee.__init__()` method! In the above example, we overwrite by hand the MRO to solve the inherent **diamond problem.**

For (much) more on inheritance and the notion of "compositions" between objects, see the guide [realpython.com/inheritance-composition-python/](https://realpython.com/inheritance-composition-python/)

### Getters, setters, and @property





If you are working on a project with other people, you may want to protect your classes from tempering by other. Python does not have an explicit notion of public and private attributes. One way to "hide" instance variables is to use an `_` before the variable name.

Additionally, we will always use the `@property` decorator along with getter and setter methods. This also allows you to make certain checks and control what you return. Here is an example,

In [None]:
class Dog :
    # instance initialization method
    def __init__(self, dog_name) :
        self.name = dog_name

    # Properties
    # will be called whenever .name is used
    @property
    def name(self) :
        return self._name
    
    # will be called whenever .name = value is used
    @name.setter
    def name(self, dog_name) :
        if type(dog_name) is not str :
            raise ValueError("Dog name must be a string")
        self._name = dog_name.title()
    
    # instance method
    def say_name(self) :
        print("My name is", self.name)

In the new `Dog` class, whenever we call `.name` we are now using the special function right after the `@property` decorator. As you can see, when setting the name, I am checking the type and case. Let's see how this works.

In [None]:
my_dog = Dog('sam')
my_dog.say_name()

In [None]:
my_dog.name = 1234

As you see above, when I try to set the name `my_dog.name = 1234`, my `@name.setter` method is called instead of a direct access to a data attribute!

Let's look at a slightly more complicated examples. A class that stores the data of a graph.

In [None]:
class Graph :
    def __init__(self, verts, edges) :
        self.vertices = verts
        self.edges = edges
    
    @property
    def vertices(self) :
        # return a *copy* of your internal data
        return set(self._vertices)
    
    @vertices.setter
    def vertices(self, verts) :
        self._vertices = set(verts)
    
    @property
    def edges(self) :
        # return a *copy* of your internal data
        return set(self._edges)
    
    @edges.setter
    def edges(self, edges) :
        # let's check that edge endpoints are
        # in vertices
        endpts = set()
        for e in edges :
            if len(e) != 2 :
                raise ValueError("Edges must be pairs")
            endpts.update(e)
        if not endpts.issubset(self.vertices) :
            raise ValueError("All edge edpoints must be in vertices")
        self._edges = set(edges)
        
    def num_components(self) :
        """ Returns the number of connected components. 
        Note : not efficient """
        # we will start with giving each vertex it's own *cluster*
        # as we go through the edges, we will merge clusters
        vert_to_clust = { v : {v}   for v in self.vertices }
        for v1,v2 in self.edges :
            c1 = vert_to_clust[v1]
            c2 = vert_to_clust[v2]
            if c1 != c2 :
                c1.update(c2)
                for v in c2 :
                    vert_to_clust[v] = c1
        # we must now count the distict sets we have
        # in vert_to_clust.values(). 
        clusters = frozenset(map(frozenset,vert_to_clust.values()))
        return len(clusters)

In [None]:
my_graph = Graph((0,1,2,3), { (0,1), (2, 1) })
print(my_graph.edges)
print(my_graph.vertices)

print('-'*20)

print("""We can still access the internal data, but good
        programmers will not accidentally do that!\n""")
print(my_graph._vertices)

print('-'*20)

print("""Notice that we are really returning a copy !\n""")
print(id(my_graph.vertices))
print(id(my_graph._vertices))

print('-'*20)

print("""We can see how many connected components there are.\n""")
print("My graph has" , my_graph.num_components(), "connected components.")

## Overloading operators

You might want to write a class where you can use the operators `+`, `-`, `*`,`/`, `//`, `%`, `pow`, etc. For example, you might want to build a (multiplicative) cyclic group. 

In [None]:
class CyclicGroupElement :
    group_order = 10
    def __init__(self, power = 1) :
        self._power = power % CyclicGroupElement.group_order
    def __mul__(self, other) :
        new_power = self._power + other._power
        return CyclicGroupElement(new_power)
    def __repr__(self) :
        # return a string showing a human readable description of self
        return 'CyclicGroupElement x^{}'.format(self._power)

In [None]:
a = CyclicGroupElement()
b = CyclicGroupElement(3)

In [None]:
a*b

In [None]:
print(a*b*b*b*b)

Above, I defined functions `__mul__(self, other)` and `__repr__(self)`. As you may have guessed, `__mul__` is the function that is called when I use the multiply command. Notice that since the arguments of `__mul__` are ordered, we can, if we wish, define non-commutative multiplication.

The `__repr__` method is used by the `print` function. The return value should always be a string that is an unambiguous representation of your object. If you want something more human readable, implement `__str__`

If we try to use any other operators, we will fail :

In [None]:
a+b

Here is a list of **some** operators you know and their corresponding function names. Below, everywhere you see objects `a,b` in the methods you have `self == a` and `other == b`

   * `a + b` corresponds to `__add__(self, other)` 
   * `a - b` corresponds to `__sub__(self, other)`
   * `a*b` corresponds to `__mul__(self, other)`
   * `a/b` corresponds to `__truediv__(self, other)`
   * `a//b` corresponds to `__floordiv__(self, other)`
   * `a % b` corresponds to `__mod__(self, other)`
   * `divmod(a,b)` corresponds to `__divmod__(self, other)`
      * you should have that `divmod(a,b) = (a//b, a % b)` for your implementation
   * `a ** b` or `pow(a,b,n)` corresponds to `__pow__(self, other[, modulo])` where `modulo == n`
   * `len(a)` corresponds to `__len__(self)` (if your object has some sense of "length")
   * `a < b` corresponds to `__lt__(self, other)`
   * `a <= b` corresponds to `__le__(self, other)`
   * `a == b` corresponds to `__eq__(self, other)`
   * `a != b` corresponds to `__ne__(self, other)`
   * `a > b` corresponds to `__gt__(self, other)`
   * `a >= b` corresponds to `__ge__(self, other)`
   * `repr(a)` corresponds to `__repr__(self)`
      * this is the string you see in your interpreter if you just type `a` followed by ENTER
   * `str(a)` corresponds to `__str__(self)`
      * this is the string you see when you call `print(a)`

You can find a full list at : https://docs.python.org/3/reference/datamodel.html#special-method-names

Let's add some more of these to our `CyclicGroupElement` object.

In [None]:
def CyclicGroup(group_order) :
    class CyclicGroupElement :
        def __init__(self, power = 1) :
            assert isinstance(power, int)
            self._power = power % self.group_order
            
        @property
        def group_order(self) :
            return CyclicGroupElement._group_order
        
        def __mul__(self, other) :
            # so that we don't crash comparing group orders
            if not isinstance(other, CyclicGroupElement) :
                # this tells python that we can't do the operation
                return NotImplemented
            return CyclicGroupElement(self._power + other._power)
        
        def __truediv__(self, other) :
            if not isinstance(other, CyclicGroupElement) :
                return NotImplemented
            return CyclicGroupElement(self._power - other._power)
        
        def __str__(self) :
            return 'x^{}'.format(self._power)
        
        def __repr__(self) :
            return 'CyclicGroupElement x^{}'.format(self._power)
    
    CyclicGroupElement._group_order = group_order
    return CyclicGroupElement
    

In [None]:
# the cyclic group class that lets us build elements
C_10 = CyclicGroup(10)

# some elements
a = C_10(4)
b = C_10(12)

print(a)
print(b)

In [None]:
a*b

In [None]:
a/b

In [None]:
a*2
# tries to call a.__mul__(2)
# where other == 2, since we return
# NotImplemented, we get a TypeError

There are several key things we are doing here at the same time. Let us first focus on the `class CyclicGroupElement`. This class tells us how to build, multiply, divide, and represent elements on a cyclic group. Notice that when I write the multiplication or division functions, I am careful to check the type of `other`. The keyword `self` is always guaranteed to be of the correct type, however, we don't know about `other`. The `NotImplemented` keyword tells python that a `TypeError` has occurred because we don't know how to perform the operation.


The second key point is the **nesting of a class inside a function**. Let's look at what the function `CyclicGroup(group_order)` actually returns.

In [None]:
repr(CyclicGroup(10))

So `CyclicGroup` is the `class CyclicGroupElement`, however, it's a more than that. In fact, every time we call `CyclicGroup(n)` we obtain a **new class (description) object** with the specific condition that `CyclicGroup(n)._group_order == n`. 

In [None]:
C_10 = CyclicGroup(10)
C_10_again = CyclicGroup(10)

In [None]:
id(C_10)

In [None]:
id(C_10_again)

 You should literaly think of `G_10` and `G_10_again` as two **different** group of order 10.

In [None]:
a = C_10(4)
b = C_10_again(5)
a*b

The reason we get an error here is that `a` and `b` are in different groups. Programmatically speaking, they are of different types (i.e. instances of different classes). Thus, when in multiple I check `isinstance(other, CyclicGroupElement)` where `other == b`, I get `False`.

In [None]:
type(a) == type(b)

Thus, every time we call `CyclicGroup(n)` we get a **new recipe** on how to make `CyclicGroupElement`'s. This is why we can have different groups with different orders and nothing will collide!

In [None]:
C_127 = CyclicGroup(127)

## Writing Packages

Packages are simply a collection of modules structured in a way to allow for “dotted module names”. For example, the module `sound.formats` corresponds to a submodule `formats` of package named `sound`. This helps control namespaces, so someone working on `formats` does not need to know what the `sound` namespace looks like.

Suppose you want to design a collection of modules handling of sound files and sound data. There are many different sound file formats, so you may need to create and maintain modules for the conversion between the various file formats. You will also want to add modules for different audio processing tasks.

Here is a possible structure for your package :

```
sound/                          Top-level package
      __init__.py               Initialize the sound package
      formats/                  Subpackage for file format conversions
              __init__.py
              wavread.py
              wavwrite.py
              ...
      effects/                  Subpackage for sound effects
              __init__.py
              echo.py
              surround.py
              ...
      filters/                  Subpackage for filters
              __init__.py
              equalizer.py
              karaoke.py
              ...
tests/                           Tests directory
```

When importing the package, Python searches through the directories in `sys.path` looking for the package subdirectory.

The `__init__.py` files are required to make python treat directories containing the file as packages. This prevents directories with a common name unintentionally hiding valid modules that occur later on the module search path.

One can import submodules via 

```python
import sound.effects.echo
```

In the simplest case, `__init__.py` can just be an empty file. But you can also execute initialization code in `__init__.py` for when your package is imported. For example, the line

```python
__all__ = ["effects", "filters"]
```

will force the `from sounds import formats` to fail and only succeed with `effects` and `filters`.

Other options include things like
```python
import sound.filters as fltrs
from .formats import wav_read
````
where inside the formats module `__init__.py` we have
```python
import sound.filters as fltrs
````

Let's take a look at the `sound` module example included in the repository.

**Upshot**. You should use `__init__.py` to setup the namespace of your package when it is imported. If you don't need anything extra, you can leave it blank.

**More about imports**. You can read more about the structure of `import` at [realpython.com/python-modules-packages/](https://realpython.com/python-modules-packages/)

**Remark**. There is really not fully agreed upon format to what to put in `__init__.py`. See the following post for a discussion on the matter [pcarleton.com/2016/09/06/python-init/](https://pcarleton.com/2016/09/06/python-init/)

### Installable packages

Installable packaged that are hosted in repositories are also an option. However, the details are somewhat involved, but essentially you create . See [realpython.com/pypi-publish-python-package/](https://realpython.com/pypi-publish-python-package/) and [docs.python-guide.org/writing/structure/](https://docs.python-guide.org/writing/structure/) for guides and comments.


## Managing several repositories

Since you will  most likely not be creating installable and distributed packages, I would suggest you stick to simple python package and use `git submodule` to track all of your internal dependencies.

To use `import`, you will need to add the root module directory to your path using `sys.path.append()` as necessary.

If you really need to install a package using `pipenv` form a `git` repository, this can be done with

```bash
pipenv install -e git+https://github.com/requests/requests.git@v2.20.1#egg=requests
```

where `#egg=package_name` specifies the package name used by `pipenv`

## Scripts

Scripts are non-graphical standalone programs for doing a specific task.

For example, here are the contents of `three_powers_of_two.py` : a script that prints the first 3 powers of 2.

```python
#!/usr/bin/env python3

def generate_powers() :
    return [ 2**x for x in range(3) ]

print("Global __name__ is :", __name__)

if __name__ == '__main__' :
    print(*generate_powers(), sep = '\n')
```

Now, if I import this file, nothing will happend except for the fact that I will have `generate_powers` defined.

In [None]:
import three_powers_of_two as tpt

As you can see, when importing, the global variable `__name__` is set to the filename.

However, if I now go to a console/terminal and **run** the scipt using
```bash
python3 three_powers_of_two.py
```
or, because we have `#!/usr/bin/env python3` at the top of the file, we can use
```bash
./three_powers_of_two.py
```

You will see the commands in the `if` statement executed.

```bash
$ ./three_powers_of_two.py 
Global __name__ is : __main__
1
2
4
```

So a script can be used both as a module and a tool. However, a program isn't very useful is you can't give it input.

### Arguments and Options

The standard way to pass arguments to a script is to give a space separated list after the command call :

```bash
python three_powers_of_two.py arg1 arg2
```

To read the aguemnts in, we will use the `sys` module's `sys.argv` attribute. We update our `three_powers_of_two_new.py` with : 

```python
#!/usr/bin/env python3

import sys

def generate_powers() :
    return [ 2**x for x in range(3) ]

if __name__ == '__main__' :

    print(sys.argv)
    
    print(*generate_powers(), sep = '\n')
```

When we run this using the above command in a terminal, we will see :
```bash
$ python three_powers_of_two.py arg1 arg2
['three_powers_of_two.py', 'arg1', 'arg2']
1
2
4
```
In particular, `sys.argv` is a list that starts with the **name** of the program and then gives **all space separated arugments**.

**Remark** if you need to have a space in an arugment, you can use (double) quatition marks :
```bash
$ python three_powers_of_two.py arg1 "arg2 with a space"
['three_powers_of_two.py', 'arg1', 'arg2 with a space']
1
2
4
```

Let us make a slightly more useful script `count_vowels.py` that counts vowels in a file

```python
#!/usr/bin/env python3

import sys

def vowels_in_string(data) :
    return { v : data.count(v) for v in 'aeiou' }

if __name__ == '__main__' :
    if len(sys.argv) < 2 :
        print("Usage: python count_vowels.py file")
        sys.exit(2)
        
    file_name = sys.argv[1]

    with open(file_name, 'r') as fp :
        data = fp.read()
        print(vowels_in_string(data))
```

Notice that I am doing minimal input checking. This allows me to both inform the user how to use the program and also to check for bad input.

We can run out program to get :
```bash
$ python count_vowels.py "morning.ipynb"
{'u': 5197, 'a': 5898, 'i': 5305, 'o': 4934, 'e': 6094}
```

### Options and `getopt`

Using `sys.argv` gives us only **positional** arguments for our program. There is a better way using the `getopt` modules. The idea is to specify a **flag** or **keyword** using a `-` or `--` prefix. We would like to do something like this : 

```bash
./hanoi_gif.py -v --d 4 --fps 4 awesome_hanoi_4.gif
```

Let's see a simple example fo how `getopt` works

In [None]:
import getopt

argv_list = '-v -d 4 --fps 5 -w something --write nothing arg1 arg2 arg3'.split()

opts, args = getopt.getopt(argv_list, 'vd:w:',
                           ['verbose', 'disk=', 'fps=', 'write='])

print(opts)
print(args)

As you can see there are **three** types of options/arguments here. The key thing to understand is the line
```python
opts, args = getopt.getopt(argv_list, 'vd:w:',
                           ['verbose', 'disk=', 'fps=', 'write='])
```
The string `'vd:w:'` indicates how to parse **single letter** options preceded by a `-` symbol. While the list `['verbose', 'disk=', 'fps=', 'write=']` indicates how to parse **keyword** options preceded by a `--` symbol

The options here are :
* **flags** : these are the `-v` or `--verbose` options
   * they have no value, their **presence** is all you need
   * they are declared by a letter **without** a colon or a word **without** an `=`

* **keyword arguments** : these are the `-d`,`-w`, `--disks`, `--write`, and `--fps` options
   * they require a value to follow them when calling the program
   * their declaration is followed by a colon after a letter or an `=` after a keyword
   
* **positional arguments** : there are `arg1`, `arg2`, and `arg3`
   * must **follow** all flag and keyword arguments
   
When `getopt.getopt` parses `argv_list`, it returns a **list of tuples** and a **list of strings**. The list of tuples is map of options to their values and the list of strings is the list of positional arguments.

To apply this to function arguments, we simply need to call `getopt` on `sys.argv[1:]` (dropping the program name).

We can now implement a `hanoi_gif.py` containing the following code. Pay close attention to how I interpret the contents of `opts` and `args`.

In [None]:
%cat hanoi_gif.py