# Functions and Classes

This notebook covers:

* The elements of structure in Python: functions and classes
* Defining functions (including the idea of variable scope, nested functions, lambdas and recursion)
* Defining classes and creating objects

Classes can be confusing if you haven't met object-oriented programming before. Fortunately one of the great things about Python is that it lets you solve problems in the way you find least confusing. Sometimes this will involve classes, but only if that approach suits the problem and you. Python doesn't force you to understand the mechanics or the philosophy of defining your own classes to write sophisticated programs. 

## Program Structure

Here is a representative skeleton of a long program, showing almost everything we're going to cover. The rest of the notebook will explain the different features in more detail.

```python
# Import statements
import stuff

# Variables at the module level
DEFINE_STUFF = 1

# Class definitions
class Stuff:
    # Functions within the class definition
    def __init__(self,arguments...):
        "stuff"
    def function_in_class(self,arguments..):
        "stuff"
        return
        
# Function definitions
def function_in_module(arguments...):
    "stuff happens in the function"
    return stuff
   
if __name__ == '__main__':
    "stuff to run from the command line"
```


A `.py` file defines a scope for a **module**. Python has a simple structure of **one module per file**. 

A module is made up of **function definitions** and **class definitions** as well as **module-level statements** which can include any valid Python, including variable definitons (this is really what you're doing when you're working interactively). The idea of modules, `import` statements and the `if __name__ == '__main__'` block on the end are explained in a separate notebook.

**Function definitions** contain variable assignments and, optionally, other function defintions

**Class definitions** contain variable assignments (referred to in this case as attributes) and function definitions (referred to in this case as methods) that are 'glued' to objects of that class. This is confusing at first and will be explained later in this notebook.

### A realistic example

This is a full program with some realistic complexity. There a lot of new things in here -- the point is not to understand what it's doing but to spot the elements in the skeleton example above.

The example is one (not optimal) solution to the challenge of reporting which of a number of random bears is celebrating an important milestone in its life (for example, if it can vote at age 18, or when it becomes a senior bear at age 80).

In [355]:
# Import statements
import os
import random
import numpy as np
from   numpy.random import normal
import datetime

# Variables defined at the module level. 
# Captial letters for these are just my personal style.
THIS_YEAR     = datetime.datetime.now().year
BEAR_FILE     = 'my_bear_names.txt'

# Simple one-line exception definition -- just a class like any other,
# but inherits from Exception
class BearNamesError(Exception): pass

# A normal Class definition
class Bear:
    # A special function within the class definition
    def __init__(self,name,birth_year):
        """
        Args:
            name: name of bear
            age: age of bear in years
        """
        # Variables can be glued to objects using the . notation
        # (variables glued to objects are called attributes)
        self.name       = name
        self.birth_year = birth_year
    
        self.milestones = [('first year',     lambda x: x == 0),
                           ('decade',         lambda x: x%10 == 0),
                           ('can vote',       lambda x: x == 18),
                           ('midlife crisis', lambda x: x == 35),
                           ('senior bear',    lambda x: x >= 80)]
        
    # A normal function within the class definition
    def celebrating_milestone_in_life(self,current_year):
        """
        Compares age of this bear to a set of personal milestones
        and returns the first milestone that it has reached.
        
        Args:
            current_year: the current year used to calculate age.
            
        Returns:
            str (name of milestone) if a milestone is reached this year
            None if no milestone is reached
        """
        full_years = int(np.floor(current_year - self.birth_year))
        
        for milestone, test in self.milestones:
            if test(full_years):
                return milestone
        return None
                
# A function at the module level
def get_names(reset=False):
    """
    Get some bear names from Wikipedia 
    (beware, this might break if the Wiki page changes...)
    
    Returns:
        list of str (bear names)
    """
    # Import statements inside a function
    import requests
    from   lxml import html
    
    # Return the data from local file if it already exists
    if os.path.isfile(BEAR_FILE) and not reset:
        with open(BEAR_FILE,'r') as f:
            bears = [l.strip() for l in f.readlines()]
        return bears
    
    # Otherwise download and save
    print('Downloading names...')
    
    xpath       = '//ul[(((count(preceding-sibling::*) + 1) = 17) and parent::*)]//li'
    pageContent = requests.get('https://en.wikipedia.org/wiki/List_of_fictional_bears')
    bears       = [b.text_content().split(',')[0] for b in html.fromstring(pageContent.content).xpath(xpath)]
    
    # Simple error checking
    if 'Kumamon' not in bears:
        raise BearNamesError("Bear list doesn't include Kumamon, maybe the XPath is wrong?")
        
    # Unbelievably, some significant bears are not in Wikipedia!
    bears.extend(['Oh Bear', 'Hero the Bear', 'Bravo the Bear'])
    
    with open(BEAR_FILE,'w') as f:
        for b in bears:
            f.write('{0}\n'.format(b))
    print('Saved bears to {0}'.format(BEAR_FILE))
        
    return bears

# Another function at the module level
def generate_bears(n_bears=100,names=None):
    """
    n_bears (int): number of bears to generate.
    names (list of str): list of bear names.
    
    Returns:
        list of Bear objects
    """
    if names is None:
        names = ['Unknown Bear']
    
    # Uses an imported function and variable from the module scope
    random_birth_years = 1970 + np.minimum(THIS_YEAR-1970,
                                           normal(loc=0,scale=30,size=n_bears))

    random_names = [random.choice(names) for i in range(0,n_bears)]
    
    # Build a list of Bear objects
    bears = list()
    for n, y in zip(random_names,random_birth_years):
        bears.append(Bear(n,y))
    return bears

# And one more function at the module level
def report_random_bear_events(n_bears=100):
    """
    Makes up some random bears and prints a report on those
    celebrating important birthday milestone this year.
    """
    names = get_names(False)
    bears = generate_bears(n_bears,names=names) 
    
    ages  = np.array([THIS_YEAR - bear.birth_year for bear in bears])
    names = [bear.name for bear in bears]
    
    celebrations = np.array([bear.celebrating_milestone_in_life(THIS_YEAR) for bear in bears])

    n_old, a_old = names[ages.argmax()], int(ages[ages.argmax()])
    print('Oldest bear is {} ({:d} yrs)'.format(n_old,a_old))
    
    n_young, a_young = names[ages.argmin()], int(ages[ages.argmin()])
    if n_young == n_old: n_young = 'is also called %s'%(n_young) 
    print('Youngest bear is {} ({:d} yrs)'.format(n_young,a_young))
 
    events         = celebrations[np.flatnonzero(celebrations)]
    event_bears    = [names[i] for i in np.flatnonzero(celebrations)]
    n_celebrations = len(event_bears)
    
    if n_celebrations > 0:
        print('{} bears celebrating. Congratulations to:'.format(n_celebrations)) 
        for i in np.argsort(event_bears):
            print('   {} ({})'.format(event_bears[i],events[i]))                                                               
    else:
        print('No bears are celebrating a special birthday this year.')

If we put this in bears.py, we can add the following `if __name__ == '__main__':` block (these special blocks don't work inside Jupyter notebook cells):

```python
if __name__ == '__main__':
    import argparse
    parser = argparse.ArgumentParser()
    parser.add_argument('n_bears',type=int,default=10)
    args = parser.parse_args()
    
    report_random_bear_events(args.n_bears)
``` 

We would then have a complete Python program that we can run to figure out which bears to congratulate. 
```bash
> python bears.py 10
```
Since we've defined all the functions in the cell above, we can do this inside the notebook by calling the `report_random_bear_events()` function directly. Since this is random you can run it several times and get different results.

In [345]:
report_random_bear_events(10)

Oldest bear is The Grupo Bimbo mascot (86 yrs)
Youngest bear is United Buddy Bears initiated by Klaus and Eva Herlitz (0 yrs)
4 bears celebrating. Congratulations to:
   Bely Mishka (decade)
   Oski the Bear (decade)
   The Grupo Bimbo mascot (senior bear)
   United Buddy Bears initiated by Klaus and Eva Herlitz (first year)


You might be wondering if the order of the different components is important -- the answer *yes but not that much*. If you stick to the general reccommended order above (import stuff first, then define module level variables, classes and functions) you shouldn't run into trouble. If you want to break those rules, you still *probably* won't get into trouble, but you can do if you try hard enough. The advantage of this order is mainly that it makes logical sense to someone reading your code, which is the most important thing.

Now we'll look at each element in turn, starting with functions, then classes.

## Defining functions

This is a simple definition of a function, using the `def` statement:

In [2]:
def my_function(first_argument, another_argument):
    """
    my_function adds together the first and second arguments 
    and returns the result.
    
    Arguments:
        first_argument   : the first argument
        another_argument : this is the other argument
    
    The two arguments can be any two things that python can 
    add together using the '+' operator.
    
    Returns: 
        the sum of the two arguments
    """
    print('I got %s and %s'%(first_argument, another_argument))
    some_result = first_argument + another_argument
    
    return some_result

Note the following:

- the body of the function (from the line after the `:`) is indented.
- the first thing in the function is a multi-line string that explains the purpose of the arguments. This is called the docstring. Notice that it isn't assigned to anything -- Python automatically recognizes it as a docstring.
- the variable after `return` is returned by the function.

Of course you don't *have* to write a docstring, but Python makes it so easy that's it's standard practice to do so.

Good docstrings are concise and about 80 characters wide. Apart from that, it's up to you what if anything to write there, but it's a good idea to write something, even in your own code. Docstrings are picked up by Python's interactive help system. 

In [44]:
help(my_function)

Help on function my_function in module __main__:

my_function(first_argument, another_argument)
    my_function adds together the first and second arguments 
    and returns the result.
    
    Arguments:
        first_argument   : the first argument
        another_argument : this is the other argument
    
    The two arguments can be any two things that python can 
    add together using the '+' operator.
    
    Returns: 
        the sum of the two arguments



Now we've defined this function, we can call it:

In [46]:
my_function(1,2)

I got 1 and 2


3

In [47]:
my_function(1,True)

I got 1 and True


2

In [48]:
my_function('astro','physics')

I got astro and physics


'astrophysics'

Think about why `%s` is ued to format `x` in the string that gets printed here...

In [3]:
x = my_function(7,8) # assign the result to a variable

print('So x = %s'%(x))

I got 7 and 8
So x = 15


Answer: we don't know what kind of thing we're going to get back from `my_function()`, because the way we've written it that depends on the arguments -- numbers can be turned into strings automatically, but strings can't be turned into numbers automatically.

This is the simplest possible function:

In [4]:
def simplest_possible_function(): 
    pass

# If we call it, nothering happens.
simplest_possible_function()

`pass` is a keyword that does nothing. It's needed here because otherwise the method definition is incomplete. Functions don't have to `return` anything, but they do have to do something (even if that something is just `pass`).

This example shows how to define **default** values for function arguments, which means that you don't *have* to give those arguments when you call the function. If you want to miss any out, you have to give the explicit names for the others.

In [281]:
def another_function(alpha=2,beta=None):
    if beta is not None:
        return alpha*beta
    else:
        return str(alpha**2)
    
print(another_function()) # No explicit arguments
print(another_function(3))
print(another_function(3,2))
print(another_function(beta=2,alpha=3))
print(another_function(beta=4))

4
9
6
6
8


There is another way to define very simple one-line functions, called `lambda`

In [282]:
add_together = lambda x,y : x+y
add_together(1,2)

3

There is no deep difference between functions and `lambda`s. As it says [here](https://docs.python.org/3/faq/design.html#why-can-t-lambda-expressions-contain-statements):

> Unlike lambda forms in other languages, where they add functionality, Python lambdas are only a shorthand notation if you’re too lazy to define a function.

I tend to use `lambda` for simple one-line expressions. You can see an example of this in my `bears.py` code above.  The `milestones` variable is a list of tuples of the form `(name, logical_test)`, where `logical_test` is a `lambda` function. The `celebrating_milestone_in_life` function (in the `Bear` class definition) passes the same argument to each of these tests in turn. You can see it would take a lot more lines of code to `def` separate functions for each of these tests instead of using `lambda`.

Also notice how `lambda`s are treated like objects in that example. This is an important point **Functions are objects too**. They even have attributes and methods associated with them, although these are 'hidden' by giving them names starting with `__`.

In [283]:
print(my_function.__name__.upper())

MY_FUNCTION


This means we can pass functions as arguments.

In [285]:
def uppercase_name_of_function(f):
    print(f.__name__.upper())

uppercase_name_of_function(my_function) # Try this with a function
uppercase_name_of_function(add_together) # Try this with a lambda

MY_FUNCTION
<LAMBDA>


`lambda` expressions don't have names, and don't really need them -- but we can give then names if we want, since they're no different from other functions.

In [286]:
add_together.__name__ = 'Add Together!'
uppercase_name_of_function(add_together)

ADD TOGETHER!


### Implicit lists of arguments and sets of keyword arguments

You'll often see this in Python code. As well as explicit definitions of function arguments, you can also ask Python functions to 'grab' any other arguments that get passed to the function using `*args` and `**kwargs`.

In [2]:
def func(a,b,c,an_option=0,*args,**kwargs):
    """
    """
    print('Regular arguments {} {} {}'.format(a,b,c))
    
    print('Regular argument with default: an_option = {}'.format(an_option))
    
    if len(args) > 0:
        print('args is a tuple that holds the remaining non-keyword arguments: {}'.format(args))
    
    for k,v in kwargs.iteritems():
        print('kwargs is a dict that holds keyword arguments: {} = {}'.format(k,v))

`*args` captures any non-keyword arguments after the mandatory ones (in this case a,b,c) and those with defaults (`an_option` here). They are stored in a tuple called `args`.

`**kwargs` captures any **keyword arguments** after those you've defined, and stores them in a dictionary called `kwargs`.

In [None]:
func(1,2,3,5,6,7,8,alpha=1,beta=2)

This is useful if you want to 'pass on' arguments from one function to another that it calls. If you refer to `args` with `*args` inside the function, the arguments in `args` will be treated as if you'd typed them out one by one with a comma between them (if you use `args` without the `*`, as in the cells above, then it will be treated as a tuple). A similar logic applies to `**kwargs`.

## Scope

The value associated with a variable name at any specific point in the code depends on the block of code in which that name was last defined -- in jargon, the **scope** of the variable. The 'main' block of each module/script (commands with no indentation) defines a scope for that module, and all functions and classes define their own separate scope. 

Variables definined in an 'outer' scope (lower indentation level) are accessible in 'inner' scopes (higher indentation levels) unless the variable name is reassigned. By default assigning a variable name doesn't affect any assignments to the same name in the enclosing scope or any separate scopes at the same level.

In [287]:
y = 1
x = 10 # This 'x' ...
def a_function(z):
    x = 1000  # ...is in a different scope to this x 
    return z + x
print(x)

10


The most confusing cases occur inside class definitions, lambdas and nested functions, otherwise the rules that determine the scope of variables in Python are straightforward. There is a discussion of the general rules of Python scope in [this StackOverflow post](http://stackoverflow.com/questions/291978/short-description-of-scoping-rules). The main reason to be aware this is that there are some traps. 

Before you run this next cell, think about what you expect it to print for `y` and `x`:

In [309]:
x = 100
y = 0
for x in range(0,5):
    y = y + x
print(y)
print(x)

10
4


Loop variables do not live in a separate scope, unlike function arguments and unlike some other languages. Watch out for this.

If a variable is used that isn't defined in the current scope, Python will look for it in a higher-level enclosing scope:

In [289]:
x = 10
def a_function(z):
    return z+x
print(a_function(10))
print(x)

20
10


In [290]:
x = list()
def a_new_function(z):
    return x.append(z)

a_new_function(10)
print(x)
a_new_function(20)
print(x)
a_new_function(30)
print(x)

[10]
[10, 20]
[10, 20, 30]


You can explicitly force variables to be 'global' (bound in the enclosing scope) like this:

In [291]:
x = 10
def a_function(z):
    global x
    x = z
    return x

print(a_function(100))
print(x) # is x still = 10 or did it change?
    

100
100


What happens if you remove the `global`?

What happens if you change the function argument from `z` to `x` in the `def` statement in the example above?

**Tip:** It's quite rare to see `global` in real code for simple scientific applications.

### Nested functions

When small functions are defined inside other functions, they're usually written using `lambda`. But sometimes we want complicated functions to be defined inside other functions -- **nested** functions. There is nothing very complicated about that:

In [310]:
def outer_function(x):
    # define a nested function
    y = 10
    def inner_function(x,y):
        # The inner function inherits variables from the scope of 
        # the outer function, unless they're redefined in the body 
        # of the inner function
        z = x*y
        print("I'm the inner function, and I calculated z=%d given x,y = (%d,%d)"%(z,x,y))
        return z
    
    # call the nested function twice
    a = inner_function(2*x,y)
    b = inner_function(4*x,y)
    return a+b

print('Result of outer function with argument x=3: %d'%(outer_function(3)))

I'm the inner function, and I calculated z=60 given x,y = (6,10)
I'm the inner function, and I calculated z=120 given x,y = (12,10)
Result of outer function with argument x=3: 180


The inner function is only defined inside the scope of the outer function, so we can't call it from a higher scope.

In [311]:
inner_function(4)

NameError: name 'inner_function' is not defined

On the other hand, different outer functions can define nested functions with the same name without causing problems.

## Classes and Objects

Classes are an abstraction of the idea of basic data types like `int`, `float` and `str`. For example, we've already seen that strings are objects of the `string` type (type and class are the same thing here).

In [295]:
name_of_bear = 'Bravo the bear' # a string
print(type(name_of_bear))

<class 'str'>


We can also ask 'is the varaible a string'?

In [296]:
isinstance(name_of_bear,str)

True

Objects of the same class ('instances' of the class) behave in the same way (in Python jargon, they have the same methods) but have different values. We already saw some methods of the `str` class, accessed through a '`.`'. For example

In [298]:
name_of_bear.upper()

'BRAVO THE BEAR'

At this point you might not see much difference between idea of making the string upper-case by calling a method that belongs to the string rather than using a function operating on the strin, e.g. `upper(name_of_bear)`. You would be right. There isn't really any deep fundamental difference in what's happening. We'll carry on with useful stuff for now, but there are some more comments on the philosophical ideas below.

In the language of classes and objects, `str` is the class and `my_string` is an object of that class. When we said `my_string = 'carbon dioxide'` we made a new instance of the `str` class and set its value. We can make multiple objects of the same class with the same data.

In [299]:
name_of_first_bear  = 'Bravo' # a bear
name_of_second_bear = 'Bravo' # also a bear

These have equal values:

In [300]:
name_of_first_bear == name_of_second_bear

True

In general two objects of the same class don't live in the same location of memory even if their values are equal. Just to make my life in writing this tutorial harder, this general rule **isn't** true for a few basic data types, just for optimization.

In [301]:
id(name_of_first_bear) == id(name_of_second_bear)

True

In [302]:
name_of_first_bear is name_of_second_bear

True

**Python lets you make your own classes**.

The next cell defines the simplest possible class. This class does nothing except 'be a class with a name' in the most minimal way (like `simplest_possible_function` we saw above).

In [366]:
class Bear:
    pass

(make sure you run the cell above, because we've already defined a class named Bear at the start of the notebook and we want to use this simpler definition below)

**Tip:** By convention, class names start with a Captial letter. Multi-word class names are usualy written in so-called    CamelCase, whereas, by convention, functions are named_like_this, with underscores.

This definition lets us make Bear objects ('instances of class `Bear`') as follows:

In [360]:
first_bear = Bear()
second_bear = Bear()

Even though we know no details about either, we can tell these are not the same bear:

In [436]:
first_bear is second_bear

False

Really we'd like to associate some **attributes** with `Bear` -- meaningful properties that are specific to individual Bears, like their name and age. This is basic practical reason for user-defined classes -- basically structured collections of data. The next example shows how to do this.

In [374]:
first_bear = Bear()
first_bear.name = 'Bravo the Bear'
first_bear.age  = 10
second_bear = Bear()
second_bear.name = 'Hero the Bear'
second_bear.age  = 10

The `.name` and `.age` here are called **attributes** of the class `Bear`. You can add whatever attributes you like to a class just by assignments of the form `name_of_class.name_of_attribute = value`, and the attribute values can be anything.

Now we can, for example, write a function that does something with our Bear objects:

In [376]:
def print_bear_age(bear):
    print('%s is %d years old'%(bear.name,bear.age))
    
print_bear_age(first_bear)
print_bear_age(second_bear)

Bravo the Bear is 10 years old
Hero the Bear is 10 years old


We can also add functions as attributes:

In [437]:
first_bear.greet = lambda : print("Hi, I'm Bravo the Bear!")
first_bear.greet()

Hi, I'm Bravo the Bear!


Functions that are attributes are sometimes called `methods` of the class; this is just what (some) people do, nothing to do with Python.

It would be good if the greeting could use the `.name` of the specific bear, and/or be something more complicated than we can stick in a `lambda`. The next example shows how to add a function like this to the class definition.

In [391]:
class Bear:
    def greet(self):
        print("Hi, I'm %s"%(self.name))
        
first_bear = Bear()
first_bear.name = 'Bravo the Bear'
first_bear.greet()

second_bear = Bear()
second_bear.name = 'Hero the Bear'
second_bear.greet()

Hi, I'm Bravo the Bear
Hi, I'm Hero the Bear


You'll notice that a mysterious `self` appears as the first argument in the definition of `Bear.greet()`, but we don't give `greet()` any arguments when we call it. 

**The `self` appearing as the first argument of `greet` represents one of the most fundamental thing about classes**, namely, that you define a **class** but then **create objects that belong to that class**, and **every one of those objects is different**.

In this case we want to get the name associated with the specifc instance of `Bear` that we're asking `greet()`. Python will always silently add a variable holding the current instance of the class as the first argument when you call any of its methods. By convention, this is called `self` (you can call it something else, but you'll confuse other people reading your code). You can use `self` in the body of your function to get the attributes of the specific instance on which the function was called.

You need to include that explicitly in function definitions you add to your classes, even if the functions don't use `self`.  The following example shows what happens if you forget this.


In [392]:
class Bear:
    def greet():
        print("Hi")
    
first_bear = Bear()
first_bear.name = 'Bravo the Bear'
first_bear.greet()

TypeError: greet() takes 0 positional arguments but 1 was given

1 argument was passed to `greet`, even though `greet` didn't define any arguments and we didn't explicitly give it any. The missing argument is `self`.

Adding attributes by hand in the way we did above is obviously tedious, and also error prone. For example:

In [394]:
class Bear:
    def greet(self):
        print("Hi, I'm %s and I'm %d years old"%(self.name,self.age))

third_bear = Bear()
third_bear.name = 'Oh Bear'
third_bear.greet()

AttributeError: 'Bear' object has no attribute 'age'

Instead we'd like to enforce that Bears should *always* have a name and an age and assign those things in a shorter way. To do this we have to add a special function called `__init__` to the class defintion. 

In [395]:
class Bear:
    def __init__(self,name,age):
        self.name = name
        self.age  = age
        
    def greet(self):
        print("Hi, I'm %s and I'm %d years old"%(self.name,self.age))


Now we can create bears in this much more compact way.

In [382]:
first_bear  = Bear('Bravo the Bear', 10)
second_bear = Bear('Hero the Bear', 25)

`__init__` is a **special function** that is called when you write brackets directly after the name of class you've defined.

Even the basic version of the `Bear` class we defined first had an `__init__` method, it just didn't do anything. What we've just done is to **override** that default `__init__` method with our own version that does something useful. 

Inside `__init__` all we do is use `.attribute_name = ` assignments to put the values passed as arguments to attributes of `self`.

### Hidden attributes and methods

What's going on with the `__` around `__init__`?

If you call `help()` on a class, you get a list of its methods. You'll see that many of these start with `__`. These are 'hidden' methods -- hidden only in the sense that they don't show up when you use tab completion. Apart from that they are identical to 'normal' methods in most respects. **However**, many of these methods are treated in some special way by Python. That's why they're 'hidden', These are methods that you are not expected to call directly yourself (for example, we didn't explictly call `Bear.__init__()`, just `Bear()`) but you can do if you want.

Another example of this is how to define 'equality' between two instances of a class by defining the `__eq__` method.

In [396]:
Bear.__eq__??

In [397]:
class Bear:
    def __init__(self,name,age):
        self.name = name
        self.age  = age
    
    def __eq__(self,other):
        return (self.age == other.age) & (self.name == other.name)
           
    def greet(self):
        print("Hi, I'm %s and I'm %d years old"%(self.name,self.age))

In [400]:
first_bear  = Bear('Bravo the Bear', 10)
second_bear = Bear('Hero the Bear', 25)
third_bear  = Bear('Bravo the Bear', 10)

print(third_bear is first_bear)
print(third_bear == first_bear)

False
True


Here `__eq__` takes another Bear (`other`) and returns true if the complicated condition we specify is satisfied. Python knows to call this special method when we write the equality operaor `==`.

Even though the two `Bear` objects are different (their values can be changed indepdendently) we can still test if they are 'equal' by some criteria we make up for ourselves.

That's pretty much it -- we've now covered the most important things about defining classes. The next section is extended reading, you might want to skip it for now.

### Philosophy of classes and objects

Everything in Python, more or less, is an object of some sort. Python is sometimes described as an 'object-oriented' language. However, unlike some other languages, Python does not insist that you know or care much about the philosophy of the object-oriented approach to programing. It is very common to write Python programs without defining any classes yourself. In my experience most Python code ends up being a mix of 'object oriented' and straightforward (imperative) styles depending on how the user feels like solving a particular problem.

It would be nice if this basic tutorial could say exactly when you're supposed to do stuff with classes and when you're supposed to use functions. Unfortunately there is no such rule. There are a huge number of different strategies and Python is flexible enough to use just about all of them. For quick progams this doesn't matter at all -- just do whatever is clearest and most natural. For longer programs (longer code and/or taking longer to run) the design choices become more important. The best guide is to see how other people do it, by reading their code and some books, and by experimenting.

This is nicely illustrated by somthing that seems like a contradiction in the approach of something very basic in Python, which is getting the length of a string:

In [26]:
name_of_bear = 'Bravo the bear'
len(name_of_bear)

14

Why isn't this done by `name_of_bear.len()`? The  [answer](https://docs.python.org/2/faq/design.html#why-does-python-use-methods-for-some-functionality-e-g-list-index-but-functions-for-other-e-g-len-list) is rather subtle. That itself is a sign that this stuff is complicated and sometimes subjective -- complex coding is a creative and expressive excercise and there is often more than one correct and/or 'neat' way to solve a problem.

When you write code using classes, you'll face similar design choices -- should I use methods of the class or functions that operate on instances of the class? The important thing, at least in the beginning, is to at least be consistent within one bit of code, not to get too hung up on abstract philosophy or 'design patterns' for their own sake, and know tht you can always change your approach later. The guiding philosophy behind the design of Python itself is basically to support that idea of being flexible.

Some comments from a data science point of view: heavy use of classes tends to be hard to mesh with very intensive computations because objects are intrinsically less structured in memory and therefore slower to process and more wasteful of RAM than arrays of uniform data type. Code that loops over lists of millions of objects will never scale as well as code that uses vectorized operations on arrays. For example, contrast this:

In [305]:
class Bear():
    def __init__(self,name,age):
        self.name = name
        self.age  = age

some_bears = [Bear('Bravo',5),Bear('Hero',7),Bear('O-Bear',9),Bear('Kumamon',12)]
total_age  = 0.0
for b in some_bears:
    total_age += float(b.age)
mean_age = total_age/len(some_bears)
print('The mean age of the bears (%s) is %f'%(', '.join([b.name for b in some_bears]), mean_age))

The mean age of the bears (Bravo, Hero, O-Bear, Kumamon) is 8.250000


with this:

In [306]:
import numpy as np
names = np.array(['Bravo','Hero','O-Bear','Kumamon'])
ages  = np.array([5,7,9,12])

print('The mean age of the bears (%s) is %f'%(', '.join([_ for _ in names]), np.mean(ages)))

The mean age of the bears (Bravo, Hero, O-Bear, Kumamon) is 8.250000


If all you wanted to do was find the mean age of the bears, the first approach would be vastly less efficient if you had 1000 bears to deal with (for several reasons -- the `for` loop, the creaton of the list, the overhead in the `.` method calls, and the lack of vectorization). 

Also, there is often little to be gained from using classes rather than nested dictionaries if all you want to do is store attributes. It's purely a matter of style. The main advantage of classes is that you can associate *methods* with objects, not just data.

The bottom line is that the use of classes is one of several choices of appoach that you can use to structure your code, so that the **way** you solve a problem makes sense in the context of the problem itself. Where you're processing large amounts of highly structure data in predictable ways, collections of arrays (or structure arrays like `np.recarray`, Pandas' `DataFrame` or Astropy's `Table`) might be a better representation than collections of individual objects. All these approaches are object-oriented, but represent different ideas about what the relevant object are.

#### Duck Typing

You won't see the `isinstance` check very often in Python code, because the reccomended approach is not to care what class an object actually *is*, only how it behavies (i.e. what methods it has). This is called 'duck typing' (from this [idiom](https://en.wikipedia.org/wiki/Duck_test)).

The next two cells show an example. First we define two classes, `Bear` and `Duck`, and a function, `can_vote`:

In [410]:
class Bear:
    def __init__(self,name,age):
        self.name = name
        self.age  = age
        
class Duck:
    def __init__(self,name,age):
        self.name = name
        self.age  = age
        
def can_vote(citizen):
    return citizen.age > 18

Now we create a collection of `Bear`s and `Duck`s and check if they are old enough to vote.

In [414]:
citizens = [Bear('Bravo',10), Bear('Hero',20), Duck('A. Duck',10)]

for c in citizens:
    print('%s is old enough to vote: %s'%(c.name,can_vote(c)))

Bravo is old enough to vote: False
Hero is old enough to vote: True
A. Duck is old enough to vote: False


The `can_vote` function doesn't care if the citizen is a `Bear`, `Duck` or anything else. So long as it has an `age` attribute, it can be checked (only an age is necessary, the fact that these classes also define `name` attributes is irrelevant).

This idea is quite common in Python. If for some reason you wanted a version of `can_vote` that checks it is going to operate on something with an age, the way is obviously not this:

In [None]:
def can_vote(citizen):
    if isinstance(Duck) or isinstance(Bear):
        return citizen.age > 18
    else:
        raise ArgumentError('Need an age to check voting status!')

but this:

In [420]:
def can_vote(citizen):
    if hasattr(citizen,'age'):
        return citizen.age > 18
    else:
        raise ArgumentError('Need an age to check voting status!')

### Inheritance

Beyond this point you're on your own with investigating how to use classes and objects, but it's worth pointing out that classes can be 'sub-classed' -- you can make more specific class definitions that automatically inherit their attribute definitions from a 'parent' class, unless you explicitly override those definitions.

The following example shows how this works.

In [434]:
class Citizen:
    def __init__(self,name,age):
        self.name = name
        self.age  = age

    def can_vote(self):
        return self.age > 18
    
    def greeting(self):
        print('How do you do! My name is %s.'%(self.name))
        
class Bear(Citizen):
    def greeting(self):
        print("Quack! I'm a bear called %s."%(self.name))

class Duck(Citizen):
    def greeting(self):
        print("Grrr! I'm a duck! My name is %s."%(self.name))

In [435]:
citizens = [Bear('Bravo',10), Bear('Hero',20), Duck('A. Duck',10)]

for c in citizens:
    c.greeting()
    print("Can I vote? %s"%(c.can_vote()))

Quack! I'm a bear called Bravo.
Can I vote? False
Quack! I'm a bear called Hero.
Can I vote? True
Grrr! I'm a duck! My name is A. Duck.
Can I vote? False


`Bear` and `Duck` both automatically have the `can_vote` method because they inherit from `Citizen`. But they both override the `greeting` method to give distinct greetings.

There is more to learn here, but it's beyond the scope of this tutorial.

## End of notebook