# Table of Contents

1. General characteristics
2. Functions
    - Functions should have descriptive names
    - Functions should be short
    - Functions should do one thing
3. Classes
    - When to use classes
    - Encapsulation
    - When to use inheritance
    - `super` function
5. Exceptions
6. Documentation / comments
7. Modules and packages
8. Standard libraries (Intro): os, sys, datetime, shutil, glob, re, logging, urllib, subprocess, pickle, ... (move this to language overview)
9. [Unit] Testing
10. PEP8 - coding conventions
11. Version Control Tools (Git)
12. Virtual environments
13. ...
14. Miscellany:
    - comprehensions (list/dict/set)
    - string formatting
    - lambda functions
    - decorators
    - logical operators
    - variable scope
    - mutable vs immutable
    - `*args` and `**kwargs`
    - floating point arithmetic

# PEP 8 - Python Style Guide

A PEP is a Python Enhancement Proposal. PEP 8 (the eigth PEP) describes how to write Python
code in a common style that will be easily readable by other programmers. If this seems
unnecessary, consider that programmers spend much more time reading code than writing it.

You can read PEP 8 here: https://www.python.org/dev/peps/pep-0008/

## pycodestyle

Wouldn't it be nice if you didn't need to remember all of these silly rules
for how to write PEP 8-consistent code? What if there was a tool that would
tell you if your code matches PEP 8 conventions or no?

There is such a tool, called [`pycodestyle`](https://pypi.python.org/pypi/pycodestyle).

In [1]:
"""
This is some ugly code that does not conform to PEP 8.

Check me with pycodestyle:
    pycodestyle ../resources/pep8_example.py
"""
from string import *
import math, os, sys

def f(x):
    """This function has lines that are just too long. The maximum suggested line length is 80 characters."""
    return 4.27321*x**3 - 8.375134*x**2 + 7.451431*x + 2.214154 - math.log(3.42153*x) + (1 + math.exp(-6.231452*x**2))
def g(x,
     y):
    print("Bad splitting of arguments")

# examples of bad spacing
mydict  =  { 'ham' : 2,  'eggs'  : 7  }#this is badly spaced
mylist=[ 1 , 2 , 3 ]

myvar   = 7
myvar2  = myvar*myvar
myvar10 = myvar**10

# badly formatted math
a= myvar+7 *  18-myvar2  /  2

l = 1 # l looks like 1 in some fonts
I = l # also bad
O = 0 # O looks like 0 in some fonts

# bad variable names
kMyUglyVariableName  = 18
The_Meaning_Of_Life  = 42


In [2]:
!pycodestyle ../resources/pep8_example.py

../resources/pep8_example.py:8:12: E401 multiple imports on one line
../resources/pep8_example.py:10:1: E302 expected 2 blank lines, found 1
../resources/pep8_example.py:11:80: E501 line too long (109 > 79 characters)
../resources/pep8_example.py:12:80: E501 line too long (118 > 79 characters)
../resources/pep8_example.py:13:1: E302 expected 2 blank lines, found 0
../resources/pep8_example.py:14:6: E128 continuation line under-indented for visual indent
../resources/pep8_example.py:18:1: E305 expected 2 blank lines after class or function definition, found 1
../resources/pep8_example.py:18:7: E221 multiple spaces before operator
../resources/pep8_example.py:18:10: E222 multiple spaces after operator
../resources/pep8_example.py:18:13: E201 whitespace after '{'
../resources/pep8_example.py:18:19: E203 whitespace before ':'
../resources/pep8_example.py:18:33: E203 whitespace before ':'
../resources/pep8_example.py:18:38: E202 whitespace before '}'
../resources/pep8_example.p

# Naming conventions

Use descriptive names for your variables, functions, and classes. In Python,
the following conventions are usually observed:
* Variables, functions, and function arguments are lower-case, with underscores to separate words.
    ```python
    index = 0
    num_columns = 3
    length_m = 7.2   # you can add units to a variable name
    ```
* Constants can be written in all-caps.
    ```python
    CU_SPECIFIC_HEAT_CAPACITY = 376.812   # J/(kg K)
    ```
* Class names are written with the CapWords convention:
    ```python
    class MyClass:
    ```
    
Programmers coming from other programming languages (especially FORTRAN and C/C++) should avoid using
special encodings (e.g., [Hungarian notation](https://en.wikipedia.org/wiki/Hungarian_notation)) in their
variable names:
```python
# don't do this!
iLoopVar = 0      # i indicates integer
szName = 'Test'   # sz means 'string'
gGlobalVar = 7    # g indicates a global variable
```

# Comments

Comments are helpful when they clarify code. They should be used *sparingly*. Why?
* If a code is so difficult to read that it needs a comment to explain it, it should probably be rewritten.
* Someone may update the code and forget to update a comment, making it misinformation.
* Comments tend to clutter the code and make it difficult to read.

Consider this example:

In [3]:
# this function does foo to the bar!
def foo(bar):
    bar = not bar   # bar is active low, so we invert the logic
    if bar == True:   # bar can sometimes be true
        print("The bar is True!")   # success!
    else:   # sometimes bar is not true
        print("Argh!")   # I hate it when the bar is not true!    

Only one of these comments is helpful. This code is much easier to read when written properly:

In [4]:
def foo(bar):
    """
    This function does foo to the bar!
    
    Bar is active low, so we invert the logic.
    """
    bar = not bar    # logic inversion
    if bar:
        print("The bar is True!")
    else:
        print("Argh!")

# Doc strings

Doc-strings are a useful way to document what a function (or class) does.

In [5]:
def add_two_numbers(a, b):
    """This function returns the result of a + b."""
    return a + b

In a Jupyter notebook (like this one) or an iPython shell, you can access get information
about what a function does and what arguments it does by reading its doc-string:

In [16]:
add_two_numbers?

Doc-strings can be several lines long:

In [6]:
def analyze_data(data, old_format=False, make_plots=True):
    """
    This function analyzes our super-important data.
    
    If you want to use the old data format, set old_format to True.
    Set make_plots to false if you do not want to plot the data.
    """
    # analysis ...

If you are working on a large project, there may be project specific conventions
on how to write doc-strings. For example:

In [7]:
def google_style_doc_string(arg1, arg2):
    """Example Google-style doc-string.
    
    Put a brief description of what the function does here.
    In this case, the function does nothing.
    
    Args:
        arg1 (str): Your full name (name + surname)
        arg2 (int): Your favorite number

    Returns:
        bool: The return value. True for success, False otherwise.
    """

def scipy_style_doc_string(x, y):
    """This is a SciPy/NumPy-style doc-string.
    
    All of the functions in SciPy and NumPy use this format for their
    doc-strings.
    
    Parameters
    ----------
    x : float
        Description of parameter `x`.
    y :
        Description of parameter `y` (with type not specified)

    Returns
    -------
    err_code : int
        Non-zero value indicates error code, or zero on success.
    err_msg : str or None
        Human readable error message, or None on success.
    """

# Functions

## When to use functions

Python can be used as a scripting language (like Bash or Perl), and often times Python programs start out
as scripts. Here is an example of a script that renames image files (call it `image_renamer.py`):

In [8]:
#!/usr/bin/env python3

from glob import glob
import os

jpeg_file_list = glob('Image_*.jpg')
for old_file_name in jpeg_file_list:
    fname_parts = old_file_name.split('_')
    new_file_name = fname_parts[0] + '_0' + fname_parts[1]   # add leading zero: 01 -> 001
    os.rename(old_file_name, new_file_name)

The first line indicates to the shell that this is a Python 3 script (the `#!` combination is called a <a href=https://en.wikipedia.org/wiki/Shebang_(Unix)>shebang</a>).

You can run this script as an executable from the shell, just like any other program:
```bash
chmod a+x image_renamer.py
./image_renamer.py
```

Often times, this is all you need. However, it has several disadvantages:
1. The flow proceeds linearly, from top to bottom. If you want to repeat some idea (maybe renaming files is only part of the script, and you need to do this several times) you end up pasting the same code in several parts of your script, making the script longer and more error prone.
2. To reuse this code, you need to cut and paste it. This is fine for short bits of code; for longer codes, it becomes tedious. Also, if you fix a mistake in the code in one place, you need to remember to fix it (by hand!) anywhere else that code is used.
3. Scripts can become quite long (thousands of lines). With nothing to break up the program, it is like reading a technical book without chapters or headings.
4. The only way to test that this code works correctly is to run it in a directory with images.

Functions solve all four of these problems. Consider this code:

In [9]:
"""
image_renamer.py -- simple script to rename images.
"""
from glob import glob
import sys
import os


def rename_images(image_list, test=False):
    for old_file_name in image_list:
        fname_parts = old_file_name.split('_')
        new_file_name = fname_parts[0] + '_0' + fname_parts[1]   # add leading zero: 01 -> 001
        if test:
            print(new_file_name)
        else:
            os.rename(old_file_name, new_file_name)
        

if __name__ == '__main__':   # only run this part if the file is being executed as a script
    directory = './'
    if len(sys.argv) == 2:
        directory = sys.argv[1]
    jpeg_file_list = glob(directory + '/Image_*.jpg')
    rename_images(jpeg_file_list)

To be fair, the code is now longer, in in some ways more complicated. However, it has several advantages over the simple script. Recalling our previous list, note that:

1. The flow is now non-linear. Nothing actually happens until the `if __name__ == '__main__'` statement, which calls the `rename_images()` function.
2. To reuse this code, from some other script you can do `from image_renamer import rename_images`, and then use the `rename_images()` function as if you had copied it into your new piece of code.
3. The parts of this script are now easy to identify. If you wanted to rename images in multiple directories, it would not be difficult to do.
4. You can now test this code to see what it does:

In [10]:
rename_images(['Image_01.jpg', 'Image_02.jpg'], test=True)

Image_001.jpg
Image_002.jpg


## Functions should have descriptive names

Functions should have names that describe what they are for.

For example, what does this function do?

In [None]:
def myfunc(mylist):
    import re
    f = re.compile('([0-9]+)_.*')
    return [int(f.findall(mystr)[0]) for mystr in mylist]

myfunc(['000_Image.png', '123_Image.png', '054_Image.png'])

A better name could be:
```python
def extract_integer_index(file_list):
```
If you name things well, it makes comments unnecessary. Your code will speak for itself!

## Functions should be short

Here is an example of a function that is a bit too long.
It is not very long because it is an example, but in real physics code it is not uncommon
to find single functions that are hundreds of lines long!

In [12]:
def analyze():
    print("******************************")
    print("    Starting the Analysis!    ")
    print("******************************")

    # create fake data
    x = [4.1, 2.8, 6.7, 3.5, 7.9, 8.0, 2.1, 6.3, 6.6, 4.2, 1.5]
    y = [2.2, 5.3, 6.3, 2.4, 0.1, 0.67, 7.8, 9.1, 7.1, 4.9, 5.1]
    
    # make tuple and sort
    data = list(zip(x, y))
    data.sort()
    
    # calculate statistics
    y_sum = 0
    xy_sum = 0
    xxy_sum = 0
    for xx, yy in data:
        y_sum += xx
        xy_sum += xx*yy
        xxy_sum += xx*xx*yy
    xbar = xy_sum / y_sum
    x2bar = xxy_sum/y_sum
    std_dev = (x2bar - xbar**2)**0.5
    
    # print the results
    print("Mean:   ", xbar)
    print("Std Dev:", std_dev)

    print("Analysis successful!")

analyze()

******************************
    Starting the Analysis!    
******************************
Mean:    4.272253258845437
Std Dev: 2.2108824184193927
Analysis successful!


How can we improve this code? Our `analysis` function is really doing three things:
1. Creating fake data
2. Calculating some statistics
3. Printing the the status and results

Each of these things can be put in a separate function.

In [13]:
def generate_fake_data():
    x = [4.1, 2.8, 6.7, 3.5, 7.9, 8.0, 2.1, 6.3, 6.6, 4.2, 1.5]
    y = [2.2, 5.3, 6.3, 2.4, 0.1, 0.67, 7.8, 9.1, 7.1, 4.9, 5.1]
    data = list(zip(x, y))
    data.sort()
    return data

def calculate_mean_and_stddev(xy_data):
    y_sum = 0
    xy_sum = 0
    xxy_sum = 0
    for xx, yy in xy_data:
        y_sum += xx
        xy_sum += xx*yy
        xxy_sum += xx*xx*yy
    xbar = xy_sum / y_sum
    x2bar = xxy_sum/y_sum
    std_dev = (x2bar - xbar**2)**0.5
    return xbar, std_dev

def analyze():
    data = generate_fake_data()
    mean, std_dev = calculate_mean_and_stddev(data)
    print("Mean:   ", mean)
    print("Std Dev:", std_dev)

analyze()

Mean:    4.272253258845437
Std Dev: 2.2108824184193927


We note three important results of this code restructuring:
1. It is much easier to tell at a glance what `analyze()` does.
2. The comments (which we used to organize our code before) are no longer needed.
3. `generate_fake_data()` and `calculate_mean_and_stddev()` can now be reused elsewhere.

## Functions should do one thing

We now know we should break up big functions into smaller ones, but how do we decide
how to break them up, and how small should they be?

A useful principle for guiding the creation of functions is that functions should
do one thing. In the previous section, our large `analysis()` function was doing
several things, so we broke it up into smaller functions.

But wait! You may notice that `calculate_mean_and_stddev()` does two things! Should we
break it up into two functions, `calculate_mean()` and `calculate_stddev()`?
The answer depends on two things:
1. Will you ever want to calculate the mean and standard deviation separately?
2. Will splitting the function into two result in a large amount of duplicated code?

Another important consequence of of the "do one thing" principle is that it can help
you avoid cases where a function does more than what you would expect it to do.

For example, this function claims to just write data to a file; however, it also modifies the data!

In [14]:
def write_data_to_file(data, filename='data.dat'):
    with open(filename, 'w') as f:
        data *= 2
        f.write(data)

Try to imagine a much larger code where you have
a factor of two introduced, and you can't figure out where it came from (imagine searching a large code for the number 2).

# Classes

## When to use classes

The question "When should I use classes?" is more difficult to answer than "When should I use functions?" (answer: almost always). Classes are generally used in Object-Oriented Programming (OOP). A full discussion of OOP is beyond the scope of this course, so we will just give some general guidance here.

You should consider using classes when:
1. You have several functions manipulating the same set of data.
2. You find that you are passing the same arguments to several functions.
3. You want parts of your code to be responsible for maintaining their own internal state.
4. You want your code to have an easy-to-use interface that doesn't require understanding exactly what the code does.

Consider this code:

In [15]:
import random

def create_data_set(length, lower_bound=0, upper_bound=10, seed_value=None):
    random.seed(seed_value)
    return [random.uniform(lower_bound, upper_bound) for i in range(length)]
    
def shuffle(data):
    random.shuffle(data)
    return data

def mean(data):
    return sum(data)/len(data)
    
def display(data):
    print(data)
    
def analyze(data):
    print(mean(data))    
    display(data)
    new_data = shuffle(data)
    display(new_data)
    
data = create_data_set(5)
analyze(data)

5.3133667102310085
[1.7726318094009863, 1.4196366428391571, 9.755901417796744, 3.879247191800456, 9.739416489317696]
[3.879247191800456, 9.739416489317696, 1.4196366428391571, 1.7726318094009863, 9.755901417796744]


The first function creates a data set (initialization), while the other functions manipulate this data set.
In this case, it may make sense to create a class:

In [16]:
import random

class DataSet:
    def __init__(self, length, lower_bound=0, upper_bound=10, seed_value=None):
        random.seed(seed_value)
        self.data = [random.uniform(lower_bound, upper_bound) for i in range(length)]
    
    def shuffle(self):
        random.shuffle(self.data)

    def mean(self):
        return sum(self.data)/len(self.data)

    def display(self):
        print(self.data)
        
    def analyze(self):
        print(mean(self.data))    
        self.display()
        self.shuffle()
        self.display()
        
a = DataSet(5)
a.analyze()

5.043469129010969
[7.792623266150359, 2.5203796891727936, 3.1271510969213425, 7.505545791356189, 4.27164580145416]
[2.5203796891727936, 7.505545791356189, 4.27164580145416, 7.792623266150359, 3.1271510969213425]


Compared to the function version, the class version:
1. Has slightly cleaner code (especially the `analyze` function).
2. Has a nicer interface (compare the last two lines).
3. Is less flexible (what if you want to analyze some other data set?).

In the simple example above, it is not clear whether the class version or the function version is better. Let's consider something a bit more complex... 

In [10]:
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.patches as patches
import matplotlib.animation as animation
from math import sin, cos, atan2
import random


class PacMan:
    RADIUS = 0.1           # size of pacman
    ANGLE_DELTA = 5        # degrees; controls how fast pacman's mouth opens/closes
    MAX_MOUTH_ANGLE = 30   # degrees; maximum mouth opening half-angle
    MAX_SPEED = 0.02       # controls how fast pacman moves
    X_BOUNDS = 1           # controls x-axis display range
    Y_BOUNDS = 0.5         # controls y-axis display range
    
    def __init__(self, waypoints=None):
        self._init_figure()
        self._init_pacman()
        if waypoints:
            self.waypoints = waypoints
            self.go_home()
        else:
            self.waypoints = []
        self._show_animation()
        
    def _init_figure(self):
        self.fig = plt.figure()
        self.ax = self.fig.add_subplot(111, aspect='equal')
        self.ax.set_xlim(-self.X_BOUNDS, self.X_BOUNDS)
        self.ax.set_ylim(-self.Y_BOUNDS, self.Y_BOUNDS)
        
    def _init_pacman(self):
        self.x = 0
        self.y = 0
        self.angle = 0
        self.angle_set = False
        self.mouth_closing = True
        self.mouth_open_angle = 30
        pacman_patch = patches.Wedge((self.x, self.y), self.RADIUS, 
                                     self.mouth_open_angle, -self.mouth_open_angle,
                                     color="yellow", ec="none")
        self.pacman = self.ax.add_patch(pacman_patch)
    
    def _animate_mouth(self):
        if self.mouth_closing:
            self.mouth_open_angle -= self.ANGLE_DELTA
        else:
            self.mouth_open_angle += self.ANGLE_DELTA
        if self.mouth_open_angle <= 0:
            self.mouth_open_angle = 1
            self.mouth_closing = False
        if self.mouth_open_angle >= self.MAX_MOUTH_ANGLE:
            self.mouth_closing = True
        self.pacman.set_theta1(self.mouth_open_angle)
        self.pacman.set_theta2(-self.mouth_open_angle)

    def _calculate_angle_to_point(self, x, y):
        dx = x - self.x
        dy = y - self.y
        angle_rad = atan2(dy, dx)
        return angle_rad
        
    def _animate_motion(self):
        if not self.waypoints:
            return
        way_x, way_y = self.waypoints[0]
        if (self.x == way_x) and (self.y == way_y):
            self.waypoints.pop(0)
            self.angle_set = False
            return
        if not self.angle_set:
            self.angle = self._calculate_angle_to_point(way_x, way_y)
            self.angle_set = True
        dx = self.MAX_SPEED*cos(self.angle)
        dy = self.MAX_SPEED*sin(self.angle)
        if abs(way_x - (self.x + dx)) >= self.MAX_SPEED:
            self.x += dx
        else:
            self.x = way_x
        if abs(way_y - (self.y + dy)) >= self.MAX_SPEED:
            self.y += dy
        else:
            self.y = way_y
        tx = mpl.transforms.Affine2D().rotate(self.angle) + \
             mpl.transforms.Affine2D().translate(self.x, self.y) + self.ax.transData
        self.pacman.set_transform(tx)
    
    def _next_frame(self, i):
        self._animate_mouth()
        self._animate_motion()
        return[self.pacman,]
    
    def _show_animation(self):
        ani = animation.FuncAnimation(self.fig, self._next_frame, interval=30)
        if mpl.get_backend() == u'MacOSX':
            plt.show(block=False)
        else:
            plt.show()
            
    def add_waypoint(self, x, y):
        """Add a point where pacman should go. This function is non-blocking."""
        self.waypoints.append((x, y))
    
    @staticmethod
    def generate_random_path(num_points):
        """Generate a random list of points pacman should visit."""
        xlim = (-PacMan.X_BOUNDS, PacMan.X_BOUNDS)
        ylim = (-PacMan.Y_BOUNDS, PacMan.Y_BOUNDS)
        waypoints = []
        for i in range(num_points):
            waypoints.append((random.uniform(*xlim), random.uniform(*ylim)))
        return waypoints
    
    def add_random_path(num_points):
        """Add a list of random points to pacman's waypoint list."""
        random_points = self.generate_random_path(num_points)
        self.waypoints.extend(random_points)
        
    def go_home(self):
        """Send pacman back to the origin (0, 0)."""
        self.add_waypoint(-self.MAX_SPEED, 0)
        self.add_waypoint(0, 0)


pac = PacMan(PacMan.generate_random_path(num_points=10))

Pacman is responsible for maintaining his own state. This sort of program lends itself very well to object-oriented programming.

## Private methods

Functions inside classes are called methods (variables inside classes are called fields).

By convention, methods that start with an underscore (e.g., `_init_pacman()`) are "private", although not in the way that Java or C++ methods are private. These methods can still be accessed by users, but the underscore indicates that users should not generally mess with them (they are not part of the public API).

Methods that start with two underscores can also be considered private, but the two underscores has a particular use in Python called name mangling and is only needed to prevent conflicts during inheritance (very advanced Python!). Unless you know what you are doing, stick to single underscores.

Methods that start and end with two underscores (e.g., `__init__()`) are generally reserved for Python system calls. Don't name your methods this way.

Going back to the Pacman example, we not that there are only three methods needed to make pacman move: `add_waypoint()`, `random_path()`, and `go_home`. Each of these can be easily used without any knowledge of the complicated class internals. It is good programming to provide a simple, easy-to-use interface to classes that is difficult to use incorrectly.

## Encapsulation

Encapsulation is an object-oriented programming concept that it is a good idea to prevent users from meddling with the internals of your class except via an approved external interface.

In traditional OO languages like Java and C++, encapsulation is strongly encouraged, while Python is less strict.

Here is an example of how Python classes are typically written:

In [66]:
class Rect:
    def __init__(self, width, height):
        self.width = width
        self.height = height
        
    def area(self):
        return self.width*self.height
    
    def perimeter(self):
        return 2*self.width + 2*self.height

This has a minimum of extra code ("boilerplate" in programmer-speak) and is generally the right way to make a Python class. However, note that we can do the following:

In [67]:
a = Rect(3, -1)    # fine
print('Area of a:', a.area())

b = Rect(2, 's')   # also fine?
print('Area of b:', b.area())

Area of a: -3
Area of b: ss


In some cases, it is desirable to ensure that the 

In [68]:
from numbers import Number

class EncapsulatedRect:
    def __init__(self, width, height):
        self.set_width(width)
        self.set_height(height)
        
    def area(self):
        return self._width*self._height
    
    def perimeter(self):
        return 2*self._width + 2*self._height
    
    def get_width(self):
        return self._width
    
    def get_height(self):
        return self._height
    
    def set_width(self, width):
        if isinstance(width, Number) and width > 0:
            self._width = width
        else:
            raise ValueError('set_width: value should be a non-negative number.')
        
    def set_height(self, height):
        if isinstance(height, Number) and height > 0:
            self._height = height
        else:
            raise ValueError('set_height: value should be a non-negative number.')

Here, `_width` and `_height` are internal variables, which can only be changed by approved setters which make sure that the values are good.

Unlike in C++ and Java, however, even in our `EncapsulatedRect` we can still modify `_width` and `_height` directly:

In [78]:
d = EncapsulatedRect(4, 5)
d._width = 2
print(d.area())

10


In general, the more "Pythonic" approach is actually `Rect` rather than `EncapsulatedRect`. In particular, Python encourages directly accessing fields rather than using getters and setters, which add boilerplate and clutter the code. Python expects users to be smart enough to use classes the correctly.

## When to use inheritance

## `super` Function

# Exceptions

Exceptions are a mechanism for handling errors. Traditionally, errors were handled with return codes, like this:

In [4]:
def example_only_does_not_work():
    fin = open('does_not_exist.txt', 'r')
    if not fin:
        return -1
    # ... do stuff with file
    fin.close()
    return 0

This kind of code is problematic for two reasons:
1. The return codes (and therefore errors) can be ignored/forgotten.
2. The return code must be either checked by the function that calls it, or explicitly passed to higher level functions.
3. Return codes are generally integers, so they must be looked up in a table. They also can't provide any specific details.

To illustrate point #2, consider the following code:

In [5]:
def foo():
    return -1   # error code!

def bar():
    foo()
    return 0    # return success?

def baz():
    bar()
    return 0    # no errors, right?

Exceptions offer an elegant solution to all three of the problems listed above.

## Raising exceptions

Exceptions must derive from the `BaseException` class (user-defined exceptions should be derived from `Exception`). It is common to use one of the [built-in exception subclasses](https://docs.python.org/3/library/exceptions.html). Common examples include:

1. `ImportError` - raised when trying to import an unknown module.
2. `IndexError` - raised when trying to access an invalid element in an array.
3. `KeyError` - raised when trying to use an invalid key with a dictionary.
4. `NameError` - raised when trying to use a variable that hasn't been defined.
5. `TypeError` - raised when trying to use an object of the wrong type.
6. `ValueError` - raised when an argument has the correct type but a bad value.
7. `IOError` - raised when there is a problem with reading/writing a file.
8. `RuntimeError` - catch-all class for errors while code is running.

In general, you can use these built-in exceptions when there is one that suits the problem. For instance, you might raise a `ValueError` or `TypeError` when checking arguments to a function:

In [28]:
def foobar(value):
    if not isinstance(value, int):
        raise TypeError("foobar requires and int!")
    if value < 0:
        raise ValueError("foobar argument 'value' should be > 0; you passed: %i" % value)
    
# uncomment to test:
# foobar(2.7)
# foobar(-7)

You do not need to add a string argument when raising an exception. This works fine:

In [25]:
raise Exception

Exception: 

However, this is not very helpful. In general, you should add some descriptive text to your exceptions to explain to the user what exactly went wrong.

To make your exceptions even more useful, or when there isn't a built-in exception that meets your needs, you can roll your own by sub-classing `Exception` or one of the other built-in exceptions:

In [36]:
class MyCustomException(Exception):
    pass

# using a doc-string instead of 'pass' is more helpful
class CorruptFile(IOError):
    """Raise this exception when attempting to read a file that is corrupt."""
    
# uncomment to test...
# raise MyCustomException("Test")
# raise CorruptFile("Oh no, the file is corrupted!")

## Handling exceptions

## Understanding stack traces

Consider this line from some earlier code:

In [6]:
fin = open('does_not_exist.txt', 'r')

IOError: [Errno 2] No such file or directory: 'does_not_exist.txt'

The file does not exist, so it raises an error -- in this case an `IOError` (IO = input/output). Here, we have not "handled" this exception, so Python performs a "stack trace" or "traceback" (basically unrolling your code to show you where the error occcured).

These tracebacks are an excellent way to figure out what went wrong in your program. However, they can appear to be a little cryptic to the uninitiated, so we will look at how to understand them.

Consider this example, where you are trying to fit a quadratic function to two data points:

In [39]:
from scipy.optimize import curve_fit

def f(x, a, b, c):
    return a*x**2 + b*x + c

x = [0, 1]
y = [2, 3]
curve_fit(f, x, y)

TypeError: Improper input: N=3 must not exceed M=2

The traceback indicates that the error is a `TypeError`, and then starts in the current file (listed in green), where the offending call is made. It tells you that the error originates on line 8 (in this case, of the notebook cell).

*Aside*: you can view line numbers in an notebook by selecting a cell, pressing escape, and then pressing the (lowercase) 'L' key. Press 'L' again to turn the line numbers off.

The traceback then goes to the file where the offending function resides (in this case, in `minpack.pyc` in the scipy library). The exception originated during a call to `leastsq()`.

Finally, the traceback shows you where the actual `TypeError` exception was raised (also in the `minpack.pyc` file, just at a different line). The `TypeError` tells you that N=3 must not exceed M=2.

This doesn't seem very helpful at first. What actually went wrong? What are N and M? In fact, the problem is one of basic linear algebra: we are trying to fit three unknowns (from our quadratic) with only two equations (one from each (x,y) data point). We need more data! Try adding another junk data point, and you will see that the error goes away.

To summarize, we note the following useful lessons:
1. Tracebacks appear cryptic, and can be quite long, but once you understand them they are very helpful!
2. Exceptions allow you to propagate an error from where it actually occurs to where the function is used, much higher up, without any additional code (unlike return codes).
3. Make sure that your error messages are helpful! Probably the message about "Improper input: N=3 must not exceed M=2" seemed very clear to the original authors, but maybe is less clear to users. How could you make the error message easier to understand?

# Unit Testing