In [1]:
import numpy as np

# Exceptions

An exception is an event, which occurs during the execution of a program, that disrupts the normal flow of the program's instructions.

You've already seen some exceptions in the **Debugging** lesson.

Many programs want to know about exceptions when they occur. For example, if the input to a program is a file path. If the user inputs an invalid or non-existent path, the program generates an exception. It may be desired to provide a response to the user in this case. This is a way of indicating that there is an error in the inputs provided. In general, this is the preferred style for dealing with invalid inputs or states inside a python function rather than having an error return.

## Catching Exceptions

Python provides a way to detect when an exception occurs. This is done by the use of a block of code surrounded by a "try" and "except" statement.

In [2]:
def divide(numerator, denominator):
    result = numerator / denominator
    print("result = %f" % result)

In [3]:
divide(1.0, 0)

ZeroDivisionError: float division by zero

In [4]:
def divide1(numerator, denominator):
    try:
        result = numerator/denominator
        print("result = %f" % result)
    except:
        print("You can't divide by 0!")

In [5]:
divide1(1.0, 0)

You can't divide by 0!


In [6]:
divide1(1.0, 'a')

You can't divide by 0!


While this does catch the exception, the error message doesn't really match the condition because conceptually `'a' != 0`

In [7]:
divide1("x", 2)

You can't divide by 0!


Moreover, this is also not correct, because `2 != 0` but the message says it does because the real error is in the first argument!

In [8]:
def divide2(numerator, denominator):
    try:
        result = numerator / denominator
        print("result = %f" % result)
    except (ZeroDivisionError, TypeError) as err:
        print("Got an exception: %s" % err)

In [9]:
divide2(1, "X")

Got an exception: unsupported operand type(s) for /: 'int' and 'str'


In [10]:
divide2("x", 2)

Got an exception: unsupported operand type(s) for /: 'str' and 'int'


In [11]:
divide2(1, 0)

Got an exception: division by zero


#### What do you do when you get an exception?

First, you can feel relieved that you caught a problematic element of your software! Yes, relieved. Silent fails are much worse. (Again, another plug for testing.)

You should then figure out what to do in the problematic case - your software can do something different in this case, like print a message or stop executing.

## Generating Exceptions

#### Why *generate* exceptions? (Don't I have enough unintentional errors?)

In [12]:
import pandas as pd
def validateDF(df):
    if not "hours" in df.columns:
        raise ValueError("DataFrame should have a column named 'hours'.")
    else:
        pass

In [13]:
df = pd.DataFrame({'hours': range(10) })
validateDF(df)

In [14]:
df = pd.DataFrame({'years': range(10) })
validateDF(df)

ValueError: DataFrame should have a column named 'hours'.

What's the difference between using an assertion and raising an exception?

* Assertions are almost always turned off in production code. They are are helpful for debugging but will NOT provide helpful information in production.
* Raising an exception will also work in production code, which helps calling code to detect and automatically recover from errors.

The most common types of errors (and the ones we'll use in this class) are `TypeError` - used when the caller has provided arguments of the incorrect types - and `ValueError` - used when the caller has provided argument values that don't make sense.

## Class exercise

1. Convert the entropy exercise from the `Debugging` section to raise specific `ValueError`s with helpful messages rather than using assertions. You should ALSO validate that the input is a list of numbers - but make sure to use `TypeError` for incorrect types!

In [3]:
import numbers

def entropy(p):
    if any([not isinstance(p_i, numbers.Number) for p_i in p]):
        raise TypeError("At least one input is not a number")
    if any([(p_i < 0.0) or (p_i > 1.0) for p_i in p]):
        raise ValueError("At least one input is out of range [0...1]")
    elif not np.isclose(1, np.sum(p), atol=1e-08):
        raise ValueError("The list of input probabilities does not sum to 1")
    else:
        pass
    
    items = []
    for p_i in p:
        if p_i > 0:
            interm = p_i * np.log2(p_i)
            items.append(interm)
    return np.abs(np.sum(items))

In [21]:
entropy([1.2, 0.5])

ValueError: At least one input is out of range [0...1]

In [4]:
entropy([0.2,"x"])

TypeError: At least one input is not a number

2. For the CSV parser exercise from last week, modify it to raise an appropriate exception for bad inputs. There are multiple forms of bad input - what could be a bad input?

  * The file may not exist
  * The rows may not have the same number of commas/not be the same length

In [23]:
def read_csv(file_name):
    # one option:
    # if not os.path.isfile(file_name):
    
    try:
        with open(file_name) as file:
            rows = []
            header = file.readline().rstrip('\n').split(',')
            for line in file:
                cols = line.rstrip('\n').split(',')
                row = {}
                if len(cols) != len(header):
                    raise ValueError(f"Header has {len(header)} columns but at least one row has {len(cols)} columns")
                for i in range(0, len(header)):
                    row[header[i]] = cols[i]
                rows.append(row)
            return rows
    except:
        # It is better to try/except rather than checking that the file
        # exists first, because the file could be moved between when you
        # check and when you open the file.
        # This is called a "race condition"
        raise ValueError(f"File {file_name} does not exist")