SUMMARY
Evaluate the readability, complexity and performance of a function.
Write docstrings for functions following the NumPy/SciPy format.
Write comments within a function to improve readability.
Write and design functions with default arguments.
Explain the importance of scoping and environments in Python as they relate to functions.
Formulate test cases to prove a function design specification.
Use assert statements to formulate a test case to prove a function design specification.
Use test-driven development principles to define a function that accepts parameters, returns values and passes all tests.
Handle errors gracefully via exception handling.

In the last module, we were introduced to the DRY principle and how creating functions helps comply with it.

Let’s do a little bit of a recap.

DRY stands for Don’t Repeat Yourself.

We can avoid writing repetitive code by creating a function that takes in arguments, performs some operations, and returns the results.

The example in Module 5 converted code that creates a list of squared elements from an existing list of numbers into a function.

In [2]:
#example loop
numbers = [2, 3, 5]
squared = list()
for number in numbers: 
    squared.append(number ** 2)
squared


[4, 9, 25]

In [3]:
#ex1 loop as function
def squares_a_list(numerical_list):#function name and agruement
    new_squared_list = list() #initialize output list
    for number in numerical_list:
        new_squared_list.append(number ** 2)
    return new_squared_list
squares_a_list(numbers) #function call

[4, 9, 25]

This function gave us the ability to do the same operation for multiple lists without having to rewrite any code and just calling the function.

In [5]:
larger_numbers = [5, 44, 55, 23, 11]
promoted_numbers = [73, 84, 95]
executive_numbers = [100, 121, 250, 103, 183, 222, 214]



In [6]:
squares_a_list(larger_numbers)

[25, 1936, 3025, 529, 121]

In [7]:
squares_a_list(promoted_numbers)

[5329, 7056, 9025]

In [8]:
squares_a_list(executive_numbers)

[10000, 14641, 62500, 10609, 33489, 49284, 45796]

It’s important to know what exactly is going on inside and outside of a function.

In our function squares_a_list() we saw that we created a variable named new_squared_list.

We can print this variable and watch all the elements be appended to it as we loop through the input list.

But what happens if we try and print this variable outside of the function?

Yikes! Where did new_squared_list go?

It doesn’t seem to exist! That’s not entirely true.

In Python, new_squared_list is something we call a local variable.

Local variables are any objects that have been created within a function and only exist inside the function where they are made.

Code within a function is described as a local environment.

Since we called new_squared_list outside of the function’s body, Python fails to recognize it.

In [10]:
def squares_a_list(numerical_list):
    new_squared_list = list()
    for number in numerical_list:
        new_squared_list.append(number ** 2)
        print(new_squared_list)
    return new_squared_list

In [11]:
squares_a_list(numbers)

[4]
[4, 9]
[4, 9, 25]


[4, 9, 25]

In [12]:
new_squared_list

NameError: name 'new_squared_list' is not defined

Let’s talk more about function arguments.

Arguments play a paramount role when it comes to adhering to the DRY principle as well as adding flexibility to your code.

Let’s bring back the function we made named squares_a_list().

The reason we made this function in the first place was to DRY out our code and avoid repeating the same for loop for any additional list we wished to operate on.

What happens now if we no longer wanted to square a number but calculate a specified exponential of each element, perhaps (n^3), or (n^4)?

Would we need a new function?

We could make a similar new function for cubing the numbers.

But this feels repetitive.


A better solution that adheres to the DRY principle is to tweak our original function but add an additional argument.

Take a look at exponent_a_list() which now takes 2 arguments; the original numerical_list, and now a new argument named exponent.

This gives us a choice of the exponent. We could use the same function now for any exponent we want instead of making a new function for each.

This makes sense to do if we foresee needing this versatility, else the additional argument isn’t necessary.

In [14]:
def exponent_a_list(numerical_list, exponent):
    new_exponent_list = list()

    for number in numerical_list:
        new_exponent_list.append(number ** exponent)

    return new_exponent_list

In [13]:
numbers = [2, 3, 5]
exponent_a_list(numbers, 3) #the 2nd arguement allows us to specify an exponent value

NameError: name 'exponent_a_list' is not defined

In [None]:
exponent_a_list(numbers, 5)

Functions can have any number of arguments and any number of optional arguments, but we must be careful with the order of the arguments.

When we define our arguments in a function, all arguments with default values (aka optional arguments) need to be placed after required arguments.

If any required arguments follow any arguments with default values, an error will occur.

Let’s take our original function exponent_a_list() and re-order it so the optional exponent argument is defined first.

We will see Python throw an error.

In [15]:
def exponent_a_list(exponent=2, numerical_list):
    new_exponent_list = list()

    for number in numerical_list:
        new_exponent_list.append(number ** exponent)

    return new_exponent_list

SyntaxError: non-default argument follows default argument (<ipython-input-15-edf6d51d5c6a>, line 1)

Up to this point, we have been calling functions with multiple arguments in a single way.

When we call our function, we have been ordering the arguments in the order the function defined them in.

So, in exponent_a_list(), the argument numerical_list is defined first, followed by the argument exponent.

Naturally, we have been calling our function with the arguments in this order as well.

In [16]:
def exponent_a_list(numerical_list, exponent=2):
    new_exponent_list = list()

    for number in numerical_list:
        new_exponent_list.append(number ** exponent)

    return new_exponent_list
exponent_a_list([2, 3, 5], 5)

[32, 243, 3125]

We showed earlier that we could also call the function by specifying exponent=5.

Another way of calling this would be to also specify any of the argument names that do not have default values, in this case, numerical_list.

What happens if we switch up the order of the arguments and put exponent=5 followed by numerical_list=numbers?

It still works!

In [None]:
exponent_a_list(numerical_list=[2, 3, 5], exponent=5)

In [None]:
exponent_a_list(exponent=5, numerical_list=[2, 3, 5])

What about if we switch up the ordering of the arguments without specifying any of the argument names.

Our function doesn’t recognize the input arguments, and an error occurs because the two arguments are being swapped - it thinks 5 is the list, and [2, 3, 5] is the exponent.

It’s important to take care when ordering and calling a function.

The rule of thumb to remember is if you are going to call a function where the arguments are in a different order from how they were defined, you need to assign the argument name to the value when you call the function.

In [None]:
exponent_a_list(5,  [2, 3, 5]) #this wont work because it thinkg that 5 is the list

Functions can get very complicated, so it is not always obvious what they do just from looking at the name, arguments, or code.

Therefore, people like to explain what the function does.

The standard format for doing this is called a docstring.

A docstring is a literal string that comes directly after the function def and documents the function’s purpose and usage.

Writing a docstring documents what your code does so that collaborators (and you in 6 months’ time!) are not struggling to decipher and reuse your code.

In the last section we had our function squares_a_list().

Although our function name is quite descriptive, it could mean various things.

How do we know what data type it takes in and returns?

Having documentation for it can be useful in answering these questions.

Here is the code for a function from the pandas package called truncate().

You can view the complete code here. https://github.com/pandas-dev/pandas/blob/v1.1.0/pandas/core/generic.py#L9258

I think we can all agree that it would take a bit of time to figure out what the function is doing, the expected input variable types, and what the function is returning.

Luckily pandas provides detailed documentation to explain the function’s code.

Ah. This documentation gives us a much clearer idea of what the function is doing and how to use it.

We can see what it requires as input arguments and what it returns.

It also explains the expectations of the function.

Reading this instead of the code saved us some time and definitely potential confusion.

There are several styles of docstrings; this one and the one we’ll be using is called the NumPy style.

All docstrings, not just the Numpy formatted ones, are contained within 3 sets of quotations""". We discussed in module 4 that this was one of the ways to implement string values.

Adding this additional string to our function has no effect on our code, and the sole purpose of the docstring is for human consumption.

The NumPy format includes 4 main sections:
- A brief description of the function
- Explaining the input Parameters
- What the function Returns
- Examplesgit a

In [17]:
string1 = """This is a string"""
type(string1)

str

Writing documentation for squares_a_list() using the NumPy style takes the following format.

We can identify the brief description of the function at the top, the parameters that it takes in, and what object type they should be, as well as what to expect as an output.

Here we can even see examples of how to run it and what is returned.

In [18]:
def squares_a_list(numerical_list):
    """
    Squared every element in a list.

    Parameters
    ----------
    numerical_list : list
        The list from which to calculate squared values 

    Returns
    -------
    list
        A new list containing the squared value of each of the elements from the input list 

    Examples
    --------
    >>> squares_a_list([1, 2, 3, 4])
    [1, 4, 9, 16]
    """
    new_squared_list = list()
    for number in numerical_list:
        new_squared_list.append(number ** 2)
    return new_squared_list

Using exponent_a_list(), a function from the previous section as an example, we include an optional note in the parameter definition and an explanation of the default value in the parameter description.


In [None]:
def exponent_a_list(numerical_list, exponent=2):
    """
    Creates a new list containing specified exponential values of the input list. 

    Parameters
    ----------
    numerical_list : list
        The list from which to calculate exponential values from
    exponent: int or float, optional
        The exponent value (the default is 2, which implies the square).

    Returns
    -------
    new_exponent_list : list
        A new list containing the exponential value specified of each 
        of the elements from the input list 

    Examples
    --------
    >>> exponent_a_list([1, 2, 3, 4])
    [1, 4, 9, 16]
    """

    new_exponent_list = list()

    for number in numerical_list:
        new_exponent_list.append(number ** exponent)

Ah, remember how we talked about side effects back at the beginning of this module?

Although we recommend avoiding side effects in your functions, there may be occasions where they’re unavoidable or required.

In these cases, we must make it clear in the documentation so that the user of the function knows that their objects are going to be modified. (As an analogy: If someone wants you to babysit their cat, you would probably tell them first if you were going to paint it red while you had it!)

So how we include side effects in our docstrings?

It’s best to include your function side effects in the first sentence of the docstring.

In [None]:
def function_name(param1, param2):
    """The first line is a short description of the function. 

    If your function includes side effects, explain it clearly here.


    Parameters
    ----------
    param1 : datatype
        A description of param1.

    .
    .
    .
    Etc.
    """

Ok great! Now that we’ve written and explained our functions with a standardized format, we can read it in our file easily, but what if our function is located in a different file?

How can we learn what it does when reading our code?

We learned in the first assignment that we can read more about built-in functions using the question mark before the function name.

This returns the docstring of the function.
?function_name

In [20]:
?len # For example, if we want the docstring for the function len():

Object `len # For example, if we want the docstring for the function len():` not found.


We all know that mistakes are a regular part of life.

In coding, every line of code is at risk for potential errors, so naturally, we want a way of defending our functions against potential issues.

Defensive programming is code written in such a way that, if errors do occur, they are handled in a graceful, fast and informative manner.

If something goes wrong, we don’t want the code to crash on its own terms - we want it to fail gracefully, in a way we pre-determined.

To help soften the landing, we write code that throws our own Exceptions.

Exceptions are used in Defensive programming to disrupt the normal flow of instructions. When Python encounters code that it cannot execute, it will throw an exception.

Before we dive into exceptions, let’s revisit our function exponent_a_list().

It works somewhat well, but what happens if we try to use it with an input string instead of a list.

We get an error that explains a little bit of what’s causing the issue but not directly.

This error, called a TypeError here, is itself a Python exception. But the error message, which is a default Python message, is not super clear.

This is where raising our own Exception steps in to help.

In [21]:
def exponent_a_list(numerical_list, exponent=2):
    new_exponent_list = list()
    for number in numerical_list:
        new_exponent_list.append(number ** exponent)
    return new_exponent_list
numerical_string = "123"
exponent_a_list(numerical_string)

TypeError: unsupported operand type(s) for ** or pow(): 'str' and 'int'

In [22]:
def exponent_a_list(numerical_list, exponent=2):

    if type(numerical_list) is not list:
        raise Exception("You are not using a list for the numerical_list input.")

    new_exponent_list = list()
    for number in numerical_list:
        new_exponent_list.append(number ** exponent)
    return new_exponent_list

Exceptions disrupt the regular execution of our code. When we raise an Exception, we are forcing our own error with our own message.

If we wanted to raise an exception to solve the problem on the last slide, we could do the following.

In [23]:
numerical_string = "123"
exponent_a_list(numerical_string)

Exception: You are not using a list for the numerical_list input.

Let’s take a closer look.

The first line of code is an if statement - what needs to occur to trigger this new code we’ve written.

This code translates to “If numerical_list is not of the type list…”.

The second line does the complaining.

We tell it to raise an Exception (throw an error) with this message.

Now we get an error message that is straightforward on why our code is failing.

Exception: You are not using a list for the numerical_list input.

I hope we can agree that this message is easier to decipher than the original.

The new message made the cause of the error much clearer to the user, making our function more usable.

Let’s now learn more about the possible different types of Exceptions.

The exception type called Exception is a generic, catch-all exception type.

There are also many other exception types; for example, you may have encountered ValueError or a TypeError at some point.

Exception, which is used in our previous examples, may not be the best option for the raises we made.

Let’s take a look now at the exception we wrote that checks if the input value for numerical_list was the correct type.

Since this is a type error, a better-raised exception over Exception would be TypeError.

Let’s make our correction here and change Exception in our function to TypeError.

In [24]:
if type(numerical_list) is not list:
   raise Exception("You are not using a list for the numerical_list input.")

NameError: name 'numerical_list' is not defined

In [25]:
def exponent_a_list(numerical_list, exponent=2):

    if type(numerical_list) is not list:
        raise TypeError("You are not using a list for the numerical_list input.")

    new_exponent_list = list()
    for number in numerical_list:
        new_exponent_list.append(number ** exponent)
    return new_exponent_list

In [26]:
numerical_string = "123"
exponent_a_list(numerical_string)

TypeError: You are not using a list for the numerical_list input.

Now that we can write exceptions, it’s important to document them.

It’s a good idea to include details of any included exceptions in our function’s docstring.

Under the NumPy docstring format, we explain our raised exception after the “Returns” section.

We first specify the exception type and then an explanation of what causes the exception to be raised.

For example, we’ve added a “Raises” section in our exponent_a_list docstring here.

In [None]:
def exponent_a_list(numerical_list, exponent=2):
    """
    Creates a new list containing specified exponential values of the input list. 

    Parameters
    ----------
    numerical_list : list
        The list from which to calculate exponential values from
    exponent : int or float, optional
        The exponent value (the default is 2, which implies the square).

    Returns
    -------
    new_exponent_list : list
        A new list containing the exponential value specified of each 
        of the elements from the input list 

    Raises
    ------
    TypeError
        If the input argument numerical_list is not of type list

    Examples
    --------
    >>> exponent_a_list([1, 2, 3, 4])
    [1, 4, 9, 16]
    """

In the last section, we learned about raising exceptions, which, in a lot of cases, helps the function user identify if they are using it correctly.

But there are still some questions remaining:

How can we be so sure that the code we wrote is doing what we want it to?

Does our code work 100% of the time?

These questions can be answered by using something called units tests.

We’ll be implementing unit tests in Python using assert statements." assert statements are just one way of implementing unit tests.

Let’s first discuss the syntax of an assert statement and then how they can be applied to the bigger concept, which is unit tests.

assert statements can be used as sanity checks for our program.

We implement them as a “debugging” tactic to make sure our code runs as we expect it to.

When Python reaches an assert statement, it evaluates the condition to a Boolean value.

If the statement is True, Python will continue to run. However, if the Boolean is False, the code stops running, and an error message is printed.

Let’s take a look at one.

Here we have the keyword assert that checks if 1==2. Since this is False, an error is thrown, and the message beside the condition "1 is not equal to 2." is outputted.

In [27]:
assert 1 == 2 , "1 is not equal to 2."

AssertionError: 1 is not equal to 2.

https://prog-learn.mds.ubc.ca/module6/assert2.png

Let’s take a look at an example where the Boolean is True.

Here, since the assert statement results in a True values, Python continues to run, and the next line of code is executed.

When an assert is thrown due to a Boolean evaluating to False, the next line of code does not get an opportunity to be executed.

In [28]:
assert 1 == 1 , "1 is not equal to 1."
print('Will this line execute?')

Will this line execute?


In [29]:
assert 1 == 2 , "1 is not equal to 2."
print('Will this line execute?')

AssertionError: 1 is not equal to 2.

In [None]:
Not all assert statements need to have a message.

We can re-write the statement from before without one.

This time you’ll notice that the error doesn’t contain the particular message beside AssertionError like we had before.

In [30]:
assert 1 == 2 

AssertionError: 

Where do assert statements come in handy?

Up to this point, we have been creating functions, and only after we have written them, we’ve tested if they work.

Some programmers use a different approach: writing tests before the actual function. This is called Test-Driven Development.

This may seem a little counter-intuitive, but we’re creating the expectations of our function before the actual function code.

Often we have an idea of what our function should be able to do and what output is expected.

If we write our tests before the function, it helps understand exactly what code we need to write and it avoids encountering large time-consuming bugs down the line.

Once we have a serious of tests for the function, we can put them into assert statements as an easy way of checking that all the tests pass.

https://prog-learn.mds.ubc.ca/module6/why.png

So, what kind of tests do we want?

We want to keep these tests simple - things that we know are true or could be easily calculated by hand.

For example, let’s look at our exponent_a_list() function.

Easy cases for this function would be lists containing numbers that we can easily square or cube.

For example, we expect the square output of [1, 2, 4, 7] to be [1, 4, 16, 49].

The test for this would look like the one shown here.

It is recommended to write multiple tests.

Let’s write another test for a differently sized list as well as different values for both input arguments numerical_list and exponent.

Let’s make another test for exponent = 3. Again, we use numbers that we know the cube of.

We can also test that the type of the returned object is correct.

In [31]:
def exponent_a_list(numerical_list, exponent=2):
    new_exponent_list = list()

    for number in numerical_list:
        new_exponent_list.append(number ** exponent)

    return new_exponent_list

In [32]:
assert exponent_a_list([1, 2, 4, 7], 2) == [1, 4, 16, 49], "incorrect output for exponent = 2"

In [33]:
assert exponent_a_list([1, 2, 3], 3) == [1, 8, 27], "incorrect output for exponent = 3"

In [34]:
assert type(exponent_a_list([1,2,4], 2)) == list, "output type not a list"

Just because all our tests pass, this does not mean our program is necessarily correct.

It’s common that our tests can pass, but our code contains errors.

Let’s take a look at the function bad_function(). It’s very similar to exponent_a_list except that it separately computes the first entry before doing the rest in the loop.

This function looks like it would work perfectly fine, but what happens if we get an input argument for numerical_list that cannot be sliced?

Let’s write some unit tests using assert statements and see what happens.

Here, it looks like our tests pass at first.

But what happens if we try our function with an empty list?

We get an unexpected error! How do we avoid this?

Write a lot of tests and don’t be overconfident, even after writing a lot of tests!

Checking an empty list in our bad_function() function is an example of checking a corner case.

A corner case is an input that is reasonable but a bit unusual and may trip up our code.

In [35]:
def bad_function(numerical_list, exponent=2):
    new_exponent_list = [numerical_list[0] ** exponent] # seed list with first element
    for number in numerical_list[1:]:
        new_exponent_list.append(number ** exponent)
    return new_exponent_list

In [37]:
assert bad_function([1, 2, 4, 7], 2) == [1, 4, 16, 49], "incorrect output for exponent = 2"
assert bad_function([2, 1, 3], 3) == [8, 1, 27], "incorrect output for exponent = 3"

In [38]:
bad_function([], 2)

IndexError: list index out of range

Often, we will be making functions that work on data.

For example, perhaps we want to write a function called column_stats that returns some summary statistics in the form of a dictionary.

The function here is something we might have envisioned. (Note that if we’re using test-driven development, this function will just be an idea, not completed code.)

In these situations, we need to invent some sort of data so that we can easily calculate the max, min, range, and mean and write unit tests to check that our function does the correct operations.

The data can be made from scratch using functions such as pd.DataFrame() or pd.DataFrame.from_dict() which we learned about in module 4.

You can also upload a very small slice of an existing dataframe.

In [39]:
def column_stats(df, column):
   stats_dict = {'max': df[column].max(),
                 'min': df[column].min(),
                 'mean': round(df[column].mean()),
                 'range': df[column].max() - df[column].min()}
   return stats_dict

The values we chose in our columns should be simple enough to easily calculate the expected output of our function.

Just like how we made unit tests using calculations we know to be true, we do the same using a simple dataset we call helper data.

The dataframe must have a small dimension to keep the calculations simple.

The tests we write for the function column_stats() are now easy to calculate since the values we are using are few and simple.

We wrote tests that check different columns in our forest dataframe.

In [41]:
import pandas as pd
data = {'name': ['Cherry', 'Oak', 'Willow', 'Fir', 'Oak'], 
        'height': [15, 20, 10, 5, 10], 
        'diameter': [2, 5, 3, 10, 5], 
        'age': [0, 0, 0, 0, 0], 
        'flowering': [True, False, True, False, False]}

forest = pd.DataFrame.from_dict(data)
forest

Unnamed: 0,name,height,diameter,age,flowering
0,Cherry,15,2,0,True
1,Oak,20,5,0,False
2,Willow,10,3,0,True
3,Fir,5,10,0,False
4,Oak,10,5,0,False


In [42]:
assert column_stats(forest, 'height') == {'max': 20, 'min': 5, 'mean': 12.0, 'range': 15}
assert column_stats(forest, 'diameter') == {'max': 10, 'min': 2, 'mean': 5.0, 'range': 8}
assert column_stats(forest, 'age') == {'max': 0, 'min': 0, 'mean': 0, 'range': 0}


We use a systematic approach to design our function using a general set of steps to follow when writing programs.

The approach we recommend includes 5 steps:

1. Write the function stub: a function that does nothing but accepts all input parameters and returns the correct datatype.

This means we are writing the skeleton of a function.

We include the line that defines the function with the input arguments and the return statement returning the object with the desired data type.

Using our exponent_a_list() function as an example, we include the function’s first line and the return statement.

In [43]:
def exponent_a_list(numerical_list, exponent=2):
    return list()

2. Write tests to satisfy the design specifications.

This is where our assert statements come in.

We write tests that we want our function to pass.

In our exponent_a_list() example, we expect that our function will take in a list and an optional argument named exponent and then returns a list with the exponential value of each element of the input list.

Here we can see our code fails since we have no function code yet!

In [44]:
def exponent_a_list(numerical_list, exponent=2):
    return list()

assert type(exponent_a_list([1,2,4], 2)) == list, "output type not a list"
assert exponent_a_list([1, 2, 4, 7], 2) == [1, 4, 16, 49], "incorrect output for exponent = 2"
assert exponent_a_list([1, 2, 3], 3) == [1, 8, 27], "incorrect output for exponent = 3"

AssertionError: incorrect output for exponent = 2

3. Outline the program with pseudo-code.

Pseudo-code is an informal but high-level description of the code and operations that we wish to implement.

In this step, we are essentially writing the steps that we anticipate needing to complete our function as comments within the function.

So for our function pseudo-code includes:

In [45]:
def exponent_a_list(numerical_list, exponent=2):

    # create a new empty list
    # loop through all the elements in numerical_list
    # for each element calculate element ** exponent
    # append it to the new list 

    return list()

assert type(exponent_a_list([1,2,4], 2)) == list, "output type not a list"
assert exponent_a_list([1, 2, 4, 7], 2) == [1, 4, 16, 49], "incorrect output for exponent = 2"
assert exponent_a_list([1, 2, 3], 3) == [1, 8, 27], "incorrect output for exponent = 3"

AssertionError: incorrect output for exponent = 2

4. Write code and test frequently.

Here is where we fill in our function.

As you work on the code, more and more tests of the tests that you wrote will pass until finally, all your assert statements no longer produce any error messages.

In [46]:
def exponent_a_list(numerical_list, exponent=2):
    new_exponent_list = list()

    for number in numerical_list:
        new_exponent_list.append(number ** exponent)

    return new_exponent_list

assert type(exponent_a_list([1,2,4], 2)) == list, "output type not a list"
assert exponent_a_list([1, 2, 4, 7], 2) == [1, 4, 16, 49], "incorrect output for exponent = 2"
assert exponent_a_list([1, 2, 3], 3) == [1, 8, 27], "incorrect output for exponent = 3"


5. Write documentation.

Finally, we finish writing our function with a docstring.

In [None]:
def exponent_a_list(numerical_list, exponent=2):
    """ Creates a new list containing specified exponential values of the input list. 

    Parameters
    ----------
    numerical_list : list
        The list from which to calculate exponential values from
    exponent : int or float, optional
        The exponent value (the default is 2, which implies the square).

    Returns
    -------
    new_exponent_list : list
        A new list containing the exponential value specified of each of
        the elements from the input list 

    Examples
    --------
    >>> exponent_a_list([1, 2, 3, 4])
    [1, 4, 9, 16]
    """
    new_exponent_list = list()
    for number in numerical_list:
        new_exponent_list.append(number ** exponent)
    return new_exponent_list


This has been quite a full module!

We’ve learned how to make functions, how to handle errors gracefully, how to test our functions, and write the necessary documentation to keep our code comprehensible.

These skills will all contribute to writing effective code.

One thing we have not discussed yet is the actual code within a function.

What makes a function useful?

Is a function more useful when it performs more operations?

Does adding parameters make your functions more or less useful?

These are all questions we need to think about when writing functions.

We are going to list some habits to adopt when writing and designing your functions.

Hard coding is the process of embedding values directly into your code without saving them in variables

When we hardcode values into our code, it decreases flexibility.

Being inflexible can cause you to end up writing more functions and/or violating the DRY principle.

This, in turn, can decrease the readability and makes code problematic to maintain. In short, hard coding is a breeding ground for bugs.

Remember our function squares_a_list()?

In this function, we “hard-coded” in 2 when we calculated number ** 2.

There are a couple of approaches to improving the situation. One is to assign 2 to a variable in the function before doing this calculation. That way, if you need to reuse that number, later on, you can just refer to the variable; and if you need to change the 2 to a 3, you only need to change it in one place. Another benefit is that you’re giving it a variable name, which acts as a little bit of documentation.

The other approach is to turn the value into an argument like we did when we made exponent_a_list().

This new function now gives us more flexibility with our code.

If we now encounter a situation where we need to calculate each element to a different exponent like 4 or 0, we can do so without writing new code and potentially making a new error in doing so.

We reduce our long term workload.

This version is more maintainable code, but it doesn’t give the function caller any flexibility. What you decide depends on how you expect your function to be used.

In [47]:
def squares_a_list(numerical_list):
    new_squared_list = list()

    for number in numerical_list:
        new_squared_list.append(number ** 2)

    return new_squared_list

In [None]:
def exponent_a_list(numerical_list, exponent):
    new_exponent_list = list()

    for number in numerical_list:
        new_exponent_list.append(number ** exponent)

    return new_exponent_list

Although it may seem useful when a function acts as a one-stop-shop that does everything you want in a single function, this also limits your ability to reuse code that lies within it.

Ideally, functions should serve a single purpose.

For example, let’s say we have a function that reads in a csv, finds the mean of each group in a column, and plots a specified variable.

Although this may seem nice, we may want to break this up into multiple smaller functions. For example, what if we don’t want the plot? Perhaps the plot is just something we wanted a single time, and now we are committed to it for each time we use the function.

Another problem with this function is that the means are only printed and not returned. Thus, we have no way of accessing the statistics to use further in our code (we would have to repeat ourselves and rewrite 

In [49]:
import altair as alt

def load_filter_and_average(file, grouping_column, ploting_column):
    df = pd.read_csv(file)
    source = df.groupby(grouping_column).mean().reset_index()
    chart = alt.Chart(source, width = 500, height = 300).mark_bar().encode(
                      x=alt.X(grouping_column),
                      y=alt.Y(ploting_column)
            )
    return chart
bad_idea = load_filter_and_average('cereal.csv', 'mfr', 'rating')
bad_idea

In this case, you want to simplify the function.

Having a function that only calculates the mean values of the groups in the specified column is much more usable.

A preferred function would look something like this, where the input is a dataframe we have already read in, and the output is the dataframe of mean values for all the columns.

In [50]:
def grouped_means(df, grouping_column):
    grouped_mean = df.groupby(grouping_column).mean().reset_index()
    return grouped_mean

cereal_mfr = grouped_means(cereal, 'mfr')
cereal_mfr

NameError: name 'cereal' is not defined

In [None]:
If we wanted, we could then make a second function that creates the desired plot part of the previous function.

In [51]:
def plot_mean(df, grouping_column, ploting_column):
    chart = alt.Chart(df, width = 500, height = 300).mark_bar().encode(
                      x=alt.X(grouping_column),
                      y=alt.Y(ploting_column)
            )
    return chart

plot1 = plot_mean(cereal_mfr, 'mfr', 'rating')
plot1

NameError: name 'cereal_mfr' is not defined

3. Return a single object

For the most part, we have only lightly touched on the fact that functions can return multiple objects, and it’s with good reason.

Although functions are capable of returning multiple objects, that doesn’t mean that it’s the best option.

For instance, what if we converted our function load_filter_and_average() so that it returns a dataframe and a plot.

In [53]:
def load_filter_and_average(file, grouping_column, ploting_column):
    df = pd.read_csv(file)
    source = df.groupby(grouping_column).mean().reset_index()
    chart = alt.Chart(source, width = 500, height = 300).mark_bar().encode(
                      x=alt.X(grouping_column),
                      y=alt.Y(ploting_column)
            )
    return chart, source

another_bad_idea = load_filter_and_average('cereal.csv', 'mfr', 'rating')
another_bad_idea

(alt.Chart(...),
   mfr    calories   protein       fat      sodium     fiber      carbo  \
 0   A  100.000000  4.000000  1.000000    0.000000  0.000000  16.000000   
 1   G  111.363636  2.318182  1.363636  200.454545  1.272727  14.727273   
 2   K  108.695652  2.652174  0.608696  174.782609  2.739130  15.130435   
 3   N   86.666667  2.833333  0.166667   37.500000  4.000000  16.000000   
 4   P  108.888889  2.444444  0.888889  146.111111  2.777778  13.222222   
 5   Q   95.000000  2.625000  1.750000   92.500000  1.337500  10.250000   
 6   R  115.000000  2.500000  1.250000  198.125000  1.875000  17.625000   
 
      sugars      potass   vitamins     shelf    weight      cups     rating  
 0  3.000000   95.000000  25.000000  2.000000  1.000000  1.000000  54.850917  
 1  7.954545   85.227273  35.227273  2.136364  1.049091  0.875000  34.485852  
 2  7.565217  103.043478  34.782609  2.347826  1.077826  0.796087  44.038462  
 3  1.833333  121.000000   8.333333  1.666667  0.971667  0.778333

Since our function returns a tuple, we can obtain the plot by selecting the first element of the output.

This can be quite confusing. We would recommend separating the code into two functions and can have each one return a single object.

It’s best to think of programming functions in the same way as mathematical functions where most times, mathematical functions return a single value.

In [54]:
another_bad_idea[0]

It’s generally bad form to include objects in a function that were created outside of it.

Take our grouped_means() function.

What if instead of including df as an input argument, we just used cereal that we loaded earlier?

The number one problem with doing this is now our function only works on the cereal data - it’s not usable on other data.

In [None]:
def grouped_means(df, grouping_column):
    grouped_mean = df.groupby(grouping_column).mean().reset_index()
    return grouped_mean

In [None]:
cereal = pd.read_csv('cereal.csv')

def bad_grouped_means(grouping_column):
    grouped_mean = cereal.groupby(grouping_column).mean().reset_index()
    return grouped_mean

Ok, let’s say we still use it, then what happens?

Although it does work, global variables have the opportunity to be altered in the global environment.

When we change the global variable outside the function and try to use the function again, it will refer to the new global variable and potentially no longer work.

Of course, like in any case, these habits are suggestions and not strict rules.

There will be times where adhering to one of these may not be possible or will hinder your code instead of enhancing it.

The rule of thumb is to ask yourself how helpful is your function if you or someone else wishes to reuse it.