# 3.2 Functions

Functions are the primary and most important method of code organization and reuse in Python. As a rule of thumb, if you anticipate needing to repeat the same or very similar code more than once, it may be worth writing a reusable function. Functions can also help make your code more readable by giving a name to a group of Python statements.

Functions are declared with the def keyword. A function contains a block of code with an optional use of the return keyword:

In [2]:
def my_function(x, y):
    return x + y

When a line with return is reached, the value or expression after return is sent to the context where the function was called, for example:

In [5]:
my_function(1, 2)

3

In [7]:
result = my_function(1, 2)

In [9]:
result

3

There is no issue with having multiple return statements. If Python reaches the end of a function without encountering a return statement, None is returned automatically. For example:

In [12]:
def function_without_return(x):
    print(x)

In [14]:
result = function_without_return("hello!")

hello!


In [16]:
print(result)

None


Each function can have positional arguments and keyword arguments. Keyword arguments are most commonly used to specify default values or optional arguments. Here we will define a function with an optional z argument with the default value 1.5:

In [19]:
def my_function2(x, y, z=1.5):
    if z > 1:
        return z * (x + y)
    else:
        return z / (x + y)

While keyword arguments are optional, all positional arguments must be specified when calling a function.

You can pass values to the z argument with or without the keyword provided, though using the keyword is encouraged:

In [23]:
my_function2(5, 6, z=0.7)

0.06363636363636363

In [25]:
my_function2(3.14, 7, 3.5)

35.49

In [27]:
my_function2(10, 20)

45.0

The main restriction on function arguments is that the keyword arguments must follow the positional arguments (if any). You can specify keyword arguments in any order. This frees you from having to remember the order in which the function arguments were specified. You need to remember only what their names are.

## Namespaces, Scope, and Local Functions

Functions can access variables created inside the function as well as those outside the function in higher (or even global) scopes. An alternative and more descriptive name describing a variable scope in Python is a namespace. Any variables that are assigned within a function by default are assigned to the local namespace. The local namespace is created when the function is called and is immediately populated by the function’s arguments. After the function is finished, the local namespace is destroyed (with some exceptions that are outside the purview of this chapter). Consider the following function:

In [32]:
def func():
    a = []
    for i in range(5):
        a.append(i)

When func() is called, the empty list a is created, five elements are appended, and then a is destroyed when the function exits. Suppose instead we had declared a as follows:

In [35]:
a = []

In [37]:
def func():
    for i in range(5):
        a.append(i)

Each call to func will modify list a:

In [40]:
func()

In [42]:
a

[0, 1, 2, 3, 4]

In [44]:
func()

In [46]:
a

[0, 1, 2, 3, 4, 0, 1, 2, 3, 4]

Assigning variables outside of the function’s scope is possible, but those variables must be declared explicitly using either the global or nonlocal keywords:

In [49]:
a = None

In [51]:
def bind_a_variable():
    global a
    a = []

In [53]:
bind_a_variable()

In [55]:
print(a)

[]


nonlocal allows a function to modify variables defined in a higher-level scope that is not global. Since its use is somewhat esoteric (I never use it in this book), I refer you to the Python documentation to learn more about it.

Caution
I generally discourage use of the global keyword. Typically, global variables are used to store some kind of state in a system. If you find yourself using a lot of them, it may indicate a need for object-oriented programming (using classes).

## Returning Multiple Values

When I first programmed in Python after having programmed in Java and C++, one of my favorite features was the ability to return multiple values from a function with simple syntax. Here’s an example:

In [61]:
def f():
    a = 5
    b = 6
    c = 7
    return a, b, c

In [63]:
a, b, c = f()

In data analysis and other scientific applications, you may find yourself doing this often. What’s happening here is that the function is actually just returning one object, a tuple, which is then being unpacked into the result variables. In the preceding example, we could have done this instead:

In [66]:
return_value = f()

In this case, return_value would be a 3-tuple with the three returned variables. A potentially attractive alternative to returning multiple values like before might be to return a dictionary instead:

In [69]:
def f():
    a = 5
    b = 6
    c = 7
    return {"a" : a, "b" : b, "c" : c}

This alternative technique can be useful depending on what you are trying to do.

## Functions Are Objects

Since Python functions are objects, many constructs can be easily expressed that are difficult to do in other languages. Suppose we were doing some data cleaning and needed to apply a bunch of transformations to the following list of strings:

In [74]:
states = ["   Alabama ", "Georgia!", "Georgia", "georgia", "FlOrIda",
    "south   carolina##", "West virginia?"]

Anyone who has ever worked with user-submitted survey data has seen messy results like these. Lots of things need to happen to make this list of strings uniform and ready for analysis: stripping whitespace, removing punctuation symbols, and standardizing proper capitalization. One way to do this is to use built-in string methods along with the re standard library module for regular expressions:

In [77]:
import re

def clean_strings(strings):
    result = []
    for value in strings:
        value = value.strip()
        value = re.sub("[!#?]", "", value)
        value = value.title()
        result.append(value)
    return result

The result looks like this:

In [80]:
clean_strings(states)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South   Carolina',
 'West Virginia']

An alternative approach that you may find useful is to make a list of the operations you want to apply to a particular set of strings:

In [83]:
def remove_punctuation(value):
    return re.sub("[!#?]", "", value)

clean_ops = [str.strip, remove_punctuation, str.title]

def clean_strings(strings, ops):
    result = []
    for value in strings:
        for func in ops:
            value = func(value)
        result.append(value)
    return result

Then we have the following:

In [86]:
clean_strings(states, clean_ops)

['Alabama',
 'Georgia',
 'Georgia',
 'Georgia',
 'Florida',
 'South   Carolina',
 'West Virginia']

A more functional pattern like this enables you to easily modify how the strings are transformed at a very high level. The clean_strings function is also now more reusable and generic.

You can use functions as arguments to other functions like the built-in map function, which applies a function to a sequence of some kind:

In [90]:
for x in map(remove_punctuation, states):
    print(x)

   Alabama 
Georgia
Georgia
georgia
FlOrIda
south   carolina
West virginia


map can be used as an alternative to list comprehensions without any filter.

## Anonymous (Lambda) Functions

Python has support for so-called anonymous or lambda functions, which are a way of writing functions consisting of a single statement, the result of which is the return value. They are defined with the lambda keyword, which has no meaning other than “we are declaring an anonymous function”:

In [95]:
def short_function(x):
    return x * 2

In [97]:
equiv_anon = lambda x: x * 2

I usually refer to these as lambda functions in the rest of the book. They are especially convenient in data analysis because, as you’ll see, there are many cases where data transformation functions will take functions as arguments. It’s often less typing (and clearer) to pass a lambda function as opposed to writing a full-out function declaration or even assigning the lambda function to a local variable. Consider this example:

In [100]:
def apply_to_list(some_list, f):
    return [f(x) for x in some_list]

In [102]:
ints = [4, 0, 1, 5, 6]

In [104]:
apply_to_list(ints, lambda x: x * 2)

[8, 0, 2, 10, 12]

You could also have written [x * 2 for x in ints], but here we were able to succinctly pass a custom operator to the apply_to_list function.

As another example, suppose you wanted to sort a collection of strings by the number of distinct letters in each string:

In [108]:
strings = ["foo", "card", "bar", "aaaa", "abab"]

Here we could pass a lambda function to the list’s sort method:

In [111]:
strings.sort(key=lambda x: len(set(x)))

In [113]:
strings

['aaaa', 'foo', 'abab', 'bar', 'card']

## Generators

Many objects in Python support iteration, such as over objects in a list or lines in a file. This is accomplished by means of the iterator protocol, a generic way to make objects iterable. For example, iterating over a dictionary yields the dictionary keys:

In [117]:
some_dict = {"a": 1, "b": 2, "c": 3}

In [119]:
for key in some_dict:
    print(key)

a
b
c


When you write for key in some_dict, the Python interpreter first attempts to create an iterator out of some_dict:

In [122]:
dict_iterator = iter(some_dict)

In [124]:
dict_iterator

<dict_keyiterator at 0x138d5f060>

An iterator is any object that will yield objects to the Python interpreter when used in a context like a for loop. Most methods expecting a list or list-like object will also accept any iterable object. This includes built-in methods such as min, max, and sum, and type constructors like list and tuple:

In [127]:
list(dict_iterator)

['a', 'b', 'c']

A generator is a convenient way, similar to writing a normal function, to construct a new iterable object. Whereas normal functions execute and return a single result at a time, generators can return a sequence of multiple values by pausing and resuming execution each time the generator is used. To create a generator, use the yield keyword instead of return in a function:

In [130]:
def squares(n=10):
    print(f"Generating squares from 1 to {n ** 2}")
    for i in range(1, n + 1):
        yield i ** 2

When you actually call the generator, no code is immediately executed:

In [133]:
gen = squares()

In [135]:
gen

<generator object squares at 0x138ceba00>

It is not until you request elements from the generator that it begins executing its code:

In [138]:
for x in gen:
    print(x, end=" ")

Generating squares from 1 to 100
1 4 9 16 25 36 49 64 81 100 

### Note
Since generators produce output one element at a time versus an entire list all at once, it can help your program use less memory.

## Generator expressions


Another way to make a generator is by using a generator expression. This is a generator analogue to list, dictionary, and set comprehensions. To create one, enclose what would otherwise be a list comprehension within parentheses instead of brackets:

In [143]:
gen = (x ** 2 for x in range(100))

In [145]:
gen

<generator object <genexpr> at 0x138d020c0>

This is equivalent to the following more verbose generator:

In [148]:
def _make_gen():
    for x in range(100):
        yield x ** 2

In [150]:
gen = _make_gen()

Generator expressions can be used instead of list comprehensions as function arguments in some cases:

In [153]:
sum(x ** 2 for x in range(100))

328350

In [155]:
dict((i, i ** 2) for i in range(5))

{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

Depending on the number of elements produced by the comprehension expression, the generator version can sometimes be meaningfully faster.

## itertools module

The standard library itertools module has a collection of generators for many common data algorithms. For example, groupby takes any sequence and a function, grouping consecutive elements in the sequence by return value of the function. Here’s an example:

In [160]:
import itertools

In [162]:
def first_letter(x):
    return x[0]

In [164]:
names = ["Alan", "Adam", "Wes", "Will", "Albert", "Steven"]

In [166]:
for letter, names in itertools.groupby(names, first_letter):
    print(letter, list(names)) # names is a generator

A ['Alan', 'Adam']
W ['Wes', 'Will']
A ['Albert']
S ['Steven']


See Table 3-2 for a list of a few other itertools functions I’ve frequently found helpful. You may like to check out the official Python documentation for more on this useful built-in utility module.

### Table 3-2. Some useful itertools functions

|Function|Description|
|---|---|
|chain(*iterables)|Generates a sequence by chaining iterators together. Once elements from the first iterator are exhausted, elements from the next iterator are returned, and so on.|
|combinations(iterable, k)|Generates a sequence of all possible k-tuples of elements in the iterable, ignoring order and without replacement (see also the companion function combinations_with_replacement).|
|permutations(iterable, k)|Generates a sequence of all possible k-tuples of elements in the iterable, respecting order.|
|groupby(iterable[, keyfunc])|Generates (key, sub-iterator) for each unique key.|
|product(*iterables, repeat=1)|Generates the Cartesian product of the input iterables as tuples, similar to a nested for loop.|

## Errors and Exception Handling

Handling Python errors or exceptions gracefully is an important part of building robust programs. In data analysis applications, many functions work only on certain kinds of input. As an example, Python’s float function is capable of casting a string to a floating-point number, but it fails with ValueError on improper inputs:

In [172]:
float("1.2345")

1.2345

In [174]:
float("something")

ValueError: could not convert string to float: 'something'

Suppose we wanted a version of float that fails gracefully, returning the input argument. We can do this by writing a function that encloses the call to float in a try/except block (execute this code in IPython):

In [177]:
def attempt_float(x):
    try:
        return float(x)
    except:
        return x

The code in the except part of the block will only be executed if float(x) raises an exception:

In [180]:
attempt_float("1.2345")

1.2345

In [182]:
attempt_float("something")

'something'

You might notice that float can raise exceptions other than ValueError:

In [185]:
float((1, 2))

TypeError: float() argument must be a string or a real number, not 'tuple'

You might want to suppress only ValueError, since a TypeError (the input was not a string or numeric value) might indicate a legitimate bug in your program. To do that, write the exception type after except:

In [188]:
def attempt_float(x):
    try:
        return float(x)
    except ValueError:
        return x

We have then:

In [191]:
attempt_float((1, 2))

TypeError: float() argument must be a string or a real number, not 'tuple'

You can catch multiple exception types by writing a tuple of exception types instead (the parentheses are required):

In [194]:
def attempt_float(x):
    try:
        return float(x)
    except (TypeError, ValueError):
        return x

In some cases, you may not want to suppress an exception, but you want some code to be executed regardless of whether or not the code in the try block succeeds. To do this, use finally:

In [197]:
f = open(path, mode="w")

try:
    write_to_file(f)
finally:
    f.close()

NameError: name 'path' is not defined

Here, the file object f will always get closed. Similarly, you can have code that executes only if the try: block succeeds using else:

In [200]:
f = open(path, mode="w")

try:
    write_to_file(f)
except:
    print("Failed")
else:
    print("Succeeded")
finally:
    f.close()

NameError: name 'path' is not defined

## Exceptions in IPython

If an exception is raised while you are %run-ing a script or executing any statement, IPython will by default print a full call stack trace (traceback) with a few lines of context around the position at each point in the stack:If an exception is raised while you are %run-ing a script or executing any statement, IPython will by default print a full call stack trace (traceback) with a few lines of context around the position at each point in the stack:

In [204]:
%run ../examples/ipython_bug.py

AssertionError: 

Having additional context by itself is a big advantage over the standard Python interpreter (which does not provide any additional context). You can control the amount of context shown using the %xmode magic command, from Plain (same as the standard Python interpreter) to Verbose (which inlines function argument values and more). As you will see later in Appendix B, you can step into the stack (using the %debug or %pdb magics) after an error has occurred for interactive postmortem debugging.