# Lesson 5: Functions and Modules

## Topics:
- Function definitions
- Function documentation
- Functions within functions
- Modules
- Modules of Interest: sys, math, collections
- Making your own libraries

## Introduction

This afternoon we'll concentrate on our last fundamental programming concept for the course. To date, we've been writing all of our program logic in the main body of our scripts. And we've seen how built-in python __functions__ like __raw_input()__ are used to operate on variables and their values. In this session, we'll learn how to write __functions__ of our own, how to properly document them for ourselves and other users, and how to collect them into __modules__, and make our own local repositories, or __libraries__.

If you properly leverage a well-designed function, writing the main logic of your programs becomes almost-too-easy. Instead of writing out meticulous logical statements and loops for every task, you just call forth your previously-crafted logic, which you've vested in well-made __functions__.

## Functions

Functions are the basic means to manage complexity in your programs, allowing you to avoid nesting and repeating large chunks of code that could otherwise make your tasks unmanageable. They allow you to bundle code with a defined input and output into single lines, and you should use them frequently from now on.

```python
def hello(name):
 greeting = "Hello %s!" % (name)
 return greeting
```

To define a function, you use the keyword __def__. Then comes the function name, in this case __hello__, with parentheses containing any input __arguments__ the function might need. In this case, we need a name to form a proper greeting, so we're giving the __hello()__ function a variable __argument__ called __name__. After that, the function does its thing, executing the indented block of code immediately below. In this case, it creates a greeting _Hello "name"!_. The last thing that it does is return that greeting to the rest of the program.

Once created, you call on the function as your normally would:

```python
myinput = "class"
myoutput = hello(myinput)

print myoutput

>> 'Hello class!'
```

Technically speaking, a function does not need to explicitly return something, although it's uncommon that you'll write any that don't. If you don't return something explicitly, Python will nevertheless return the special object None. None is logically false (for if statements), and printing None will result in nothing being printed (although None is not the empty string). It's easy to forget to return a value, so this is an easy first thing to check in case your functions don't work as expected.

```python
def hello(name):
 greeting = "Hello %s!" % (name)
 ##return greeting

myinput = "class"
myoutput = hello(myinput)

print myoutput

>> None
```

In [None]:
def hello(name):
 greeting = "Hello %s!" % (name)
 return greeting

myinput = "class"
myoutput = hello(myinput)

print myoutput

Now, let's look at another function:

```python
def hello(name):
 greeting = "Hello %s!" % (name)
 testVariable = """The hotel room is a mess, there's a chicken hangin'
                   out, somebody's baby is in the closet, there's a
                   tiger in the bathroom that Mike Tyson wants back, Stu
                   lost a tooth and eloped, and Doug is missing."""
 print 'Inside of the function:', testVariable
 return greeting
 
testVariable = "What happens in Vegas stays in Vegas."
grt = hello("Stu Price")
print 'Outside of the function:', testVariable
```

Even though the epic story of a bachelor party gone horrifically awry was assigned to a variable called __testVariable__ inside the function, nothing happened to that variable outside the function. Variables created inside a function occupy their own __namespace__ in memory distinct from variables outside of the function, and so reusing names between the two can be done without you having to keep track of it. (Refer to the article http://bytebaker.com/2008/07/30/python-namespaces/ about __namespace__ for more information.) That means you can use functions written by other people without having to keep track of what variables those functions are using internally. Just like a sleazy town in Nevada, what happens in the function stays in the function. (An important exception lies with lists and dictionaries, which you will examine in the exercises.)

1. What happens if you try to print **testVariable** outside of the function and you don't assign anything to it?

Everything you have learned to write in python so far can go into a function -- they are a means of organizing your code, and are best used after you write a series of commands you expect to often use. Turning that series of commands into a function makes it easy to repeat that set of commands over and over, while keeping the code easy to read. 

In [None]:
def hello(name):
 greeting = "Hello %s!" % (name)
 testVariable = """The hotel room is a mess, there's a chicken hangin'
                   out, somebody's baby is in the closet, there's a
                   tiger in the bathroom that Mike Tyson wants back, Stu
                   lost a tooth and eloped, and Doug is missing."""
 #print 'Inside of the function:', testVariable
 return greeting
 
testVariable = "What happens in Vegas stays in Vegas."
grt = hello("Stu Price")
print 'Outside of the function:', testVariable

Here are some other examples, involving **print**.

In [None]:
def useless():
    print 'What was the point of that?'
    print
    
useless()

'''
def countToTen():
    for i in range(10):
        print i
 
countToTen()

print "Call function within function"
def calluseless():
    print "Let's use the function useless()"
    useless()
 
calluseless()
'''

What happens in the above commands? Notice that what you print inside the function gets printed if you call on the function, even if you don't return anything. However, it won't print anything inside the function unless you call on the function. Finally, you can call on functions from inside functions!

We've shown examples with one input variable and one return value, but functions can accept zero input variables, one input variable, or multiple input variables, and functions don't necessarily need to return variables back to the program, but they are also capable of returning multiple variables.

Here's an example with multiple input variables and multiple output variables.

In [None]:
def doLaundry(amtDetergent, dirtyClothes): ##amtDetergent is integer, dirtyClothes is list
    if type(amtDetergent) != int: return (None, "type error")
    cleanClothes = []
    for load in dirtyClothes:
        amtDetergent -= 1
        cleanClothes.append(load)
    return (amtDetergent, cleanClothes)
 
amtTide = 5
print "Starting amount of Tide:",amtTide
print "Let's do some laundry!"
dirtyLaundry = ['socks','shirts','pants']
(amtTide, cleanLaundry) = doLaundry(amtTide, dirtyLaundry)
print "Amount of Tide left:", amtTide
print "Clean laundry includes:", cleanLaundry

Above, in __doLaundry()__, I returned a __tuple__ of the two variables enclosed in parenthesis. You could also return a __list__, which works much the same way. You could return other objects as well, like __dictionaries__. Below is an example where we return a list.

```python
def returnStuff():
    a = '>Gene1'
    b = 'ATGGTGGG'
    return [a,b] 

## We can only assign to a single variable as a list.
both = returnStuff()

## Or we can immediately assign to multiple variables.
name, seq = returnStuff()

## We can index the output the same as any list.
returnStuff()[0]
returnStuff()[1]

## We can make a dictionary directly from the function.
dictOfStuff = {returnStuff()[0][1:]:returnStuff()[1]}

```

In [None]:
def returnStuff():
    a = '>Gene1'
    b = 'ATGGTGGG'
    return [a,b] 

## We can only assign to a single variable as a list.
both = returnStuff()
print both

## Or we can immediately assign to multiple variables.
#name, seq = returnStuff()
#print name
#print seq

## We can index the output the same as any list.
#print returnStuff()[0]
#print returnStuff()[1]

## We can make a dictionary directly from the function.
#dictOfStuff = {returnStuff()[0][1:]:returnStuff()[1]}
#print dictOfStuff

So how do functions make our lives easier? We can exploit functions to break difficult tasks into a number of easier tasks, and then these easier tasks into ones easier still, and so on. Large code blocks, with a few function calls, are only tens of lines long, and many functions are only a handful of lines. This allows us to program in large, structural sweeps, rather than getting lost in the details. This makes programs both easier to write and easier to read:

```python
##Don't copy this into a script!
 
def publishAPaper(authors,topic,journal):
 data = doWork(topic)
 figures = analyze(data)
 paper = writePaper(data,figures)
 submit(authors,paper,journal)
```

And, a big part of that ease comes with the use of:

## Modules

In all of the examples above, we defined our functions right above the code that we hoped to execute. If you have many functions, you can see how this would get messy in a hurry. Furthermore, part of the benefit of functions is that you can call them multiple times within a program to execute the same operations without tiresomely writing them all out again. But wouldn't it be nice to share functions across programs, too? For example, working with genomic data means lots of time getting sequence out of FASTA files, and shuttling that sequence from program to program. Many of the programs we work with overlap to a significant degree, as they need to parse FASTA files, calculate evolutionary rates, and interface with our lab servers, for example -- all of which means that many of them share functions. And if the same function exists in two or more different programs, we hit the same problems that we hit before: complex debugging, decreased readability, and, of course, too much typing.

__Modules__ solve these problems. In short, they're collections of functions and variables (and often objects, which we'll get to towards the end of the course) that are kept together in a single file that can be read and imported by any number of programs.

### Using a module: the basics

To illustrate the basics, we'll go through the use of two modules, __sys__ and __math__, one of which we use almost all the time. In fact, it's a very rare program indeed that doesn't use the __sys__ module. __sys__ contains a lot of really esoteric functions, but it also contains a simple, everyday thing -- what you typed on the command line.

We are going to write a script, and like we learned on the first day of class, write it to a file to run and then run it through using %%bash in a cell below.

### A quick aside on variables in bash!

I gave you some basic commands in Unix the first day, but have not said much else. Hongru will give you a more advanced lesson in Unix, but a short lesson here is on variables. 
```bash
x='myfile'

echo ${x}

>> 'myfile'
```

The line initializing the variable must have no spaces and you write the variable within the brackets of **${ }** where you want your variable contents to appear. 

Like for python, I did not need to set the path directory to a variable. I just like to do so because it helps me see my code more clearly.

In [None]:
%%writefile /Users/melyang/Desktop/PythonBootcamp2017/lessons/lesson5_example1.py

import sys # gaining access to the module
 
# you can access variables stored in the module by using a dot
# to get at the variable 'argv' which is stored in 'sys', type:
 
commandLine = sys.argv
 
print commandLine

In [None]:
%%bash
pd="/Users/melyang/Desktop/PythonBootcamp2017/lessons/"
python ${pd}lesson5_example1.py
##Now try adding more things after the ".py" script!

The __sys__ module contains a variable __argv__, which is a list of strings composed of what was written into the command line, where each of the different strings are separated by whitespaces. We can access this list __argv__ from our program by importing the module sys and calling __sys.argv__.

This is a lot more convenient than **raw_input** which we have been using, as it requires one line. However, it might need more documentation in comments or otherwise to remind yourself what you were doing in the script. 

This method of quickly assigning variables is incredibly useful, as it allows you to run the same script multiple times without manually editing the code each time. 

With the **sys** module we accessed a variable. We can also access functions stored inside modules. To demonstrate this, I'll use the module __math__. The following script can be found in _example2.py_. Try typing __python lesson5_example2.py 3__ into the terminal.

In [None]:
%%writefile /Users/melyang/Desktop/PythonBootcamp2017/lessons/lesson5_example2.py
import sys
import math
 
# sys.argv contains only strings, even if you type integers.
# And, remember, the first element is the command itself-- usually
# not very useful.
##x = 15.0
x = float(sys.argv[1]) # argv stores the command line arguments as
                       # strings, but python isn't especially clever,
                       # so we can't do math with strings
logX = math.log(x)

print logX

In [None]:
%%bash
pd="/Users/melyang/Desktop/PythonBootcamp2017/lessons/"
python ${pd}lesson5_example2.py 3

The **math** module provides many more functions that help you to perform numerical operations. You can visit [this link](https://docs.python.org/2/library/math.html), to find a full list of functions. Also, don't forget that you can use the tab and "??" option in the notebook to interactively investigate yourself. Just remember to write "math." and then pressing tab to see the available list of functions. 

### Modules have more than just functions: The collections module

We already knew this: **sys.argv** is a list. Another thing that modules often contain is datatypes. Just as Python has some built-in datatypes (like int, list, str, and dict), it's also possible (although outside the scope of this course) to create full-fledged data types of your own.

One of the more useful of these is the collections module. It has a bunch of new data types that are, as you might guess from the name, collections of other things. There are two of them that I use with some regularity: **collections.Counter** and **collections.defaultdict**. Let's start with **collections.Counter**, which counts things.

In [None]:
import collections
 
my_genera = ['Helicobacter', 'Escherichia', 'Lactobacillus', 'Lactobacillus', 'Oryza',
 'Wolbachia', 'Oryza', 'Rattus', 'Lactobacillus', 'Drosophila']

b = {}
for genus in my_genera:
    if genus not in b:
        b[genus] = 0
    b[genus] += 1

c = collections.Counter(my_genera)
    
d = collections.Counter()
for genus in my_genera:
    d[genus] += 1
    
print b
#print c
#print d

Above, we have a list of genera. If we wanted a dictionary including the number of times each genus appears in the list, we would do as written above for **b** - iterate through the list, check if the genus name is already in the initialized dictionary **b**, and if not, make a category. 

With the **collections** module we can do this much more quickly. For instance, **collections.Counter** is a function that turns a list into a Counter data type (see **c**), which acts like a dictionary only it always have keys of stuff you are counting and values indicating the count of each key. 

You can also create an empty Counter, as we do in **d**. In this case, we can immediately add a count as we iterate, as the check for the key is done automatically, and a dictionary entry is created by default if the key is not present. 

Using a __Counter__ is faster to write and makes it more obvious that we are counting, as opposed to a dictionary, which could be used for almost anything. Another big advantage of the __Counter__ type is that it makes it really easy to sort by frequency:

```python
print c
print c.most_common()

>> Counter({'Lactobacillus': 3, 'Oryza': 2, 'Drosophila': 1, 'Escherichia': 1, 'Rattus': 1, 'Wolbachia': 1, 'Helicobacter': 1})
>> [('Lactobacillus', 3), ('Oryza', 2), ('Drosophila', 1), ('Escherichia', 1), ('Rattus', 1), ('Wolbachia', 1), ('Helicobacter', 1)]
```
The **most_common** method for the Counter data object creates a list of tuples of (key, value), sorted from the highest value to the lowest value. 

In [None]:
print c
print c.most_common()

The other collections type I really like is the __defaultdict__, which is also like a dictionary, but has a default type for a key that we haven't seen before (with a normal dictionary, if you try to read something where the key isn't in the dict, then you get an error). Let's think about how we'd make a dictionary where each key is a genus, and the value is a list of species in that genus:

```python

my_species_list = [('Helicobacter','pylori'), ('Escherichia','coli'),
              ('Lactobacillus', 'helveticus'), ('Lactobacillus', 'acidophilus'),
              ('Oryza', 'sativa'), ('Wolbachia', 'pipientis'), ('Oryza', 'glabberima'),
              ('Rattus', 'norvegicus'), ('Lactobacillus','casei'),
              ('Drosophila','melanogaster')]

d2 = collections.defaultdict(list)
 
for genus, species in my_species_list:
    d2[genus].append(species)
```

Here, if you specify your values are lists, it will, like in Counter, automatically check for you if the key is present and initialize it as an empty list if it is not. Likewise, you can specify any default data type. 

In [None]:
my_species_list = [('Helicobacter','pylori'), ('Escherichia','coli'),
              ('Lactobacillus', 'helveticus'), ('Lactobacillus', 'acidophilus'),
              ('Oryza', 'sativa'), ('Wolbachia', 'pipientis'), ('Oryza', 'glabberima'),
              ('Rattus', 'norvegicus'), ('Lactobacillus','casei'),
              ('Drosophila','melanogaster')]

d2={}
#d2 = collections.defaultdict(list)

for genus, species in my_species_list:
    d2[genus].append(species)

print d2

## Making a module

Here, we have been using modules developed by other people, but we can make our own modules too!

Any file of python code with a _.py_ extension can be imported as a module from your script. When you invoke an import operation from a program, all the statements in the imported module are executed immediately. The program also gains access to names assigned in the file (names can be functions, variables, classes, etc.), which can be invoked in the program using the syntax __module.name__. Make the following script in the file *greeting.py* in the lessons/ folder:

In [None]:
%%writefile /Users/melyang/Desktop/PythonBootcamp2017/lessons/greetings.py


print 'The top of the greeting_module has been read.'
 
def hello(name):
 greeting = "Hello %s!" % name
 return greeting
 
def nihao(name):
 greeting = "Ni hao from China, %s!" % name
 return greeting
 
x = 5
 
print 'The bottom of the greeting_module has been read.'

Let's also make a script called "lesson5_example3.py" where we use the **greetings** module. 



In [None]:
%%writefile /Users/melyang/Desktop/PythonBootcamp2017/lessons/lesson5_example3.py

import greetings
 
hi = greetings.hello('person')
print hi
print greetings.x

# What happens if you try 'print x' here?

In [None]:
%%bash

pD="/Users/melyang/Desktop/PythonBootcamp2017/lessons/"

python ${pD}lesson5_example3.py

Notice that it runs through all of greeting module first, so anything that is printed out in *greeting_module.py* is also printed out before anything in *lesson5_example3.py* is run.

1. Can you think of a fast and easy way to make it print out a greeting to whomever you might meet?

And that's it! See-- no more messy function declarations at the beginning of your script. Now if you need any other program to say hi to you, all you need to do is import the greeting module.

## Using modules: slightly more than just 'import'

Although creating a basic module is easy, sometimes you want more than just the basics. And although using a module in the most basic manner is easy, it's best to get a more thorough picture of how modules behave.

First, what if you only want one function from a given module? For instance, in China most people wouldn't understand what you meant by "hello", so "nihao" might be a better greeting. We need to use a modified syntax for retrieving _only_ the __nihao__ function from the module, without cluttering things up by loading the English __hello__ function.

Let's make "lesson5_example4.py"

In [None]:
%%writefile /Users/melyang/Desktop/PythonBootcamp2017/lessons/lesson5_example4.py

from greetings import nihao
 
hi = nihao('everybody')
print hi

In [None]:
%%bash

pD="/Users/melyang/Desktop/PythonBootcamp2017/lessons/"

python ${pD}lesson5_example4.py

We see that we can now write __nihao('everybody')__ directly, instead of having to write __greetings.nihao('everybody')__. And if we wanted to access both functions this way, we could import them both in one statement by changing the import line to the following:

```python
from greetings import nihao, hello
```

Or, what if there were a lot of functions from the __greeting_module__ we wanted to use, but didn't want to write out the full name? Rather than writing out all of the function names to import individually (there could be a lot of them), we can use the asterisk wildcard (*) symbol to refer to them.

```python
from greeting_module import *
```

While this may be useful if we are familiar with the contents of the __module__, including all of the __names__ inside, there are a few reasons to be careful about using the __from modulename import *__ syntax. First, if the module contains a lot of variables that we don't need to use, we will needlessly allocate memory to storing the information. Second, and perhaps more importantly, if the module being imported contains variables with the same names as those inside your program, you will lose access to the original values of those variables.

For example, would might have a problem if both _yourprogram.py_ and _yourmodule.py_ each define distinct functions called __hello()__. If instead you use the syntax __import yourmodule__, then you can call the function in _yourprogram.py_ using __hello()__ and you can call the function in _yourmodule.py_ using __yourmodule.hello()__. If you want to import a whole module, but don't want to type out it's full name every time, you can use the syntax: __import a_long_module_name as mname__.

```python
import greetings as greets
```

Finally, you can also import variables from modules and assign them new names in your program using the syntax __from modulename import variablename as newvariablename__.

```python
from greetings import nihao as chinesehello
```

## Where to Store Your Modules: using PYTHONPATH

Over time, you'll end up accumulating lots of these modules, and they'll tend to fall together in meaningful collections. For example, you might have a module for all your functions related to reading and parsing files, called *files_tools.py*. You might have another for common sequence-related tasks, called *sequence_tools.py*. Python keeps its modules installed in a system directory that you may or may not have access to on a remote server. Therefore, it's useful and simpler to just create your own python modules directory and then let your operating system environment know about it. 

Here, I accomplish this by placing my modules in a new folder I make called pylib.  The pylib folder can be placed anywhere, but keep in mind what is the path to the directory and whether this is a convenient place to keep the modules. Then, you can set up a command in the hidden file *.bash_profile* in the home directory. 

You will be copy/pasting the following commands into the Terminal, which will add new lines into your *~/.bash_profile* that will direct python to look into your pylib folder. Make sure you change the path of your folder into your actual path file to pylib (you can use __pwd__ in your terminal to check the path of the pylib directory.

```bash
%%bash
echo 'PYTHONPATH=$PYTHONPATH:<your path to the pylib directory>' >> ~/.bash_profile
echo 'export PYTHONPATH' >> ~/.bash_profile
source ~/.bash_profile
```

NOTE: The **source** makes your current shell process the changes you made to the *.bash_profile*. If you are getting errors, exiting out of the notebook and opening it again in a new terminal might help. 

NOTE: *.bash_profile* vs. _.bashrc_: In Linux, *.bash_profile* is run upon login while _.bashrc_ is run each time a new terminal is open. Thus, __if you are using Linux and it isn't working__, try inputting the following commands and see if it works. [This link](http://www.joshstaiger.org/archives/2005/07/bash_profile_vs.html) gives a pretty good summary of the difference in the two hidden files.

```bash
%%bash
echo 'PYTHONPATH=<your path to the pylib directory>' >> ~/.bashrc
echo 'export PYTHONPATH' >> ~/.bashrc
source ~/.bashrc
```
And with that, any file that ends up in this directory will be treated as a module by Python. And though this is a good final resting place for your polished modules, you can also prototype them by simply saving them in your current working directory, and moving them over when you're happy with them.