# Modules and math libraries
## Topics
- Modules - indepedent collections of related functions
- Using Modules - math & collections

## Introduction



First up we'll be discussing modules and how they are used to organize code.


## Modules

In all of the examples so far, we defined our functions right above the code that we hoped to execute. If you have many functions, you can see how this would get messy in a hurry. 
Furthermore, part of the benefit of functions is that you can call them multiple times within a program to execute the same operations without writing them all out again. 

But wouldn't it be nice to share functions across programs, too? 

For example, working with genomic data means lots of time getting sequence out of FASTA files, and shuttling that sequence from program to program. Many of the programs we work with overlap to a significant degree, as they need to parse FASTA files, calculate evolutionary rates, and interface with our lab servers, for example -- all of which means that many of them share functions. And if the same function exists in two or more different programs, we hit the same problems that we hit before: complex debugging, decreased readability, and, of course, too much typing.

Modules solve these problems. In short, they're collections of code that are kept together in a single file that can be read and __import__ed by any number of programs.

### The Basics: Using the math module

To illustrate the basics, we'll go through the use of the __math__ module, a module which we use almost all the time. To use a function or variable in the __math__ module, use the syntax 
__math.__*NameOfThing

In [None]:
##to use a module in your code, first import it
import math

x = 5

##Modules usually contain functions
log10 = math.log10(x)
cos = math.cos(x)

##sometimes modules contain data
pi = math.pi
e = math.e

print log10, cos, pi, e

In [None]:
math.cos?

### The collections module

Another useful module is the __collections__ module. It has five new data types that are, as you might guess from the name, collections of other things. The full documentation is [here](https://docs.python.org/2/library/collections.html). We will cover two of the most commonly used objects: __Counter__ and __defaultdict__. Let's start with __Counter__, which counts things.

In [None]:
import collections
 
my_genera = ['Helicobacter', 'Escherichia', 'Lactobacillus',
             'Lactobacillus', 'Oryza', 'Wolbachia', 'Oryza',
             'Rattus', 'Lactobacillus', 'Drosophila']
 
c = collections.Counter(my_genera)
print c
##Note that placing the list into Counter() immediately gets
##you the count.

The collections module gives us a new data type, __Counter__, that counts things. It is essentially a dictionary where the key is some element we are recording and the value is the count of how often it appears. 

In [None]:
# The hard way - setting up a dictionary and keeping manual track of counts.
counts = {}
 
for genus in my_genera:
    if genus not in counts:
        counts[genus] = 0
    counts[genus] += 1

print "The dictionary", counts

Using a __Counter__ is faster to write and saves us writing this bit of code every time we want to something. Another big advantage of the __Counter__ type is that it makes it really easy to sort by frequency:

In [None]:
my_seq = ['MET', 'GLU', 'VAL', 'LYS', 'ARG', 'GLU', 'HIS', 'TRP', 'ALA',
          'THR', 'ARG', 'LEU', 'GLY', 'LEU', 'ILE', 'LEU', 'ALA', 'MET',
          'ALA', 'GLY', 'ASN', 'ALA', 'VAL', 'GLY', 'LEU', 'GLY', 'ASN',
          'PHE', 'LEU', 'ARG', 'PHE', 'PRO', 'VAL', 'GLN', 'ALA', 'ALA',
          'GLU', 'ASN', 'GLY', 'GLY', 'GLY', 'ALA', 'PHE', 'MET', 'ILE',
          'PRO', 'TYR', 'ILE', 'ILE', 'ALA', 'PHE', 'LEU', 'LEU', 'VAL',
          'GLY', 'ILE', 'PRO', 'LEU', 'MET', 'TRP', 'ILE', 'GLU', 'TRP',
          'ALA', 'MET', 'GLY', 'ARG', 'TYR', 'GLY', 'GLY', 'ALA', 'GLN',
          'GLY', 'HIS', 'GLY', 'THR', 'THR', 'PRO', 'ILE', 'VAL', 'PHE',
          'LEU', 'ILE', 'THR', 'MET', 'PHE', 'ILE', 'ASN', 'VAL', 'SER',
          'ILE', 'LEU', 'ILE', 'ARG', 'GLY', 'ILE', 'SER', 'LYS', 'GLY',
          'ILE', 'GLU', 'ARG', 'PHE', 'ALA', 'LYS', 'ILE', 'ALA', 'MET',
          'PRO', 'THR', 'LEU', 'PHE', 'ILE', 'LEU', 'ALA', 'VAL', 'PHE',
          'LEU', 'VAL', 'ILE', 'ARG', 'VAL', 'PHE', 'LEU', 'LEU', 'GLU',
          'THR', 'PRO', 'ASN', 'GLY', 'THR', 'ALA', 'ALA', 'ASP']

c = collections.Counter(my_seq)
 
print c
print
print
print c.most_common()
print "\n\nThis is the most common: ",c.most_common()[0][0],
print " which shows up", c.most_common()[0][1],"times."

*Counter.*__most_common()__ returns a list of tuples, sorted in order by highest count to lowest count.

The other __collections__ type we will cover is __defaultdict__, which is also like a dictionary, but has a default type for a key that we haven't seen before (with a normal dictionary, if you try to read something where the key isn't in the dict, then you get an error). Let's think about how we'd make a dictionary where each key is a genus, and the value is a list of species in that genus:

In [None]:
# Let's set up a list of tuples. 
# Format is (genus, species)
my_species = [('Helicobacter','pylori'), 
              ('Escherichia','coli'),
              ('Lactobacillus', 'helveticus'),
              ('Lactobacillus', 'acidophilus'),
              ('Oryza', 'sativa'), 
              ('Wolbachia', 'pipientis'),
              ('Oryza', 'glabberima'), 
              ('Rattus', 'norvegicus'),
              ('Lactobacillus','casei'), 
              ('Drosophila','melanogaster')]

In [None]:
# Review - we can assign tuples in series....
foo, bar = ("genus","species")
print foo
print bar

In [None]:
# we can loop over them, too:
for genus, species in my_species:
    print "Genus: ", genus
    print "Species: ", species,


In [None]:
# let's build a dictionary with keys that are genera and 
# values that are lists of species.

old_style_dict = {}
for genus, species in my_species:
    if genus not in old_style_dict:
        old_style_dict[genus] = []
    old_style_dict[genus].append(species)

print "normal dictionary -- ", old_style_dict

Let's try that in the [debugger](http://www.pythontutor.com/visualize.html#mode=edit)

With a __defaultdict__, we can once again save the line in the for loop where we check for a non-existent key:

In [None]:
import collections
    
default_style_dict = collections.defaultdict(list)
 
for genus, species in my_species:
    default_style_dict[genus].append(species)

print "default dict -- ", default_style_dict

Moreover, if we check for the species in a genus that has no species, we no longer receive an error.

In [None]:
print old_style_dict['no-such-thing']

In [None]:
print default_style_dict['no-such-thing']

One thing to look at is the line where we actually declare the defaultdict: here we've given it another type, and if we use a key that's not in the dictionary already, it will initialize it to be an empty variable of that type. Most often, this will be a list, but you could imagine uses for other types, like a string, an integer (here "empty" actually would mean 0), or even another dict. It's possible to even have a defaultdict of defaultdicts!

## Finding modules

There are literally thousands of Python modules available, and most of them can be installed with the "pip" utility that we ran on the first day of class.

For example, I needed a way to work with YAML files (a very simple format for markup)....
[so I looked it up...](http://bfy.tw/HMqn)

And found that pyYaml is a thing! Hooray!

```bash
pip install pyyaml
```

Got me access to pyyaml, and then all I had to do was import...

## Making modules


Now that you know the basics of __import__ing and using a module, you will learn how to write our own modules, which is almost just as easy!

Any file of python code with a *.py* extension can be __import__ed as a module from your script. When you invoke an __import__ operation from a program, all the statements in the __import__ed module are executed immediately. The program also gains access to names assigned in the module (names can be functions, variables, classes, etc.), which can be invoked in the program using the syntax *module.name*. Find the following script in the file *greeting_module.py*:

```python
print 'The top of the greeting_module has been read.'
 
def hello(name):
 greeting = "Hello {}!".format(name)
 return greeting
 
def ahoy(name):
 greeting = "Ahoy-hoy {}!".format(name)
 return greeting
 
x = 5
 
print 'The bottom of the greeting_module has been read.'
```

The following script will call the module **greeting_module** and use the functions and variables located within the module.

In [None]:
# Make sure you reset your kernel before running this cell,
# or you won't see the output from greeting_module.
import greeting_module

greeting = greeting_module.hello('Christopher')
print greeting
print

x = 1
print 'x within greeting_module:', greeting_module.x
print 'x within the __main__ module:', x

### Other stuff to know about modules

You can import just a few things from a module 

```python
from greeting_module import ahoy
```

and you can rename a module...

```python
import greeting_module as insult_module
```

In [None]:
# And we can use a clever bit of iPython to show us what's currently imported...
import collections
import greeting_module
%who

In [None]:
# and, we can look at the contents of a module...
dir(greeting_module)

In [None]:
# some of them get complex!
dir(collections)

### Woah - a bit about naming

[pep8](https://www.python.org/dev/peps/pep-0008/) tells us "Use one leading underscore only for non-public methods and instance variables." 

That means that those variables and methods that start with '_' aren't meant to be used outside of the collections module.

Generally speaking, you woudn't use __dir()__ on a collection, you'd look on the web for the module's documentation. As you can see, they get hairy under the covers!

The >double< underscore methods are called "dunder" methods (double-under). These let us alter the behaviour of Python itself - this is a fairly advanced topic. For now, know that you woudn't invoke those directly.

## More about import

__import__ can bring in whole modules (as above, where we imported all of collections).

It can also bring in just one function, and it can place it in "local" scope. And it can rename it!

This is useful in case of a naming conflict.

In [None]:
import greeting_module
from greeting_module import ahoy

In [None]:
%who

In [None]:
greeting_module.ahoy("joe")

In [None]:
ahoy("joe")

In [None]:
from greeting_module import ahoy as pirate_hello

In [None]:
%who

In [None]:
pirate_hello("joe")