# Modules and math libraries
## Topics
- Modules - indepedent collections of related functions
- Using Modules - math & collections

## Introduction



First up we'll be discussing modules and how they are used to organize code.


## Modules

In all of the examples so far, we defined our functions right above the code that we hoped to execute. If you have many functions, you can see how this would get messy in a hurry. 
Furthermore, part of the benefit of functions is that you can call them multiple times within a program to execute the same operations without writing them all out again. 

But wouldn't it be nice to share functions across programs, too? 

For example, working with genomic data means lots of time getting sequence out of FASTA files, and shuttling that sequence from program to program. Many of the programs we work with overlap to a significant degree, as they need to parse FASTA files, calculate evolutionary rates, and interface with our lab servers, for example -- all of which means that many of them share functions. And if the same function exists in two or more different programs, we hit the same problems that we hit before: complex debugging, decreased readability, and, of course, too much typing.

Modules solve these problems. In short, they're collections of code that are kept together in a single file that can be read and __import__ed by any number of programs.

### The Basics: Using the math module

To illustrate the basics, we'll go through the use of the __math__ module, a module which we use almost all the time. To use a function or variable in the __math__ module, use the syntax 
__math.__*NameOfThing

In [None]:
##to use a module in your code, first import it
import math

x = 5

##Modules usually contain functions
log10 = math.log10(x)
cos = math.cos(x)

##sometimes modules contain variables
pi = math.pi
e = math.e

print log10, cos, pi, e

In [None]:
math.cos?

### The collections module

Another useful module is the __collections__ module. It has a bunch of new data types that are, as you might guess from the name, collections of other things. We will cover two of the most commonly used objects: __Counter__ and __defaultdict__. Let's start with __Counter__, which counts things.

In [None]:
import collections
 
my_genera = ['Helicobacter', 'Escherichia', 'Lactobacillus',
             'Lactobacillus', 'Oryza', 'Wolbachia', 'Oryza',
             'Rattus', 'Lactobacillus', 'Drosophila']
 
c = collections.Counter(my_genera)
print c
##Note that placing the list into Counter() immediately gets
##you the count.

The collections module gives us a new data type, __Counter__, that counts things. It is essentially a dictionary where the key is some element we are recording and the value is the count of how often it appears. Remember that list of amino acids we got the count for in the exercises in Section 2.1? There, we created a dictionary where every key was initialized with a value of zero, and then proceeded to add one for each observance. Here, we can just use a __Counter__ to get the count of each unique element in the list.

In [None]:
##This is how we did a count in a dictionary. Many more lines of code!
counts = {}
 
for genus in my_genera:
    if genus not in counts:
        counts[genus] = 0
    counts[genus] += 1

print "The dictionary", counts

Using a __Counter__ is faster to write and saves us writing this bit of code every time we want to something. Another big advantage of the __Counter__ type is that it makes it really easy to sort by frequency:

In [None]:
my_seq = ['MET', 'GLU', 'VAL', 'LYS', 'ARG', 'GLU', 'HIS', 'TRP', 'ALA',
          'THR', 'ARG', 'LEU', 'GLY', 'LEU', 'ILE', 'LEU', 'ALA', 'MET',
          'ALA', 'GLY', 'ASN', 'ALA', 'VAL', 'GLY', 'LEU', 'GLY', 'ASN',
          'PHE', 'LEU', 'ARG', 'PHE', 'PRO', 'VAL', 'GLN', 'ALA', 'ALA',
          'GLU', 'ASN', 'GLY', 'GLY', 'GLY', 'ALA', 'PHE', 'MET', 'ILE',
          'PRO', 'TYR', 'ILE', 'ILE', 'ALA', 'PHE', 'LEU', 'LEU', 'VAL',
          'GLY', 'ILE', 'PRO', 'LEU', 'MET', 'TRP', 'ILE', 'GLU', 'TRP',
          'ALA', 'MET', 'GLY', 'ARG', 'TYR', 'GLY', 'GLY', 'ALA', 'GLN',
          'GLY', 'HIS', 'GLY', 'THR', 'THR', 'PRO', 'ILE', 'VAL', 'PHE',
          'LEU', 'ILE', 'THR', 'MET', 'PHE', 'ILE', 'ASN', 'VAL', 'SER',
          'ILE', 'LEU', 'ILE', 'ARG', 'GLY', 'ILE', 'SER', 'LYS', 'GLY',
          'ILE', 'GLU', 'ARG', 'PHE', 'ALA', 'LYS', 'ILE', 'ALA', 'MET',
          'PRO', 'THR', 'LEU', 'PHE', 'ILE', 'LEU', 'ALA', 'VAL', 'PHE',
          'LEU', 'VAL', 'ILE', 'ARG', 'VAL', 'PHE', 'LEU', 'LEU', 'GLU',
          'THR', 'PRO', 'ASN', 'GLY', 'THR', 'ALA', 'ALA', 'ASP']

c = collections.Counter(my_seq)
 
print c
print
print
print c.most_common()

*Counter.*__most_common()__ returns a list of tuples, sorted in order by highest count to lowest count.

The other __collections__ type we will cover is __defaultdict__, which is also like a dictionary, but has a default type for a key that we haven't seen before (with a normal dictionary, if you try to read something where the key isn't in the dict, then you get an error). Let's think about how we'd make a dictionary where each key is a genus, and the value is a list of species in that genus:

In [None]:
import collections
 
my_species = [('Helicobacter','pylori'), ('Escherichia','coli'),
              ('Lactobacillus', 'helveticus'),
              ('Lactobacillus', 'acidophilus'),
              ('Oryza', 'sativa'), ('Wolbachia', 'pipientis'),
              ('Oryza', 'glabberima'), ('Rattus', 'norvegicus'),
              ('Lactobacillus','casei'), ('Drosophila','melanogaster')]
 
# Below, we put the list into a normal dictionary, 
# with genera as keys and species as values
d1 = {}
for genus, species in my_species:
    if genus not in d1:
        d1[genus] = []
    d1[genus].append(species)

print "normal dictionary -- ", d1

With a __defaultdict__, we can once again save the line in the for loop where we check for a non-existent key:

In [None]:
d2 = collections.defaultdict(list)
 
for genus, species in my_species:
    d2[genus].append(species)

print "default dict -- ", d2

Moreover, if we check for the species in a genus that has no species, we no longer receive an error.

In [None]:
print d2['Saccharomyces']

One thing to look at is the line where we actually declare the defaultdict: here we've given it another type, and if we use a key that's not in the dictionary already, it will initialize it to be an empty variable of that type. Most often, this will be a list, but you could imagine uses for other types, like a string, an integer (here "empty" actually would mean 0), or even another dict. It's possible to even have a defaultdict of defaultdicts!

## Finding modules

Placeholder - googling modules, using pip (did we install pip on windows machines?)


## Making modules

Hand-wave that this is possible