# Sequence Alignment Booster
There is a python library called `alignment` that can be used for comparing the content of strings of text and detecting cases of significant overlap.  The basic information on the library can be found at https://pypi.python.org/pypi/alignment/1.0.9 and includes an example (pasted below).

Before you can use the alignment library though you'll need to download it by going to your command line on OSX or the command prompt on Windows and running: `$ pip install alignment`

Note that the `alignment` library was written in python2 rather than python3.  There have been changes to the names of various functions between these versions and so there are two options:

1. Run the alignment tool from python2
2. Fix the alignement tool to run with python3

We're going to go with the fix because python3 is the way forward so you might as well get used to it and, well, given the context of this class it will be good for you. =)

There are only three changes that need to be done to `alignment`:

1. correct `print` statements in `sequence.py` by adding paraentheses
2. replace references to `izip` with `zip` and remove the call to `itertools` in `profile.py`
3. replace references to `xrange` with `range` in `sequencealigner.py`

Look at the code below and the output below it _before_ running the code.  When you run the code it will crash but it will tell you why it crashed and where the file that has the problem needs to be found.  Use a text editor to edit each file as each error occurs.  TextWrangler/BBEdit is a great choice on an OSX system.  Sublime and Notepad++ are good on Windows.

[Aside: after the course I'll submit these changes to the developer so that a python3 variant is available for everyone in the future.]

In [1]:
from alignment.sequence import Sequence
from alignment.vocabulary import Vocabulary
from alignment.sequencealigner import SimpleScoring, GlobalSequenceAligner

# Create sequences to be aligned.
a = Sequence('what a beautiful day'.split())
b = Sequence('what a disappointingly bad day'.split())

# Create a vocabulary and encode the sequences.
v = Vocabulary()
aEncoded = v.encodeSequence(a)
bEncoded = v.encodeSequence(b)

# Create a scoring and align the sequences using global aligner.
scoring = SimpleScoring(2, -1)
aligner = GlobalSequenceAligner(scoring, -2)
score, encodeds = aligner.align(aEncoded, bEncoded, backtrace=True)

# Iterate over optimal alignments and print them.
for encoded in encodeds:
    alignment = v.decodeSequenceAlignment(encoded)
    print (alignment)
    print ('Alignment score:', alignment.score)
    print ('Percent identity:', alignment.percentIdentity())

what a -               beautiful day
what a disappointingly bad       day
Alignment score: 3
Percent identity: 60.0


### Now what?

The next step is to plan how you will expand the use of this library in small steps to get it to do what you want.  Come up with that plan and then come and talk with me and I'll help you implement it.