# Object Oriented Aligner

Objects are just things that have *attributes* (properties, characteristics, ..) and *methods* (actions they can do). For example, a cat has:
- Attributes: color, race, name, color of eyes, age...
- Methods: beign angry, eat, sleep, meow
And an apple:
- Attributes: color, origen, price...
- Growth, ripe...

In this notebook we will write the templates for the objects we need to do a proper multiple sequence aligner.

The class is the template for doing all this objects. A created object is call **instace**.

### Sequence

The first thing we need to align sequences are sequences. Surprise! For now a sequence has a name, a sequence itself and some features as attributes and no method. We can add them later on. 

To aviod people from outside using the attributes directly and do a safe code we use the **_** to determine internal attributes and *properties* to show them outside.

In [80]:
class Sequence:
    def __init__(self, name, sequence, features):
        self._name = name
        self._seq  = sequence
        self._feat = features
    
    @property
    def name(self):
        return self._name
    
    @property
    def sequence(self):
        return self._seq
    
    @property
    def features(self):
        return self._feat
    
    def appendSequence(self, newSeq):
        self._seq += newSeq

In [58]:
seq1 = Sequence('seq1', 'ACGT', 'First sequence')
print(seq1._name)
print(seq1.name)
# Is the same output! But the seq1.name is safer to use!

seq1
seq1


In [59]:
seq1.sequence + 'dmskl'

'ACGTdmskl'

## Tree

#### Nodes

Then, we need a tree. Trees are builded from nodes and branches. We can do a template for both, however, we will never use branches and I am lazy, so I will only code the template for nodes.
So, a node is a point inside the a tree. They a parental node. They may have a right and left child. If the node has no child they will be 0! They may have, as well, name, but only if they are trees. They have the method to print theirselves, and a method to print their neighbours. Indeed, we do not even need the tree if we program the nodes correctly! The less the better!!

In [105]:
class Node:
    def __init__(self, name=None, leftChild=0, rightChild=0, parent=0):
        self._name    = name
        self._left    = leftChild
        self._right   = rightChild
        self._parent  = parent
        self.distance = 0

    def printNode(self):
        print('Name: '   + str(self._name))
    
    def printNeighbours(self):
        print('Parent: ' + str(self._parent))
        print('Right: '  + str(self._right))
        print('Left: '   + str(self._left))
        
    @property
    def rightChild(self):
        return self._rigth
    
    @property
    def leftChild(self):
        return self._left
    
    @property
    def parent(self):
        return self._parent

In [106]:
node1 = Node()
node1.printNode()
node1.printNeighbours()

Name: None
Parent: 0
Right: 0
Left: 0


## The Multiple Sequence Aligment!

Yes! We are here! Crazy right? Let's remember what it is needed for a multiple sequence aligment:
- Attibutes:
    - Sequences: All the sequences in a dictionary from `name` to `Sequence instance`.
    - A matrix: It is a two dimension dictionary. From `(A to B)` to the `score` of that subtitution.
    - Nodes: All the nodes will be saved in a dictionary `nodeId` to the `Node instance`.
    - Aligment: Yes, it is what you guess.
- Methods:
    - Function to read the fasta file.
    - A function to read an already done matrix.
    - A function to read a newick tree.
    - A function to do the pairwise alignments.
    - A function to iterate through the tree and knowing when to align.

In [104]:
class MultipleSequenceAlignment:
    import re
    def __init__(self, seqFile, matFile, newkFile):
        # Input files 
        self._seqPath  = seqFile
        self._matPath  = matFile
        self._newkPath = newkFile
        
        # Parse data
        self._sequences =  readFastaFile(seqFile)
        self._matrix    =  readMatrixFile(matFile)
        self._nodes     =  readNewickFile(newkFile)
        
        # Data
        self._numNodes = len(nodes) - 1
        
    
    def readFastaSFile(file):
        sequences = {}
        with open(file) as f:
            for line in f:
                # If this line is name line
                if re.match(r'^>', line):
                    matchObject = re.match(r'^>(\S*)\s*(.*)', line)

                    if (matchObject):
                        name = matchObject.group(1)
                        feat = matchObject.group(1)
                        sequences[name] = \
                            Sequence(name=name, sequence='', features=feat) 
                # Otherwise
                else:
                    sequences[name].appendSequence(line.strip())
        return sequences
    
    def readMatrixFile(filename):
        # Read the file
        handle = open(filename, 'r')
        content= handle.readlines()
        handle.close()

        # Set up the matrix file
        matrix  = {}
        letters = []
        numline = len(content) 

        for nl in range(0, numline):
            line = content[nl]
            splt = line.split()
            a = splt[0]
            if a not in matrix:
                matrix[a] = {}
                letters.append(a)

        # Go throug the file and save the values
        for nl in range(0, numline):
            line = content[nl]
            splt = line.split()
            l = len(splt)
            aa1 = splt[0]
            for a in range(0, len(letters)):
                aa2 = letters[a]
                matrix[aa1][aa2] = splt[a]
                matrix[aa2][aa1] = splt[a]

        return matrix
    
    

    

In [96]:
MSA1 = MultipleSequenceAlignment('s', 's', 's')

In [98]:
sequences = readFastaFile('cyc1_sequences.txt')

In [99]:
for i in sequences:
    print(i, sequences[i].sequence)

tr|D7NYA8|D7NYA8_CYNSP VMLSALGMLAAGGAGLAVALHSAVSASDLELHPPSYPWSHRGLLSSLDHTSIRRGFQVYKQVCSSCHSMDYVAYRHLVGVCYTEEEAKALAEEVEVQDGPNEDGEMFMRPGKLSDYFPKPYANPEAARAANNGALPPDLSYIVRARHGGEDYVFSLLTGYCEPPTGVSLREGLYFNPYFPGQAIGMAPPIYDEVLEFDDGTPASMSQVAK
tr|E2REM0|E2REM0_CANLF MAVAAASLRGAVLGPRGAGLPGAGARGLLCGARLGQLPLRTSQAVPLSSKAGPSRGRKVMLSALGMLAAGGAGLAVALHSAVSASDLELHPPSYPWSHRGLLSSLDHTSIRRGFQVYKQVCSSCHSMDYVAYRHLVGVCYTEDEAKALAEEVEVQDGPNEDGEMFMRPGKLSDYFPKPYPNPEAARAANNGALPPDLSYITRARHGGEDYVFSLLTGYCEPPTGVSLREGLYFNPYFPGQAIGMAPPIYDEVLEFDDGTPATMSQVAKDVCTFLRWASEPEHDHRKRMGLKMLMMMGLLFPLIYAMKRHKWSVLKSRKLAYRPPK
tr|F6VRY7|F6VRY7_HORSE SGLSRGRKVMLSALGMLAAGGAGLAVALHSAVSASDLELHPPSYPWSHRGLLSSLDHTRSSGRGFQVYKQVCSSCSMDYVAYRLVGVCYTEDEAKALAEEVEVQDGPNEDGEMFMRPEKLSDYFPKPYPNPEAARAANNGALPPDLSYIIRARHGGEDYVFSLLTGYCEPPTGVSLREGLYFNPYFPGQAIGMAPPIYNEVLEFDDGTPATMSQVAKDVCTFLRWASEPEHDHRKRMGLKMLMMMGLLLPLVYAMKRHKWSVLKSRKLAYRPPK
tr|G1LHJ1|G1LHJ1_AILME HGRSAARRNPALKDRPRPAPALPGSRGCGLGAGGFPRRARPGPLPLRPSRLGLRPEAVSLSSKAGLSRGRKVMLSALGMLAAGGAGLAVALHSAVS

In [None]:
import re