## Object orientated programming

The aim of this workbook is to facilitate assignment 5.


We will start by definining some terminology. <br><br>

<font color=blue>**Class**</font><br>
A `class` is a blue print for defining objects. It is important to understand that defining a class does not create an `object` directly, it only tells us what an `instance` of that class will look like. We can have as many instances of a class as we like. an object is an instance of a class and are used interchangeably <br><br>

<font color=blue>**Method**</font><br>
We can think of a `Method` in the same way we would a function. A `Method` performs an operation on our object. <br><br>

<font color=blue>**Constructor**</font><br>
A `constructor` is a special kind of method for initializing the object. The job of the constructor is to create a new object and set all its variables. The constructor method is defined by ```__init__```<br><br>

<font color=blue>**Instance variables**</font><br>
`Instance Variables` are variables that are attached to a particular object. They can be `strings`,  `integers`, `Lists`, `File objects` etc...<br><br>

<font color=blue>**Self**</font><br>
`Self` is how we refer to the object within the method


Lets define two functions <br>
1st) Get the AT content of a sequence of DNA <br>
2nd) Reverse complement the sequence of DNA  

In [1]:
def get_AT(dna): 
    length = len(dna)
    a_count = float(dna.count('A'))
    t_count = float(dna.count('T'))
    at_content = ((a_count + t_count) / length)
    return at_content

def complement(dna): 
    complement = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'} # dictionary
    return ''.join([complement[base] for base in dna[::-1]]) # list comprehension with string operation([::-1]) to read string from right to left


In [2]:
dna_sequence = "ACTGATCGTTACGTACGAGTCAT"
print(get_AT(dna_sequence))
print(complement(dna_sequence))

0.5652173913043478
ATGACTCGTACGTAACGATCAGT


Often we would like to attach metadata to our data. *i.e* our dna sequence might be a gene in a particular species 

In [9]:
dna_sequence = "ACTGATCGTTACGTACGAGTCAT"
species = "Drosophila melanogaster (fruitfly)"
gene_name = "ABC1"
print("Looking at the {} {} gene".format(d1.species_name, d1.gene_name))
print("AT content is {}".format(get_AT(dna_sequence)))
print("complement is {} ".format(complement(dna_sequence)))

Looking at the Drosophila melanogaster ABC1 gene
AT content is 0.5652173913043478
complement is ATGACTCGTACGTAACGATCAGT 


This is all well and good but if we have several 1000 genes and multiple species this way is no longer feasible. We could use a dictionary to store our data but dictionary keys need to be unique so this will not work.
<br><br>
It would be nice if we could put all of our information in one block like unit. We could define a complex data structure such as a list of dictionaries to do this but a better way would be to define a class and create distinct objects. <br> <br>
We can see that the `contructor method` is defining **3** instance variables. 

In [4]:
class DNARecord(): 
    def __init__(self, sequence, gene_name, species_name):
        self.sequence = sequence
        self.gene_name = gene_name
        self.species_name = species_name
    
    def complement(self):
        comple = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
        return ''.join([comple[base] for base in self.sequence[::-1]])
    
    def get_AT(self):
        length = len(self.sequence)
        a_count = float(self.sequence.count('A'))
        t_count = float(self.sequence.count('T'))
        at_content = ((a_count + t_count) / length)
        return at_content
    

d1 = DNARecord('ATATATTATTATATTATA', 'COX1', 'Homo sapiens')
print(d1.complement()) 
    


TATAATATAATAATATAT


In [8]:
# imperative code (see lecture 19)
dna_sequence = "ACTGATCGTTACGTACGAGT"
species = "Drosophila melanogaster"
gene_name = "ABC1"
print("Looking at the {} {} gene".format(species,gene_name))
print("AT content is {}".format(get_AT(dna_sequence)))
print("complement is {}".format(complement(dna_sequence)))

# object oriented code
d1 = DNARecord("ACTGATCGTTACGTACGAGT", "ABC1", "Drosophila melanogaster")
print("Looking at the {} {} gene".format(d1.species_name, d1.gene_name))
print("AT content is {}".format(d1.get_AT()))
print("complement is {}".format(d1.complement()))

Looking at the Drosophila melanogaster ABC1 gene
AT content is 0.55
complement is ACTCGTACGTAACGATCAGT
Looking at the Drosophila melanogaster ABC1 gene
AT content is 0.55
complement is ACTCGTACGTAACGATCAGT


The difference here is for the imperative code we have stored 3 bits of data separately and passed them to functions to get the answers we wanted. In the object orientated code we have packaged our 3 bits of data into an object and asked the object directly for what we were looking for. 

We can pass other values to each method defined in our class. These values can be `strings`, `lists` or even other `Objects` <br>
Lets define a new method in our class that takes another object and compares the length of dna sequence between the 2 objects. 

In [6]:
class DNARecord():
    def __init__(self, sequence, gene_name, species_name):
        self.sequence = sequence
        self.gene_name = gene_name
        self.species_name = species_name
    
    def complement(self):
        comple = {'A': 'T', 'C': 'G', 'G': 'C', 'T': 'A'}
        return ''.join([comple[base] for base in self.sequence[::-1]])
    
    def get_AT(self):
        length = len(self.sequence)
        a_count = float(self.sequence.count('A'))
        t_count = float(self.sequence.count('T'))
        at_content = ((a_count + t_count) / length)
        return at_content
    
    def comp_AT(self,other):
        if len(self.sequence) == len(other.sequence):
            return "{} for species {} is equal to {} in {}".format(self.gene_name, self.species_name, other.gene_name, other.species_name)
        elif len(self.sequence) < len(other.sequence):
            return "{} for species {} is longer than {} in {}".format(other.gene_name, other.species_name,self.gene_name, self.species_name)
        elif len(self.sequence) > len(other.sequence):
            return "{} for species {} is longer than to {} in {}".format(self.gene_name, self.species_name, other.gene_name, other.species_name)

In [7]:
d1 = DNARecord("ACTGATCGTTACGTACGAGT", "ABC1", "Drosophila melanogaster")
d2 = DNARecord("GCTACTGACATCGTTACCGTAGT", "ABC1", "Ba humbugi")
d3 = DNARecord("GCTTACACAGCTACTACGGGCAATAT", "ABC1", "Veni, vidi, vici")
print(d1.comp_AT(d1))
print(d1.comp_AT(d2))
print(d1.comp_AT(d3))
print(d2.comp_AT(d1))
print(d2.comp_AT(d1))
print(d3.comp_AT(d2))


ABC1 for species Drosophila melanogaster is equal to ABC1 in Drosophila melanogaster
ABC1 for species Ba humbugi is longer than ABC1 in Drosophila melanogaster
ABC1 for species Veni, vidi, vici is longer than ABC1 in Drosophila melanogaster
ABC1 for species Ba humbugi is longer than to ABC1 in Drosophila melanogaster
ABC1 for species Ba humbugi is longer than to ABC1 in Drosophila melanogaster
ABC1 for species Veni, vidi, vici is longer than to ABC1 in Ba humbugi
