# Intermediate Python

---

## Object Oriented Programming

[Source](https://www2.unil.ch/phylo/teaching/python/lecture3.pdf)

Object-oriented programming: a programming paradigm based on the concept of “objects”, which are data structures containing data (or attributes) and methods (or procedures).

### Objects in python

We have already used objects.

n = 12 # n is an object of type integer
s = ’ACAGATC’ # s is an object of type string
l = [12, ’A’, 21121, ’ACCAT’] # ls is an object of type list

These objects

- contain data (the number 12, the string ’ACAGCTC’, . . .)
- can be modified/manipulated

        s.count(’A’) 
        s.lower() 
        l.append(’121’) 
   
    
### Extending data types

Python standard data types

For most simple programs, we can usually survive well with standard Python data types. This includes
- numbers, strings
- tuples, lists, sets, dictionaries

__Defining your own data types__

It might be useful though to create your own data types: objects built to your own specifications and organised in the way convenient to you.
This is done through an object definitions known as classes.
                        
#### Class vs object

Implementation vs instantiation

The class is the definition of a particular kind of object in terms of its component features and how it is constructed or implemented in the code.
The object is a specific instance of the thing which has been made according to the class definition. Everything that exists in Python is an object.


#### Classes vs functions

Functions _do_ specific things, classes _are_ specific things that can also _do_ speficic things.

Classes can have methods, which are functions that are associated with a particular class, and do things associated with the thing that the class is - but if all you want is to do something, a function is all you need.


#### OOP in Python

Common principle of OOP is that a class definition
- makes available certain functionality
-   hides internal information about how a specific class is implemented

This is called encapsulation and information hiding.

Python is however quite permissive and you can access any element of an object if you know how to do that.
                        

### An example

__A sequence object__

We need to store data:

- species name
- sequence in DNA and amino-acid   protein name
- length of the sequences
- percentage of GC
- ...

We need to be able to manipulate the data using methods:
- add/remove nucleotide or amino-acid (and update the other data dependent on it)
- translate DNA to amino-acid and inversely
- print the sequence in various ways
- calculate some characteristics
- ...
                      
          

### Class definition

    class Sequence:
        # some statements
        
        
- common practice to save each class into their specific files. Use then the from Sequence import Sequence

### Inheritance

    class MultipleSeqAl(Sequence):
        # some statements

- inheriting methods from a superclass
- classes can have more than one superclass
                        

### Class functions

Providing object capabilities

- functions are defined within the construction of a class
- defined in the same way as ordinary functions (indented within the class code block)
- accessed from the variable representing the object via ’dot’ syntax 

        name = mySequence.getName()

- getName() knows which Sequence object to use when fetching the name

first argument is special: it is the object called from (self)

In [None]:
class Sequence:
    def getName(self):
        return self.name

    def getCapitalisedName(self):
        name = self.getName()
        if name:
            return name.capitalize()
        else:
            return name

Remarks on functions

__Order of functions__

- order of functions does not matter
- if function definition appears more than once, the last instead replaces the previous one

        class MultipleSeqAl(Sequence): 
            def getMSA(self):
                # function implementation

            def getSequenceIdentity(self): 
                # function implementation

Using subclasses
- call specific functions as ususal msa.getMultipleSeqAl()
- can also call msa.getName() from Sequence directly because of inheritance
- however, .getMSA() cannot be accessed from an object Sequence

__Object attributes__
- Variables tied to the object
- attributes hold information useful for the object and its functions
- e.g. associate a variable storing sequence name in Sequence objects

__Class attributes__
- specific to a particular object
- defined inside class functions
- use the self keyword to access it

__Object attributes__
- available to all instances of a class
- defined outside all function blocks
- usually used for variable that do not change   accessed directly using the variable name
- bare function names are also class attributes

__Examples of class attributes__

#### Object life cycle

__Birth, life and death__
- creation of object handled in a special function called constructor
- removal is handled by a function called destructor
- Python has automatic garbage collection, usually no need to define a destructor

__Class constructor__
- called whenever the corresponding object is created
- use a special name: __init__
- first argument is the object itself (i.e. self)
- any other arguments you need to create the object
- good idea to introduce a key to uniquely identifies objects of a given class
                        

When to create attributes
- attributes can be created in any class function (or directly on the object)
- convention to create most of them in the constructor either directly or through the call to a function
- set it to None if it cannot be set at object creation 
- constructor are inherited by subclasses
                      

In [None]:
class Sequence:

    # A class attribute. It is shared by all instances of this class
    sequence_type = "sequence"

    # Basic initializer, this is called when this class is instantiated.
    # Note that the double leading and trailing underscores denote objects
    # or attributes that are used by python but that live in user-controlled
    # namespaces. Methods(or objects or attributes) like: __init__, __str__,
    # __repr__ etc. are called magic methods (or sometimes called dunder methods)
    # You should not invent such names on your own.

    def __init__(self, string):

        # Assign the argument (string) to the instance's seq attribute
        self.seq = string

        # Initialize property
        self.length = len(self.seq)

        # Initialize property
        self.source = ""

    # A class method is shared among all instances
    # They are called with the calling class as the first argument
    @classmethod
    def get_seq(cls):
        return cls.seq

    # A property is just like a getter.
    # It turns the method len() into an read-only attribute of the same name.
    @property
    def lenth(self):
        return self._length

    # A property is just like a getter.
    # It turns the method len() into an read-only attribute of the same name.
    @property
    def source(self):
        return self._source

    # This allows the property to be set
    @source.setter
    def source(self, source):
        self._source = source

In [None]:
# Instantiate a class
seq1 = Sequence("atcg")
seq1.source = "NIH"

seq2 = Sequence("ttaggg")
seq2.source = "UTSW"

# seq1 and seq2 are instances of type Sequence, or in other words: they are Sequence objects

# Call our class method
print(seq1.source)
print(seq1.seq)
print(seq2.source)
print(seq2.seq)

## Multiple Inheritance

In [None]:
# Another class definition
class DNA_Sequence(Sequence):

    sequence_type = "DNA"

    def __init__(self, seq, gc_percentage=0.0):
        self.gc = gc_percentage
        self.adapter = False
        super().__init__(seq)

    # And its own method as well
    def get_gc(self):
        return self.gc

In [None]:
dna = DNA_Sequence("atgc", 0.5)
dna.source = "NIH"

print(dna.get_gc())
print(dna.source)

To take advantage of modularization by file you could place the classes above in their own files,

say, sequence.py and dna_sequence.py

to import functions from other files use the following format

    from "filename-without-extension" import "function-or-class"

    # dna_sequence.py
    from sequence import Sequence

#### Advantages and Disadvantages of Object-Oriented Programming (OOP)
[Source](https://www.saylor.org/site/wp-content/uploads/2013/02/CS101-2.1.2-AdvantagesDisadvantagesOfOOP-FINAL.pdf)

Some of the advantages of object-oriented programming include:
1. Improved software-development productivity: modularity, extensibility, and reusability
2. Improved software maintainability
3. Faster development: Reuse enables faster development
4. Lower cost of development: Reuse of software also lowers the cost of development
5. Higher-quality software: More time for verification 


Disadvantages of object-oriented programming include:
1. Steep learning curve
2. Larger program size: OOP typically involve more lines of code than procedural programs
3. Slower programs
4. Not suitable for all types of problems: There are problems that lend themselves well to functional-programming style, logic-programming style, or procedure-based programming style, and applying object-oriented programming in those situations will not result in efficient programs.  

---

# Advanced DNA class

Source: http://www.mularoni.com/python_course/advanced.html

In [None]:
%%writefile DNA_class.py
class DNA:

    """Class representing DNA as a string sequence."""

    basecomplement = {"a": "t", "c": "g", "t": "a", "g": "c"}

    standard = {
        "ttt": "F",
        "tct": "S",
        "tat": "Y",
        "tgt": "C",
        "ttc": "F",
        "tcc": "S",
        "tac": "Y",
        "tgc": "C",
        "tta": "L",
        "tca": "S",
        "taa": "*",
        "tca": "*",
        "ttg": "L",
        "tcg": "S",
        "tag": "*",
        "tcg": "W",
        "ctt": "L",
        "cct": "P",
        "cat": "H",
        "cgt": "R",
        "ctc": "L",
        "ccc": "P",
        "cac": "H",
        "cgc": "R",
        "cta": "L",
        "cca": "P",
        "caa": "Q",
        "cga": "R",
        "ctg": "L",
        "ccg": "P",
        "cag": "Q",
        "cgg": "R",
        "att": "I",
        "act": "T",
        "aat": "N",
        "agt": "S",
        "atc": "I",
        "acc": "T",
        "aac": "N",
        "agc": "S",
        "ata": "I",
        "aca": "T",
        "aaa": "K",
        "aga": "R",
        "atg": "M",
        "acg": "T",
        "aag": "K",
        "agg": "R",
        "gtt": "V",
        "gct": "A",
        "gat": "D",
        "ggt": "G",
        "gtc": "V",
        "gcc": "A",
        "gac": "D",
        "ggc": "G",
        "gta": "V",
        "gca": "A",
        "gaa": "E",
        "gga": "G",
        "gtg": "V",
        "gcg": "A",
        "gag": "E",
        "ggg": "G",
    }

    def __init__(self, s="", name=""):
        """Create DNA instance initialized to string s."""
        self.seq = s.lower()
        self.seq = self.cleandna(self.seq)
        self.len = len(self.seq)
        self.name = name

    def cleandna(self, s):
        """Return dna only composed by letters ['a', 'c', 'g', 't', 'n']."""
        new_sequence = ""
        nucleotides = ["a", "c", "g", "t", "n"]
        for c in s:
            if c not in nucleotides:
                continue
            new_sequence += c
        return new_sequence

    def getname(self):
        """Return the name of the sequence."""
        if self.name:
            return self.name
        else:
            return "unknown"

    def getsequence(self):
        """Return the dna sequence."""
        return self.seq

    def setname(self, name):
        """Set the name of the sequence."""
        self.name = name

    def setsequence(self, s):
        """Set the sequence content."""
        self.seq = s.lower()
        self.len = len(self.seq)

    def transcribe(self):
        """Return as rna string."""
        return self.seq.replace("t", "u")

    def reverse(self):
        """Return dna string in reverse order."""
        letters = list(self.seq)
        letters.reverse()
        return "".join(letters)

    def complement(self):
        """Return the complementary dna string."""
        comp = ""
        letters = list(self.seq)
        for base in letters:
            comp += self.basecomplement[base]
        return comp

    def reversecomplement(self):
        """Return the reverse complement of the dna string."""
        revcomp = ""
        letters = list(self.seq)
        letters.reverse()
        for base in letters:
            revcomp += self.basecomplement[base]
        return revcomp

    def gc_percentage(self):
        """Return the percentage of dna composed of G+C."""
        s = self.seq
        gc = s.count("g") + s.count("c")
        return gc * 100.0 / len(s)

    def translate(self, frame=1):
        """ translate a DNA like cDNA sequence to a protein 
            possible frames 1,2,3,-1,-2,-3 
        """
        possibleframe = (1, 2, 3, -1, -2, -3)
        if frame not in possibleframe:
            frame = 1  # First frame
        if frame < 0:
            cdna = self.reversecomplement()
            frame = abs(frame) - 1
        else:
            cdna = self.seq
            frame = frame - 1
        code = self.standard
        prot = ""
        i = frame  # Starting frame
        while i <= len(cdna) - 3:  # While there are at least 3 letters
            prot += code.get(cdna[i : i + 3], "?")
            i += 3
        return prot

In [None]:
# This is how we load the class
from DNA_class import DNA

dir(DNA)
[
    "__doc__",
    "__init__",
    "__module__",
    "basecomplement",
    "complement",
    "gc_percentage",
    "getname",
    "getsequence",
    "reverse",
    "reversecomplement",
    "setname",
    "setsequence",
    "standard",
    "transcribe",
    "translate",
]

# Now let's use it!
dna1 = DNA("CGACAAGGATTAGTAGTTTAC", "mydna1")
dna2 = DNA("gcctgaaattgcgcgc")
dna1.getname()
"mydna1"

dna2.getname()
"unknown"

dna1.getsequence()
"cgacaaggattagtagtttac"

dna2.getsequence()
"gcctgaaattgcgcgc"

dna3 = DNA()
dna3.getsequence()
""

dna3.getname()
"unknown"

dna3.setsequence("gcCVnn % tgacKLtcg")
dna3.setname("dna3")
dna3.getname()
"dna3"

dna3.getsequence()
"gccnntgactcg"


dna = DNA("CGACAAGGATTAGTAGTTTAC", "mydna")
dna.getsequence()
"cgacaaggattagtagtttac"

dna.transcribe()
"cgacaaggauuaguaguuuac"

dna1.reverse()
"catttgatgattaggaacagc"

dna.complement()
"gctgttcctaatcatcaaatg"

dna.reversecomplement()
"gtaaactactaatccttgtcg"

dna.gc_percentage()
38.095238095238095

dna.translate()  # default frame = 1
"RQGLVVY"
dna.translate(1)  # frame 1
"RQGLVVY"
dna.translate(2)  # frame 2
"DKD**F"
dna.translate(3)  # frame 3
"TRISSL"
dna.translate(-1)  # frame complement 1
"VNY*SLW"
dna.translate(-2)  # frame complement 2
"*TTNPC"
dna.translate(-3)  # frame complement 3
"KLLILV"