## Object oriented programming 
### Creating your own object types

### BIOINF 575

---
##### Adapted from material created by Marcus Sherman
---


You can do perfectly good data science _without_ ever writing a `class`. 

However, using `Object-Oriented Programming` can make your data science <u>easier to write</u>, <u>easier to read</u>, and <u>more intuitive</u> while also making it **more shareable/extensible**.

---
#### Object-Oriented Programming

Whenever you code in Python, you should always have a similar questions that you ask yourself during your workflow: "What do I have?" and "What do I need?". While working on subcomponents of a function, you should always ask yourself "What ***kind*** of object am I working with, and what does it do?"

In Python, ***EVERYTHING*** is an object!

In [2]:
# Different types

type(list)

type

In [4]:
{1:"d"}

{1: 'd'}

In [6]:
type({1:"d"})

dict

In [8]:
dir(list)

['__add__',
 '__class__',
 '__class_getitem__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

### So what _are_ objects?

<img width = 450 src='https://ih1.redbubble.net/image.9426655.9925/fc,550x550,silver.jpg'/>

---
### A New Frontier

Up to this point, we have used objects already defined for us. However, we are not limited by those boundaries, we can *make* our own objects. This is done through the `class` keyword.

<img src='https://ds055uzetaobb.cloudfront.net/image_optimizer/9996aa83f77a2837f41a4de7f2ab517168716532.png' width = 500/>

Using `class` is much like `def` functions. However, later on we get to play around with some of those 'dunder' (\_\_) methods we have been steering you away from.

### First, the syntax

<img src='class_def.png' width=700 align='left'/>

### The Big Idea 
> The idea behind objects is to **bundle** coherent <u>methods</u> (things the object can _do_) and <u>attributes</u> (things the object _has_) that logically go together into a well-defined _interface_.

They are a data abstraction that has 2 main jobs:
1. Captures internal *representation* of the data it is abstracting
2. Creates an *interface* for the abstracted data

#### Let's think about a gene and basic information we want to store about it.
We could make a dictionary:

```python

{"symbol": "Gene1", "seq": "AACGT"}

```


While this structure works perfectly fine, if we want to add new elements or add a function specific to the gene it will be hard to keep track of it all.
It makes sense that all of these things could be wrapped up into a single object (_mainly because it is hard to manage and add new funtionality to it_).

#### Let's make a `Gene` object 

- <font color = "red">Use the `__init__` method to define class variables/ attributes</font>
- Write functions in the class for extra functionality
- Use `self` to refer to the object you are creating



In [42]:
class Gene:
    def __init__(self, psymbol = "Gene1", pseq = "AACGT"):
        self.symbol = psymbol
        self.sequence = pseq

    def __repr__(self):
        return f"Gene('{self.symbol}','{self.sequence}')"
        
    def __str__(self):
        return self.symbol
    
    def get_startseq(self,n):
        return self.sequence[:n]
  

In [44]:
g = Gene()
g

Gene('Gene1','AACGT')

In [54]:
g.get_startseq(3)

'AAC'

In [36]:
g1 = Gene("Test", "TTTT")
g1

Gene('Test','TTTT')

In [38]:
print(g)

Gene1


In [16]:
type(g)

__main__.Gene

In [18]:
type(Gene)

type

In [20]:
dir(g)

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'get_startseq',
 'sequence',
 'symbol']

In [22]:
dir(Gene)

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'get_startseq']

---
We need to take a second to talk about 3 things real quick:
1. Functions within `class`es (like `self.update_status`) are called ***methods*** or procedural attributes
2. `self.symbol` and `self.sequence` are called ***attributes*** since they only contain data
3. What in the world is `self`?

**PS**: `self` is a parameter that allows an object to look back at its self. Specifically the current *instance* of itself. However, outside of writing the class, you will never actually have to pass the word `self` into the methods.

Let's use our new class.

In [56]:
dir(Gene)

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'get_startseq']

In [58]:
type(Gene)

type

In [60]:
g = Gene()

In [62]:
g

Gene('Gene1','AACGT')

In [64]:
str(g)

'Gene1'

In [66]:
g.sequence

'AACGT'

In [68]:
g.symbol

'Gene1'

In [70]:
len(g)

TypeError: object of type 'Gene' has no len()

In [72]:
g + g1

TypeError: unsupported operand type(s) for +: 'Gene' and 'Gene'

In [74]:
g

Gene('Gene1','AACGT')

In [76]:
g.symbol = "New symbol"
g

Gene('New symbol','AACGT')

____

#### Let's add more functionality to the `Gene` class
- Implement specific dunder methods for added functionality 
    - the dunder method name is representative for the function/ operation it is used for
- Use @property to set up read-only attributes
- Add documentation using docstrings

In [94]:
# Gene object
class Gene:
    """
    Contains the information about a Gene such as the symbol, description,
    exon number, and sequence 
    
    Attributes:
    symbol (str):  gene symbol
    description (str):  gene description
    exon_no (int): total number of exons for the gene
    sequence (str): the gene sequence
    status (str): the gene status, one of 'new', 'current', 'deprecated', 'in process' 
    """
    def __init__(self, psymbol = "Gene1", pdesc = "Gene for testing", 
                 pexon_no = 1, pseq = "AACGT"):
        self.symbol = psymbol
        self.description = pdesc
        self.exon_no = pexon_no
        self.sequence = pseq
        self.__status = "current"
        
    def __str__(self):
        return f"Gene('{self.symbol}','{self.description}',{self.exon_no},'{self.sequence}')"
    
    def __repr__(self):
        return f"Gene('{self.symbol}','{self.description}',{self.exon_no},'{self.sequence}')"

    def __len__(self):
        return len(self.seq)
    
    def __add__(self, gene):
        new_gene = Gene(self.symbol + gene.symbol,
                    self.description + gene.description,
                    self.exon_no + gene.exon_no,
                    self.sequence + gene.sequence)
        new_gene.update_status("new")
        return new_gene

        
    @property # getter
    def status(self):
        """
        Get the status for the gene
        """
        return self.__status
    
    # @status.setter # property setter - same name as the property
    # def status(self, pstatus):
    #    self.__status = pstatus
             
    def update_status(self, pstatus):
        """
        Updates the status of a gene
        """
        self.__status = pstatus
        
        

In [96]:
# Use Gene - explore
# create/init Gene objects, we need the __init__ method
# The constructor without arguments works if we have default values
# The cell will use the __repr__ method to display the representation of the object

g = Gene()
g

Gene('Gene1','Gene for testing',1,'AACGT')

In [98]:
g.status

'current'

In [100]:
g.__status

AttributeError: 'Gene' object has no attribute '__status'

In [102]:
dir(g)

['_Gene__status',
 '__add__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__len__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'description',
 'exon_no',
 'sequence',
 'status',
 'symbol',
 'update_status']

In [104]:
g.update_status("deprecated")
print(g.status)
g.status = "current"
g.status


deprecated


AttributeError: property 'status' of 'Gene' object has no setter

In [106]:
g.status

'deprecated'

In [108]:
len(g)

5

_____

See, I didn't need to use `self` on the outside.
_____

Wait, with just that, we made a new object? I don't believe you...

In [110]:
# what did we create? type()

type(g)


__main__.Gene

In [114]:
# we need the __str__ method to print the object value

type(Gene)



type

In [116]:
# dir
dir(Gene)


['__add__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__len__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'status',
 'update_status']

In [118]:
dir(g)

['_Gene__status',
 '__add__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__len__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'description',
 'exon_no',
 'sequence',
 'status',
 'symbol',
 'update_status']

In [120]:
# check attributes

g.exon_no

1

In [122]:
g.exon_no = 10

In [124]:
g.exon_no

10

In [126]:
# check the status
g.status


'deprecated'

In [128]:
# try to change the status
g.status= "testing"


AttributeError: property 'status' of 'Gene' object has no setter

In [130]:
# Use the update_status method to change the status of the gene

g.update_status("testing")
g.status

'testing'

In [None]:
# get the gene length - works only if __len__ is implemented

In [132]:
len(g)

5

In [134]:
# add two genes - works only if __add__ is implemented

g1 = Gene()
g1

Gene('Gene1','Gene for testing',1,'AACGT')

In [136]:
g1.symbol = "New_Gene"
g1.description = "Desc of new gene"
g1.exon_no = 5
g1.sequence = "GGGGGGG"
g1

Gene('New_Gene','Desc of new gene',5,'GGGGGGG')

In [138]:
g


Gene('Gene1','Gene for testing',10,'AACGT')

In [140]:
g + g1

Gene('Gene1New_Gene','Gene for testingDesc of new gene',15,'AACGTGGGGGGG')

In [142]:
g1 + g

Gene('New_GeneGene1','Desc of new geneGene for testing',15,'GGGGGGGAACGT')

In [144]:
# multiply two genes - works only if __mul__ is implemented
g1 * g


TypeError: unsupported operand type(s) for *: 'Gene' and 'Gene'

In [146]:
class Gene:
    def __init__(self, psymbol , pseq ):
        self.symbol = psymbol
        self.sequence = pseq

    def __repr__(self):
        return f"Gene('{self.symbol}','{self.sequence}')"
        
    def __str__(self):
        return self.symbol
    
    def get_startseq(self,n):
        return self.sequence[:n]

In [148]:
Gene

__main__.Gene

In [158]:
g = Gene("MyGene", "CCCC")
g

Gene('MyGene','CCCC')

In [152]:
dict()

{}

#### Resources
https://docs.python.org/3/tutorial/classes.html     
https://docs.python.org/3/reference/datamodel.html      
https://python-textbok.readthedocs.io/en/1.0/Classes.html      
https://www.w3schools.com/python/python_classes.asp          
https://www.geeksforgeeks.org/python-classes-and-objects/          
https://www.tutorialspoint.com/python/python_classes_objects.htm   
https://python-course.eu/oop/inheritance.php     
https://www.geeksforgeeks.org/python-oops-concepts/      
https://gist.github.com/rurtubia/f5c506f414bb85efc4d8
