## Object oriented programming 
### Creating your own object types

### BIOINF 575 - Fall 2023

---
##### Adapted from material created by Marcus Sherman
---


You can do perfectly good data science _without_ ever writing a `class`. 

However, using `Object-Oriented Programming` can make your data science <u>easier to write</u>, <u>easier to read</u>, and <u>more intuitive</u> while also making it **more shareable/extensible**.

---
#### Object-Oriented Programming

Whenever you code in Python, you should always have a similar questions that you ask yourself during your workflow: "What do I have?" and "What do I need?". While working on subcomponents of a function, you should always ask yourself "What ***kind*** of object am I working with, and what does it do?"

In Python, ***EVERYTHING*** is an object!

In [1]:
# Different types

type(list)

type

In [2]:
{1:"d"}

{1: 'd'}

In [4]:
type({1:"d"})

dict

In [5]:
dir(list)

['__add__',
 '__class__',
 '__class_getitem__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

### So what _are_ objects?

<img width = 450 src='https://ih1.redbubble.net/image.9426655.9925/fc,550x550,silver.jpg'/>

---
### A New Frontier

Up to this point, we have used objects already defined for us. However, we are not limited by those boundaries, we can *make* our own objects. This is done through the `class` keyword.

<img src='https://ds055uzetaobb.cloudfront.net/image_optimizer/9996aa83f77a2837f41a4de7f2ab517168716532.png' width = 500/>

Using `class` is much like `def` functions. However, later on we get to play around with some of those 'dunder' (\_\_) methods we have been steering you away from.

The dunder methods (__) implement functionality for initializing the object and applying common operators and functions to the object.  

Ex.    
- for addition to do + between two object we implement \_\_add\_\_
- to be able to display the string representation of the object --- apply the str function - we implement the \_\_str\_\_ method

### First, the syntax

<img src='class_def.png' width=700 align='left'/>

### The Big Idea 
> The idea behind objects is to **bundle** coherent <u>methods</u> (things the object can _do_) and <u>attributes</u> (things the object _has_) that logically go together into a well-defined _interface_.

They are a data abstraction that has 2 main jobs:
1. Captures internal *representation* of the data it is abstracting
2. Creates an *interface* for the abstracted data

#### Let's think about a gene and basic information we want to store about it.
We could make a dictionary:

```python

{"symbol": "Gene1", seq = "AACGT"}

```


While this structure works perfectly fine, if we want to add new elements or add a function specific to the gene it will be hard to keep track of it all.
It makes sense that all of these things could be wrapped up into a single object (_mainly because it is hard to manage and add new funtionality to it_).

#### Let's make a `Gene` object 

- <font color = "red">Use the `__init__` method to define class variables/ attributes</font>
- Write functions in the class for extra functionality
- Use `self` to refer to the object you are creating



In [7]:
class Gene:
    def __init__(self, psymbol = "Gene1", pseq = "AACGT"):
        print("Init is called")
        self.symbol = psymbol
        self.sequence = pseq
        
    def __str__(self):
        return self.symbol
    
    def get_startseq(n):
        return self.sequence[:n]
  

In [13]:
g = Gene()
str(g)

Init is called


'Gene1'

In [14]:
g = Gene("Gene100")
str(g)

Init is called


'Gene100'

In [15]:
g = Gene(pseq = "GGGTACG", psymbol = "Gene50")
str(g)

Init is called


'Gene50'

In [16]:
g.sequence

'GGGTACG'

In [None]:
# Double underscore (dunder) methods implement the functionality of common 
# functions and operators for the objects of the type we define 
# For instance:
# __add__ implements the addition (+) operator
# __gt__ implements the greater than  (>) operator
# __len__ implments the functionality for calling the len function on the object
# __str__ implement the functionality for calling the str function on the object

# __init__ is a special dunder method that is called when we call the class name as a function 
# this method creates an object of this new type

In [8]:
# this is the class ... the blueprint for genes
dir(Gene)

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'get_startseq']

In [3]:
Gene.symbol

AttributeError: type object 'Gene' has no attribute 'symbol'

In [4]:
type(Gene)

type

In [8]:
Gene()

Init is called


<__main__.Gene at 0x7f93f9f4e5b0>

In [9]:
str(Gene())

Init is called


'Gene1'

In [10]:
g = Gene()
str(g)

Init is called


'Gene1'

In [11]:
dir(g)

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'get_startseq',
 'sequence',
 'symbol']

In [13]:
g.sequence

'AACGT'

In [15]:
g1 = Gene("Gene100")
str(g1)

'Gene100'

In [16]:
g1.sequence

'AACGT'

In [26]:
class Gene:
    def __init__(self, psymbol, pseq = ""):
        self.symbol = psymbol
        self.sequence = pseq
        
    def __str__(self):
        return self.symbol
    
    def __repr__(self):
        return "Gene('" + self.symbol + "','" + self.sequence + "')"
    
    def get_startseq(n):
        return self.sequence[:n]

In [27]:
str(Gene("Gene2"))

'Gene2'

In [28]:
g2 = Gene("Gene20", "AGACGGGTTGAT")
g2

Gene('Gene20','AGACGGGTTGAT')

In [29]:
g3 = Gene('Gene20','AGACGGGTTGAT')
g3.symbol

'Gene20'

In [30]:
g2.get_startseq(4)

TypeError: get_startseq() takes 1 positional argument but 2 were given

In [31]:
Gene.get_startseq(4)

NameError: name 'self' is not defined

In [33]:
class Gene:
    def __init__(self, psymbol, pseq = ""):
        self.symbol = psymbol
        self.sequence = pseq
        
    def __str__(self):
        return self.symbol
    
    def __repr__(self):
        return "Gene('" + self.symbol + "','" + self.sequence + "')"
    
    def get_startseq(self, n):
        return self.sequence[:n]
    
    def general_computation(seq1, seq2):
        """
        returns the shortest sequence
        """
        res = seq2
        if len(seq1) < len(seq2):
            res = seq1
        return res
        

In [37]:
g = Gene("Gene20", "AACGT")

In [38]:
g

Gene('Gene20','AACGT')

In [39]:
g.get_startseq(4)

'AACG'

In [40]:
Gene.get_startseq(g,4)

'AACG'

In [43]:
g.general_computation("AAA", "CGTTA")

TypeError: general_computation() takes 2 positional arguments but 3 were given

In [44]:
Gene.general_computation("AAA", "CGTTA")

'AAA'

In [None]:
# if we call the method from the object (g) the first parameter will be the object
# if we call the method from the class (Gene) the first parameter will be whatever we provide as the first parameter


In [45]:
g.general_computation("AAA")

TypeError: object of type 'Gene' has no len()

In [46]:
class Gene:
    def __init__(self, psymbol, pseq = ""):
        self.symbol = psymbol
        self.sequence = pseq
        
    def __str__(self):
        return self.symbol
    
    def __repr__(self):
        return "Gene('" + self.symbol + "','" + self.sequence + "')"
    
    def __len__(self):
        return len(self.sequence)
    
    def get_startseq(self, n):
        return self.sequence[:n]
    
    def general_computation(seq1, seq2):
        """
        returns the shortest sequence
        """
        res = seq2
        if len(seq1) < len(seq2):
            res = seq1
        return res
        

In [47]:
g = Gene("EGFR_ex", "AAATTGGCAGT")
g

Gene('EGFR_ex','AAATTGGCAGT')

In [48]:
len(g)

11

---
We need to take a second to talk about 3 things real quick:
1. Functions within `class`es (like `self.get_startseq`) are called ***methods*** or procedural attributes
2. `self.symbol` and `self.sequence` are called ***attributes*** since they only contain data
3. What in the world is `self`?

**PS**: `self` is a parameter that allows an object to look back at its self. Specifically the current *instance* of itself. However, outside of writing the class, you will never actually have to pass the word `self` into the methods.

Let's use our new class.

In [49]:
dir(Gene)

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__len__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'general_computation',
 'get_startseq']

In [50]:
type(Gene)

type

In [52]:
g = Gene("Gene_Symbol1", "GCTTTA")

In [53]:
g

Gene('Gene_Symbol1','GCTTTA')

In [54]:
str(g)

'Gene_Symbol1'

In [55]:
g.sequence

'GCTTTA'

In [60]:
g.sequence = "CCC"

In [61]:
g

Gene('Gene_Symbol1','CCC')

In [62]:
f"Testing {g.symbol}, there we go"

'Testing Gene_Symbol1, there we go'

In [63]:
f"Testing {[1,2,3]}, there we go"

'Testing [1, 2, 3], there we go'

In [64]:
f"Testing {2}, there we go"

'Testing 2, there we go'

In [65]:
f"Testing {g}, there we go"

'Testing Gene_Symbol1, there we go'

____

#### Let's add more functionality to the `Gene` class
- Implement specific dunder methods for added functionality 
    - the dunder method name is representative for the function/ operation it is used for
- Use @property to set up read-only attributes
- Add documentation using docstrings

In [86]:
# Gene type
class Gene:
    """
    Contains the information about a Gene such as the symbol, description,
    exon number, status and sequence 
    
    Attributes:
    symbol (str):  gene symbol
    description (str):  gene description
    exon_no (int): total number of exons for the gene
    sequence (str): the gene sequence
    status (str): the gene status, one of 'new', 'current', 'deprecated', 'in process' 
    
    Methods:
    update_status: updates the gene status to the one provided as a parameter
    """
    def __init__(self, psymbol = "Gene1", pdesc = "Gene for testing", 
                 pexon_no = 1, pseq = "AACGT"):
        self.symbol = psymbol
        self.description = pdesc
        self.exon_no = pexon_no
        self.sequence = pseq
        self.__status = "current"
        
    def __str__(self):
        return f"Gene('{self.symbol}','{self.description}',{self.exon_no},'{self.sequence}')"
    
    def __repr__(self):
        return f"Gene('{self.symbol}','{self.description}',{self.exon_no},'{self.sequence}')"

    def __len__(self):
        return len(self.sequence)
    
    def __add__(self, gene):
        new_gene = Gene(self.symbol + gene.symbol,
                    self.description + gene.description,
                    self.exon_no + gene.exon_no,
                    self.sequence + gene.sequence)
        new_gene.update_status("new")
        return new_gene

        
    @property # getter
    def status(self):
        """
        Get the status for the gene
        """
        return self.__status
    
    # @status.setter # property setter - same name as the property
    # def status(self, pstatus):
    #    self.__status = pstatus
             
    def update_status(self, pstatus):
        """
        Updates the status of a gene
        """
        self.__status = pstatus
        
        

In [59]:
help(Gene)

Help on class Gene in module __main__:

class Gene(builtins.object)
 |  Gene(psymbol='Gene1', pdesc='Gene for testing', pexon_no=1, pseq='AACGT')
 |  
 |  Contains the information about a Gene such as the symbol, description,
 |  exon number, status and sequence 
 |  
 |  Attributes:
 |  symbol (str):  gene symbol
 |  description (str):  gene description
 |  exon_no (int): total number of exons for the gene
 |  sequence (str): the gene sequence
 |  status (str): the gene status, one of 'new', 'current', 'deprecated', 'in process' 
 |  
 |  Methods:
 |  update_status: updates the gene status to the one provided as a parameter
 |  
 |  Methods defined here:
 |  
 |  __add__(self, gene)
 |  
 |  __init__(self, psymbol='Gene1', pdesc='Gene for testing', pexon_no=1, pseq='AACGT')
 |      Initialize self.  See help(type(self)) for accurate signature.
 |  
 |  __len__(self)
 |  
 |  __repr__(self)
 |      Return repr(self).
 |  
 |  __str__(self)
 |      Return str(self).
 |  
 |  update_stat

In [87]:
# Use Gene - explore
# create/init Gene objects, we need the __init__ method
# The constructor without arguments works if we have default values
# The cell will use the __repr__ method to display the representation of the object

g = Gene()
g

Gene('Gene1','Gene for testing',1,'AACGT')

In [88]:
g.status

'current'

In [89]:
g.sequence

'AACGT'

In [90]:
dir(g)

['_Gene__status',
 '__add__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__len__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'description',
 'exon_no',
 'sequence',
 'status',
 'symbol',
 'update_status']

In [91]:
g.update_status("deprecated")
print(g.status)
g.status = "current"
g.status


deprecated


AttributeError: can't set attribute

In [92]:
g._Gene__status = "current"

g.status

'current'

_____

See, I didn't need to use `self` on the outside.
_____

Wait, with just that, we made a new object? I don't believe you...

In [93]:
# what did we create? type()

type(g)


__main__.Gene

In [94]:
# we need the __str__ method to print the object value


print(g)


Gene('Gene1','Gene for testing',1,'AACGT')


In [95]:
# dir

dir(g)

['_Gene__status',
 '__add__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__len__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'description',
 'exon_no',
 'sequence',
 'status',
 'symbol',
 'update_status']

In [96]:
# check attributes

g.symbol

'Gene1'

In [97]:
# check the status
g.status


'current'

In [98]:
# try to change the status

g.status = "new"

AttributeError: can't set attribute

In [99]:
# Use the update_status method to change the status of the gene

g.update_status("new")

In [100]:
g

Gene('Gene1','Gene for testing',1,'AACGT')

In [101]:
g.status

'new'

In [102]:
# get the gene length - works only if __len__ is implemented

In [103]:
len(g)

5

In [105]:
# add two genes - works only if __add__ is implemented

g + g

Gene('Gene1Gene1','Gene for testingGene for testing',2,'AACGTAACGT')

In [106]:
# multiply two genes - works only if __mul__ is implemented

g * g

TypeError: unsupported operand type(s) for *: 'Gene' and 'Gene'

In [107]:
# Gene type
class Gene:
    """
    Contains the information about a Gene such as the symbol, description,
    exon number, status and sequence 
    
    Attributes:
    symbol (str):  gene symbol
    description (str):  gene description
    exon_no (int): total number of exons for the gene
    sequence (str): the gene sequence
    status (str): the gene status, one of 'new', 'current', 'deprecated', 'in process' 
    
    Methods:
    update_status: updates the gene status to the one provided as a parameter
    """
    def __init__(self, psymbol = "Gene1", pdesc = "Gene for testing", 
                 pexon_no = 1, pseq = "AACGT"):
        self.symbol = psymbol
        self.description = pdesc
        self.exon_no = pexon_no
        self.sequence = pseq
        self.__status = "current"
        
    def __str__(self):
        return f"Gene('{self.symbol}','{self.description}',{self.exon_no},'{self.sequence}')"
    
    def __repr__(self):
        return f"Gene('{self.symbol}','{self.description}',{self.exon_no},'{self.sequence}')"

    def __len__(self):
        return len(self.sequence)
    
    def __add__(self, gene):
        new_gene = Gene(self.symbol + gene.symbol,
                    self.description + gene.description,
                    self.exon_no + gene.exon_no,
                    self.sequence + gene.sequence)
        new_gene.update_status("new")
        return new_gene

        
    @property # getter
    def status(self):
        """
        Get the status for the gene
        """
        return self.__status
    
    @status.setter # property setter - same name as the property
    def status(self, pstatus):
        self.__status = pstatus
             
    def update_status(self, pstatus):
        """
        Updates the status of a gene
        """
        self.__status = pstatus
        
        

In [108]:
g = Gene()

In [109]:
g

Gene('Gene1','Gene for testing',1,'AACGT')

In [110]:
g.status

'current'

In [111]:
g.status = "new"

In [112]:
g.status

'new'

#### Resources
https://docs.python.org/3/tutorial/classes.html     
https://docs.python.org/3/reference/datamodel.html      
https://python-textbok.readthedocs.io/en/1.0/Classes.html      
https://www.w3schools.com/python/python_classes.asp          
https://www.geeksforgeeks.org/python-classes-and-objects/          
https://www.tutorialspoint.com/python/python_classes_objects.htm   
https://python-course.eu/oop/inheritance.php     
https://www.geeksforgeeks.org/python-oops-concepts/      
https://gist.github.com/rurtubia/f5c506f414bb85efc4d8
