## Object oriented programming 
### Creating your own object types

### BIOINF 575 - Fall 2023

---
##### Adapted from material created by Marcus Sherman
---


You can do perfectly good data science _without_ ever writing a `class`. 

However, using `Object-Oriented Programming` can make your data science <u>easier to write</u>, <u>easier to read</u>, and <u>more intuitive</u> while also making it **more shareable/extensible**.

---
#### Object-Oriented Programming

Whenever you code in Python, you should always have a similar questions that you ask yourself during your workflow: "What do I have?" and "What do I need?". While working on subcomponents of a function, you should always ask yourself "What ***kind*** of object am I working with, and what does it do?"

In Python, ***EVERYTHING*** is an object!

In [1]:
# Different types

type(list)

type

In [2]:
{1:"d"}

{1: 'd'}

In [3]:
type({1:"d"})

dict

In [None]:
# dir(list)

### So what _are_ objects?

<img width = 450 src='https://ih1.redbubble.net/image.9426655.9925/fc,550x550,silver.jpg'/>

---
### A New Frontier

Up to this point, we have used objects already defined for us. However, we are not limited by those boundaries, we can *make* our own objects. This is done through the `class` keyword.

<img src='https://ds055uzetaobb.cloudfront.net/image_optimizer/9996aa83f77a2837f41a4de7f2ab517168716532.png' width = 500/>

Using `class` is much like `def` functions. However, later on we get to play around with some of those 'dunder' (\_\_) methods we have been steering you away from.

### First, the syntax

<img src='class_def.png' width=700 align='left'/>

### The Big Idea 
> The idea behind objects is to **bundle** coherent <u>methods</u> (things the object can _do_) and <u>attributes</u> (things the object _has_) that logically go together into a well-defined _interface_.

They are a data abstraction that has 2 main jobs:
1. Captures internal *representation* of the data it is abstracting
2. Creates an *interface* for the abstracted data

#### Let's think about a gene and basic information we want to store about it.
We could make a dictionary:

```python

{"symbol": "Gene1", "seq": "AACGT"}

```


While this structure works perfectly fine, if we want to add new elements or add a function specific to the gene it will be hard to keep track of it all.
It makes sense that all of these things could be wrapped up into a single object (_mainly because it is hard to manage and add new funtionality to it_).

#### Let's make a `Gene` object 

- <font color = "red">Use the `__init__` method to define class variables/ attributes</font>
- Write functions in the class for extra functionality
- Use `self` to refer to the object you are creating



In [4]:
class Gene:
    pass

In [5]:
dir(Gene)

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__']

In [10]:
g = Gene()
g

<__main__.Gene at 0x7fb471a400a0>

In [None]:
# the double underscore (dunder) methods implement operators and 
# general methods functionality for that type of object

# for instance
# - if we implement __add__ we will be able to do addition like obj1 + obj2
# - if we implement __len__ we will be able to do call the len function on the object 

# the __init__ is a special dunder method that is called when the class name is used as a function
# e.g. : list()

In [8]:
# dir("test")

In [9]:
class Gene:
    def __init__(self, psymbol = "Gene1", pseq = "AACGT"):
        print("called init")
        self.symbol = psymbol
        self.sequence = pseq
        
    def __str__(self):
        return self.symbol
    
    def __repr__(self):
        return "Gene('" + self.symbol + "','" + self.sequence + "')"
    
    def get_startseq(n):
        return self.sequence[:n]
  

In [10]:
g = Gene(psymbol="Gene100")
g

called init


Gene('Gene100','AACGT')

In [11]:
str(g)

'Gene100'

In [12]:
g.get_startseq(4)

TypeError: get_startseq() takes 1 positional argument but 2 were given

In [13]:
g.get_startseq()

NameError: name 'self' is not defined

In [6]:
o = eval("Gene('Gene100','AACGT')")
type(o)

__main__.Gene

In [3]:
str(g)

'Gene100'

In [4]:
Gene('Gene100','AACGT')

Gene('Gene100','AACGT')

In [7]:
g.symbol

'Gene100'

In [8]:
g.sequence

'AACGT'

In [24]:
class Gene:
    def __init__(self, psymbol, pseq):
        self.symbol = psymbol
        self.sequence = pseq
        
    def __str__(self):
        return self.symbol
    
    def __repr__(self):
        return "Gene('" + self.symbol + "','" + self.sequence + "')"
    
    def get_startseq(n):
        return self.sequence[:n]
  

In [27]:
g = Gene("Gene_example", "AAACGGGTTTTTT")
g

Gene('Gene_example','AAACGGGTTTTTT')

In [28]:
dir(g)

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'get_startseq',
 'sequence',
 'symbol']

In [29]:
g.sequence

'AAACGGGTTTTTT'

In [30]:
g.symbol

'Gene_example'

In [32]:
g.pseq

AttributeError: 'Gene' object has no attribute 'pseq'

In [34]:
g.get_startseq(4)

TypeError: get_startseq() takes 1 positional argument but 2 were given

In [35]:
Gene.get_startseq(g,4)

TypeError: get_startseq() takes 1 positional argument but 2 were given

In [36]:
Gene.get_startseq(4)

NameError: name 'self' is not defined

In [21]:
class Gene:
    def __init__(self, psymbol, pseq):
        self.symbol = psymbol
        self.sequence = pseq
        
    def __str__(self):
        return self.symbol
    
    def __repr__(self):
        return "Gene('" + self.symbol + "','" + self.sequence + "')"
    
    def get_startseq(self, n):
        return self.sequence[:n]

In [22]:
g = Gene("testG", "GAGTTCCCAA")

In [23]:
g

Gene('testG','GAGTTCCCAA')

In [24]:
g.get_startseq(6)

'GAGTTC'

In [25]:
dir(__builtin__)

['ArithmeticError',
 'AssertionError',
 'AttributeError',
 'BaseException',
 'BlockingIOError',
 'BrokenPipeError',
 'BufferError',
 'ChildProcessError',
 'ConnectionAbortedError',
 'ConnectionError',
 'ConnectionRefusedError',
 'ConnectionResetError',
 'EOFError',
 'Ellipsis',
 'EnvironmentError',
 'Exception',
 'False',
 'FileExistsError',
 'FileNotFoundError',
 'FloatingPointError',
 'GeneratorExit',
 'IOError',
 'ImportError',
 'IndentationError',
 'IndexError',
 'InterruptedError',
 'IsADirectoryError',
 'KeyError',
 'KeyboardInterrupt',
 'LookupError',
 'MemoryError',
 'ModuleNotFoundError',
 'NameError',
 'None',
 'NotADirectoryError',
 'NotImplemented',
 'NotImplementedError',
 'OSError',
 'OverflowError',
 'PermissionError',
 'ProcessLookupError',
 'RecursionError',
 'ReferenceError',
 'RuntimeError',
 'StopAsyncIteration',
 'StopIteration',
 'SyntaxError',
 'SystemError',
 'SystemExit',
 'TabError',
 'TimeoutError',
 'True',
 'TypeError',
 'UnboundLocalError',
 'UnicodeDecode

---
We need to take a second to talk about 3 things real quick:
1. Functions within `class`es (like `self.get_startseq`) are called ***methods*** or procedural attributes
2. `self.symbol` and `self.sequence` are called ***attributes*** since they only contain data
3. What in the world is `self`?

**PS**: `self` is a parameter that allows an object to look back at its self. Specifically the current *instance* of itself. However, outside of writing the class, you will never actually have to pass the word `self` into the methods.

Let's use our new class.

In [26]:
dir(Gene)

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'get_startseq']

In [27]:
type(Gene)

type

In [28]:
dir(g)

['__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'get_startseq',
 'sequence',
 'symbol']

In [29]:
g = Gene()

TypeError: __init__() missing 2 required positional arguments: 'psymbol' and 'pseq'

In [30]:
g = Gene("gsymbol", "GGTCAG")

In [31]:
g

Gene('gsymbol','GGTCAG')

In [32]:
str(g)

'gsymbol'

In [33]:
g.sequence

'GGTCAG'

In [34]:
g.symbol = "Gene1"
g

Gene('Gene1','GGTCAG')

In [42]:
class Gene:
    def __init__(self, psymbol, pseq):
        self.symbol = psymbol
        self.sequence = pseq
        self.length = len(self.sequence)
        
    def __str__(self):
        return self.symbol
    
    def __repr__(self):
        return "Gene('" + self.symbol + "','" + self.sequence + "')"
    
    def __len__(self):
        return len(self.sequence)
    
    def get_startseq(self, n):
        return self.sequence[:n]

In [43]:
g = Gene("gsymbol", "GGTCAGAACT")

In [44]:
g

Gene('gsymbol','GGTCAGAACT')

In [45]:
len(g)

10

In [46]:
g.length 

10

In [47]:
g.length = 100

In [48]:
g

Gene('gsymbol','GGTCAGAACT')

In [49]:
g.length

100

____

#### Let's add more functionality to the `Gene` class
- Implement specific dunder methods for added functionality 
    - the dunder method name is representative for the function/ operation it is used for
- Use @property to set up read-only attributes
- Add documentation using docstrings

In [50]:
# f strings allow us to include values from different types of objects in our string

"test" + str(1)

'test1'

In [51]:
f"test {1}"

'test 1'

In [52]:
f"test {[1,2,3]}"

'test [1, 2, 3]'

In [53]:
g

Gene('gsymbol','GGTCAGAACT')

In [54]:
f"test {g}"

'test gsymbol'

In [55]:
f"test {g.sequence}"

'test GGTCAGAACT'

In [65]:
# Gene object
class Gene:
    """
    Contains the information about a Gene such as the symbol, description,
    exon number, and sequence 
    
    Attributes:
    symbol (str):  gene symbol
    description (str):  gene description
    exon_no (int): total number of exons for the gene
    sequence (str): the gene sequence
    status (str): the gene status, one of 'new', 'current', 'deprecated', 'in process' 
    
    Methods: 
    update_status: updates gene status to a new given status
    """
    def __init__(self, psymbol = "Gene1", pdesc = "Gene for testing", 
                 pexon_no = 1, pseq = "AACGT"):
        self.symbol = psymbol
        self.description = pdesc
        self.exon_no = pexon_no
        self.sequence = pseq
        self.__status = "current"
        
    def __str__(self):
        print("called str")
        return f"Gene('{self.symbol}','{self.description}',{self.exon_no},'{self.sequence}')"
    
    def __repr__(self):
        print("called repr")
        return f"Gene('{self.symbol}','{self.description}',{self.exon_no},'{self.sequence}')"

    def __len__(self):
        return len(self.sequence)
    
    def __add__(self, gene):
        new_gene = Gene(self.symbol + gene.symbol,
                    self.description + gene.description,
                    self.exon_no + gene.exon_no,
                    self.sequence + gene.sequence)
        new_gene.update_status("new")
        return new_gene

        
    @property # getter
    def status(self):
        """
        Get the status for the gene
        """
        return self.__status
    
    # @status.setter # property setter - same name as the property
    # def status(self, pstatus):
    #    self.__status = pstatus
             
    def update_status(self, pstatus):
        """
        Updates the status of a gene
        """
        self.__status = pstatus
        
        

In [66]:
# Use Gene - explore
# create/init Gene objects, we need the __init__ method
# The constructor without arguments works if we have default values
# The cell will use the __repr__ method to display the representation of the object

g = Gene()
g

called repr


Gene('Gene1','Gene for testing',1,'AACGT')

In [67]:
len(g)

5

In [68]:
g.status

'current'

In [69]:
g.update_status("deprecated")
print(g.status)
g.status = "current"
g.status


deprecated


AttributeError: can't set attribute

In [70]:
g.status

'deprecated'

In [71]:
dir(g)

['_Gene__status',
 '__add__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__len__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'description',
 'exon_no',
 'sequence',
 'status',
 'symbol',
 'update_status']

In [72]:
# Gene object
class Gene:
    """
    Contains the information about a Gene such as the symbol, description,
    exon number, and sequence 
    
    Attributes:
    symbol (str):  gene symbol
    description (str):  gene description
    exon_no (int): total number of exons for the gene
    sequence (str): the gene sequence
    status (str): the gene status, one of 'new', 'current', 'deprecated', 'in process' 
    
    Methods: 
    update_status: updates gene status to a new given status
    """
    def __init__(self, psymbol = "Gene1", pdesc = "Gene for testing", 
                 pexon_no = 1, pseq = "AACGT"):
        self.symbol = psymbol
        self.description = pdesc
        self.exon_no = pexon_no
        self.sequence = pseq
        self.__status = "current"
        
    def __str__(self):
        print("called str")
        return f"Gene('{self.symbol}','{self.description}',{self.exon_no},'{self.sequence}')"
    
    def __repr__(self):
        print("called repr")
        return f"Gene('{self.symbol}','{self.description}',{self.exon_no},'{self.sequence}')"

    def __len__(self):
        return len(self.sequence)
    
    def __add__(self, gene):
        new_gene = Gene(self.symbol + gene.symbol,
                    self.description + gene.description,
                    self.exon_no + gene.exon_no,
                    self.sequence + gene.sequence)
        new_gene.update_status("new")
        return new_gene

        
    @property # getter
    def status(self):
        """
        Get the status for the gene
        """
        return self.__status
    
    @status.setter # property setter - same name as the property
    def status(self, pstatus):
        self.__status = pstatus
             
    def update_status(self, pstatus):
        """
        Updates the status of a gene
        """
        self.__status = pstatus
        
        

In [74]:
g = Gene()
g

called repr


Gene('Gene1','Gene for testing',1,'AACGT')

In [75]:
print(g)

called str
Gene('Gene1','Gene for testing',1,'AACGT')


In [76]:
g.status

'current'

In [77]:
g.status = "new"

In [78]:
g

called repr


Gene('Gene1','Gene for testing',1,'AACGT')

In [79]:
g.status

'new'

In [80]:
g._Gene__status

'new'

In [81]:
g.__status

AttributeError: 'Gene' object has no attribute '__status'

In [82]:
# Gene object
class Gene:
    """
    Contains the information about a Gene such as the symbol, description,
    exon number, and sequence 
    
    Attributes:
    symbol (str):  gene symbol
    description (str):  gene description
    exon_no (int): total number of exons for the gene
    sequence (str): the gene sequence
    status (str): the gene status, one of 'new', 'current', 'deprecated', 'in process' 
    
    Methods: 
    update_status: updates gene status to a new given status
    """
    def __init__(self, psymbol = "Gene1", pdesc = "Gene for testing", 
                 pexon_no = 1, pseq = "AACGT"):
        self.symbol = psymbol
        self.description = pdesc
        self.exon_no = pexon_no
        self.sequence = pseq
        self.__status = "current"
        
    def __str__(self):
        print("called str")
        return f"Gene('{self.symbol}','{self.description}',{self.exon_no},'{self.sequence}')"
    
    def __repr__(self):
        print("called repr")
        return f"Gene('{self.symbol}','{self.description}',{self.exon_no},'{self.sequence}')"

    def __len__(self):
        return len(self.sequence)
    
    def __add__(self, gene):
        new_gene = Gene(self.symbol + gene.symbol,
                    self.description + gene.description,
                    self.exon_no + gene.exon_no,
                    self.sequence + gene.sequence)
        new_gene.update_status("new")
        return new_gene

        
    @property # getter
    def status(self):
        """
        Get the status for the gene
        """
        return self.__status
    
    @status.setter # property setter - same name as the property
    def status(self, pstatus):
        self.__status = pstatus
        
    @property # getter
    def length(self):
        """
        Get the status for the gene
        """
        return len(self.sequence)
             
    def update_status(self, pstatus):
        """
        Updates the status of a gene
        """
        self.__status = pstatus
        
        

In [83]:
g = Gene()

In [84]:
g

called repr


Gene('Gene1','Gene for testing',1,'AACGT')

In [85]:
g.length

5

In [86]:
g.length = 100

AttributeError: can't set attribute

In [87]:
g.sequence = "AACGGTTAGTTT"
g.length

12

_____

See, I didn't need to use `self` on the outside.
_____

Wait, with just that, we made a new object? I don't believe you...

In [89]:
# what did we create? type()

type(g)


__main__.Gene

In [90]:
# we need the __str__ method to print the object value

type(Gene)



type

In [91]:
# dir
dir(g)


['_Gene__status',
 '__add__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__le__',
 '__len__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 'description',
 'exon_no',
 'length',
 'sequence',
 'status',
 'symbol',
 'update_status']

In [94]:
# check attributes

g.exon_no


1

In [93]:
# check the status

g.status

'current'

In [95]:
# try to change the status
g.status = "new"


In [96]:
g.status

'new'

In [97]:
# Use the update_status method to change the status of the gene

g.update_status("deprecated")

In [98]:
g.status

'deprecated'

In [None]:
# get the gene length - works only if __len__ is implemented

In [99]:
len(g)

12

In [100]:
g.length

12

In [101]:
# add two genes - works only if __add__ is implemented

g + g


called repr


Gene('Gene1Gene1','Gene for testingGene for testing',2,'AACGGTTAGTTTAACGGTTAGTTT')

In [102]:
# multiply two genes - works only if __mul__ is implemented

g * g

TypeError: unsupported operand type(s) for *: 'Gene' and 'Gene'

#### Resources
https://docs.python.org/3/tutorial/classes.html     
https://docs.python.org/3/reference/datamodel.html      
https://python-textbok.readthedocs.io/en/1.0/Classes.html      
https://www.w3schools.com/python/python_classes.asp          
https://www.geeksforgeeks.org/python-classes-and-objects/          
https://www.tutorialspoint.com/python/python_classes_objects.htm   
https://python-course.eu/oop/inheritance.php     
https://www.geeksforgeeks.org/python-oops-concepts/      
https://gist.github.com/rurtubia/f5c506f414bb85efc4d8
