In [1]:
object

object

## Autometa 2.0 Design Foundations

### Aims: 


1. To learn Object Oriented Programming Basics
    1. Modularity. Allowing our code to be used as a library so it can be used by anyone. Making the code modular will enable re-usability and minimize duplication.
    1. Class vs. Instance Attributes. Attributes belonging to a class that are shared amongst all instances vs. attributes that are not shared by objects of the class.
    1. Inheritance... or the capability of one class to derive or inherit its properties from another class. This provides an additional layer of code re-usability and is transitive. Meaning if Class B inherits from Class A, then all subclasses of B will automatically inherit from class A.

### Towards a [programming paradigm](https://en.wikipedia.org/wiki/Programming_paradigm)

We are striving towards writing modular code, i.e. easily used by others in our/others' projects. It is important to note, python is writtenly entirely around objects. In fact, *object* is the root of all classes and built-in functions, like list(), dict(), and set() are objects!

### Object-Oriented Programming (OOP)

We are planning on moving towards a _more_ Object-Oriented Programming (OOP) approach. So, what is Object-Oriented Programming (OOP)? OOP is a programming paradigm based on the concept of "objects". These objects can contain **data** and **code** which correspond to **fields** and **methods**, respectively. Objects can bring an immense amount of flexibility when it comes to organizing our programs. We can group operations together that apply to the object's fields. Similarly, each instance of the object has its *own* state. Meaning we can select and apply operations on different objects depending on their state. I.e. whether or not a metagenome (Here we assume we have a class that has a metagenome field) has been filtered by length or any markers are present, or a user has supplied their own set of markers, whether ORFs have been called and annotated, the list goes on and on... The essential point, is that all operations associated with a class can be organized together which can equally help organize/structure our thinking.

### Software architecture using Unified Modeling Language design approach

To organize our thinking, generally speaking we can identify class structure from the NSF project outline/requirements by:
- Nouns --> potential Classes, objects, fields
- Verbs --> potential methods or responsibilities of a class

A typical exercise when laying out a system's structure is writing down all of the classes and next to each class' name listing its responsibilities (problems to be solved; short verb phrases) and collaborators (other classes sent messages by this class).



## Appendix

### Functional Programming

- Pure Functions - do not have side effects, that is, they do not change the state of the program. Given the same input, a pure function will always produce the same output.
- Immutability - cannot be changed after data is created
- Higher Order Functions - functions can accept other functions as parameters and functions can return new functions as output. This allows us to abstract over actions, giving us flexibility in our code's behavior.

### References:

- [PEP8](https://www.python.org/dev/peps/pep-0008/ "PEP8 Style Guide") Style Guide for Python Code
- [Understanding Code Reuse and Modularity in Python 3](https://www.geeksforgeeks.org/understanding-code-reuse-modularity-python-3/)
- [Class vs. Instance Attributes](https://www.geeksforgeeks.org/class-instance-attributes-python/)
- [properties vs. attributes](https://stackoverflow.com/questions/7374748/whats-the-difference-between-a-python-property-and-attribute)
- [Python property function](https://www.geeksforgeeks.org/python-property-function/)
- [Inheritance in python](https://www.geeksforgeeks.org/inheritance-in-python/)
- [Inheritance of protected and private class properties](https://stackoverflow.com/questions/20261517/inheritance-of-private-and-protected-methods-in-python)
- [Operator Overloading](https://www.programiz.com/python-programming/operator-overloading)
- [Functional Programming](https://stackabuse.com/functional-programming-in-python/#:~:targetText=Python%20is%20not%20a%20functional,for%20the%20task%20at%20hand.)
- [Lambda Functions](https://stackabuse.com/lambda-functions-in-python/)
- Blog post by Python Creator: [Origins of Python's "Functional" Features](http://python-history.blogspot.com/2009/04/origins-of-pythons-functional-features.html)
- [python mutable vs. immutable objects](https://towardsdatascience.com/https-towardsdatascience-com-python-basics-mutable-vs-immutable-objects-829a0cb1530a)

### Walk-through of OOP
#### 1.A Initializing and defining an object class

In Python, every class inherits from a built-in base class `object`. The `__init__` function of a class is invoked when we create an object instance of the class.

In [5]:
class Autometa:
    """docstring for Autometa Class."""
    
    def __init__(self, metagenome):
        """Autometa class *constructor* function
        Input:
        ReturnType: str
        Returns:
            stuff - here is a description of stuff
        Additional Help:
        """
        print('Initialized Autometa Class Instance')
        self.metagenome = metagenome

In [6]:
autometa = Autometa('metagenome')
autometa?

Initialized Autometa Class Instance


#### 1.B Class vs Instance Attributes
##### Defining a class

In [39]:
class Autometa:
    """docstring for Autometa Class."""
    # Class attribute
    n_datasets = 0
    
    def __init__(self, metagenome):
        "Autometa class *constructor* function"
        print('Initialized Autometa Class Instance')
        self.metagenome = metagenome
        Autometa.n_datasets += 1
    
    def num_datasets(self):
        return self.n_datasets

##### Instantiating class objects

In [41]:
# Instantiate our class    
am1 = Autometa('metagenome1')
print(f"am1 (object) attribute call: {am1.num_datasets()}")


Initialized Autometa Class Instance
am1 (object) attribute call: 2


In [42]:
# Instantiate a second object
am2 = Autometa('metagenome2')
print(f"am2 (object) attribute call: {am2.num_datasets()}")
am2.metagenome

Initialized Autometa Class Instance
am2 (object) attribute call: 3


'metagenome2'

In [43]:
# Check the class attribute n_datasets
print(f"am1 (object) attribute call: {am1.num_datasets()}")
print(f"Autometa (class) attribute call: {Autometa.n_datasets}")

am1 (object) attribute call: 3
Autometa (class) attribute call: 3


Notice the am1 object, although it was initially called as our first dataset now has an attribute of 2
This is due to to the `n_datasets` variable being a class attribute rather than a instance attribute.

To define an instance attribute, you must refer to the instance itself. Convention for referencing the instance itself is `self`, however any name may be specified, as long as it is consistent... Just use `self`. 

#### 1.C Inheritance (subclassing)

Single inheritance: When a class inherits from only one parent class
Multiple inheritance: When a class inherits from multiple classes (Like C++ / Unlike Java)

As was mentioned above, in Python, every class inherits from a built-in basic class called an `object`. The `__init__` function of a class is invoked when we create an object instance of the class. Previously (and specifically for python 3.x) when you define a class like `class Autometa:` the implicit declaration is `class Autometa(object):` where `object` is this root class.

You will notice the `super` function. This is a special function in python to access the parent methods or attributes.

##### Example of single inheritance and ability to create private instance variables

In [51]:
class A():
    def __init__(self, name):
        self.name = name
    def get_name(self):
        # Make sure you have self here
        return self.name
a = A(name='Jerry')
a.get_name()

'Jerry'

In [None]:
class User:
    def __init__(self, name='JD'):
        self._name = name

    @property
    def name(self):
        def fset():
            self.name = self._name
        def fdel():
            del self.name
        def fget():
            return self.name
    property(name, *args, **kwargs)

In [64]:
import uuid

class User(object):
    """Docstring for User Class"""
    n_users = 0
    users = []

    def __init__(self, name='Jane Doe', 
                 affiliation='UW-Madison', country='USA'):
        self.name = name
        self.affiliation = affiliation
        self.country = country
        self._countries = [self.country]
        self.id = uuid.uuid1()
        #Class attribute
        User.n_users += 1
        User.users.append(self.name)

class Researcher(User):
    """Docstring for Researcher Class"""
    def __init__(self, name, affiliation, country, dataset):
        super().__init__(name, affiliation, country)
        self.dataset = dataset
        
    def get_dataset(self):
        return self.dataset

class Admin(User):
    """Docstring for Admin Class"""
    def __init__(self, name, affiliation, country):
        super().__init__(name, affiliation, country)
    
    def n_users(self):
        """Returns the number of researchers using autometa"""
        return super().n_users
    def get_users(self):
        """Returns a list of researchers using autometa"""
        return super().users


In [65]:
# Instantiate some researchers and an Admin
jason = Admin(name='jason', affiliation='UW-Madison', country='USA')
jane = Researcher(name='jane',affiliation='UW-Madison', country='USA', dataset='cranberries.fasta')
jeremy = Researcher(name='jeremy',affiliation='UW-Madison', country='USA', dataset='mendota.fasta')
janick = Researcher(name='janick',affiliation='Uppsala', country='Sweden', dataset='alps.fasta')
james = Researcher(name='james',affiliation='UW-Milwaukee', country='USA', dataset='monona.fasta')

#### Some methods and attributes are now specific to their respective class

Use Cases:
- Teachers with class (for Tiny Earth Initiative)
- Admin at a University
- PI managing cluster or server

##### Some examples: Different methods and attributes are now available to Admin

In [66]:
jason.get_users()

['jason', 'jane', 'jeremy', 'janick', 'james']

In [67]:
jason.get_dataset()

AttributeError: 'Admin' object has no attribute 'get_dataset'

In [68]:
jason.n_users()

5

In [69]:
jane.get_users()

AttributeError: 'Researcher' object has no attribute 'get_users'

In [70]:
jane.get_dataset()

'cranberries.fasta'

In [71]:
jeremy.name

'jeremy'

In [81]:
# vars(jane)
dir(jason)
# jason.get_users?

These are examples of single inheritance. I.e. Admin and Researcher have only inherited once from the User class.

However, python is capable of multi-level inheritance allowing derived classes to be subclassed (or inherited) similarly to their parent class. These classes are defined as above where the `object` in the parentheses is now the class that has been previously defined. This may be helpful when organizing relations of data to one another. I.e. We should not have to re-write multiple methods to access a MAG, Metagenome, or Pangenome's GC%, contigs, markers, etc.

Topic for discussion: 

Whether we should implement a base class *similar* to SeqIO or AntiSMASH, where there is a base class for SeqRecords, ORFs, Features, etc.

##### Demonstration of Multilevel inheritance

In [82]:
# A demonstration of multi-level inheritance  

# Base or Super class. (Note object in parentheses)
class Project(object):
    # Constructor
    def __init__(self, user):
        self.user = user

    # To get username
    def get_user(self):
        return self.user
  
  
# Inherited or Sub class (Note Project in parentheses)
class Metagenome(Project):
    # Constructor
    def __init__(self, user, dataset):
#         Project.__init__(self, name)
        super().__init__(user)
        # At this stage, we have now inherited get_name
        self.dataset = dataset
        self.n50 = 0
        self.contigs = dict()
    
    # To get dataset
    def get_dataset(self):
        return self.dataset
    
    def get_n50(self):
        return self.n50
    
    def gc_content(self):
        return 'GC%'
    
    def get_contigs(self):
        return self.contigs
    
# Inherited or Sub class (Note Metagenome in parentheses) 
class Mag(Metagenome, Parent2): 
      
    # Constructor 
    def __init__(self, user, dataset): 
#         Metagenome.__init__(self, name, affiliation)
        super().__init__(user, dataset)


In [83]:
# Checking method resolution order
Mag.mro()

[__main__.Mag, __main__.Metagenome, __main__.Project, object]

In [84]:
# Main Code
mag = Mag('Evan', 'metagenome.fasta')

# We can now call functions previously defined in other classes from the Mag class. These are inherited methods
print(
    f"User: {mag.get_user()}\n"
    f"Dataset: {mag.get_dataset()}\n"
    f"Contigs: {mag.get_contigs()}\n"    
    f"N50: {mag.get_n50()}\n"
)

User: Evan
Dataset: metagenome.fasta
Contigs: {}
N50: 0



In the case of the previously outlined inheritance scheme of `Project > Metagenome > Mag`, the usage may not be as clear. 

For handling our metagenomes, it may make more sense to have a base Contig class, or something of the like.

However, an appropriate mult-level inheritance use case could be in Dependency checking and Exception handling

Briefly, having a Dependency class where children of the Dependency class are Database and Executable classes and these classes inherit sanity checks (Dependency methods) to ensure they are appropriate for the autometa pipeline. Then Children of the Database class may Markers, NCBI, GTDBK, etc. This way, any base-level checks required from the Dependency class will need to be satisfied when calling any of the children inheriting the Dependency class.

When provided *any* new database file or executable that is being implemented, all of the similar checks for access and checking whether the file exists and is formatted correctly will need to be pass.

#### 1.3 Operator Overloading: converting an object instance to a string

In [86]:
print(jason)

<__main__.Admin object at 0x10abd2950>


In [87]:
class Autometa:
    def __init__(self, metagenome):
        print('Init Autometa Class')
        self.metagenome = metagenome

    def __str__(self):
        return f"{self.metagenome}"
    
# Instantiate our class    
autometa = Autometa('metagenome')
# Use the print or str operator
print(f'str operator: {str(autometa)}')
print(f'print operator: {autometa}')

Init Autometa Class
str operator: metagenome
print operator: metagenome


In [88]:
print(object)

<class 'object'>


# Appendix

#### Example of Functional Programming

##### Example of a Pure Function:
I.e. input not changed, always same expected output with same input

In [None]:
# Pure Function
def multiply_2_pure(numbers):
    new_numbers = []
    for n in numbers:
        new_numbers.append(n * 2)
    return new_numbers

original_numbers = [1, 3, 5, 10]
changed_numbers = multiply_2_pure(original_numbers)
print(original_numbers) # [1, 3, 5, 10]
print(changed_numbers)

##### Basic example of Mutability vs. Immutability

In [None]:
mutable = ['mixed_data', 12.55, ['mutable','data','structure']]
immutable = ('mixed_data', 12.55, ['mutable','data','structure'])

# Reading from data types are essentially the same:
print(mutable[2])    # ['different','data','structure']
print(immutable[2])  # ['different','data','structure']

# Changing collection value from 12.55 to 15
mutable[1] = 15

# This fails with the tuple
immutable[1] = 15

- We **can** change the **contents** of a mutable object in a Tuple
- We **can not** change the **reference** to the mutable object that's stored in memory

##### Example of immutability
When all of the objects or elements in the immutable datastructure are immutable (doubly immutable if you will), then even when updating the variables that were used in the datastructure prevents the immutable datastructure from being affected.

In [None]:
objs = ('this','is','immutable')
truth = True
other_variable = 24

double_immutable = (truth, objs, other_variable)

truth = False
other_variable += 10
objs += ('more','stuff','here')

print(double_immutable)

##### Example of mutability
When not all of the objects or elements in the immutable datastructure are immutable, some of these objects are now able to be mutated! This can cause some unexpected behavior!

In [None]:
objs = ['this','is','mutable']
truth = True
other_variable = 24

immutable = (truth, objs, other_variable)

truth = False
other_variable += 10
objs += ('more','stuff','here')

print(immutable)


In [None]:
# Example Usage for immutable collection: User Setting Autometa Parameters
parameters = ('metagenome.fasta', 'coverage_table', 'length_cutoff')

# Example Usage for mutable collection: 

#### Higher Order Functions:

##### Accept a function as input

In [None]:
def hof_write_repeat(message, n, action):
    for i in range(n):
        action(message)

prog = 'diamond'
hof_write_repeat(f'AutometaCheckpoint: {prog}', 2, print)

import logging
# Log the output as an error instead
hof_write_repeat(f'AutometaDebug: {prog}', 2, logging.error)

#####  Creating a one-higher order function: Return a function for further processing (These are similar to decorators)

In [None]:
from Bio import SeqIO

def hof_length_filter(cutoff):
    # Create a function that loops and adds the increment
    def filter_contigs(contigs):
        return [record for record in contigs if len(record.seq) >= cutoff]
    # We return the function as we do any other value
    return filter_contigs

cutoff = 3000
contigs = [rec for rec in SeqIO.parse('../tests/dev/demo.fasta', 'fasta')]
length_filter = hof_length_filter(cutoff)
filtered_contigs = length_filter(contigs)
print(
    f"Length cutoff:\t{cutoff}\n"
    f"Total contigs:\t{len(contigs)}\n"
    f"Filtered contigs: {len(filtered_contigs)}"
)

##### Example of Modularity

##### To conclude our overview of thinking about designing/organizing/structuring the Autometa 2 code base

In [106]:
import this

The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!
