# Submodule 3 Tutorial 1: Object-Oriented Programming in Python

## Overview
Objects in Python (and in Object-Oriented Programming) allow for modular, efficient, and scalable code by organizing data and functions into reusable units. This is especially useful in bioinformatics, where handling large datasets and complex analyses efficiently is critical.

## Learning Outcomes
* After this tutorial, you should be able to:*
    - Define class, object, methods, attributes, inheritance
    - Write or edit these elements of a Python class 

## Prerequisites

- Submodule 1 of Intro to Python (basic python)

## Getting Started

Run the next code box to install needed libraries

In [None]:
%pip install jupyterquiz
from jupyterquiz import display_quiz
from datetime import date

## Introduction

Let’s talk about **objects**​, the software kind. This relates to biology rather nicely.
<br>
Long-time specialists in the art of writing software consider <u>the adoption of object-oriented techniques to be one of the most important developments in software engineering </u> (maybe THE most important development in past 50 years)​.

That's because the use of software **objects** allows for much more sophisticated software to exist, for a number of reasons:
  - Objects hide complexity. Someone can create very complex code and hide the implementation behind a simple interface to that code.
  - Objects allow for more reusable code. A third party can create an object and then pass it off to multiple consumers. Developers no longer have to "reinvent" every software routine, such as displaying a type of plot on a chart or importing data from a database. They can just incorporate the necessary functionality from a software vendor or open-source repository and focus on their specific problem.
  - Objects from outside sources do not need to be debugged and tested (in theory) like homegrown software, leading to shorter development times, and fewer bugs.

You have seen this functionality when you import whole libraries of methods (import Bio, import matplotlib) or specific tools. By learning here how to construct some of these tools, even if you never make your own python tools, you can READ the scripts others make to see how they might need to be adapted or used.

## The "genetics" of OOP

To introduce you to the fundamental types of objects in Python, let's relate it to genetics. After all this, we will dissect each tool.

*Summary of the Analogy*

| OOP Concept    | Genetics Equivalent                |
|:--------------:|:-----------------------------------| 
|Class           | Gene Family                        |
|Object          | Specific Gene                      |
|Attributes      | DNA Sequence                       |
|Methods         | Gene Function (e.g., transcription)|
|Inheritance     | Gene Duplication & Mutation        |

This analogy shows how OOP helps organize bioinformatics data efficiently, just like genetics organizes biological information. 

### A Class is Like a Gene Family

In genetics, a gene family is a group of related genes that share a common function (e.g., hemoglobin genes). 
<br>
Similarly, in OOP, a class is a blueprint for creating multiple related objects with shared properties and behaviors.

In [None]:
class Gene:
    def __init__(self, name, sequence):
        self.name = name
        self.sequence = sequence

<u>Analogy:</u> The Gene class is like the "hemoglobin gene family," providing a general structure for individual genes.

### Object

An **Object** is like a specific gene.
<br>
Each object created from a class is like an individual gene, with its own sequence but following the same general structure.

To make an object with Gene characteristics, we use it's Class ("Gene") as you might have done previously specifying a string  jersey_number= str("12"). 
<br>
We will delve deeper a little later, but for now notice that, in the definition of a gene above, you see that it includes name & sequence in the def __init__ block.

In [None]:
gene1 = Gene("HBB", "ATGGTG...TAA")
gene2 = Gene("HBA1", "ATGGTG...TAG")

<u>Analogy:</u> gene1 (HBB) and gene2 (HBA1) are like specific genes within the hemoglobin family—same blueprint but unique sequences.

### Methods are Like Gene Functions

Genes produce proteins with specific functions. 
<br>
In OOP, **methods** (functions inside of a class) define how objects *behave.* 

In the following code, the method inside of the Gene class is the transcribe method: def transcribe(self)

<U>Analogy:</u> Just like the HBB gene codes for beta-globin, the transcribe() method defines how the Gene object converts DNA to mRNA. 

In [None]:
class Gene:
    def __init__(self, name, sequence):
        self.name = name
        self.sequence = sequence.upper()
    
    def transcribe(self):
        """Simulates transcription by converting DNA to mRNA."""
        return self.sequence.replace("T", "U")  # mRNA uses Uracil instead of Thymine

gene1 = Gene("HBB", "ATGGTG...TAA")

print(gene1.name, "would be transcribed to:", gene1.transcribe())


### Inheritance is Like Gene Duplication & Mutation

In genetics, gene duplication and mutation lead to new genes with modified functions. 
<br>
In OOP, inheritance allows a new class to inherit properties from an existing class but modify or extend its behavior.

In [None]:
class ProteinCodingGene(Gene):  #notice that the declaration of the gene class did not include anything in parentheses.
    def translate(self):
        """Simulates translation into a protein (simplified)."""
        return "Protein sequence based on " + self.sequence   # Placeholder for actual translation logic that is not present

gene2 = ProteinCodingGene("HBA1", "ATGGTG...TAG")
print(gene2.translate())  

<u>Analogy:</u> ProteinCodingGene inherits properties from Gene (shown by including (Gene) in this class definition.
<br>
That is analogous to how HBA1 evolved from ancestral hemoglobin genes.

The translate() method adds a new function, just like gene mutations can lead to new functions. Since ProteinCodingGene inherited EVERYTHING already in the Gene class, there is no need for the def __init__ or other things.

### Test your Knowledge

Please do a quick check of the vocabulary.

In [None]:
from jupyterquiz import display_quiz
vocab="PythonQuizQuestions/oop_vocab1.json"
display_quiz(vocab)


<div class="alert alert-block alert-info"> After the overview, now we delve deeper into the individual pieces.</a> </div>

## The Object

An “object” is an “instantiation” (or creation) of a variable based upon a class. We can then use the variable/object in our code.​

Python has many default "constructors" that are used to make variables of standard classes (e.g., integers, strings) or the constructors can be written that have more complex properties, such as the Sequence type of variable from the Bio library (Submodule 1).

Here is a set of information that might be part of the metadata for a patient. You will recognize that many of these are Strings while the last 2 are of type "date"

|Variable	       |Some Values               |
|:----------------:|:-------------------------|
|patient_id	       |"P123456"                 |
|name	           |"John Doe"                |
|age	           |45                        |
|Sex	           |"Male", "Female", "Other" |
|ethnicity		   |"Hispanic", "Caucasian"   |
|date_of_birth	   |"1979-06-15"              |
|date_of_diagnosis |"2011-03-04"              |

One option for holding this kind of data would be a dictionary or, if we have lots of such data, a NumPy array. 
We will see that we can construct a particular data class, Patient, that would define all those attributes. 

Unlike a simple dictionary or an array, though, using a class definition instead means that we could also provide rules check to ensure that the patient's age (for example) was a positive integer between 0 and 120, and that the dates were in the correct format. 

It might look like this:

**Patient1=Patient("P123456", "John Doe",45,"Male", "Hispanic","1979-06-15","2011-03-04")**

Let's look at how to make a new & useful Class. 

## The Class

A “class” is a design document (in software) we use to create objects. It defines methods, properties, permissions, behaviors and other object-related attributes.

A class in Python is like a blueprint for creating objects. It defines a structure by bundling data (attributes) and functions (methods) that work together. Think of it as a recipe that describes what an object is and how it behaves.

Earlier, we said a class was *like* a gene or a protein coding gene-- it has it's own attributes (like length) and functions (like transcription or an ORF). 

To create a class in Python, use the *class* keyword followed by the class name and a colon. If the class has inherited from another class, that class will be in parentheses before the colon.

Then, one defines the class' methods and attributes within an indented block.

But, let's start with the simplest class definitions.

In [None]:
class Mutation:
    """A class to represent a genetic mutation."""

class PhylogeneticTree: """Represents evolutionary relationships between species."""

If you "run" that box, nothing obvious will happen. The classes have merely been defined.

To each of these, we need to initialize an object.

def __ init __ (self, x,y,z): 
<br>
the __ are two underlines with no space before & after init

**self** is the value you'll give the variable, and x, y & z represent the other values that will be part of the complex variable.

for Patient, we'd have
<br>
def __init__(self, patient_id, name, age, sex, ethnicity, DOB, date_of_diagnosis)

Let's work with a mutation class


In [None]:
class Mutation:  #A class to represent a genetic mutation.
    def __init__(self, gene_name, position, original_base, mutated_base, mutation_type):
        """
        Initialize a Mutation object with gene name, mutation position, 
        original base, mutated base, and type of mutation.
        """
        self.gene_name = gene_name  # The affected gene
        self.position = position  # Position in the DNA sequence
        self.original_base = original_base.upper()  # The original nucleotide (A, T, C, G)
        self.mutated_base = mutated_base.upper()  # The new nucleotide after mutation
        self.mutation_type = mutation_type  # Example: "Missense", "Nonsense", "Silent", "Frameshift"

# Make a mutation object
mut1 = Mutation("BRCA1", 123, "A", "G", "Missense")
print(mut1.__dict__) # a way to get the whole mut1 dictionary
print(mut1.mutation_type) # to report only the mutation type


<div class="alert alert-block alert-info"> <b>Tip:</b> Try these in the above mutation class code block:

    - make a second mutation object (mut2) 
    - querry other pieces of mut1 and mut2 such as the position.
    - see what happens if you do NOT give all expected/required pieces of information. 
    - add a new element of the variable ("disease_associated")</a> </div>

*You should be able to tell that Python is using known classes to deal with the values entered, such as integers or strings*


## Methods in classes

**Methods** in a class are functions that define the behavior of an object. They allow objects to interact with their attributes and perform actions specific to their purpose. 
<br>
Methods make classes more powerful and reusable by encapsulating logic within objects, keeping code organized, modular, and easy to maintain.
<br>
Each method typically includes <u>itself</u> as its first parameter, giving it access to the object's attributes. 


### How to create a method

A method in Python is a function inside a class that defines behavior for an object. Every method must have a few key parts:

**def keyword**

🔹 Every method starts with def, just like we learned about for functions NOT inside a method.
🔹 This keyword signals that we are defining a method inside the class.

**Method Name**
🔹 The name of the method should be descriptive of its purpose.
🔹 Follows standard Python naming conventions (lowercase, with underscores if needed).

In the Mutation class, we can make a method that will report the the type of mutation, with the keyword def, a logical name, and what information it would need (self) followed by a colon:

**def get_type(self):**

Next, we need to tell how to *get the type*

**def get_type(self):
   return self.mutation_type**

the "return" tells python to immediately report the value "self.mutation_type" 

Add that function to the class definition below, then we'll make the mut 1 mutation object & query its mutation type

In [None]:
class Mutation:  #A class to represent a genetic mutation.
    def __init__(self, gene_name, position, original_base, mutated_base, mutation_type):
        """
        Initialize a Mutation object with gene name, mutation position, 
        original base, mutated base, and type of mutation.
        """
        self.gene_name = gene_name  # The affected gene
        self.position = position  # Position in the DNA sequence
        self.original_base = original_base.upper()  # The original nucleotide (A, T, C, G)
        self.mutated_base = mutated_base.upper()  # The new nucleotide after mutation
        self.mutation_type = mutation_type  # Example: "Missense", "Nonsense", "Silent", "Frameshift"

mut1 = Mutation("BRCA1", 123, "A", "G", "Missense")



<div class="alert alert-block alert-info"> <b>Tip:</b> Try this above: it would be nice to have a well-formatted description of the enzyme.... so you could call mut1.describe</a> </div>

So, create a method called describe that will give the following output for mut1:

'Mutation in BRCA1: Position 123, A → G (Missense)'

In [None]:
#Code for good formatting
def describe(self):
        """Returns a formatted description of the mutation."""
        return (f"Mutation in {self.gene_name}: Position {self.position}, "
                f"{self.original_base} → {self.mutated_base} ({self.mutation_type})")


### 

Look at the next example, creating a class Enzyme that stores the name, EC number, Km and Vmax of an enzyme. AND it has some useful enzyme functions

In [None]:
class Enzyme:
    """A class to represent an enzyme with kinetic properties."""
    
    def __init__(self, name, ec_number, km, vmax):
        """Initialize an enzyme with its name, EC number, Km, and Vmax."""
        self.name = name  # Enzyme name
        self.ec_number = ec_number  # Enzyme Commission (EC) number
        self.km = km  # Michaelis constant (substrate concentration at half Vmax)
        self.vmax = vmax  # Maximum reaction velocity
  
    def get_kinetic_parameters(self):
        """Returns Km and Vmax as a tuple."""
        return self.km, self.vmax
    
    def calculate_velocity(self, substrate_concentration):
        """Calculates reaction velocity using the Michaelis-Menten equation: 
           v = (Vmax * [S]) / (Km + [S])
        """
        v = (self.vmax * substrate_concentration) / (self.km + substrate_concentration)
        return v

# Creating an enzyme object
enzyme1 = Enzyme("Hexokinase", "EC 2.7.1.1", 0.15, 10.0)  #you can provide these parameters in order or by including =
enzyme2= Enzyme("Alchol_dehydrogenase", "EC 1.1.1.1", vmax=3.0, km=0.9)
print(enzyme1.ec_number)
v = enzyme2.calculate_velocity(0.3)
print(v)

Try to do the following:
1. Make another enzyme object
2. Add a parameter **concentration** to the Enzyme class and be sure to adjust the def __init__
3. Create a new method to change the units of concentration by a provided factor (e.g., to convert ug/mL to mg/mL by dividing by 1000)

#### Test Your Knowledge

After you work through those changes, take the following quiz:

In [None]:
from jupyterquiz import display_quiz
classquiz="PythonQuizQuestions/class_qz1.json"
display_quiz(classquiz)

#### Exercise 1: Create a Simple DNASequence Class

**Goal:** <u>Construct</u> a class that represents a DNA sequence and includes basic operations.
Now, using the template below to do the following: 

Steps:
1. Define a class called RNASequence.
2. Add an __init__ method that takes an RNA sequence as input and stores it.
3. Create a method called get_length() that returns the length of the sequence.
4. Create a method called RevTranscribe() that converts U → T (simulating cDNA production).

THis will 

In [None]:
# Edit the following to match the exercise directions, you need to replace all ______________
class ___________________:  """A class to represent an RNA sequence."""
    
    def __init__(_________________):  # what parameters must it take?
        self.sequence = sequence.______()  # Ensure uppercase
    
    def get_length(self):
        """Returns the length of the sequence."""
        return _____  # Replace with the correct code for the length of a string
    
    def RevTranscribe(self):
        """Returns the DNA sequence (U → T)."""
        return _____  # Replace with the correct code

#### Test your code for RNAsequence
Create an RNA object to test the class and its methods with RNA sequence augcguua to make sure your code works. 

In [None]:
#Code to test your class
#rna = 
print("Sequence length:", rna.get_length())  # Expected output: 8
print("cDNA sequence:", rna.transcribe())  # Expected output: ATGCGTTA

## Conclusion

In this tutorial, you have:
- learned some OOP vocabulary.
- defined a new class.
- written your own object.

You may be ready to try some advanced techniques in OOP in [Tutorial 2: OOP2](Submodule_3_Tutorial_2_OOP2.ipynb)

OR

Work with [modules and packages](Submodule_3_Tutorial_3_Modules&Packages.ipynb).

## Clean up
Remember to shut down your Jupyter Notebook instance when you are done for the day to avoid unnecessary charges. You can do this by stopping the notebook instance from the Cloud console.