<a href="https://colab.research.google.com/github/SunmoonTao/colab/blob/master/Bridging_9005_to_9006_notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction:

In 9006 you learn about Object-Oriented Programming (OOP). In 9005 we used Classes and OOP concepts without realzing it! Let's go ahead and take a look at some of those examples:


Here we defined an Object a, which belongs to the Integer class!

In [None]:
a = 28
print(type(a))

#output:
# <class 'int'>

Here we defined an Object l, that belongs to the List class.

In [None]:
l = [1, 2, 3]
print(type(l))

# output:
# <class 'list'>

In [None]:
# we can use dir() to see all of the methods allowed in the object
dir(l)

In [None]:
# what are some other possible built-in Python Classes that we have worked with?

##Methods
A method is a special kind of function - one that belongs to a specific class of objects. Methods usually perform specific tasks involving that type of object. Though all methods are functions, not all functions are methods. Python has a number of built-in methods on different objects in Python (such as strings and lists)

[Methods for strings](https://www.w3schools.com/python/python_strings_methods.asp)

[Methods for lists](https://www.w3schools.com/python/python_lists_methods.asp)

Check out more about the list method .append() [here](https://www.w3schools.com/python/ref_list_append.asp)

In this example, we are using the ```append()``` method (a special kind of function) on the Object ```list1``` that is a List class.

In [None]:
# using the list method .append() this will add an element to our list
list1 = [1, 2, 3]
list1.append('yay') # we want to add 'yay' to our list
print(list1)
print(type(list1))
print(len(list1)) # to get the length of the list

In this case, ```list1``` is the Object, that belongs to the List Class. And we can use specific methods, such as ```append()```, on that object ```list1```.

# A few more examples of Classes we used in 9005

**File handling with ```open()```**

When you open a file in Python, you're using a class called file. The ```open()``` function returns an instance of this class that you can use to read from or write to the file.

In [None]:
file = open("example.txt", "r")
content = file.read()
file.close()


**Working with Time and Dates**

Python's ```datetime``` module uses classes to represent dates and times. You can create instances of classes like ```datetime.datetime``` to work with dates and times.

In [None]:
from datetime import datetime

now = datetime.now()
print(now.year, now.month, now.day)


**PANDAS Data Input and Output Functions:**

A Class we encountered when working with PANDAS

Pandas provides classes for reading and writing data from various file formats, such as ```read_csv()```, ```read_excel()```, ```to_csv()```, and ```to_excel()```. These classes facilitate data import and export.

In [None]:
import pandas as pd

# Reading data from a CSV file
df = pd.read_csv('data.csv')

# Writing data to a CSV file
df.to_csv('output.csv', index=False)


**Matplotlib example
```matplotlib.figure.Figure```:**

The Figure class represents the entire figure or window in which your plot is displayed. It acts as a container for one or more Axes objects.


In [None]:
import matplotlib.pyplot as plt

fig = plt.figure()


**```matplotlib.axes.Axes```:**

The Axes class represents a subplot or individual plot within a figure. It's where you add and customize various plot elements, like lines, points, and labels.


In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots()
ax.plot(x, y)
ax.set_xlabel('X-axis')
ax.set_ylabel('Y-axis')
ax.set_title('Simple Line Plot')


# OOP basics

Imagine you have a box of toy cars. Each toy car can do different things. Some can go fast, some can honk, and some can even have their lights turned on.

Now, think of Object-Oriented Programming (OOP) like this:

1. **Objects**: Each toy car is like an "object" in OOP. It's a special thing that can do stuff.

2. **Classes**: The word "class" is like a blueprint. It tells us how to make a toy car. So, if you have a class for toy cars, you can make many toy cars that all work the same way.

3. **Attributes**: Just like real cars have different colors, toy cars can have things that make them unique. In OOP, we call these things "attributes." So, some toy cars might have a red color, while others have a blue color.

4. **Methods**: These are like actions the toy cars can do. For example, a method could be "go fast" or "honk." It's like telling the toy car what to do.

5. **Inheritance**: Imagine you have a big toy car box, and inside it, there are small boxes for different types of cars, like sports cars and trucks. OOP lets us make new toy cars that are like the ones in those small boxes. We can use the blueprint (class) of the small box cars to make more cars that work the same way.

So, Object-Oriented Programming is like playing with toy cars. You have **different types of cars (objects) with special things they can do (methods) and different looks (attributes)**, and you can make more cars that are just like the ones you already have. It helps organize and make sense of things in computer programs, just like you organize your toy cars when you play with them.

# Classes
A class is a description of an object's characteristics.

A Class is code that specifies the **data attributes**(data) and **methods**(functionality) for a particular type of object.

Think of a Class as a "blueprint" from which objects may be created.

(You could think of a Class as a cookie cutter, and the objects created from the class as cookies.)

As we mentioned earlier: Classes are a fundamental concept in Python, and you may be using them without even realizing it. Here are some more common examples of how classes are used in Python:

Built-in Classes:
Python provides many built-in classes that you use regularly. For instance:
```str```: When you create a string like ```my_string = "Hello, World!"```, you are using the str class.
```list``` and ```dict```: Similarly, when you create lists and dictionaries, you are using the ```list``` and ```dict``` classes, respectively.

# Creating Custom Classes
You can define your own classes to represent real-world objects or abstract concepts. For example:

In [None]:
# Define the 'Insect' class
# The Insect class holds the insect information
class Insect:
    # the __init__ method initializes the attributes (the data)
    def __init__(self, name, legs, wings=False):
        self.name = name
        self.legs = legs
        self.wings = wings

    # the make_sound method sets the sound attribute
    def make_sound(self):
        return "No specific sound"

# Create instances of 'Insect' to represent different insects
# Each object that is created from a Class is called an instance of the class
butterfly = Insect("Butterfly", 6, True)
ant = Insect("Ant", 6, False)
bee = Insect("Bee", 6, True)

# Access attributes and methods
print(f"{butterfly.name} has {butterfly.legs} legs and {'wings' if butterfly.wings else 'no wings'}.")
print(f"{ant.name} has {ant.legs} legs and {'wings' if ant.wings else 'no wings'}.")
print(f"{bee.name} has {bee.legs} legs and {'wings' if bee.wings else 'no wings'}.")
print(f"{butterfly.name} makes the sound: {butterfly.make_sound()}")
print(f"{ant.name} makes the sound: {ant.make_sound()}")
print(f"{bee.name} makes the sound: {bee.make_sound()}")


In this example:

We have a single class called Insect that represents insects. The class has attributes name, legs, and an optional attribute wings. It also has a make_sound method, which returns a generic insect sound.
We create instances of the Insect class (butterfly, ant, and bee) to represent different insects.
We access the attributes and methods of each Insect instance to display information about the insects, including their names, the number of legs, the presence of wings, and the sounds they make.
This code demonstrates how you can create instances of a single class to represent various insects, each with its own attributes and behaviors.





# Let's revisit Project 1:

# Project 1, Gene Finding!

## Step one: duplicate this notebook and set the sharing settings to COMMENT!

In [None]:
# you can run this cell to import the fasta file we are using:
!wget https://raw.githubusercontent.com/Peziza/306/main/X73525.fasta

--2023-09-22 18:28:29--  https://raw.githubusercontent.com/Peziza/306/main/X73525.fasta
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 6547 (6.4K) [text/plain]
Saving to: ‘X73525.fasta’


2023-09-22 18:28:29 (61.1 MB/s) - ‘X73525.fasta’ saved [6547/6547]



### 1. Reading the file
Write a function that reads a fasta file that has one sequence in it. It should take as input a filename, and it should return a string of DNA. Call your function with the input `'X73525.fasta'` and store the output in a variable called `dna_string`. Test that it worked by printing the length of `dna_string` after you call the function. You should get `6387`.
* Remember to use .strip() as you read through the lines of the file to remove newlines!

In [None]:
def fasta_reader(fname):
    '''
    given a fasta file name input with one sequence, return a string with all
    the bases. It records the sequence name line, but doesn't output it
    '''
    with open(fname, 'r') as f: # here we are using the class file and using the open() function
        DNA_string = ''
        for line in f:
            if line[0] == '>':
                title_line = line.strip()
            else:
                DNA_string += line.strip()
    return DNA_string

dna_string = fasta_reader('X73525.fasta') # here we have defined an object dna_string that belongs to the String class. We can use string methods on this!
print(type(dna_string))
print(len(dna_string))

<class 'str'>
6387


### 2. Making the reverse complement
Since genes can also occur on the other strand of DNA (also in the 5'->3' direction), we need to look for genes in the reverse complement too. In the block below, write a function (or you can copy-and-paste from your homework) that takes as input a string of DNA and returns a string that is the reverse complement. Call this function with `dna_string` as your input and store the output in a variable called `r_dna_string`. Again, check its length, which should also be `6387`.

In [None]:
def reverse_comp(s):
    '''returns the reverse complement of a DNA string'''
    new_s = ''               # make an empty string to hold our reverse-complemented DNA
    reversed_s = s[::-1]     # reverses DNA string
    for base in reversed_s:  # loop over bases in the reversed string
        if base == 'A':
            new_s += 'T'
        elif base == 'T':
            new_s += 'A'
        elif base == 'G':
            new_s += 'C'
        elif base == 'C':
            new_s += 'G'
    return new_s    # return the new string once we're done with the loop

r_dna_string = reverse_comp(dna_string)
print(len(r_dna_string))


Up until this point, we have written two functions: ```fasta_reader``` and ```reverse_comp```.

If we were to break it down even further we have:


1. The `fasta_reader` function reads a FASTA file, extracts the DNA sequence (without the title line), and returns it as a string.

2. The `reverse_comp` function takes a DNA string as input, computes its reverse complement, and returns the reverse complemented DNA string.

3. Outside of these functions, the code calls `fasta_reader` to read a FASTA file called 'X73525.fasta' and stores the DNA sequence in the `dna_string` variable. Then, it calls the `reverse_comp` function to compute the reverse complement of `dna_string` and stores it in the `r_dna_string` variable.

4. Finally, the code prints the length of `r_dna_string`.


The code is written in a **procedural style**, which means it doesn't use **Object-Oriented Programming (OOP)** concepts. In procedural programming, code is organized around functions, and data is often passed between functions as arguments.



While this code is not based on OOP principles, it is a functional and straightforward way to accomplish the task of reading a FASTA file, computing a reverse complement, and displaying the result. OOP would introduce the concept of classes and objects to encapsulate and organize the functionality, but in this case, a procedural approach is sufficient.

# Below is an example using OOP for the first part of Project 1:

In [None]:
# Defines the 'DNASequence' class
# The DNASequence class holds the information (data and functionality) of DNA sequences
class DNASequence:
    def __init__(self, fname):
        # Constructor: Initializes the DNASequence object with a filename (fname).
        # It also calls the _read_fasta method to read the DNA sequence from the file.
        self.fname = fname # attributes
        self.sequence = self._read_fasta() # calls a method to read the DNA sequence from the file

    def _read_fasta(self):
        '''
        Given a fasta file name input with one sequence, return a string with all
        the bases. It records the sequence name line, but doesn't output it.
        '''
        with open(self.fname, 'r') as f:
            DNA_string = ''
            for line in f:
                if line[0] == '>':
                    title_line = line.strip()  # Record the title line (sequence name)
                else:
                    DNA_string += line.strip()  # Concatenate DNA sequence lines
        return DNA_string

    def length(self):
        # Returns the length of the DNA sequence.
        return len(self.sequence)

    def reverse_complement(self):
        '''Returns the reverse complement of the DNA sequence'''
        new_s = ''               # Initialize an empty string to hold the reverse complemented DNA
        reversed_s = self.sequence[::-1]     # Reverse the DNA sequence
        for base in reversed_s:  # Loop over bases in the reversed string
            if base == 'A':
                new_s += 'T'     # Complement 'A' with 'T'
            elif base == 'T':
                new_s += 'A'     # Complement 'T' with 'A'
            elif base == 'G':
                new_s += 'C'     # Complement 'G' with 'C'
            elif base == 'C':
                new_s += 'G'     # Complement 'C' with 'G'
        return new_s    # Return the new string once the loop is done

# Create an instance (objects) of the DNASequence class and perform operations
dna_sequence = DNASequence('X73525.fasta')  # Create an object with a specified FASTA file
print("Length of DNA sequence:", dna_sequence.length())  # Print the length of the DNA sequence

reverse_complement_sequence = dna_sequence.reverse_complement()  # Compute reverse complement
print("Length of reverse complemented DNA sequence:", len(reverse_complement_sequence))  # Print its length


For our project 1, Procedural Programming worked just fine!


Whether the Object-Oriented Programming (OOP) approach is "better" than a procedural approach depends on various factors, including the specific problem you're trying to solve, the complexity of your code, and your team's coding style and preferences. Both OOP and procedural programming have their strengths and use cases:

**Advantages of the OOP Approach:**

1. **Modularity and Encapsulation:** OOP encourages the organization of code into classes and objects, which can encapsulate data and behavior together. This makes it easier to manage and maintain code as it grows.

2. **Reusability:** OOP promotes code reusability. Once you've defined a class, you can create multiple instances of that class, reducing code duplication.

3. **Abstraction:** OOP allows you to abstract complex systems into simpler, more understandable components (objects). This can improve code readability and comprehension.

4. **Inheritance:** Inheritance allows you to create new classes that inherit attributes and methods from existing classes. This promotes code reuse and helps represent real-world relationships.

5. **Polymorphism:** Polymorphism enables objects of different classes to be treated as objects of a common superclass, allowing for more flexible and generic code.

**Advantages of the Procedural Approach:**

1. **Simplicity:** Procedural code tends to be simpler and easier to understand for smaller scripts and straightforward tasks.

2. **Performance:** In some cases, procedural code can be more efficient than OOP code, especially for low-level operations.

3. **Compatibility:** Procedural code may be more compatible with certain programming paradigms or environments where OOP is not well-suited.

**Which Approach to Choose:**

- For simple scripts and tasks, the procedural approach may be more straightforward and efficient.

- For complex systems, software architecture, or large-scale projects, OOP is often preferred because it provides a more organized and maintainable structure. It encourages best practices such as encapsulation, modularity, and abstraction.

- It's also common to use a combination of both paradigms when appropriate. For instance, a large OOP-based system might use procedural-style functions for certain utility tasks.

In summary, the choice between OOP and procedural programming depends on the context and requirements of your project. OOP is typically favored for larger, more complex applications where organization and scalability are crucial, while procedural programming can be suitable for smaller, more focused tasks or when performance is a primary concern. Ultimately, the "better" approach is the one that best suits your specific development needs.

# Additional Resources:

Runestone: Chapter 20 Defining your own Classes

Chapter 53 from the book A Smarter Way to Learn Python by Mark Myers. It also has online exercises to go along with the chapter: http://www.asmarterwaytolearn.com/python/53.html
(I'm trying to find a way to attach the chapter)

These first two websites are places I usually go when I'm working with a new concept or need a refresher:

https://www.w3schools.com/python/python_classes.asp

https://www.geeksforgeeks.org/python-classes-and-objects/

This might be helpful, too:
https://towardsdatascience.com/introduction-to-python-classes-da526ff745df

Here is a link to a free html version of Think Python (I really like this book). Here is the chapter on Classes that might be useful: (might be Python 2 and not Python 3)
https://greenteapress.com/thinkpython2/html/thinkpython2016.html

**Videos:**

4 Pillars of OOP
(programming with Mosh)
https://www.youtube.com/watch?v=pTB0EiLXUC8

Classes and Objects with Python
(CS Dojo)
https://www.youtube.com/watch?v=wfcWRAxRVBA


# The rest of project 1:

### 3. Now it's time for the hard part: finding ORFs.
If you want a challenge without spoilers, try to come up with your own method to find ORFs! Otherwise, **check out the instructions for this part in the Part3_instructions.ipynb file**!

### 4. Remember: there are 6 reading frames
Now you need to use your function (which you don't need to copy and paste below, it is already defined above!) to find the ORFs in the other 6 reading frames. You can store the output of each of these function calls in variables like `orfs2`, `orfs3`, `r_orfs1`, `r_orfs2`, and `r_orfs3`. You can add all these lists together to get one big list of ORFs! (You may have a different way to do this that doesn't use all these variables - that's ok too!). If you followed the algorithm above this list should have 115 entries.
* (Note: if you used your own method for #3 you may have already though about reading frames - this assumes you made a function to find ORFs in one reading frame at a time)

### 5. Filtering by length and translating
Based on our discussions in class, there is reason to believe an ORF is more likely to be a gene if it is longer. Here, you should make a new list called `genes` that has translated versions of all the ORFs you think are long enough to be likely to be real genes (based on our discussion in class or your own simulations of ORF lengths in random DNA (doing these simulations is extra credit)). To do the filtering and translating, you will probably want to:
* Use a for loop
* Use an if statement
* Write a function to translate from DNA sequences to amino acid sequences (which will use the dictionary provided below)

In [None]:
aa_dict = {
    'AAA': 'K',  'AAC': 'N',  'AAG': 'K',  'AAT': 'N',  'ACA': 'T',  'ACC': 'T',
    'ACG': 'T',  'ACT': 'T',  'AGA': 'R',  'AGC': 'S',  'AGG': 'R',  'AGT': 'S',
    'ATA': 'I',  'ATC': 'I',  'ATG': 'M',  'ATT': 'I',  'CAA': 'Q',  'CAC': 'H',
    'CAG': 'Q',  'CAT': 'H',  'CCA': 'P',  'CCC': 'P',  'CCG': 'P',  'CCT': 'P',
    'CGA': 'R',  'CGC': 'R',  'CGG': 'R',  'CGT': 'R',  'CTA': 'L',  'CTC': 'L',
    'CTG': 'L',  'CTT': 'L',  'GAA': 'E',  'GAC': 'D',  'GAG': 'E',  'GAT': 'D',
    'GCA': 'A',  'GCC': 'A',  'GCG': 'A',  'GCT': 'A',  'GGA': 'G',  'GGC': 'G',
    'GGG': 'G',  'GGT': 'G',  'GTA': 'V',  'GTC': 'V',  'GTG': 'V',  'GTT': 'V',
    'TAA': '*',  'TAC': 'Y',  'TAG': '*',  'TAT': 'Y',  'TCA': 'S',  'TCC': 'S',
    'TCG': 'S',  'TCT': 'S',  'TGA': '*',  'TGC': 'C',  'TGG': 'W',  'TGT': 'C',
    'TTA': 'L',  'TTC': 'F',  'TTG': 'L',  'TTT': 'F'
}


### 6. Outputting your list of genes in a fasta file
This one is easy - we've provided a function that takes as input a list of genes and an output filename (your output filename should be `X73525_genes.fasta`) and write the gene sequences to a fasta file you can use on the BLAST website.

In [None]:
def fasta_writer(gene_list, output_filename):
    '''
    given a list of genes and an output_filename, output those genes to a fasta file
    '''
    with open(output_filename, 'w') as f:
        gene_counter = 1
        for gene in gene_list:
            f.write('>gene_'+str(gene_counter)+'\n')
            for i in range(len(gene)//80+1): # writes 80 characters per line
                f.write(gene[i*80:i*80+80] + '\n')
            gene_counter += 1



### 7. One function to rule them all
Look back at the code you've written. Your goal now is to make one function that takes as input an input fasta file name (`X73525.fasta`) and an output fasta file name (`X73525_genes.fasta`) and does all the steps at once. Hint: you will want to use all the code above that is **not** in another function. You do not need to copy-and-paste the functions down here, they are already defined above!

#### Why????
This may seem redundant, but it is actually useful. Now if we were to give you another file with a mystery DNA sequence, you could simply call this one function and quickly have an output file with all the genes your algorithm found in that file!

### Remember:
* Read the instructions!
* Think through your plan and write it out before getting into the coding (we will work on this a bit together in class!)
* Ask for help when you get stuck!

Resources for this notebook:

https://towardsdatascience.com/explaining-python-classes-in-a-simple-way-e3742827c8b5

Gaddis, Starting out with Python 6e, Pearson, 2023

ChatGPT3

<a style='text-decoration:none;line-height:16px;display:flex;color:#5B5B62;padding:10px;justify-content:end;' href='https://deepnote.com?utm_source=created-in-deepnote-cell&projectId=9f51c415-3606-41a2-94cb-a04bdc1ac789' target="_blank">
 </img>
Created in <span style='font-weight:600;margin-left:4px;'>Deepnote</span></a>