# Lecture 4 - Introduction to Python (part IV)
--- 

 

# Example 2: Write a Function to Find the Reverse Complement of a DNA Sequence

In this exercise, you will write a Python function that takes a DNA sequence as input and returns its **reverse complement**.

## What Is the Reverse Complement?

The **reverse complement** of a DNA sequence is formed by two steps:
1. **Complement**: Replace each nucleotide with its complement:
    - `A` (adenine) → `T` (thymine)
    - `T` (thymine) → `A` (adenine)
    - `C` (cytosine) → `G` (guanine)
    - `G` (guanine) → `C` (cytosine)

2. **Reverse**: Reverse the order of the complemented sequence.

### Example:
Original Sequence:  
`ATGC`

Complement:  
`TACG`

Reverse Complement:  
`GCAT`

## Task

1. Write a function named `reverse_complement` that:
   - Accepts a DNA sequence as input.
   - Returns its reverse complement.

2. Write test cases for your function to make sure it works as expected.

3. Read the DNA sequence from a **FASTA file** named `Ecoli_genome.fasta`. Assume the file contains a valid DNA sequence.

4. Let's spend some time thinking about the number of operations it took to generate the reverse complement for a given DNA sequence of length n. 

### Steps to Implement:

1. **Parse the FASTA File**:
   Use Python to open and read the `Ecoli_genome.fasta` file. Skip the header line (starting with `>`), and extract the sequence.

2. **Write the Function**:
   Define the `reverse_complement` function to compute the reverse complement of the sequence.

3. **Test the Function**:
   Print the reverse complement of the sequence read from the FASTA file.


### Challenge:
Write your solution to handle longer sequences effectively!


In [None]:
# Let's spend some time working on this problem 











# Functions vs. Methods in Python

Functions and methods are both used to perform specific tasks in Python, but they differ in their context and how they are called.

## What Are Classes and Instances?

In Python, a **class** is a blueprint for creating objects. Objects are instances of classes, and they encapsulate data (attributes) and behaviors (methods). Classes are important because they help organize and structure your code around real-world entities.

### Example: `str` as a Class
The `str` type in Python is an example of a class. When you define a string, like:
```python
my_string = "Hello, World!"
```
`my_string` is an **instance** of the `str` class. It inherits all the methods and properties defined in the `str` class.

### Why Are Methods Defined?
**Methods** are functions that are defined within a class and operate on instances of that class. They are bound to the object and are called using the syntax:
```python
instance.method()
```
Methods allow objects to act in ways specific to their class. For example, the `count()` method for strings can count the occurrences of a substring.

---

## Difference Between Functions and Methods

| **Feature**       | **Functions**                             | **Methods**                               |
|--------------------|-------------------------------------------|-------------------------------------------|
| Definition         | Standalone, defined using `def`.         | Defined inside a class.                   |
| Binding            | Not bound to any object.                 | Bound to an instance of a class.          |
| Call Syntax        | `function_name(arguments)`               | `instance.method(arguments)`              |
| Example            | `len(my_string)`                         | `my_string.count('a')`                    |

---

## Exercise: Implement GC Content Using String Methods

We will now reimplement the GC content example using the methods defined for strings.

### Task
Write a method that calculates the GC content of a DNA sequence. Use string methods like `count` to simplify your implementation.

### Example Implementation:
```python
def gc_content(sequence):
    """
    Calculate the GC content of a DNA sequence using string methods.
    """
    g_count = sequence.count('G')
    c_count = sequence.count('C')
    gc_percentage = (g_count + c_count) / len(sequence) * 100
    return round(gc_percentage, 2)

# Example usage:
sequence = "ATGCGCGT"
print(gc_content(sequence))  # Output: 75.0
```

### Why Use String Methods?
String methods like `count` are optimized for operations on strings and make the code more concise and readable. They also reduce the need for manual iteration through the sequence.

---

### Challenge:
Refactor the function into a method of a custom `DNA` class. This will demonstrate how to encapsulate sequence-specific behaviors in a class structure.
```python
class DNA:
    def __init__(self, sequence):
        self.sequence = sequence

    def gc_content(self):
        """
        Calculate the GC content of the DNA sequence.
        """
        g_count = self.sequence.count('G')
        c_count = self.sequence.count('C')
        gc_percentage = (g_count + c_count) / len(self.sequence) * 100
        return round(gc_percentage, 2)

# Example usage:
my_dna = DNA("ATGCGCGT")
print(my_dna.gc_content())  # Output: 75.0
```
This approach demonstrates the power of object-oriented programming by binding data (the sequence) and behavior (methods) into a single object.

---

Understanding the difference between functions and methods, as well as the concept of classes, is essential for writing efficient and organized Python code.


# Example 3: Define a `Sequence` Class

In this exercise, you will define a class named `Sequence` to represent biological sequences. The class will encapsulate attributes and methods to handle DNA and RNA sequences and perform common operations.

---

## Task

1. Create a class called `Sequence` with the following:
    - **Attributes**:
        - `sequence`: The nucleotide sequence (string).
        - `is_rna`: A boolean flag indicating whether the sequence is RNA (`True`) or DNA (`False`).

    - **Methods**:
        1. **`reverse`**:
            - Returns the reverse of the sequence.
        2. **`complement`**:
            - Returns the complement of the sequence.
            - Use `A ↔ T`, `C ↔ G` for DNA and `A ↔ U`, `C ↔ G` for RNA.
        3. **`reverse_complement`**:
            - Returns the reverse complement of the sequence.
        4. **`gc_content`**:
            - Calculates and returns the GC content of the sequence as a percentage.

---

## Example Usage:

```python
# Create a DNA sequence instance
dna_seq = Sequence("ATGCGCGT", is_rna=False)

# Reverse the sequence
print(dna_seq.reverse())  # Output: TGCGCGTA

# Complement the sequence
print(dna_seq.complement())  # Output: TACGCGCA

# Reverse complement the sequence
print(dna_seq.reverse_complement())  # Output: ACGCGCAT

# Calculate GC content
print(dna_seq.gc_content())  # Output: 75.0

# Create an RNA sequence instance
rna_seq = Sequence("AUGCGCGU", is_rna=True)

# Perform similar operations for RNA
print(rna_seq.reverse())  # Output: UGCGCGUA
print(rna_seq.complement())  # Output: UACGCGCA
print(rna_seq.reverse_complement())  # Output: ACGCGCAU
print(rna_seq.gc_content())  # Output: 75.0
```

---

## Challenge:
1. Extend the `Sequence` class to include additional methods, such as:
    - **Transcription**: Convert a DNA sequence to RNA.
    - **Translation**: Translate an RNA sequence into a protein sequence.
2. Validate sequences on initialization to ensure they contain only valid nucleotides (A, T, C, G for DNA; A, U, C, G for RNA).


In [None]:
# Let's spend some time working on this example 






