# Lecture 4 - Introduction to Python (part IV)
--- 

 

# Lists, Sets, and Dictionaries
## 1. Lists
- **Definition**: An ordered collection of items (can contain duplicates).
- **Key Features**:
  - Mutable (modifiable after creation).
  - Items can be of any data type.
  - Supports indexing and slicing.

### Common Operations
```python
# Creating a list
my_list = [1, 2, 3, 4, 5]

# Accessing elements
print(my_list[0])  # Output: 1

# Modifying elements
my_list[2] = 10

# Adding elements
my_list.append(6)

# Removing elements
my_list.remove(2)

# List comprehension
squared = [x**2 for x in my_list]
```

---

## 2. Sets
- **Definition**: An unordered collection of unique items.
- **Key Features**:
  - Mutable (but elements must be hashable, e.g., no lists).
  - Automatically removes duplicates.
  - No indexing or slicing (since unordered).

### Common Operations
```python
# Creating a set
my_set = {1, 2, 3, 4}

# Adding elements
my_set.add(5)

# Removing elements
my_set.remove(2)

# Set operations
set_a = {1, 2, 3}
set_b = {3, 4, 5}

union = set_a | set_b         # {1, 2, 3, 4, 5}
intersection = set_a & set_b  # {3}
difference = set_a - set_b    # {1, 2}
```

---

## 3. Dictionaries
- **Definition**: A collection of key-value pairs.
- **Key Features**:
  - Keys must be unique and immutable.
  - Values can be of any type.
  - Provides fast lookups by key.

### Common Operations
```python
# Creating a dictionary
my_dict = {'name': 'Alice', 'age': 25}

# Accessing values
print(my_dict['name'])  # Output: Alice

# Modifying values
my_dict['age'] = 26

# Adding new key-value pairs
my_dict['city'] = 'New York'

# Removing key-value pairs
del my_dict['city']

# Iterating through keys and values
for key, value in my_dict.items():
    print(f"{key}: {value}")
```

---

## Summary Table

| Feature         | List                  | Set                   | Dictionary                    |
|------------------|-----------------------|-----------------------|-------------------------------|
| Ordered          | Yes                  | No                    | No (insertion order preserved in Python 3.7+) |
| Allows Duplicates| Yes                  | No                    | Keys: No; Values: Yes        |
| Mutable          | Yes                  | Yes                   | Yes                          |
| Indexing         | Yes                  | No                    | No                           |
| Common Use Case  | Sequential data      | Unique items          | Key-value lookups            |

---

### Practical Examples
1. Use **lists** when maintaining an ordered collection or sequence (e.g., processing a queue).
2. Use **sets** to handle unique elements and perform mathematical set operations.
3. Use **dictionaries** for mapping relationships or performing fast lookups.

In [1]:
my_list = ["Name", 12, True, 12.1] 
my_list.append("Age")
my_list.append("Age")
my_list.append("Age")

print(my_list)

my_list.insert(0,5)
print(my_list)

squares = [x**2 for x in range(1,21)]
print(squares)


set_a = {1,2,3}
set_b = {3}

set_b.issubset(set_a)
set_b <= set_a


my_dict = {"names":["Ali","Sarah"],"age":[23,22]}

print(my_dict)
print(my_dict['names'])

my_dict["names"][0] = "Mohammad"


print(my_dict.keys())
print(my_dict.values())

['Name', 12, True, 12.1, 'Age', 'Age', 'Age']
[5, 'Name', 12, True, 12.1, 'Age', 'Age', 'Age']
[1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361, 400]
{'names': ['Ali', 'Sarah'], 'age': [23, 22]}
['Ali', 'Sarah']
dict_keys(['names', 'age'])
dict_values([['Mohammad', 'Sarah'], [23, 22]])


# Example 1: Write a Function to Find the Reverse Complement of a DNA Sequence

In this exercise, you will write a Python function that takes a DNA sequence as input and returns its **reverse complement**.

## What Is the Reverse Complement?

The **reverse complement** of a DNA sequence is formed by two steps:
1. **Complement**: Replace each nucleotide with its complement:
    - `A` (adenine) → `T` (thymine)
    - `T` (thymine) → `A` (adenine)
    - `C` (cytosine) → `G` (guanine)
    - `G` (guanine) → `C` (cytosine)

2. **Reverse**: Reverse the order of the complemented sequence.

### Example:
Original Sequence:  
`ATGC`

Complement:  
`TACG`

Reverse Complement:  
`GCAT`

## Task

1. Write a function named `reverse_complement` that:
   - Accepts a DNA sequence as input.
   - Returns its reverse complement.

2. Write test cases for your function to make sure it works as expected.

3. Read the DNA sequence from a **FASTA file** named `Ecoli_genome.fasta`. Assume the file contains a valid DNA sequence.

4. Let's spend some time thinking about the number of operations it took to generate the reverse complement for a given DNA sequence of length n. 

### Steps to Implement:

1. **Parse the FASTA File**:
   Use Python to open and read the `Ecoli_genome.fasta` file. Skip the header line (starting with `>`), and extract the sequence.

2. **Write the Function**:
   Define the `reverse_complement` function to compute the reverse complement of the sequence.

3. **Test the Function**:
   Print the reverse complement of the sequence read from the FASTA file.


### Challenge:
Write your solution to handle longer sequences effectively!


In [2]:
# Let's spend some time working on this problem 

complement = {"A":"T","T":"A","C":"G","G":"C"}

def reverse_complement(DNA):
    reverse_DNA = ""
    for nt in DNA: 
        comp_DNA = complement[nt]
        reverse_DNA = reverse_DNA + comp_DNA
    reverse_DNA = reverse_DNA[::-1]
    return reverse_DNA


In [3]:
test_DNA = "ATTTCGTTTAA"

rev_comp_DNA = reverse_complement(test_DNA)

print(rev_comp_DNA)


TTAAACGAAAT


# Functions vs. Methods in Python

Functions and methods are both used to perform specific tasks in Python, but they differ in their context and how they are called.

## What Are Classes and Instances?

In Python, a **class** is a blueprint for creating objects. Objects are instances of classes, and they encapsulate data (attributes) and behaviors (methods). Classes are important because they help organize and structure your code around real-world entities.

### Example: `str` as a Class
The `str` type in Python is an example of a class. When you define a string, like:
```python
my_string = "Hello, World!"
```
`my_string` is an **instance** of the `str` class. It inherits all the methods and properties defined in the `str` class.

### Why Are Methods Defined?
**Methods** are functions that are defined within a class and operate on instances of that class. They are bound to the object and are called using the syntax:
```python
instance.method()
```
Methods allow objects to act in ways specific to their class. For example, the `count()` method for strings can count the occurrences of a substring.

---

## Difference Between Functions and Methods

| **Feature**       | **Functions**                             | **Methods**                               |
|--------------------|-------------------------------------------|-------------------------------------------|
| Definition         | Standalone, defined using `def`.         | Defined inside a class.                   |
| Binding            | Not bound to any object.                 | Bound to an instance of a class.          |
| Call Syntax        | `function_name(arguments)`               | `instance.method(arguments)`              |
| Example            | `len(my_string)`                         | `my_string.count('a')`                    |

---

## Exercise: Implement GC Content Using String Methods

We will now reimplement the GC content example using the methods defined for strings.

### Task
Write a method that calculates the GC content of a DNA sequence. Use string methods like `count` to simplify your implementation.

### Example Implementation:
```python
def gc_content(sequence):
    """
    Calculate the GC content of a DNA sequence using string methods.
    """
    g_count = sequence.count('G')
    c_count = sequence.count('C')
    gc_percentage = (g_count + c_count) / len(sequence) * 100
    return round(gc_percentage, 2)

# Example usage:
sequence = "ATGCGCGT"
print(gc_content(sequence))  # Output: 75.0
```

### Why Use String Methods?
String methods like `count` are optimized for operations on strings and make the code more concise and readable. They also reduce the need for manual iteration through the sequence.

---

### Challenge:
Refactor the function into a method of a custom `DNA` class. This will demonstrate how to encapsulate sequence-specific behaviors in a class structure.
```python
class DNA:
    def __init__(self, sequence):
        self.sequence = sequence

    def gc_content(self):
        """
        Calculate the GC content of the DNA sequence.
        """
        g_count = self.sequence.count('G')
        c_count = self.sequence.count('C')
        gc_percentage = (g_count + c_count) / len(self.sequence) * 100
        return round(gc_percentage, 2)

# Example usage:
my_dna = DNA("ATGCGCGT")
print(my_dna.gc_content())  # Output: 75.0
```
This approach demonstrates the power of object-oriented programming by binding data (the sequence) and behavior (methods) into a single object.

---

Understanding the difference between functions and methods, as well as the concept of classes, is essential for writing efficient and organized Python code.


# Example 3: Define a `Sequence` Class

In this exercise, you will define a class named `Sequence` to represent biological sequences. The class will encapsulate attributes and methods to handle DNA and RNA sequences and perform common operations.

---

## Task

1. Create a class called `Sequence` with the following:
    - **Attributes**:
        - `sequence`: The nucleotide sequence (string).
        - `is_rna`: A boolean flag indicating whether the sequence is RNA (`True`) or DNA (`False`).

    - **Methods**:
        1. **`reverse`**:
            - Returns the reverse of the sequence.
        2. **`complement`**:
            - Returns the complement of the sequence.
            - Use `A ↔ T`, `C ↔ G` for DNA and `A ↔ U`, `C ↔ G` for RNA.
        3. **`reverse_complement`**:
            - Returns the reverse complement of the sequence.
        4. **`gc_content`**:
            - Calculates and returns the GC content of the sequence as a percentage.

---

## Example Usage:

```python
# Create a DNA sequence instance
dna_seq = Sequence("ATGCGCGT", is_rna=False)

# Reverse the sequence
print(dna_seq.reverse())  # Output: TGCGCGTA

# Complement the sequence
print(dna_seq.complement())  # Output: TACGCGCA

# Reverse complement the sequence
print(dna_seq.reverse_complement())  # Output: ACGCGCAT

# Calculate GC content
print(dna_seq.gc_content())  # Output: 75.0

# Create an RNA sequence instance
rna_seq = Sequence("AUGCGCGU", is_rna=True)

# Perform similar operations for RNA
print(rna_seq.reverse())  # Output: UGCGCGUA
print(rna_seq.complement())  # Output: UACGCGCA
print(rna_seq.reverse_complement())  # Output: ACGCGCAU
print(rna_seq.gc_content())  # Output: 75.0
```

---

## Challenge:
1. Extend the `Sequence` class to include additional methods, such as:
    - **Transcription**: Convert a DNA sequence to RNA.
    - **Translation**: Translate an RNA sequence into a protein sequence.
2. Validate sequences on initialization to ensure they contain only valid nucleotides (A, T, C, G for DNA; A, U, C, G for RNA).


In [4]:
# Let's spend some time working on this example 
# Let's spend some time working on this example 
class Sequence:
    def __init__(self, sequence, is_rna=False):
        self.sequence = sequence.upper()
        self.is_rna = is_rna
        self.validate_sequence()

    def validate_sequence(self):
        valid_nucleotides = "AUCG" if self.is_rna else "ATCG"
        for nucleotide in self.sequence:
            if nucleotide not in valid_nucleotides:
                raise ValueError(f"Invalid nucleotide {nucleotide} for {'RNA' if self.is_rna else 'DNA'} sequence.")

    def reverse(self):
        return self.sequence[::-1]

    def complement(self):
        complement_map = {
            "A": "U" if self.is_rna else "T",
            "U": "A" if self.is_rna else None,
            "T": "A",
            "C": "G",
            "G": "C"
        }
        return "".join(complement_map[nuc] for nuc in self.sequence)

    def reverse_complement(self):
        return self.complement()[::-1]

    def gc_content(self):
        gc_count = self.sequence.count("G") + self.sequence.count("C")
        return (gc_count / len(self.sequence)) * 100

    def transcribe(self):
        if self.is_rna:
            raise ValueError("Transcription is not applicable for RNA sequences.")
        return self.sequence.replace("T", "U")

    def translate(self):
        if not self.is_rna:
            raise ValueError("Translation requires an RNA sequence.")
        codon_table = {
            "AUG": "M", "UUU": "F", "UUC": "F", "UUA": "L", "UUG": "L",
            "UCU": "S", "UCC": "S", "UCA": "S", "UCG": "S",
            "UAU": "Y", "UAC": "Y", "UAA": "Stop", "UAG": "Stop",
            "UGU": "C", "UGC": "C", "UGA": "Stop", "UGG": "W",
            "CUU": "L", "CUC": "L", "CUA": "L", "CUG": "L",
            "CCU": "P", "CCC": "P", "CCA": "P", "CCG": "P",
            "CAU": "H", "CAC": "H", "CAA": "Q", "CAG": "Q",
            "CGU": "R", "CGC": "R", "CGA": "R", "CGG": "R",
            "AUU": "I", "AUC": "I", "AUA": "I", "AUG": "M",
            "ACU": "T", "ACC": "T", "ACA": "T", "ACG": "T",
            "AAU": "N", "AAC": "N", "AAA": "K", "AAG": "K",
            "AGU": "S", "AGC": "S", "AGA": "R", "AGG": "R",
            "GUU": "V", "GUC": "V", "GUA": "V", "GUG": "V",
            "GCU": "A", "GCC": "A", "GCA": "A", "GCG": "A",
            "GAU": "D", "GAC": "D", "GAA": "E", "GAG": "E",
            "GGU": "G", "GGC": "G", "GGA": "G", "GGG": "G"
        }
        protein = []
        for i in range(0, len(self.sequence) - 2, 3):
            codon = self.sequence[i:i+3]
            if codon_table.get(codon) == "Stop":
                break
            protein.append(codon_table.get(codon, "?"))
        return "".join(protein)

# Example usage
# DNA Sequence
dna_seq = Sequence("ATGCGCGT", is_rna=False)
print(dna_seq.reverse())             # Output: TGCGCGTA
print(dna_seq.complement())          # Output: TACGCGCA
print(dna_seq.reverse_complement())  # Output: ACGCGCAT
print(dna_seq.gc_content())          # Output: 75.0
print(dna_seq.transcribe())          # Output: AUGCGCGU

# RNA Sequence
rna_seq = Sequence("AUGCGCGU", is_rna=True)
print(rna_seq.reverse())             # Output: UGCGCGUA
print(rna_seq.complement())          # Output: UACGCGCA
print(rna_seq.reverse_complement())  # Output: ACGCGCAU
print(rna_seq.gc_content())          # Output: 75.0
print(rna_seq.translate())           # Output: MR

TGCGCGTA
TACGCGCA
ACGCGCAT
62.5
AUGCGCGU
UGCGCGUA
UACGCGCA
ACGCGCAU
62.5
MR
