<img src="https://raw.githubusercontent.com/AI-MIND-Lab/MedPython/main/Images/ai-mind.png"
     alt="AI MIND Lab Logo"
     width="150"
     height="150">
<br>

**Repository Developed by Pegah Khosravi, Principal Investigator of the AI MIND Lab**

Welcome to this repository! This repository is a result of collaborative efforts from our dedicated team at the lab. We are committed to advancing the field of Medical AI and pushing the boundaries of medical data analysis. Your interest and contributions to our work are greatly appreciated. For more information about our lab and ongoing projects, please visit the [AI MIND Lab website](https://sites.google.com/view/aimindlab/home). Thank you for your interest and support!


#Week 2: Data Structures and Functions

**Objective: Write structured, reusable code and use core data structures.**

- Defining and calling functions; parameters and return values
- Lists, tuples, and dictionaries for storing biological data
- Practical examples (sequences, annotations, patient records)
- Basic error handling and debugging
- Short exercises to reinforce concepts


#Introduction to Functions

**Overview:**

Functions are one of the most powerful features in Python. They allow you to encapsulate a block of code into a reusable piece that performs a specific task. This modular approach makes your code more organized, easier to understand, and more maintainable. Functions help avoid repetition and make your code more concise and readable.

**a) Defining Functions**

**What is a Function?**

A function is a named block of code designed to perform a specific task. Functions can take inputs (called parameters), process them, and return a result. Once defined, a function can be used (or called) as many times as needed.

**Syntax:**

def function_name(parameters):
    # Code block that performs a specific task
    return value  # Optional, returns the result

- def: This keyword is used to define a function.
- function_name: This is the name you give to your function. It should be descriptive of what the function does.
- parameters: These are the inputs you pass into the function. They are optional, and you can define a function with or without parameters.
- return: This statement is optional. It is used to send a result back to the caller of the function.

**Example:** Function to Calculate GC Content

In [1]:
# Example: Function to calculate GC content
def calculate_gc_content(dna_sequence):
    g_count = dna_sequence.count('G')
    c_count = dna_sequence.count('C')
    gc_content = (g_count + c_count) / len(dna_sequence) * 100
    return gc_content

**Explanation:**

- Function Name: calculate_gc_content is the name of the function, which clearly indicates its purpose.
- Parameter: dna_sequence is the input parameter. This allows you to pass different DNA sequences to the function.
- Code Block: The function counts the occurrences of 'G' and 'C' in the sequence, calculates the GC content, and stores it in the gc_content variable.
- Return Statement: The function returns the gc_content value as a percentage.


**b) Calling Functions**

Once you've defined a function, you can use it in your code by calling it. Calling a function involves using the function's name followed by parentheses. If the function requires parameters, you pass them inside the parentheses.

**Syntax:**

function_name(arguments)

- function_name: The name of the function you want to call.
- arguments: The actual values or variables you pass to the function's parameters.

**Example:**

In [2]:
# Example: Calling the function
sequence = "ATGCGCGATCGTACGCTAGC"
gc_content = calculate_gc_content(sequence)
print(f"GC Content: {gc_content:.2f}%")

GC Content: 60.00%


**Explanation:**

- Calling the Function: The function calculate_gc_content is called with the DNA sequence "ATGCGCGATCGTACGCTAGC" as the argument.
- Storing the Result: The result of the function (GC content) is stored in the variable gc_content.
- Printing the Result: The GC content is printed, formatted to two decimal places using :.2f.


**c) Parameters and Return Values**

**Parameters:**

Parameters are the inputs that you provide to a function. They allow the function to work with different data each time it's called. You can define functions with multiple parameters if needed.

**Example with Multiple Parameters:**

In [3]:
def greet(name, age):
    print(f"Hello, {name}! You are {age} years old.")

# Calling the function with different arguments
greet("Alice", 30)
greet("Bob", 25)

Hello, Alice! You are 30 years old.
Hello, Bob! You are 25 years old.


**Return Values:**

The return statement is used to send a value back to the caller of the function. Without a return statement, the function will return None by default. You can return any type of value (e.g., numbers, strings, lists, dictionaries).

**Example:**

In [4]:
def add_numbers(a, b):
    return a + b

# Calling the function and using the return value
result = add_numbers(5, 3)
print("The sum is:", result)


The sum is: 8


**Key Points:**

- Parameters allow you to pass data into the function.
- Return Values allow you to get data back from the function after it has performed its task.

**Exercise:**

Write a Python function that checks if a DNA sequence contains any stop codons and returns the position of the first stop codon found.

In [5]:
def find_first_stop_codon(dna_sequence):
    stop_codons = ["TAA", "TAG", "TGA"]
    position = 0

    while position <= len(dna_sequence) - 3:
        codon = dna_sequence[position:position+3]
        if codon in stop_codons:
            return position
        position += 3

    return -1  # Return -1 if no stop codon is found

**Example Usage:**

In [6]:
sequence = "ATGCGCGATCGTAACTAGCTAGGCGTGA"
position = find_first_stop_codon(sequence)

if position != -1:
    print(f"Stop codon found at position {position}")
else:
    print("No stop codon found.")

Stop codon found at position 15


**Explanation:**

Function find_first_stop_codon: This function takes a DNA sequence as input and looks for the first occurrence of a stop codon (TAA, TAG, TGA).

- Loop: The loop iterates through the DNA sequence in chunks of three nucleotides (codons) at a time.
- Return: If a stop codon is found, the function returns its position in the sequence. If no stop codon is found, it returns -1.

**Example Usage:** The script checks for stop codons in the provided sequence and prints their position if found.

#Practical Examples and Exercises

To reinforce the concepts of conditionals, loops, and functions, here are a few exercises that will help you practice these fundamental concepts in Python.


**Exercise 1:** Calculating GC Content for a List of DNA Sequences

Task: Write a function that takes a list of DNA sequences and returns a list of their GC contents.

In [7]:
def gc_content_list(sequences):
    gc_contents = []
    for seq in sequences:
        gc_contents.append(calculate_gc_content(seq))
    return gc_contents

# Example usage:
sequences = ["ATGCGCGATCGTACGCTAGC", "ATATATATAT", "GGGGCCCC"]
gc_contents = gc_content_list(sequences)
print("GC Contents:", gc_contents)


GC Contents: [60.0, 0.0, 100.0]


**Explanation:**

- Purpose: The function gc_content_list is designed to process a list of DNA sequences and calculate the GC content (percentage of G and C nucleotides) for each sequence.
- Looping through Sequences: The function iterates over each sequence in the sequences list. For each sequence, it calculates the GC content using the calculate_gc_content function (assumed to be defined earlier) and appends the result to the gc_contents list.
- Returning Results: After processing all sequences, the function returns the list of GC content values.


**Example Usage:**

- The first sequence has a GC content of 60%.
- The second sequence has no G or C nucleotides, so its GC content is 0%.
- The third sequence is entirely made up of G and C nucleotides, giving it a GC content of 100%.

**Exercise 2:** Reversing a DNA Sequence Without Built-in Functions

Task: Write a script that uses a loop to reverse a DNA sequence without using Python’s built-in reverse functions.

In [8]:
def reverse_sequence(dna_sequence):
    reversed_sequence = ""
    for nucleotide in dna_sequence:
        reversed_sequence = nucleotide + reversed_sequence
    return reversed_sequence

# Example usage:
sequence = "ATGCGCGATCGTACGCTAGC"
reversed_sequence = reverse_sequence(sequence)
print("Reversed Sequence:", reversed_sequence)


Reversed Sequence: CGATCGCATGCTAGCGCGTA


**Explanation:**

- Purpose: The reverse_sequence function reverses the order of nucleotides in a given DNA sequence.
- Building the Reversed Sequence: The function initializes an empty string reversed_sequence. It then iterates over each nucleotide in the original dna_sequence, prepending each nucleotide to the beginning of reversed_sequence. This effectively builds the reversed sequence one nucleotide at a time.
- Returning the Result: After the loop completes, the function returns the fully reversed DNA sequence.

**Example Usage:**

The original sequence "ATGCGCGATCGTACGCTAGC" is reversed to produce "CGATCGTACGCTAGCGCATG".

**Exercise 3: Counting Nucleotides**

Write a function that takes a DNA sequence as input and returns a dictionary with the counts of each nucleotide (A, T, G, C).

In [9]:
def count_nucleotides(dna_sequence):
    nucleotide_counts = {'A': 0, 'T': 0, 'G': 0, 'C': 0}

    for nucleotide in dna_sequence:
        if nucleotide in nucleotide_counts:
            nucleotide_counts[nucleotide] += 1

    return nucleotide_counts

# Example usage:
sequence = "ATGCGCGATCGTACGCTAGC"
nucleotide_counts = count_nucleotides(sequence)
print("Nucleotide Counts:", nucleotide_counts)


Nucleotide Counts: {'A': 4, 'T': 4, 'G': 6, 'C': 6}


**Explanation:**

- Purpose: The count_nucleotides function counts the occurrences of each nucleotide (A, T, G, C) in a given DNA sequence and stores these counts in a dictionary.
- Initial Counts: The function initializes a dictionary nucleotide_counts with keys for each nucleotide and sets their initial counts to 0.
- Counting Loop: The function iterates over each nucleotide in the DNA sequence. If the nucleotide is one of the recognized nucleotides (A, T, G, C), its count in the dictionary is incremented.
- Returning the Dictionary: After processing the entire sequence, the function returns the dictionary containing the nucleotide counts.

**Example Usage:**

The dictionary shows that the sequence contains 4 adenines (A), 3 thymines (T), 6 guanines (G), and 7 cytosines (C).


**Exercise 4:** Finding the Longest DNA Sequence

Task: Write a function that takes a list of DNA sequences and returns the longest sequence.



In [10]:
def find_longest_sequence(sequences):
    longest_sequence = ""

    for seq in sequences:
        if len(seq) > len(longest_sequence):
            longest_sequence = seq

    return longest_sequence

# Example usage:
sequences = ["ATGCGC", "ATCG", "ATGCGCGATCGTACGCTAGC"]
longest_sequence = find_longest_sequence(sequences)
print("Longest Sequence:", longest_sequence)


Longest Sequence: ATGCGCGATCGTACGCTAGC


**Explanation:**

- Purpose: The find_longest_sequence function iterates through a list of DNA sequences and keeps track of the longest sequence found.
- Comparison: For each sequence in the list, the function compares its length to the length of the current longest_sequence. If it's longer, it replaces longest_sequence.
- Returning the Longest Sequence: After checking all sequences, the function returns the longest one.


**Example Usage:**

The function identifies and returns the longest sequence from the list.

**Exercise 5:** Checking for Palindromic Sequences

Task: Write a function that checks if a DNA sequence is palindromic. A palindromic sequence reads the same forward and backward.

In [11]:
def is_palindromic(dna_sequence):
    reversed_sequence = dna_sequence[::-1]
    return dna_sequence == reversed_sequence

# Example usage:
sequence = "ATCGCTA"
result = is_palindromic(sequence)
print(f"Is the sequence palindromic? {result}")


Is the sequence palindromic? True


**Explanation:**

- Purpose: The is_palindromic function checks if a DNA sequence is the same when read forwards and backwards.
- Reversing the Sequence: The function reverses the DNA sequence using slicing ([::-1]).
- Comparison: It then compares the original sequence to the reversed one and returns True if they are identical (indicating that the sequence is palindromic) and False otherwise.


**Example Usage:**

The function correctly identifies "ATCGCTA" as a palindromic sequence.


**Exercise 6:** Counting Substrings in a DNA Sequence

Task: Write a function that takes a DNA sequence and a substring (such as a codon) and returns the number of times the substring appears in the sequence.

In [12]:
def count_substring(dna_sequence, substring):
    count = 0
    start = 0

    while start <= len(dna_sequence) - len(substring):
        if dna_sequence[start:start+len(substring)] == substring:
            count += 1
        start += 1

    return count

# Example usage:
sequence = "ATGCGCATGCGCGATG"
substring = "ATG"
count = count_substring(sequence, substring)
print(f"The substring '{substring}' appears {count} times in the sequence.")


The substring 'ATG' appears 3 times in the sequence.


**Explanation:**

- Purpose: The count_substring function counts the occurrences of a specific substring (e.g., a codon like "ATG") in a DNA sequence.
- Looping Through the Sequence: The function uses a while loop to check each possible position in the DNA sequence where the substring might start.
- Counting Matches: If the substring matches the corresponding part of the DNA sequence, the count is incremented.
- Returning the Count: After checking the entire sequence, the function returns the total count of occurrences.

**Example Usage:**

The function correctly counts that the substring "ATG" appears three times in the given sequence.

**Exercise 7:** Transcribing DNA to RNA

Task: Write a function that transcribes a DNA sequence into RNA by replacing all occurrences of thymine (T) with uracil (U).



In [13]:
def transcribe_dna_to_rna(dna_sequence):
    rna_sequence = dna_sequence.replace('T', 'U')
    return rna_sequence

# Example usage:
sequence = "ATGCGCATGCGCGATG"
rna_sequence = transcribe_dna_to_rna(sequence)
print("RNA Sequence:", rna_sequence)


RNA Sequence: AUGCGCAUGCGCGAUG


**Explanation:**

- Purpose: The transcribe_dna_to_rna function converts a DNA sequence into an RNA sequence by substituting thymine (T) with uracil (U).
- Using replace: The function uses Python’s built-in replace() method to replace all occurrences of 'T' with 'U' in the DNA sequence.
- Returning the RNA Sequence: The function then returns the transcribed RNA sequence.


**Example Usage:**

The function correctly transcribes the DNA sequence into the corresponding RNA sequence.

**Exercise 8:** Validating a DNA Sequence

Task: Write a function that checks whether a given string is a valid DNA sequence (i.e., it only contains the characters A, T, G, and C).

In [14]:
def is_valid_dna_sequence(dna_sequence):
    valid_nucleotides = {'A', 'T', 'G', 'C'}
    for nucleotide in dna_sequence:
        if nucleotide not in valid_nucleotides:
            return False
    return True

# Example usage:
sequence = "ATGCGCATGCGZ"
is_valid = is_valid_dna_sequence(sequence)
print(f"Is the sequence valid? {is_valid}")


Is the sequence valid? False


**Explanation:**

- Purpose: The is_valid_dna_sequence function verifies whether the input string contains only valid DNA nucleotides (A, T, G, C).
- Checking Each Nucleotide: The function loops through each character in the DNA sequence and checks if it is in the set of valid nucleotides. If it encounters an invalid character, it returns False.
- Returning the Result: If all characters are valid nucleotides, the function returns True.


**Example Usage:**

The function correctly identifies that "ATGCGCATGCGZ" is not a valid DNA sequence due to the presence of the character 'Z'.

#Introduction to Lists and List Operations

**a) What are Lists?**

**Overview:**

A list is an ordered collection of items that can store multiple values, including numbers, strings, and even other lists. Lists are mutable, meaning you can modify their content after creation.

**Syntax:**

Creating a list

my_list = [item1, item2, item3]

**Example:**

In [15]:
# Example: Creating a list of DNA sequences
dna_sequences = ["ATGCGC", "CGTA", "GCTAGC", "TGCATG"]
print("DNA Sequences:", dna_sequences)

DNA Sequences: ['ATGCGC', 'CGTA', 'GCTAGC', 'TGCATG']


**b) Basic List Operations**

1. Accessing List Elements

- Lists use zero-based indexing to access elements. You can use both positive and negative indices.

In [16]:
# Accessing elements using positive indices
first_sequence = dna_sequences[0]
last_sequence = dna_sequences[-1]
print("First Sequence:", first_sequence)
print("Last Sequence:", last_sequence)

First Sequence: ATGCGC
Last Sequence: TGCATG


**2. Modifying List Elements**

- Lists are mutable, so you can change their contents.

In [17]:
# Modifying an element in the list
dna_sequences[1] = "ATCG"
print("Modified DNA Sequences:", dna_sequences)

Modified DNA Sequences: ['ATGCGC', 'ATCG', 'GCTAGC', 'TGCATG']


**3. Adding and Extending Lists**

- You can add elements to a list using append() or extend().

In [18]:
# Adding a single element
dna_sequences.append("CGTAGC")
print("After Appending:", dna_sequences)

# Extending a list with another list
additional_sequences = ["GATCGA", "CTAGTC"]
dna_sequences.extend(additional_sequences)
print("After Extending:", dna_sequences)

After Appending: ['ATGCGC', 'ATCG', 'GCTAGC', 'TGCATG', 'CGTAGC']
After Extending: ['ATGCGC', 'ATCG', 'GCTAGC', 'TGCATG', 'CGTAGC', 'GATCGA', 'CTAGTC']


**4. Removing Elements from a List**

- You can remove elements using remove(), pop(), or del.

In [19]:
# Removing by value
dna_sequences.remove("CGTAGC")
print("After Removing by Value:", dna_sequences)

# Removing by index
removed_sequence = dna_sequences.pop(2)
print("After Popping:", dna_sequences)
print("Removed Sequence:", removed_sequence)

# Deleting an element by index
del dna_sequences[0]
print("After Deleting by Index:", dna_sequences)

After Removing by Value: ['ATGCGC', 'ATCG', 'GCTAGC', 'TGCATG', 'GATCGA', 'CTAGTC']
After Popping: ['ATGCGC', 'ATCG', 'TGCATG', 'GATCGA', 'CTAGTC']
Removed Sequence: GCTAGC
After Deleting by Index: ['ATCG', 'TGCATG', 'GATCGA', 'CTAGTC']


**5. Slicing Lists**

- Slicing allows you to access a subset of a list.

In [20]:
# Slicing a list
first_two_sequences = dna_sequences[:2]
print("First Two Sequences:", first_two_sequences)

First Two Sequences: ['ATCG', 'TGCATG']


**c) Advanced List Operations**

**1. List Comprehensions**

- List comprehensions provide a concise way to create lists.

In [21]:
# Creating a list of sequence lengths using list comprehension
sequence_lengths = [len(seq) for seq in dna_sequences]
print("Sequence Lengths:", sequence_lengths)

Sequence Lengths: [4, 6, 6, 6]


**2. Nested Lists**

- Lists can contain other lists, allowing you to create complex data structures.

In [22]:
# Example of a nested list
nested_sequences = [["ATG", "CGC"], ["CGTA", "TGCATG"]]
print("Nested Sequences:", nested_sequences)

# Accessing elements in a nested list
first_nested_sequence = nested_sequences[0][0]
print("First Nested Sequence:", first_nested_sequence)

Nested Sequences: [['ATG', 'CGC'], ['CGTA', 'TGCATG']]
First Nested Sequence: ATG


**Exercise:**

Write a Python script that creates a list of dictionaries, where each dictionary contains a DNA sequence and its reverse complement.

In [23]:
def reverse_complement(sequence):
    complement = {'A': 'T', 'T': 'A', 'G': 'C', 'C': 'G'}
    return ''.join(complement[base] for base in reversed(sequence))

dna_sequences = ["ATGCGC", "CGTA", "GCTAGC", "TGCATG"]
sequence_data = [{"sequence": seq, "reverse_complement": reverse_complement(seq)} for seq in dna_sequences]
print("Sequence Data:", sequence_data)

Sequence Data: [{'sequence': 'ATGCGC', 'reverse_complement': 'GCGCAT'}, {'sequence': 'CGTA', 'reverse_complement': 'TACG'}, {'sequence': 'GCTAGC', 'reverse_complement': 'GCTAGC'}, {'sequence': 'TGCATG', 'reverse_complement': 'CATGCA'}]


#Working with Tuples and Dictionaries

**a) Tuples**

**Overview:**

Tuples are ordered collections like lists, but they are immutable. This means that once a tuple is created, it cannot be modified. Tuples are useful for storing data that should not change throughout the program.

**Syntax:**

Creating a tuple

my_tuple = (item1, item2, item3)


**Example:**

In [24]:
# Example: Creating a tuple of nucleotide pairs
nucleotide_pairs = ("A-T", "C-G", "G-C", "T-A")
print("Nucleotide Pairs:", nucleotide_pairs)

Nucleotide Pairs: ('A-T', 'C-G', 'G-C', 'T-A')


**1. Accessing Tuple Elements**

- Access tuple elements using indices, similar to lists.

In [25]:
# Accessing elements in a tuple
first_pair = nucleotide_pairs[0]
print("First Nucleotide Pair:", first_pair)

First Nucleotide Pair: A-T


**2. Tuple Unpacking**

- Tuple unpacking allows you to assign the elements of a tuple to multiple variables in a single statement.

In [26]:
# Example: Tuple unpacking
a, b = ("A-T", "C-G")
print("a:", a)
print("b:", b)

a: A-T
b: C-G


**3. Tuples as Dictionary Keys**

- Tuples can be used as keys in dictionaries because they are immutable.

In [27]:
# Example: Using tuples as dictionary keys
codon_frequencies = {
    ("AUG", "Methionine"): 10,
    ("UAA", "Stop"): 5
}
print("Codon Frequencies:", codon_frequencies)

Codon Frequencies: {('AUG', 'Methionine'): 10, ('UAA', 'Stop'): 5}


**b) Dictionaries**

**Overview:**

Dictionaries are collections of key-value pairs, where each key maps to a specific value. They are ideal for storing data with a clear relationship between elements, such as gene names and their annotations or sequences.

**Syntax:**

Creating a dictionary

my_dict = {
    "key1": "value1",
    "key2": "value2"
}

**Example:**


In [28]:
# Example: Creating a dictionary to store gene annotations
gene_annotations = {
    "BRCA1": "Breast cancer susceptibility gene",
    "TP53": "Tumor protein p53",
    "EGFR": "Epidermal growth factor receptor"
}
print("Gene Annotations:", gene_annotations)

Gene Annotations: {'BRCA1': 'Breast cancer susceptibility gene', 'TP53': 'Tumor protein p53', 'EGFR': 'Epidermal growth factor receptor'}


**1. Accessing and Modifying Dictionary Values**

- Access values using keys, and modify them by assigning new values to existing keys.

In [29]:
# Accessing a value in a dictionary
annotation = gene_annotations["BRCA1"]
print("BRCA1 Annotation:", annotation)

# Modifying a value in a dictionary
gene_annotations["BRCA1"] = "Updated description for BRCA1"
print("Updated Gene Annotations:", gene_annotations)

BRCA1 Annotation: Breast cancer susceptibility gene
Updated Gene Annotations: {'BRCA1': 'Updated description for BRCA1', 'TP53': 'Tumor protein p53', 'EGFR': 'Epidermal growth factor receptor'}


**2. Adding and Removing Key-Value Pairs**

- Add new key-value pairs or remove existing ones.

In [30]:
# Adding a new key-value pair
gene_annotations["MYC"] = "Myelocytomatosis oncogene"
print("After Adding MYC:", gene_annotations)

# Removing a key-value pair
del gene_annotations["TP53"]
print("After Removing TP53:", gene_annotations)

After Adding MYC: {'BRCA1': 'Updated description for BRCA1', 'TP53': 'Tumor protein p53', 'EGFR': 'Epidermal growth factor receptor', 'MYC': 'Myelocytomatosis oncogene'}
After Removing TP53: {'BRCA1': 'Updated description for BRCA1', 'EGFR': 'Epidermal growth factor receptor', 'MYC': 'Myelocytomatosis oncogene'}


**3. Dictionary Methods**

- Explore commonly used dictionary methods like keys(), values(), and items().

In [31]:
# Getting all keys, values, and items
keys = gene_annotations.keys()
values = gene_annotations.values()
items = gene_annotations.items()

print("Keys:", keys)
print("Values:", values)
print("Items:", items)

Keys: dict_keys(['BRCA1', 'EGFR', 'MYC'])
Values: dict_values(['Updated description for BRCA1', 'Epidermal growth factor receptor', 'Myelocytomatosis oncogene'])
Items: dict_items([('BRCA1', 'Updated description for BRCA1'), ('EGFR', 'Epidermal growth factor receptor'), ('MYC', 'Myelocytomatosis oncogene')])


**Exercise:**

Write a Python script that creates a dictionary where each key is a gene name, and the value is a tuple containing the gene's sequence and its GC content.

In [32]:
def calculate_gc_content(sequence):
    g_count = sequence.count('G')
    c_count = sequence.count('C')
    return (g_count + c_count) / len(sequence) * 100

gene_data = {
    "BRCA1": ("ATGCGC", calculate_gc_content("ATGCGC")),
    "TP53": ("CGTA", calculate_gc_content("CGTA")),
    "EGFR": ("GCTAGC", calculate_gc_content("GCTAGC"))
}
print("Gene Data Dictionary:", gene_data)

Gene Data Dictionary: {'BRCA1': ('ATGCGC', 66.66666666666666), 'TP53': ('CGTA', 50.0), 'EGFR': ('GCTAGC', 66.66666666666666)}


#Practical Examples Involving Biological Data

**Example 1:** Storing and Accessing Gene Expression Data

Task: Store gene expression levels in a dictionary and retrieve the expression level of a specific gene.

**Code:**

In [33]:
# Gene expression levels stored in a dictionary
gene_expression = {
    "BRCA1": 12.5,
    "TP53": 8.3,
    "EGFR": 14.7,
    "MYC": 20.2
}

# Accessing the expression level of a specific gene
gene = "TP53"
expression_level = gene_expression.get(gene, "Gene not found")
print(f"Expression level of {gene}: {expression_level}")

Expression level of TP53: 8.3


**Example 2:** Creating a List of Tuples to Store Multiple Data Points

Task: Store information about multiple genes, including their name, sequence, and expression level, using a list of tuples.

**Code:**

In [34]:
# List of tuples storing gene data
gene_data = [
    ("BRCA1", "ATGCGC", 12.5),
    ("TP53", "CGTA", 8.3),
    ("EGFR", "GCTAGC", 14.7)
]

# Accessing the data for a specific gene
for gene in gene_data:
    name, sequence, expression = gene
    print(f"Gene: {name}, Sequence: {sequence}, Expression Level: {expression}")

Gene: BRCA1, Sequence: ATGCGC, Expression Level: 12.5
Gene: TP53, Sequence: CGTA, Expression Level: 8.3
Gene: EGFR, Sequence: GCTAGC, Expression Level: 14.7


**Example 3:** Combining Lists, Tuples, and Dictionaries

Task: Combine lists, tuples, and dictionaries to create a complex data structure for storing and analyzing biological data.

**Code:**

In [35]:
# Complex data structure combining lists, tuples, and dictionaries
biological_data = {
    "genes": [
        {"name": "BRCA1", "sequence": "ATGCGC", "expression_level": 12.5},
        {"name": "TP53", "sequence": "CGTA", "expression_level": 8.3},
        {"name": "EGFR", "sequence": "GCTAGC", "expression_level": 14.7}
    ],
    "annotations": ("BRCA1", "Breast cancer susceptibility gene"),
    "experiment_dates": ["2023-01-01", "2023-02-01", "2023-03-01"]
}

# Accessing specific parts of the complex data structure
first_gene = biological_data["genes"][0]["name"]
experiment_date = biological_data["experiment_dates"][1]
annotation = biological_data["annotations"][1]

print(f"First Gene: {first_gene}")
print(f"Second Experiment Date: {experiment_date}")
print(f"Annotation for BRCA1: {annotation}")

First Gene: BRCA1
Second Experiment Date: 2023-02-01
Annotation for BRCA1: Breast cancer susceptibility gene


#Exercises to Solidify Understanding

**Exercise 1:**

Create a list of gene names and a corresponding list of gene sequences. Write a function that takes these lists as input and returns a dictionary where each gene name is a key and its corresponding sequence is the value.

In [36]:
def create_gene_dict(gene_names, gene_sequences):
    return dict(zip(gene_names, gene_sequences))

gene_names = ["BRCA1", "TP53", "EGFR"]
gene_sequences = ["ATGCGC", "CGTA", "GCTAGC"]
gene_dict = create_gene_dict(gene_names, gene_sequences)
print("Gene Dictionary:", gene_dict)

Gene Dictionary: {'BRCA1': 'ATGCGC', 'TP53': 'CGTA', 'EGFR': 'GCTAGC'}


**Exercise 2:**

Given a list of tuples where each tuple contains a gene name and its expression level, write a script that filters out genes with expression levels below a certain threshold and stores the remaining genes in a dictionary.

In [37]:
def filter_gene_expression(gene_data, threshold):
    return {name: expression for name, expression in gene_data if expression >= threshold}

gene_data = [("BRCA1", 12.5), ("TP53", 8.3), ("EGFR", 14.7), ("MYC", 7.1)]
filtered_genes = filter_gene_expression(gene_data, 10.0)
print("Filtered Genes:", filtered_genes)

Filtered Genes: {'BRCA1': 12.5, 'EGFR': 14.7}


**Exercise 3:**

Write a Python script that creates a nested dictionary where each key is a gene name, and the value is another dictionary containing the gene's sequence, its length, and its GC content.

In [38]:
def calculate_gc_content(sequence):
    g_count = sequence.count('G')
    c_count = sequence.count('C')
    return (g_count + c_count) / len(sequence) * 100

gene_sequences = {
    "BRCA1": "ATGCGC",
    "TP53": "CGTA",
    "EGFR": "GCTAGC"
}

gene_info = {gene: {
    "sequence": seq,
    "length": len(seq),
    "gc_content": calculate_gc_content(seq)
} for gene, seq in gene_sequences.items()}

print("Gene Information Dictionary:", gene_info)

Gene Information Dictionary: {'BRCA1': {'sequence': 'ATGCGC', 'length': 6, 'gc_content': 66.66666666666666}, 'TP53': {'sequence': 'CGTA', 'length': 4, 'gc_content': 50.0}, 'EGFR': {'sequence': 'GCTAGC', 'length': 6, 'gc_content': 66.66666666666666}}


**Exercise 4:**

Given a list of gene names, create a dictionary where each key is a gene name, and the value is a list of experiments (stored as tuples) that were performed on that gene.

In [39]:
gene_names = ["BRCA1", "TP53", "EGFR"]
experiments = [
    ("Experiment1", "2023-01-01", "Positive"),
    ("Experiment2", "2023-02-01", "Negative"),
    ("Experiment3", "2023-03-01", "Positive")
]

gene_experiments = {gene: experiments for gene in gene_names}
print("Gene Experiments Dictionary:", gene_experiments)

Gene Experiments Dictionary: {'BRCA1': [('Experiment1', '2023-01-01', 'Positive'), ('Experiment2', '2023-02-01', 'Negative'), ('Experiment3', '2023-03-01', 'Positive')], 'TP53': [('Experiment1', '2023-01-01', 'Positive'), ('Experiment2', '2023-02-01', 'Negative'), ('Experiment3', '2023-03-01', 'Positive')], 'EGFR': [('Experiment1', '2023-01-01', 'Positive'), ('Experiment2', '2023-02-01', 'Negative'), ('Experiment3', '2023-03-01', 'Positive')]}


#Error Handling and Debugging

**1. Understanding Python Errors**

**Overview:**

In Python, errors are often referred to as exceptions. When Python encounters an error, it raises an exception, which can halt the execution of your code. Understanding these errors is the first step in debugging.

**Common Error Types:**

1. SyntaxError:

- What It Is: Raised when the Python interpreter encounters incorrect syntax.

- Example:

In [40]:
# Example of a SyntaxError
print("Hello, world"  # Missing closing parenthesis

SyntaxError: incomplete input (ipython-input-3033378288.py, line 2)

**Explanation:**

The missing closing parenthesis causes a SyntaxError, which prevents the code from running.

2. TypeError:

- What It Is: Raised when an operation or function is applied to an object of inappropriate type.

- Example:

In [41]:
# Example of a TypeError
result = "2" + 2  # Trying to add a string and an integer

TypeError: can only concatenate str (not "int") to str

**Explanation:**

You cannot add a string and an integer directly, so Python raises a TypeError.

3. ValueError:

- What It Is: Raised when a function receives an argument of the correct type but an inappropriate value.

- Example:

In [42]:
# Example of a ValueError
number = int("abc")  # Trying to convert a non-numeric string to an integer

ValueError: invalid literal for int() with base 10: 'abc'

**Explanation:**

The string "abc" cannot be converted to an integer, resulting in a ValueError.

3. IndexError:

- What It Is: Raised when you try to access an index that is out of the range of a list or another sequence.

- Example:

In [43]:
# Example of an IndexError
my_list = [1, 2, 3]
print(my_list[5])  # Trying to access an index that doesn't exist

IndexError: list index out of range

**Explanation:**

The list my_list has only 3 elements, so trying to access the 6th element (index 5) raises an IndexError.

4. KeyError:

- What It Is: Raised when trying to access a dictionary key that doesn't exist.

- Example:

In [44]:
# Example of a KeyError
my_dict = {"name": "Alice", "age": 25}
print(my_dict["gender"])  # Key "gender" doesn't exist in the dictionary

KeyError: 'gender'

**Explanation:**

Since "gender" is not a key in my_dict, accessing it raises a KeyError.

**2. Using Try/Except Blocks**

**Overview:**

The try and except blocks in Python allow you to handle exceptions gracefully. Instead of crashing your program, you can catch the error and decide what to do next.

**Syntax:**

try:
    # Code that might raise an exception
except ExceptionType:
    # Code that runs if the exception occurs


**Example:** Handling a ValueError

In [45]:
# Example: Handling a ValueError
try:
    number = int("abc")
except ValueError:
    print("A ValueError occurred: Cannot convert 'abc' to an integer.")

A ValueError occurred: Cannot convert 'abc' to an integer.


**Explanation:**

- The try block contains code that might raise an exception.
- If a ValueError occurs, the except block is executed, printing a user-friendly error message.

**Handling Multiple Exceptions:**

You can handle different exceptions using multiple except blocks.

In [46]:
# Example: Handling multiple exceptions
try:
    my_list = [1, 2, 3]
    print(my_list[5])  # This will raise an IndexError
    number = int("abc")  # This will raise a ValueError
except IndexError:
    print("An IndexError occurred: List index out of range.")
except ValueError:
    print("A ValueError occurred: Cannot convert to an integer.")

An IndexError occurred: List index out of range.


**Explanation:**

The code tries to access an out-of-range index in a list, raising an IndexError, which is caught by the first except block.

**Using finally and else Blocks:**

- finally Block: This block runs no matter what, whether an exception occurs or not.
- else Block: This block runs only if no exceptions were raised.

In [47]:
# Example: Using try/except with finally and else
try:
    number = int("123")
except ValueError:
    print("A ValueError occurred.")
else:
    print("Conversion successful:", number)
finally:
    print("This runs no matter what.")

Conversion successful: 123
This runs no matter what.


**Explanation:**

The else block runs because no exception occurs, and the finally block runs regardless.

**3. Debugging Techniques**

**Overview:**

Debugging is the process of identifying and fixing errors in your code. Here are some techniques to help you debug more effectively.

**a) Using Print Statements**

One of the simplest debugging techniques is to insert print statements at key points in your code to check the values of variables and the flow of execution.

In [48]:
# Example: Using print statements for debugging
def calculate_gc_content(sequence):
    print("Sequence:", sequence)
    g_count = sequence.count('G')
    c_count = sequence.count('C')
    gc_content = (g_count + c_count) / len(sequence) * 100
    print("GC Content:", gc_content)
    return gc_content

sequence = "ATGCGCGATCGTACGCTAGC"
calculate_gc_content(sequence)

Sequence: ATGCGCGATCGTACGCTAGC
GC Content: 60.0


60.0

**Explanation:**

Print statements help you understand what the code is doing at each step and verify if it’s working as expected.

**b) Using the pdb Module**

Python’s built-in pdb module is a powerful tool for debugging. It allows you to step through your code, inspect variables, and understand the flow of execution.

In [49]:
# Example: Using pdb for debugging
import pdb

def calculate_gc_content(sequence):
    pdb.set_trace()  # Set a breakpoint here
    g_count = sequence.count('G')
    c_count = sequence.count('C')
    gc_content = (g_count + c_count) / len(sequence) * 100
    return gc_content

sequence = "ATGCGCGATCGTACGCTAGC"
calculate_gc_content(sequence)

> [0;32m/tmp/ipython-input-2485670083.py[0m(6)[0;36mcalculate_gc_content[0;34m()[0m
[0;32m      4 [0;31m[0;32mdef[0m [0mcalculate_gc_content[0m[0;34m([0m[0msequence[0m[0;34m)[0m[0;34m:[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m      5 [0;31m    [0mpdb[0m[0;34m.[0m[0mset_trace[0m[0;34m([0m[0;34m)[0m  [0;31m# Set a breakpoint here[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m----> 6 [0;31m    [0mg_count[0m [0;34m=[0m [0msequence[0m[0;34m.[0m[0mcount[0m[0;34m([0m[0;34m'G'[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m      7 [0;31m    [0mc_count[0m [0;34m=[0m [0msequence[0m[0;34m.[0m[0mcount[0m[0;34m([0m[0;34m'C'[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0m[0;32m      8 [0;31m    [0mgc_content[0m [0;34m=[0m [0;34m([0m[0mg_count[0m [0;34m+[0m [0mc_count[0m[0;34m)[0m [0;34m/[0m [0mlen[0m[0;34m([0m[0msequence[0m[0;34m)[0m [0;34m*[0m [0;36m100[0m[0;34m[0m[0;34m[0m[0m
[0m
ipdb> c


60.0

**How to Use:**

- Run the code, and when it hits pdb.set_trace(), you’ll enter the interactive debugger.
- Commands like n (next), c (continue), q (quit), and p variable_name (print variable) help you step through the code and inspect variables.

**c) Debugging Tools in Google Colab**

Google Colab offers built-in debugging tools that allow you to set breakpoints, step through code, and inspect variables visually.

**Steps to Use Colab Debugger:**

- Click on the line number where you want to set a breakpoint.
- Run the cell.
- The debugger will pause at the breakpoint, and you can inspect variables, step through the code, or continue execution.

**4. Why It’s Important**

**Overview:**

As students move into more complex programming tasks, the ability to troubleshoot and fix errors is crucial. Understanding how to manage errors will make them more confident and efficient programmers.

**Key Points:**

- Preventing Program Crashes: Proper error handling prevents your program from crashing unexpectedly and allows it to handle issues gracefully.
- Building Robust Code: By anticipating and handling potential errors, you can create more reliable and user-friendly programs.
- Saving Time: Effective debugging techniques can save hours of frustration by helping you quickly identify and fix problems in your code.

**Exercise:** Practice Error Handling and Debugging

Task: Write a Python script that:

1. Prompts the user to enter a DNA sequence.
2. Attempts to calculate the GC content of the sequence.
3. Handles any potential errors (e.g., the user enters an empty string or a sequence with invalid characters).
4. Uses print statements or pdb to debug if necessary.

In [51]:
def calculate_gc_content(sequence):
    if not sequence:
        raise ValueError("The sequence is empty.")
    if not set(sequence).issubset({"A", "T", "G", "C"}):
        raise ValueError("The sequence contains invalid characters.")

    g_count = sequence.count('G')
    c_count = sequence.count('C')
    gc_content = (g_count + c_count) / len(sequence) * 100
    return gc_content

try:
    sequence = input("Enter a DNA sequence: ").upper()
    gc_content = calculate_gc_content(sequence)
    print(f"GC Content: {gc_content:.2f}%")
except ValueError as e:
    print(f"Error: {e}")
finally:
    print("Thank you for using the GC content calculator.")

Enter a DNA sequence: 
Error: The sequence is empty.
Thank you for using the GC content calculator.


**Explanation:**

This script handles potential errors (like an empty sequence or invalid characters) and provides user feedback. The use of try/except ensures the program runs smoothly even if the user input is incorrect.

**Conclusion:**

This session plan covers a broader range of list, tuple, and dictionary operations, including advanced techniques and complex data structures. The additional examples and exercises provide more opportunities for hands-on practice and deeper understanding.

By the end of this session, you should have a strong grasp of how to effectively use Python’s core data structures to store, manipulate, and analyze biological data, setting a solid foundation for more advanced bioinformatics work.

This Colab session should you understand the importance of error handling and debugging in Python. By practicing these concepts, you will become more confident in writing robust and error-free code, which is essential for any bioinformatics work.