# Lecture 3: Dictionaries - Storing Key-Value Pairs
## Python for Biology
**Learning Objectives:**
- Understand what dictionaries are and when to use them
- Create and access dictionary values
- Loop through dictionaries
- Use dictionaries to translate DNA sequences (genetic code)

---

## Section 1: What is a Dictionary?

### Guided Example 1.1: Creating a simple dictionary

A dictionary stores pairs of information: a **key** and a **value**

In [None]:
# A dictionary mapping nucleotides to their full names
nucleotide_names = {
    'A': 'Adenine',
    'T': 'Thymine',
    'C': 'Cytosine',
    'G': 'Guanine'
}

print(nucleotide_names)

**What's happening here?**
- Dictionaries use curly braces `{}`
- Each entry has a **key** (like `'A'`) and a **value** (like `'Adenine'`)
- We write them as `key: value`
- Separate entries with commas

### Guided Example 1.2: Accessing values

We use the key to look up the value

In [None]:
nucleotide_names = {
    'A': 'Adenine',
    'T': 'Thymine',
    'C': 'Cytosine',
    'G': 'Guanine'
}

# Look up what 'A' stands for
name = nucleotide_names['A']
print(f"A stands for: {name}")

# Look up 'G'
print(f"G stands for: {nucleotide_names['G']}")

**What's happening here?**
- Use square brackets with the key: `dictionary[key]`
- This gives us the value for that key
- Like looking up a word in a real dictionary!

### Practice Example 1.1: Create and access a dictionary

Create a dictionary and look up values

In [None]:
# Dictionary mapping blood types to their frequency
blood_type_frequency = {
    'O+': 37,
    'A+': 36,
    'B+': 8,
    'AB+': 3
}

# YOUR CODE HERE: 
# Print the frequency of blood type 'A+'
# Print the frequency of blood type 'O+'

---

## Section 2: Adding and Modifying Entries

### Guided Example 2.1: Adding new entries

We can add new key-value pairs to a dictionary

In [None]:
# Start with a small dictionary
amino_acid_codes = {
    'Ala': 'A',
    'Cys': 'C'
}

print("Before:", amino_acid_codes)

# Add new entries
amino_acid_codes['Asp'] = 'D'
amino_acid_codes['Glu'] = 'E'

print("After:", amino_acid_codes)

**What's happening here?**
- To add: `dictionary[new_key] = new_value`
- If the key doesn't exist, it creates a new entry
- The dictionary grows as we add more entries

### Practice Example 2.1: Add entries

Add two more amino acids to the dictionary

In [None]:
amino_acid_codes = {
    'Ala': 'A',
    'Cys': 'C'
}

# YOUR CODE HERE:
# Add 'Phe' with code 'F'
# Add 'Gly' with code 'G'
# Print the dictionary

### Guided Example 2.2: Checking if a key exists

Before accessing a key, we can check if it exists

In [None]:
nucleotide_names = {
    'A': 'Adenine',
    'T': 'Thymine',
    'C': 'Cytosine',
    'G': 'Guanine'
}

# Check if 'A' is in the dictionary
if 'A' in nucleotide_names:
    print(f"A = {nucleotide_names['A']}")

# Check if 'U' is in the dictionary (it's not - U is RNA!)
if 'U' in nucleotide_names:
    print(f"U = {nucleotide_names['U']}")
else:
    print("U not found (it's an RNA nucleotide!)")

**What's happening here?**
- Use `if key in dictionary:` to check
- This prevents errors if the key doesn't exist
- Very useful when processing data

### Practice Example 2.2: Check for keys

Check if specific enzymes are in the dictionary

In [None]:
enzyme_functions = {
    'Helicase': 'Unwinds DNA',
    'Polymerase': 'Synthesizes DNA',
    'Ligase': 'Joins DNA fragments'
}

# YOUR CODE HERE:
# Check if 'Polymerase' is in the dictionary and print its function
# Check if 'Nuclease' is in the dictionary and print an appropriate message

---

## Section 3: Looping Through Dictionaries

### Guided Example 3.1: Looping through keys

We can iterate through all the keys in a dictionary

In [None]:
nucleotide_names = {
    'A': 'Adenine',
    'T': 'Thymine',
    'C': 'Cytosine',
    'G': 'Guanine'
}

print("Nucleotides:")
for nucleotide in nucleotide_names:
    print(f"  {nucleotide}")

**What's happening here?**
- Looping through a dictionary gives us the **keys**
- We can then use the key to access the value if needed

### Guided Example 3.2: Looping through keys and values

Usually we want both the key AND the value

In [None]:
nucleotide_names = {
    'A': 'Adenine',
    'T': 'Thymine',
    'C': 'Cytosine',
    'G': 'Guanine'
}

print("Nucleotide names:")
for nucleotide, name in nucleotide_names.items():
    print(f"  {nucleotide} = {name}")

**What's new here?**
- `.items()` gives us both key and value
- We use two variables: one for the key, one for the value
- Just like `enumerate()` gave us index and item!

### Practice Example 3.1: Loop through dictionary

Print all vitamins and their functions

In [None]:
vitamin_functions = {
    'Vitamin A': 'Vision',
    'Vitamin C': 'Immune system',
    'Vitamin D': 'Bone health',
    'Vitamin E': 'Antioxidant'
}

# YOUR CODE HERE:
# Loop through and print each vitamin and its function
# Format: "Vitamin A: Vision"

---

## Section 4: Using Dictionaries with DNA Sequences

### Guided Example 4.1: Nucleotide complements

Let's use a dictionary to store complement rules

In [None]:
# Dictionary of complement pairs
complement_dict = {
    'A': 'T',
    'T': 'A',
    'C': 'G',
    'G': 'C'
}

dna = "ATCG"
complement = ""

# Use dictionary to build complement
for nucleotide in dna:
    complement += complement_dict[nucleotide]

print(f"Original:   {dna}")
print(f"Complement: {complement}")

**What's happening here?**
- The dictionary stores the pairing rules
- We look up each nucleotide to get its complement
- Much cleaner than using if/elif statements!

### Practice Example 4.1: Create complement using dictionary

Use the complement dictionary to create the complement of this sequence

In [None]:
complement_dict = {
    'A': 'T',
    'T': 'A',
    'C': 'G',
    'G': 'C'
}

dna = "GCGCATAT"

# YOUR CODE HERE: create and print the complement

### Guided Example 4.2: Simple genetic code

The genetic code maps codons (3 nucleotides) to amino acids. Let's use a dictionary!

In [None]:
# A small genetic code table
genetic_code = {
    'ATG': 'M',  # Methionine (start codon)
    'GCA': 'A',  # Alanine
    'GCC': 'A',  # Alanine
    'TGC': 'C',  # Cysteine
    'GAT': 'D',  # Aspartate
    'GAC': 'D',  # Aspartate
    'TAA': '*',  # Stop
    'TAG': '*',  # Stop
    'TGA': '*'   # Stop
}

# Look up a single codon
codon = 'ATG'
amino_acid = genetic_code[codon]
print(f"{codon} codes for {amino_acid}")

**What's happening here?**
- Keys are codons (DNA triplets)
- Values are amino acid codes (single letters)
- '*' represents stop codons
- We can look up any codon instantly!

### Guided Example 4.3: Translating a sequence

Now let's translate a whole DNA sequence to amino acids

In [None]:
genetic_code = {
    'ATG': 'M',
    'GCA': 'A',
    'GCC': 'A',
    'TGC': 'C',
    'GAT': 'D',
    'GAC': 'D',
    'TAA': '*',
    'TAG': '*',
    'TGA': '*'
}

dna = "ATGGCATGCTAA"

# Split into codons and translate
protein = ""

for i in range(0, len(dna), 3):
    codon = dna[i:i+3]
    
    # Check if codon is in our genetic code
    if codon in genetic_code:
        amino_acid = genetic_code[codon]
        
        # Stop if we hit a stop codon
        if amino_acid == '*':
            break
        
        protein += amino_acid

print(f"DNA:     {dna}")
print(f"Protein: {protein}")

**What's new here?**
- We split the DNA into codons (3 at a time)
- Look up each codon in the genetic code dictionary
- Stop when we hit a stop codon ('*')
- Build up the protein sequence
- This is real translation!

### Practice Example 4.2: Translate a sequence

Translate this DNA sequence to amino acids

In [None]:
genetic_code = {
    'ATG': 'M',
    'GCA': 'A',
    'GCC': 'A',
    'TGC': 'C',
    'GAT': 'D',
    'GAC': 'D',
    'TAA': '*',
    'TAG': '*',
    'TGA': '*'
}

dna = "ATGGACTGCTAG"

# YOUR CODE HERE:
# Split into codons
# Translate each codon using the genetic code
# Stop at stop codon
# Print the protein sequence

### Guided Example 4.4: Counting amino acids in a protein

Let's count how many times each amino acid appears

In [None]:
protein = "MACDDA"

# Create empty dictionary to store counts
amino_acid_counts = {}

for amino_acid in protein:
    # If we haven't seen this amino acid yet, start counting at 0
    if amino_acid not in amino_acid_counts:
        amino_acid_counts[amino_acid] = 0
    
    # Add 1 to the count
    amino_acid_counts[amino_acid] += 1

print("Amino acid composition:")
for aa, count in amino_acid_counts.items():
    print(f"  {aa}: {count}")

**What's new here?**
- We create an empty dictionary to store our counts
- For each amino acid, we add it to the dictionary or increment its count
- This gives us the composition of the protein
- Very useful for analyzing sequences!

### Practice Example 4.3: Count nucleotides

Count how many times each nucleotide appears in this DNA sequence

In [None]:
dna = "ATCGATCGATCG"

# YOUR CODE HERE:
# Create an empty dictionary
# Count each nucleotide
# Print the counts

---

## Section 5: Practice Challenges

### Challenge 1: Reverse Genetic Code

Given an amino acid, find all codons that code for it

In [None]:
genetic_code = {
    'ATG': 'M',
    'GCA': 'A',
    'GCC': 'A',
    'GCG': 'A',
    'GCT': 'A',
    'TGC': 'C',
    'TGT': 'C'
}

target_amino_acid = 'A'

# YOUR CODE HERE:
# Find all codons that code for 'A' (Alanine)
# Hint: loop through the dictionary and check values
# Store matching codons in a list
# Print the list

### Challenge 2: Codon Usage

Count how many times each codon appears in a DNA sequence

In [None]:
dna = "ATGGGCATGGCAGGCTAA"

# YOUR CODE HERE:
# Split into codons
# Count how many times each codon appears
# Store in a dictionary
# Print the codon usage

### Challenge 3: DNA to Amino Acid Properties

Translate DNA and determine if the protein is hydrophobic or hydrophilic

In [None]:
genetic_code = {
    'ATG': 'M',
    'GCA': 'A',
    'TGC': 'C',
    'TAA': '*'
}

# Amino acid properties
hydrophobic = ['A', 'C', 'M']

dna = "ATGGCATGCTAA"

# YOUR CODE HERE:
# 1. Translate the DNA to protein
# 2. Count how many amino acids are hydrophobic
# 3. Calculate percentage: (hydrophobic_count / total_count) * 100
# 4. Print the result

---

## Summary

Congratulations! You've learned the basics of dictionaries in Python:

**Dictionary basics:**
- ✅ Creating dictionaries with key-value pairs
- ✅ Accessing values using keys
- ✅ Adding and modifying entries
- ✅ Checking if keys exist

**Looping:**
- ✅ Looping through keys
- ✅ Looping through keys and values with `.items()`

**Biological applications:**
- ✅ Using dictionaries for complement rules
- ✅ Using dictionaries for the genetic code
- ✅ Translating DNA to amino acids
- ✅ Counting composition (nucleotides, amino acids, codons)

Dictionaries are incredibly powerful for bioinformatics! They let you:
- Store lookup tables (like the genetic code)
- Count occurrences efficiently
- Map between different representations

**Next steps:** Learn about reading files and processing real biological data from databases!