# Lecture 2, Notebook 3: Dictionaries for Biological Data

**Learning Objectives:**
- Create dictionaries to store biological key-value pairs
- Access dictionary values using keys
- Use `.keys()`, `.values()`, and `.items()` methods
- Add and modify dictionary entries
- Use safe operations like `.get()` and `.pop()`
- Apply dictionaries to translate DNA codons to amino acids

## 1. Creating Your First Codon Dictionary

Dictionaries are perfect for storing biological relationships like codon → amino acid mappings.

In [1]:
# Create a small codon dictionary
codon_table = {
    'ATG': 'M',  # Methionine (Start codon)
    'TGG': 'W',  # Tryptophan
    'TTT': 'F',  # Phenylalanine
    'TAA': '*'   # Stop codon
}

print("Codon dictionary:")
print(codon_table)

Codon dictionary:
{'ATG': 'M', 'TGG': 'W', 'TTT': 'F', 'TAA': '*'}


**Exercise 1:** Create your own dictionary called `base_pairs` that maps DNA bases to their complements:
- A → T
- T → A  
- G → C
- C → G

In [None]:
# Your code here
base_pairs = {
    # Fill in the base pair mappings
}

print(base_pairs)

## 2. Accessing Dictionary Values

Use square brackets `[key]` to get specific values, or use `.keys()`, `.values()`, and `.items()` to get collections.

In [2]:
# Access individual values
print("ATG codes for:", codon_table['ATG'])
print("TTT codes for:", codon_table['TTT'])

print("\nAll codons:", codon_table.keys())
print("All amino acids:", codon_table.values())

print("\nCodon → Amino acid pairs:")
for codon, amino_acid in codon_table.items():
    print(f"{codon} → {amino_acid}")

ATG codes for: M
TTT codes for: F

All codons: dict_keys(['ATG', 'TGG', 'TTT', 'TAA'])
All amino acids: dict_values(['M', 'W', 'F', '*'])

Codon → Amino acid pairs:
ATG → M
TGG → W
TTT → F
TAA → *


**Exercise 2:** Using your `base_pairs` dictionary from above:
1. Print what 'A' pairs with
2. Print what 'G' pairs with
3. Loop through all base pairs and print them in the format "X pairs with Y"

In [None]:
# Your code here


## 3. Adding and Modifying Dictionary Entries

You can add new entries or change existing ones using the same `dict[key] = value` syntax.

In [None]:
# Start with our original codon table
codon_table = {
    'ATG': 'M',
    'TGG': 'W',
    'TTT': 'F',
    'TAA': '*'
}

print("Original:", codon_table)

# Add new codons
codon_table['AAA'] = 'K'  # Lysine
codon_table['CCC'] = 'P'  # Proline

print("After adding:", codon_table)

# Change existing value (for demonstration)
codon_table['ATG'] = 'Start'
print("After changing ATG:", codon_table)

**Exercise 3:** Create a dictionary called `nucleotide_info` that stores information about DNA bases:
1. Start with: `{'A': 'Adenine', 'T': 'Thymine'}`
2. Add entries for 'G' (Guanine) and 'C' (Cytosine)
3. Print the final dictionary

In [None]:
# Your code here
nucleotide_info = {'A': 'Adenine', 'T': 'Thymine'}

# Add G and C entries

print(nucleotide_info)

## 4. Safe Dictionary Operations

Use `.get()` to avoid errors when a key might not exist, and `.pop()` to remove entries safely.

In [None]:
# Reset our codon table
codon_table = {
    'ATG': 'M',
    'TGG': 'W',
    'TTT': 'F',
    'TAA': '*'
}

# Safe lookup with .get()
print("ATG codes for:", codon_table.get('ATG'))        # Found
print("XYZ codes for:", codon_table.get('XYZ'))        # Missing - returns None
print("XYZ codes for:", codon_table.get('XYZ', '?'))   # Missing - returns default

# Remove an entry with .pop()
removed = codon_table.pop('TAA')
print(f"\nRemoved {removed} from the table")
print("Updated table:", codon_table)

**Exercise 4:** Using the `codon_table` from above:
1. Use `.get()` to safely look up 'TTT' and 'GGG' 
2. For missing codons, return 'Unknown' as the default
3. Use `.pop()` to remove 'TGG' from the table and print what was removed

In [None]:
# Your code here


## 5. Practical Application: Analysing Plant Pollinator Relationships

Let's use our dictionary skills to analyse a database

In [12]:
plant_pollinators = {
    "sunflower": ["bee", "butterfly", "beetle"],
    "lavender": ["bee", "hummingbird"],
    "cherry_blossom": ["bee", "fly"],
    "orchid": ["butterfly", "moth", "hummingbird"],
    "dandelion": ["bee", "fly", "beetle", "ant"],
    "rose": ["bee", "butterfly"],
    "lily": ["butterfly", "moth", "bee"],
}


### question 1 How many plants and polinators are present?

To get the number of plants (dictionary keys) we can just use the len(method).
Counting the pollinators is a bit more tricky, 
because some pollinators will appear in several plants.

In [16]:
# Count pollinator diversity


all_pollinators = []
for pollinator_list in plant_pollinators.values():
    for pollinator in pollinator_list:
        if pollinator not in all_pollinators:
            all_pollinators.append(pollinator)

print(f"There are {len(plant_pollinators)} plants in the database")
print(f"There are {len(all_pollinators)} pollinators in the database")



There are 7 plants in the database
There are 7 pollinators in the database


### Pro-tip
Using sets can simplify this code! Sets are like lists but unordered and only allow single occurences of a specific value.
We can use this feature to filter out duplicates.


In [17]:
all_pollinators = set()
for pollinator_set in plant_pollinators.values():
    all_pollinators.update(pollinator_set) # update does not add duplicates

print(f"There are {len(plant_pollinators)} plants in the database")
print(f"There are {len(all_pollinators)} pollinators in the database")


There are 7 plants in the database
There are 7 pollinators in the database


### Question 2 How many plants does each pollinator visit? And which pollinator visits the most plants?

In [19]:
# Find most important pollinator (visits most plant types)
pollinator_visits = {}
for pollinator_list in plant_pollinators.values():
    for pollinator in pollinator_list:
        if pollinator in pollinator_visits:
            pollinator_visits[pollinator] += 1
        else:
            pollinator_visits[pollinator] = 1

print(pollinator_visits)

{'bee': 6, 'butterfly': 4, 'beetle': 2, 'hummingbird': 2, 'fly': 2, 'moth': 2, 'ant': 1}


getting the most visited pollinator is a bit more tricky, but again there are pro-tips for that!


In [20]:
key_pollinator = ""
max_visits = 0
for pollinator, visits in pollinator_visits.items():
    if visits > max_visits:
        max_visits = visits
        key_pollinator = pollinator
print(f"The most important pollinator is {key_pollinator} with {max_visits} visits")

The most important pollinator is bee with 6 visits


### pro-tip using max function with dictionaries

In [24]:
key_pollinator = max(pollinator_visits, key=pollinator_visits.get)
max_visits = max(pollinator_visits.values())
print(f"The most important pollinator is {key_pollinator} with {max_visits} visits")



The most important pollinator is bee with 6 visits


# 🧠 Dictionary Challenge: Neural Network Analysis
Your neuroscience lab has recorded data from different brain regions during various cognitive tasks. Use your dictionary skills to analyze this neural activity data!


In [23]:
# Neural activity data (spikes per second) during different cognitive tasks
brain_regions = {
    "prefrontal_cortex": {
        "memory_task": 45,
        "attention_task": 67,
        "decision_task": 52,
        "rest": 12,
    },
    "hippocampus": {
        "memory_task": 89,
        "spatial_task": 76,
        "rest": 8,
        "learning_task": 94,
    },
    "visual_cortex": {
        "visual_task": 134,
        "attention_task": 45,
        "rest": 15,
        "memory_task": 23,
    },
    "motor_cortex": {
        "movement_task": 98,
        "coordination_task": 87,
        "rest": 9,
        "decision_task": 34,
    },
}


## 🎯 Challenge Tasks
### Task 1: Task-Specific Activation (Beginner)
Write code to find:

Which brain region shows the highest activity during any single task?
Which brain region has the lowest resting activity?
What's the average neural activity across all regions during rest?

Hint: You'll need to loop through each brain region and examine the activity levels.
### Task 2: Task Distribution Analysis (Beginner-Intermediate)
Create a new dictionary that shows:

How many different brain regions are active during each task type
Which task activates the most brain regions?
Which tasks are only performed by one brain region?

### Task 3: Activation vs Baseline (Intermediate)
Using the baseline_activity dictionary, determine:

For each brain region, which tasks show activity ABOVE baseline?
Which brain region shows the biggest increase from baseline during its most active task?
Create a new dictionary showing the "activation ratio" (task activity ÷ baseline) for each region's highest task

### Task 4: Neurotransmitter Profile (Intermediate)
Analyze the neurotransmitter data:

Which neurotransmitter appears in the most brain regions?
Which brain region has the highest total neurotransmitter concentration?
Create a dictionary showing the average concentration of each neurotransmitter across all regions where it's found

Task 5: Specialization Index (Advanced)
Calculate a "specialization score" for each brain region:

Find the difference between the highest and lowest task activities for each region
A higher difference means more specialized (big difference between tasks)
Rank the brain regions from most specialized to least specialized

### Task 6: Cross-Reference Challenge (Advanced)
Combine multiple dictionaries to find:

Which brain region has both high dopamine concentration AND high decision-making activity?
Do regions with higher baseline activity also tend to have higher peak task activity?
Create a "neural profile" dictionary for each region combining: highest task, neurotransmitter count, and baseline activity

### 🔬 Bonus Challenge: Neural Circuit Mapping
Create a function called analyze_neural_circuit() that takes a task name as input and returns:

All brain regions active during that task
The total neural activity for that task across all regions
The most active region for that specific task

## Summary

In this notebook, you've learned:

✅ **Dictionary Creation**: Using `{key: value}` syntax for biological data  
✅ **Accessing Values**: Using `[key]`, `.keys()`, `.values()`, and `.items()`  
✅ **Modifying Dictionaries**: Adding and changing entries with `dict[key] = value`  
✅ **Safe Operations**: Using `.get()` and `.pop()` to handle missing keys  
✅ **Practical Application**: Translating DNA sequences using codon dictionaries, Analysing Pollinator/Plant relations, Analysing Neural activities. 



Dictionaries are essential for bioinformatics because they let you efficiently store and look up biological relationships - exactly what you need for analyzing sequences!