# Python Data Structures - Lists, Tuples, Sets, and Dictionaries
**Date:** [02/19/2025]
**Course:** Genomic Data Science (Python) 

---

## **1. Lists**
**Definition**
- Lists are **ordered collections** that can contain multiple data types.
- They are **mutable**, meaning elements can be changed.
- Lists are created using **Square brackets `[]`**.

In [60]:
# Creating a list
gene_expression = ["gene", 2.34e-05, 1.22e-03, 7.33e-08]
print(gene_expression)

# Accessing elements
print("Third element:", gene_expression[2])  #Index starts from 0
print("last element:", gene_expression[-1])  #Using negative index

#modifying an element
gene_expression[0]= "lif"  #changing "gene" to "lif"
print(gene_expression)

['gene', 2.34e-05, 0.00122, 7.33e-08]
Third element: 0.00122
last element: 7.33e-08
['lif', 2.34e-05, 0.00122, 7.33e-08]


# List Slicing:
- Extract a subset of a list.
- Uses the format `[start:end]` (end index is **exclusive**).

In [61]:
#Slicing a list
subset = gene_expression[-3:]  #Last 3 elements
print(subset)

#Creating a full copy of the list
copy_list = gene_expression[:]
print(copy_list)

[2.34e-05, 0.00122, 7.33e-08]
['lif', 2.34e-05, 0.00122, 7.33e-08]


#List Operations:
- **Concatenation (`+`)**: Combines two lists
- **Length (`len()`)**: Returns the number of elements

In [62]:
#Concatenating lists
more_genes = ["BRCA1", "BRCA2"]
combined_list = gene_expression + more_genes
print(combined_list)

#Length of a list
print("Length of gene_expression:", len(gene_expression))

['lif', 2.34e-05, 0.00122, 7.33e-08, 'BRCA1', 'BRCA2']
Length of gene_expression: 4


# List Methods:
- `.append()`: Adds an element to the end
- `.extend()`: Adds multiple elements
- `.pop()`: Removes and returns the last element
- `.sort()`: Sorts the list

In [63]:
#Using list methods
gene_expression.append("new_gene")  #Adds a new element
print(gene_expression)

gene_expression.extend(["extra1", "extra2"])  #Extends with multiple elements
print(gene_expression)

gene_expression.pop()  #Removes the last element
print(gene_expression)

numbers = [3, 1, 4, 1, 5]
numbers.sort()  #Sorts the list
print(numbers)

['lif', 2.34e-05, 0.00122, 7.33e-08, 'new_gene']
['lif', 2.34e-05, 0.00122, 7.33e-08, 'new_gene', 'extra1', 'extra2']
['lif', 2.34e-05, 0.00122, 7.33e-08, 'new_gene', 'extra1']
[1, 1, 3, 4, 5]


## **2. Tuples**

**Definition**:
- Like lists, but **immutable** (cannot be changed)
- Defined using **commas** (with or without parentheses)

In [64]:
#Creating a tuple
t = 10, 20, 30
print(t)

#Creating a tuple with parentheses (same result)
t = (10, 20, 30)
print(t)

#Trying to modify a tuple will raise an error
#t(0)=100  #uncomment to see the error

(10, 20, 30)
(10, 20, 30)


## **3. Sets** 

**Definition**
- Unordered collection of **unique** elements.
- **No duplicate values, no indexing**.
- Useful for **removing duplicates**. 

In [65]:
#Creating a set (duplicates are removed)
brca1_go_terms = {"DNA repair", "cell cycle regulation", "DNA repair"}
print(brca1_go_terms)  #"DNA repair" appears only once

#Adding an element to a set
brca1_go_terms.add("protein binding")
print(brca1_go_terms)

#Removing an element
brca1_go_terms.remove("cell cycle regulation")
print(brca1_go_terms)

{'cell cycle regulation', 'DNA repair'}
{'cell cycle regulation', 'protein binding', 'DNA repair'}
{'protein binding', 'DNA repair'}


# Set Operation:
- **Union (`|`)**: Combines two sets
- **Intersection (`&`)**: Finds common elements
- **Difference (`-`)**: Finds elements in one set but not the other 

In [66]:
# BRCA1 and BRCA2 gene annotation sets
brca2_go_terms = {"DNA repair", "protein binding", "transcription regulation"}

#Union: All unique terms from both genes
print(brca1_go_terms | brca2_go_terms)

#Intersection: Terms common to both genes 
print(brca1_go_terms & brca2_go_terms)

# Difference: Terms in BRCA2 but not in BRCA1
print(brca2_go_terms - brca1_go_terms)

{'transcription regulation', 'protein binding', 'DNA repair'}
{'protein binding', 'DNA repair'}
{'transcription regulation'}


## **4. Dictionaries**

**Definition**:
- Stores **key-value pairs** (like a real dictionary)
- Keys must be **unique and immutable** (e.g., strings, numbers).
- Values can be **any data type**.

In [67]:
#Creating a dicstionary 
TF_motif = {"ATF": "TGAC GTCA", "c-Myc": "CACGTG", "SP1": "GGGCGG"}
print(TF_motif)

#Accessing a value using its key
print("ATF motif:", TF_motif["ATF"] )

#Checking if a key exists
print("NF-1" in TF_motif)  #False (not in dictionary)

{'ATF': 'TGAC GTCA', 'c-Myc': 'CACGTG', 'SP1': 'GGGCGG'}
ATF motif: TGAC GTCA
False


# Modifying a Dictionary:
- **Adding a new key-value pair**: `TF_motif["AP-1"] = "TGACTCA"`
- **Updating a value**: `TF_motif["SP1"] = "GGGCCC"`
- **Removing a key**: `del TF_motif["SP1"]`

In [68]:
#Adding a new transcription factor
TF_motif["AP-1"] = "TGACTCA"
print(TF_motif)

#Modifying an existing entry 
TF_motif["SP1"] = "GGGCCC"
print(TF_motif)

#Deleting a key-value pair
del TF_motif["SP1"]
print(TF_motif)

{'ATF': 'TGAC GTCA', 'c-Myc': 'CACGTG', 'SP1': 'GGGCGG', 'AP-1': 'TGACTCA'}
{'ATF': 'TGAC GTCA', 'c-Myc': 'CACGTG', 'SP1': 'GGGCCC', 'AP-1': 'TGACTCA'}
{'ATF': 'TGAC GTCA', 'c-Myc': 'CACGTG', 'AP-1': 'TGACTCA'}


## **5. Dictionary Methods**
- `.keys()`: Get all keys
- `.values()`: Get all value 
- `.update()`: Add multiple key-value pairs
- `len(dictionary)`: Get number of key-value pairs

In [69]:
#Get all keys
print(list(TF_motif.keys()))

#Get all values
print(list(TF_motif.values()))

#Update dictionary with new key-value pairs
TF_motif.update({"EGR1": "GCGTGGGCG", "NF-kB" : "GGGRNNYYCC"})
print(TF_motif)

#Get dictionary size
print("Number of motifs:", len(TF_motif))

['ATF', 'c-Myc', 'AP-1']
['TGAC GTCA', 'CACGTG', 'TGACTCA']
{'ATF': 'TGAC GTCA', 'c-Myc': 'CACGTG', 'AP-1': 'TGACTCA', 'EGR1': 'GCGTGGGCG', 'NF-kB': 'GGGRNNYYCC'}
Number of motifs: 5


## Summary and Reflection
- **Lists**: Ordered, mutable collections with powerful methods.
- **Tuples**: Like lists, but immutable.
- **Sets**: Unordered, unique elements; useful for operations like union and intersection.
- **Dictionaries**: Key-value pairs for fast lookups.

**Questions for further exploration**
- When should we use tuples instead of lists?