# A Primer on Python Data Types

Python offers several built-in data types that are useful for different purposes. Here, we compare and contrast some of the most commonly used data types: lists, tuples, dictionariend NumPy arra, and pandas DataFramesyes to phone numbers.

## NumPy Arrays
- **Mutable**: Elements can be modified, but the array's size is fixed.
- **Ordered**: Elements are stored in a specific order.
- **Syntax**: Created using `numpy.array()` function.
- **Use Case**: Ideal for numerical operations, especially in scientific computing, due to its efficiency and the availability of vectorized operations.

## Key Differences
- **Mutability**: Lists and dictionaries are mutable, while tuples are immutable. NumPy arrays are mutable but have a fixed size.
- **Ordering**: Lists, tuples, and NumPy arrays are ordered, meaning the order of elements is preserved. Dictionaries are unordered until Python 3.7, after which they are ordered.
- **Performance**: NumPy arrays provide better performance for numerical operations compared to lists due to optimized C code under the hood.
- **Functionality**: Dictionaries offer a unique key-value pair structure, making them suitable for different use cases 
T lists, tuples, and ar the choice of data type in Python largely depends on the specific requirements of the application, such as whether you need ordered/unordered data, mutable/immutable structures, orefficient numerical computations.


## Lists
- **Mutable**: Lists can be modified after creation (add, remove, or change items).
- **Ordered**: The order of items is maintained, and items can be accessed by their position.
- **Syntax**: Created using square brackets `[]`.
- **Use Case**: Ideal for collections of items where the order matters and contents might change.

In [1]:
# Lists: Dynamic arrays, useful for storing collections of items.
# --------------------------------------------------------------
# Creating a list of common enzymes
enzymes = ["Ligase", "Helicase", "Polymerase", "Nuclease"]
print("List of Enzymes:", enzymes)

# Adding an enzyme to the list
enzymes.append("Transferase")
print("Updated List of Enzymes:", enzymes)

# Accessing a specific enzyme by index
print("Second Enzyme in the List:", enzymes[1])

List of Enzymes: ['Ligase', 'Helicase', 'Polymerase', 'Nuclease']
Updated List of Enzymes: ['Ligase', 'Helicase', 'Polymerase', 'Nuclease', 'Transferase']
Second Enzyme in the List: Helicase


### Some Useful Functions for Using Lists
There are two types of syntax to consider: 
- **The function takes an item from a list as input**: In this case you should write something like `my_list.function()`. Note, these functions change the list in place. 
- **The function taks a list (iteratable) as an input**: In this case you should write something like `function(my_list)`. Note, it may be preferable to create a new list to make these modifications. 

1. **Appending and Extending:**
   - `append(item)`: Adds an item to the end of the list.
   - `extend(iterable)`: Extends the list by appending all the items from the iterable (e.g., another list).

2. **Inserting and Removing Elements:**
   - `insert(index, item)`: Inserts an item at a specified index.
   - `remove(item)`: Removes the first occurrence of an item.
   - `pop([index])`: Removes and returns an item at a given index (or the last item if index is not specified).

3. **Finding Elements:**
   - `index(item)`: Returns the index of the first occurrence of an item.
   - `count(item)`: Returns the number of times an item appears in the list.

4. **Sorting:**
   - `sorted(iteratable)`: Sorts the items of the list.

5. **Copying:**
   - `copy(iteratable)`: Returns a shallow copy of the list.

6. **Clearing:**
   - `clear(iteratable)`: Removes all items from the list.

7. **List Comprehensions:**
   - A concise way to create lists. Uses `for` or `if` statements to iteratively create a list. 
   - `my_list = [x**2 for x in range(1,6)]` creates a list called my_list and fills it iteratively with the squares of 1-6.
   - `my_list = [x**2 for x in range(1, 6) if x % 2 == 0]` creates a list called my_list and fills it with only the even squares of 1-6. The conditional: `x % 2 == 0`, is a modulo function. A modulo function takes a number and a divisor and returns the remainder. Here, it means: `if the remainder of the list number divided by 2 is equal to 0` it must be even, so we keep it in the list.  

8. **Slicing:**
   - Used for accessing a subset of list elements. Syntax: `list[start:stop:step]`. Where step is the interval size between each list element. 

9. **Conversion to/from Other Data Types:**
   - `list(iterable)`: Converts an iterable (like a tuple, string, set, or dictionary) to a list.
   - `str.join(iterable)`: Concatenates a list of strings into a single string, with elements separated by the specified separator.

10. **Length:**
    - `len(list)`: Returns the number of items in the list.ns the number of items in the list.


In [24]:
# A few examples
my_list = [2,3,5,3,1,4,6,2,8] # Create the list
print(my_list) # print the list

print(len(my_list)) #print the length of the list

print(my_list.count(2)) # return the number of times '2' appears in the list

print(my_list.index(5)) # returns the index of 5; Note python starts counting at 0

sorted_list = sorted(my_list, reverse = False) # sort the list; Also try this: sorted(my_list, reverse = True)
print(sorted_list)

[2, 3, 5, 3, 1, 4, 6, 2, 8]
9
2
2
[1, 2, 2, 3, 3, 4, 5, 6, 8]


### Exercise 1: Lists
- **Task 1**: Create a list, and try some of these functions out
- **Task 2**: Create a list using a list comprehension.

In [8]:
# Your Answers Here; Create Additional Cells as Needed

## Tuples
- **Immutable**: Once a tuple is created, it cannot be modified.
- **Ordered**: Like lists, tuples maintain the order of items.
- **Syntax**: Created using parentheses `()`.
- **Use Case**: Suitable for fixed data sets, like coordinates or RGB color values.
- We won't focus too much on these here (b/c they're not very exciting), but wanted you to know about them.

## Dictionaries
- **Mutable**: Can change, add, or delete key-value pairs.
- **Unordered**: Items are not stored in a specific order and are accessed via keys.
- **Syntax**: Created using curly braces `{}` with key-value pairs.
- **Use Case**: Perfect for associating keys with values, like mapping names to phone numbers.

In [39]:
# Dictionaries: Key-value pairs, great for mapping relationships.
# ----------------------------------------------------------------
# Creating a dictionary to map enzymes to their functions
enzyme_functions = {
    "Ligase": "Joining of DNA strands",
    "Helicase": "Unwinding DNA helix",
    "Polymerase": "Polymerizing nucleotides",
    "Nuclease": "Cutting DNA strands"
}
print("Enzyme Functions:", enzyme_functions)

# Accessing a function by enzyme name
print("Function of Helicase:", enzyme_functions["Helicase"])

# Adding a new key-value pair
enzyme_functions["Transferase"] = "Transfer of functional groups"
print("Updated Enzyme Functions:", enzyme_functions)

Enzyme Functions: {'Ligase': 'Joining of DNA strands', 'Helicase': 'Unwinding DNA helix', 'Polymerase': 'Polymerizing nucleotides', 'Nuclease': 'Cutting DNA strands'}
Function of Helicase: Unwinding DNA helix
Updated Enzyme Functions: {'Ligase': 'Joining of DNA strands', 'Helicase': 'Unwinding DNA helix', 'Polymerase': 'Polymerizing nucleotides', 'Nuclease': 'Cutting DNA strands', 'Transferase': 'Transfer of functional groups'}
{'Alanine': 'amino acid', 'Tyrosine': 'amino acid', 'Methionine': 'amino acid'}
{'Adenosine': 'nucleic acid', 'Cytosine': 'nucleic acid', 'Guanosine': 'nucleic acid'}
{'Alanine': 'amino acid', 'Tyrosine': 'amino acid', 'Methionine': 'amino acid', 'Adenosine': 'nucleic acid', 'Cytosine': 'nucleic acid', 'Guanosine': 'nucleic acid'}
{'alanine': 'amino acid', 'adenosine': 'nucleic acid', 'pyruvate': 'keto acid'}


## Some Useful Functions for Using Dictionaries

1. **Creating a Dictionary:**
   - `{}` or `dict()`: Create an empty dictionary.
   - `dict.fromkeys(sequence, value)`: Create a new dictionary with keys from `sequence` and values set to `value`.

2. **Accessing Elements:**
   - `dict[key]`: Access the item with key `key`. Raises a `KeyError` if the key is not found.
   - `get(key, default=None)`: Returns the value for `key` if `key` is in the dictionary, else `default`.

3. **Adding and Updating Elements:**
   - `dict[key] = value`: Sets `dict[key]` to `value`, overwriting any existing value.
   - `update([other])`: Updates the dictionary with the key/value pairs from `other`, overwriting existing keys.

4. **Removing Elements:**
   - `pop(key[, default])`: Remove the item with key `key` and return its value, or `default` if `key` is not found.
   - `popitem()`: Removes and returns a `(key, value)` pair as a 2-tuple.
   - `del dict[key]`: Removes `dict[key]` from the dictionary.
   - `clear()`: Removes all items from the dictionary.

5. **Keys, Values, and Items:**
   - `keys()`: Returns a new view of the dictionary's keys.
   - `values()`: Returns a new view of the dictionary's values.
   - `items()`: Returns a new view of the dictionary’s items (`(key, value)` pairs).

6. **Copying:**
   - `copy()`: Returns a shallow copy of the dictionary.

7. **Merging Dictionaries (Python 3.5+):**
   - `{**d1, **d2}`: Creates a new dictionary by merging `d1` and `d2`.

8. **Dictionary Comprehensions:**
   - `{key: value for (key, value) in iterable}`: Similar to list comprehensions, but for dictionaries.
   - `zip()`: Commonly used in combination with dictionary comprehensions to create dictionaries when you have separate lists of keys and values.
    - Example: `my_dict = dict(zip(list_of_keys, list_of_values))`

9. **Length:**
   - `len(dict)`: Returns the number of items in the dictionary.

10. **Membership Test:**
    - `key in dict`: Returns `True` if `dict` has a key `key`, else `False`.

11. **Nested Dictionaries:**
    - Used for storing hierarchical or structured data.

12. **Sorting:**
    - `sorted(dict)`: Returns a sorted list of the dictionary's keys.)`rns a sorted list of the dictionary's keys.


In [42]:
# Some useful examples

# Create a dictionary where the keys all have the same value.
keys = ['Alanine', 'Tyrosine', 'Methionine']
values = 'amino acid'
my_dict = dict.fromkeys(keys, values)
print(my_dict)

# Create another dictionary where the keys all have the same value.
keys = ['Adenosine', 'Cytosine', 'Guanosine']
values = 'nucleic acid'
new_dict = dict.fromkeys(keys, values)
print(new_dict)

# Merge the two dictionaries into one
combined_dict = {**my_dict, **new_dict}
print(combined_dict)

# -----------------------------------------------------------------

# Creating a dictionary from lists of keys and values using a dictionary comprehension
keys = ['alanine', 'adenosine', 'pyruvate']
values = ['amino acid', 'nucleic acid', 'keto acid']

# Create a dictionary with different values for each key
created_dict = {key: value for key, value in zip(keys, values)}
print(created_dict) 

{'Alanine': 'amino acid', 'Tyrosine': 'amino acid', 'Methionine': 'amino acid'}
{'Adenosine': 'nucleic acid', 'Cytosine': 'nucleic acid', 'Guanosine': 'nucleic acid'}
{'Alanine': 'amino acid', 'Tyrosine': 'amino acid', 'Methionine': 'amino acid', 'Adenosine': 'nucleic acid', 'Cytosine': 'nucleic acid', 'Guanosine': 'nucleic acid'}
{'alanine': 'amino acid', 'adenosine': 'nucleic acid', 'pyruvate': 'keto acid'}


### Exercise 2: Dictionaries
- **Task 1**: Create two dictionaries, combine them into one, and print a specific key:value.
- **Task 2**: Create a dictionary from a list of keys and a list of values using a dictionary comprehension

In [36]:
# Your Answers Here

## NumPy Arrays
- **Mutable**: Elements can be modified, but the array's size is fixed.
- **Ordered**: Elements are stored in a specific order.
- **Syntax**: Created using `numpy.array()` function.
- **Use Case**: Ideal for numerical operations, especially in scientific computing, due to its efficiency and the availability of vectorized operations.
- **Note**: Data type must be the same for every element in the array

In [41]:
# NumPy Arrays: Efficient arrays for numerical data.
# import the library
import numpy as np
# Creating a NumPy array of pH values
ph_values = np.array([7.2, 7.4, 6.8, 7.0, 7.3])
print("pH Values:", ph_values)

# Performing calculations on the entire array
average_ph = np.mean(ph_values)
print("Average pH:", average_ph)

pH Values: [7.2 7.4 6.8 7.  7.3]
Average pH: 7.140000000000001


## Some Useful Functions for Using NumPy Arrays

NumPy is a fundamental package for scientific computing in Python. It provides a high-performance multidimensional array object and tools for working with these arrays.

1. **Creating Arrays:**
   - `np.array(list)`: Creates a NumPy array from a list or list of lists.
   - `np.zeros(shape)`: Creates an array filled with zeros.
   - `np.ones(shape)`: Creates an array filled with ones.
   - `np.arange(start, stop, step)`: Creates an array with values from `start` to `stop` with `step` increments.
   - `np.linspace(start, stop, num)`: Creates an array with `num` evenly spaced values over the specified interval.

2. **Array Attributes:**
   - `ndarray.shape`: Tuple of array dimensions.
   - `ndarray.size`: Number of elements in the array.
   - `ndarray.dtype`: Data type of the array elements.
   - `ndarray.ndim`: Number of array dimensions.

3. **Indexing and Slicing:**
   - `array[index]`: Accesses an element.
   - `array[start:stop:step]`: Slices the array.

4. **Reshaping and Transposing:**
   - `reshape(shape)`: Gives a new shape to an array without changing its data.
   - `transpose()`: Permute the dimensions of an array.

5. **Mathematical Operations:**
   - `np.add()`, `np.subtract()`, `np.multiply()`, `np.divide()`: Basic arithmetic operations.
   - `np.sqrt()`, `np.log()`, `np.exp()`: Square root, logarithm, and exponential.
   - `np.sin()`, `np.cos()`, `np.tan()`: Trigonometric functions.

6. **Aggregation Functions:**
   - `np.sum()`, `np.mean()`, `np.median()`: Summation, mean, and median.
   - `np.min()`, `np.max()`: Minimum and maximum.
   - `np.std()`: Standard deviation.

7. **Linear Algebra:**
   - `np.dot()`: Dot product of two arrays.
   - `np.linalg.inv()`: Inverse of a matrix.
   - `np.linalg.eig()`: Eigenvalues and eigenvectors of a matrix.

8. **Random Module:**
   - `np.random.rand()`: Random values in a given shape.
   - `np.random.randint()`: Random integers from a low to a high range.

9. **Comparisons:**
   - `np.equal()`, `np.greater()`, `np.less()`: Element-wise comparisons.

10. **Combining/Splitting:**
    - `np.concatenate()`, `np.vstack()`, `np.hstack()`: Combine arrays.
    - `np.split()`, `np.vsplit()`, `np.hsplit()`: Split arrays.

11. **Saving and Loading:**
    - `np.save()`, `np.load()`: Save and load arrays to and from disk.

Remember to import NumPy using `import numpy as np` to access these functions.


In [49]:
# Some code examples of numpy array uses

# Create a 1D array
my_list = [1,2,3,4,5]
array_1D = np.array(my_list)
print(array_1D)

# Do math operations on the array; This is a very useful property of arrays
print(array_1D * 2)

first_list = [1,2,3,4,5]
second_list = [5,4,3,2,1]
array_2D = np.array([first_list, second_list]) # notice the use of []
print(array_2D)


[1 2 3 4 5]
[ 2  4  6  8 10]
[[1 2 3 4 5]
 [5 4 3 2 1]]


## Code Examples

In [None]:
# Importing necessary libraries
import numpy as np

# Python Data Types in Biochemistry
# ----------------------------------



# Tuples: Immutable sequences, useful for fixed data sets.
# ---------------------------------------------------------
# Defining a tuple of nucleotide bases
nucleotides = ("Adenine", "Thymine", "Cytosine", "Guanine")
print("Nucleotide Bases:", nucleotides)

# Accessing elements in a tuple
print("First Nucleotide Base:", nucleotides[0]) # Notice that Python starts counting from 0..

# Tuples are immutable, so you can't change them after creation
# This is useful for data that shouldn't be modified

# Dictionaries: Key-value pairs, great for mapping relationships.
# ----------------------------------------------------------------
# Creating a dictionary to map enzymes to their functions
enzyme_functions = {
    "Ligase": "Joining of DNA strands",
    "Helicase": "Unwinding DNA helix",
    "Polymerase": "Polymerizing nucleotides",
    "Nuclease": "Cutting DNA strands"
}
print("Enzyme Functions:", enzyme_functions)

# Accessing a function by enzyme name
print("Function of Helicase:", enzyme_functions["Helicase"])

# Adding a new key-value pair
enzyme_functions["Transferase"] = "Transfer of functional groups"
print("Updated Enzyme Functions:", enzyme_functions)

# NumPy Arrays: Efficient arrays for numerical data.
# --------------------------------------------------
# Creating a NumPy array of pH values
ph_values = np.array([7.2, 7.4, 6.8, 7.0, 7.3])
print("pH Values:", ph_values)

# Performing calculations on the entire array
average_ph = np.mean(ph_values)
print("Average pH:", average_ph)

# Conclusion
# ----------
# This code block demonstrates the use of different data types in Python,
# such as lists, tuples, dictionaries, and NumPy arrays. Each data type serves a specific
# purpose and can be used to efficiently store and manipulate data relevant
# to biochemistry applications.

# Python Data Types Exercises

## Exercise 1: List Manipulation
- **Task**: Create a list of five different proteins found in the human body. Then, write a function to add a new protein to the list and print the updated list.
- **Hint**: Use the `append()` method to add items to a list.

## Exercise 2: Tuple Operations
- **Task**: Define a tuple containing the names of four different vitamins. Write a loop to iterate through the tuple and print each vitamin name.
- **Hint**: Remember, tuples are immutable, so you cannot modify them after creation.

## Exercise 3: Dictionary Handling
- **Task**: Create a dictionary mapping three amino acids to their respective molecular weights. Then, write a function that takes an amino acid name as input and returns its molecular weight.
- **Hint**: Use the amino acids 'Alanine', 'Cysteine', and 'Aspartic Acid' with arbitrary weights.

## Exercise 4: NumPy Array Calculations
- **Task**: Create a NumPy array of ten random enzyme activity values. Calculate and print the mean and standard deviation of these values.
- **Hint**: Use `np.random.rand()` to create random values and `np.mean()`, `np.std()` for calculations.

## Exercise 5: Advanced Data Structure Challenge
- **Task**: Combine the concepts of lists, tuples, and dictionaries. Create a dictionary where each key is an enzyme, and its value is a tuple containing the enzyme's pH optimum and a list of subst Include 3 enzymes in your data structure.lated tasks. Good luck!


In [12]:
# Your answers here; Create new cells as needed    

## Solutions

In [None]:
# Import necessary libraries
import numpy as np

# Exercise 1: List Manipulation
def add_protein(proteins, new_protein):
    proteins.append(new_protein)
    return proteins

proteins = ["Hemoglobin", "Insulin", "Keratin", "Collagen", "Myosin"]
new_protein = "Actin"
updated_proteins = add_protein(proteins, new_protein)
print("Updated Protein List:", updated_proteins)

# Exercise 2: Tuple Operations
vitamins = ("Vitamin A", "Vitamin B", "Vitamin C", "Vitamin D")
for vitamin in vitamins:
    print("Vitamin:", vitamin)

# Exercise 3: Dictionary Handling
amino_acid_weights = {
    "Alanine": 89.1,
    "Cysteine": 121.2,
    "Aspartic Acid": 133.1
}

def get_molecular_weight(amino_acid):
    return amino_acid_weights.get(amino_acid, "Unknown")

print("Molecular Weight of Alanine:", get_molecular_weight("Alanine"))

# Exercise 4: NumPy Array Calculations
enzyme_activities = np.random.rand(10)
print("Enzyme Activities:", enzyme_activities)
print("Mean Activity:", np.mean(enzyme_activities))
print("Standard Deviation:", np.std(enzyme_activities))

# Exercise 5: Advanced Data Structure Challenge
enzymes = {
    "Amylase": (6.8, ["Starch", "Glycogen"]),
    "Lipase": (7.0, ["Triglycerides"]),
    "Protease": (6.5, ["Proteins"])
}

for enzyme, (pH, substrates) in enzymes.items():
    print(f"Enzyme: {enzyme}, pH Optimum: {pH}, Substrates: {substrates}")

