
# Bioinformatics in Health Science Course - Python Fundamentals
# School of Medicine, University of Minho, Braga, Portugal

Welcome to our Python Fundamentals practical class! This session will cover several practical exercises to help you with the basics of Python programming. You will learn how to use:

1. Print statements and variables to output information to the console and store data for use in your programs.

2. Basic arithmetic operations to perform mathematical calculations in your code.

3. Conditional statements to control the flow of your program based on certain conditions.

4. Functions to organize and reuse code.

5. Lists and list manipulation to work with collections of data.

6. Dictionaries to store data in a key-value format.

7. For loops to repeat actions in your code. 

8. Importing modules to access pre-existing code and functionality.

Throughout the day, you will have the opportunity to apply your knowledge through hands-on coding exercises to build your skills and confidence as a Python programmer. We're here to help!


# 1. Print statements and variables

📝In Python, "print" is a built-in function that is used to output text or variables to the console. 
- You can use the print function to print strings, numbers, or variables. 
- You can also use print(x) to print the value of a variable x. 
- Variables in python are "empty containers" that used to store data, like numbers, strings, etc... You then use the print statement to print the contents of that variable.


In [None]:
genename1 = "DMD"
gene1description = "gene that codes for the protein dystrophin, which is involved in muscle function. The DMD gene is located on the X chromosome, and it is the largest known gene in the human genome, spanning about 2.3 million base pairs."
print(genename1, gene1description)

❓ Create new variables for the gene "BRCA" and use print to write a description of the gene. Similar to what was done above for "DMD"

In [None]:
#your code here

- You can also interpolate you variables inside a text, for a more complex output and print out the result. For this you precede the text with f' and then add the variables to be printed inside {} (don't forget to close the text by adding ' at the end):

In [None]:
print(f'Our lab studies {genename1}, which is a {gene1description}')

❓ Now try here adding your text for the BRCA gene variables you created above:

In [None]:
#your code here

🎁 Bonus : note that the text you create can also be stored in a variable:

In [None]:
general_gene_desc= f'Our lab studies {genename1}, which is a {gene1description}'
print (general_gene_desc)

# 2. Basic arithmetic operations

📝 In Python, basic arithmetic operations include addition (+), subtraction (-), multiplication (*), division (/), floor division (//), and modulus (%). 
- These operations can be performed on **numbers** (integers and floats) and can be used to perform mathematical calculations in your code. 
- Python supports the use of parentheses to control the order of operations and the use of shorthand operators such as +=, -=, *=, and /= to perform arithmetic operations and assign the result back to a variable in a single step.  

📌 These basic arithmetic operations are fundamental building blocks in any mathematical computation and are widely used in various programming fields, this is why a good understanding of these operations is crucial for any Python developer or advanced user.


In [None]:
DMD = 2.3  # Million base-pairs
BRCA = 0.1 # Million base-pairs
GENOME = 3000 # Million base-pairs

# Addition example: DMD + BRCA
# Subtraction example: DMD - BRCA
# Multiplication example: DMD * GENOME
# Division example: DMD / GENOME

# DMD represents what % of the human genome?
print("The DMD gene represents", DMD / GENOME * 100, "% of the human genome")

❓ What fraction of the human genome does BRCA represent?

In [None]:
#your code here

# 3. Conditional statements

📝 Conditional statements are used to control the flow of a program based on certain conditions. 
- They allow you to check if a certain condition is true or false, and execute different code depending on the result. 
- The most common conditional statements are the "if-elif-else" statements.

📌 Conditional statements are a fundamental concept in programming, they provide a way to control the flow of the program, and they are widely used in various programming fields.


In [None]:
if DMD > BRCA:
    print(f"DMD gene ({DMD} mb) is larger than BRCA ({BRCA} mb)")
elif DMD == BRCA:
    print(f"DMD gene ({DMD} mb) is equal in size to BRCA ({BRCA} mb)")
else:
    print(f"DMD gene ({DMD} mb) is smaller than BRCA ({BRCA} mb)")

# 4. Functions:

📝Functions are blocks of reusable code that can be called by name. 
- They are used to organize and structure your code, make it more readable, and promote code reusability. 
- In Python, A function is defined using the "def" keyword, followed by the function name, and a set of parentheses that may contain parameters. 
- The code block inside the function is indented, this block of code will be executed every time the function is called. Functions can also return a value using the "return" statement. 
- They can be used to encapsulate complex logic, and make it easier to test and debug your code.  

📌Functions are a fundamental concept in Python, and they are widely used in various programming fields.

In [None]:
def calculate_genome_percentage(gene_size):
    percentage = gene_size / GENOME * 100
    return percentage

# Example usage for DMD and BRCA genes

print(f"The DMD gene represents {calculate_genome_percentage(DMD)}% of the human genome")
print(f"The BRCA gene represents {calculate_genome_percentage(BRCA)}% of the human genome")


❓ What is the percentage of the genome is represented by an hypothetical gene with 90 million base pairs?

In [None]:
#your code here

🎁 Bonus : note that the numbers can be rounded to a chosen number of decimal places by using the round function and indicating the variable and number of decimals inside the brackets:

In [None]:
def calculate_genome_percentage(gene_size):
    percentage = gene_size / GENOME * 100
    return round(percentage, 3)

# Example usage for DMD and BRCA genes

print(f"The DMD gene represents {calculate_genome_percentage(DMD)}% of the human genome")
print(f"The BRCA gene represents {calculate_genome_percentage(BRCA)}% of the human genome")

# 5. Lists and list manipulation:

📝In Python, lists are a built-in data structure that allow you to store and organize collections of items. 
- Lists are defined using square brackets and items within the list are separated by commas. Lists can store any type of data, such as numbers, strings, and even other lists.
- List manipulation is the act of modifying or manipulating lists in various ways. This can include adding, removing, or updating items in a list, as well as sorting, reversing or slicing a list. 
- Python provides a variety of built-in methods and functions that can be used to manipulate lists. For example, the **"append()"** method can be used to add an item to the end of a list, the **"remove()"** method can be used to remove an item from a list, the **"sort()"** method can be used to sort the items in a list in ascending or descending order, and the **"slice"** operator can be used to select a specific part of the list.

📌List manipulation is an important concept in Python, as lists are widely used in various programming fields. Understanding how to manipulate lists is crucial for any Python advanced user as it allows you to organize and manage data in an efficient and flexible way. With the knowledge of list manipulation, you will be able to work with large sets of data and perform complex operations on them in a powerful and efficient way.



In [None]:
# create list
genes = ["DMD", "BRCA", "Hypothetical"]

# access elements
print(genes[0])

# modify elements
genes [2] = "IRF1"
print(genes)

# add elements
genes.append("Hypothetical")
print(genes)

# remove elements
genes.remove("Hypothetical")
print(genes)

# sort elements
genes.sort()
print(genes)

# 6. Dictionaries:

📝In Python, dictionaries are a built-in data structure that allow you to store and organize collections of items in a key-value format. 
- Dictionaries are defined using curly braces {} and items within the dictionary are separated by commas. 
- Each item in a dictionary is made up of a key-value pair, where the key is a unique identifier for the value. Dictionaries can store any type of data, such as numbers, strings, and even other dictionaries.

📌Dictionaries are very useful as they provide a way to store and retrieve data in a very efficient way. They are widely used in various programming fields. Python provides a variety of built-in methods and functions that can be used to manipulate dictionaries. With the knowledge of dictionaries, you will be able to work with large sets of data and perform complex operations on them, making your code more powerful and efficient.


In [None]:
# create dictionary
genesizes = {
  "DMD": 2.3,
  "BRCA": 0.1,
  "IRF1": 0.03
}
print(genesizes)

# access elements
print(genesizes["IRF1"])
print(genesizes.keys())
print(genesizes.values())
print(genesizes.get("IRF1"))

# modify elements
genesizes["IRF1"] = 0.0333
print(genesizes)

# add elements
genesizes["Hypothetical"] = 99
print(genesizes)

# remove elements
del genesizes["Hypothetical"]
print(genesizes)


# 7. Loops:

📝 Loops are used to execute a block of code repeatedly. The most common type of loop is the **"for"** loop: 
- A **"for"** loop is used to iterate over a sequence of items, such as a list or a range of numbers. The general syntax is "for variable in sequence:", where the variable takes on each value in the sequence, one at a time, and the code block following the "for" statement is executed for each value.

In [None]:
for gene, gene_size in genesizes.items():
    print(f"The {gene} gene represents {calculate_genome_percentage(gene_size)}% of the human genome.")


# 8. Importing modules:

📝 A module is a collection of code that can be imported and used in other code. 
- Importing modules allows you to access pre-existing code and functionality, which can save you time and effort when writing your own code. 
- Python has a large number of built-in modules that can be imported, as well as a vast number of third-party modules that can be installed using package managers such as pip.

To import a module, you use the "import" keyword, followed by the name of the module. Once a module is imported, you can access the functions and variables defined in the module by prefixing them with the name of the module. 

There are several popular collections of modules (libraries) in bioinformatics that are widely used in the field. Some of the most popular ones are:
- **Biopython**: a collection of modules for bioinformatics, including tools for sequence analysis, structure analysis, and biological data parsing.
- **PyBioMed**: A module that provides various bioinformatics tools, such as protein and DNA sequence analysis, molecular docking, and pharmacological prediction. 
- **DeepMosaic**: A module that provides tools for genomics and imaging analysis, such as mosaic identification, variant calling, and annotation.
- **PyMOL**: A molecular visualization system that can be used to create high-quality images of molecules and visualize protein structures.
- **scikit-learn**: A machine learning module that has been widely used in bioinformatics, it contains a variety of tools for classification, regression, clustering and feature selection.

📌These are just a few examples of the many libraries available for bioinformatics, and new ones are constantly being developed. These provide powerful tools and functionality that can be used to analyze and visualize biological data, making it easier for bioinformaticians to work with large and complex datasets.


In [None]:
#Install the Biopython module collection
!pip install -U Biopython

In [None]:
#From the Bio.Seq module import the Seq class
from Bio.Seq import Seq

nucleotide_sequence = 'GCCAUGGCCTGTGCTTGTGCAATCGATGCAGAGGTGGAGCGGATGGAGCTGCATGCA'

# Create a Seq object from the give nucleotide sequence
nucleotide_seq = Seq(nucleotide_sequence)
print(nucleotide_seq)

# Use the translate method to translate the nucleotide sequence into an amino acid sequence
amino_acid_seq = nucleotide_seq.translate()
print(amino_acid_seq)

# find the secret message. It is in portuguese...
# Create a list of positions where spaces should be added
space_positions = [1, 5, 10, 11]

# Initialize an empty string to store the modified amino acid sequence
modified_sequence = ""

# Use a for loop to iterate over the amino acid sequence and add spaces at the specified positions
for i, amino_acid in enumerate(amino_acid_seq):
    modified_sequence += amino_acid
    if i+1 in space_positions:
        modified_sequence += " "

# Print the modified amino acid sequence
print(modified_sequence)


# ❓ Final Challenge: Apply what you've learned 
(can be done after the class...)

Now that you have learned about the fundamentals of Python programming, it's time to put your new skills to the test and create your own program!  

Try to write code combining the power of print statements and variables, basic arithmetic operations, conditional statements, functions, for and while loops, lists and list manipulation, dictionaries and/or importing modules to create a program that can solve one of your real-world problems.  

🏁 The possibilities are endless! So go ahead, be creative, and start coding! 🏁

In [None]:
#your code here

Copyright 2023 Nuno S. Osório

This work is licensed under a Creative Commons Attribution 4.0 International License.

You are free to share and adapt this work, provided you give appropriate credit to the original author and indicate if changes were made.