# University of Guelph MBINF 2024 - Introduction to Python Workshop

Hello Guelph MBINF Program 2024! Today we will introduce Python for bioinformatics. This workshop is designed to gently introduce Python programming concepts used in bioinformatics, then apply them to do some real-world science with biological data. 

---

## Part 1. Introduction to Python
- Why use Python for Bioinformatics?
- Fundamentals of Python programming
    - Data types: strings, integers, floats, booleans
    - Data structures: lists, dictionaries
    - Conditional statements, loops, and comprehensions
    - Functions 
    - Installing and importing modules
---

### Why use Python for Bioinformatics?

<img src="images/python_usage.png" alt = "python usage">


Learning Python for bioinformatics offers numerous advantages for students in the field. There are many reasons why a bioinformatician would benifit from investing in learning Python:  

1. **Versatile Language:** Python is a versatile programming language that is widely used in bioinformatics due to its simplicity and readability. It's an excellent choice for beginners and experienced programmers alike. Most bioinformaticians are introduced to programming through a scripting language like R. While Python can also be used for scripting (ie using Jupyter notebooks), it can also be used in many other use cases such as creating full production applications and APIs.   

2. **Rich Libraries and Modules:** Python has a vast ecosystem of libraries and modules specifically designed for bioinformatics tasks. These libraries, such as Biopython and scikit-learn, provide pre-built functions and tools for tasks like sequence analysis, data manipulation, and statistical analysis.

3. **Data Manipulation and Analysis:** Python's data manipulation capabilities make it well-suited for handling large datasets commonly found in bioinformatics. It provides tools for data cleaning, transformation, and statistical analysis, enabling researchers to draw meaningful insights from biological data. Manipulating data in most commonly used Python tabular data package, Pandas, is significatly faster than using dataframes in R. 

4. **Sequence Analysis:** Python's string manipulation capabilities are invaluable for sequence analysis tasks, such as DNA, RNA, and protein sequence alignment, motif searching, and mutation analysis.

5. **Visualization:** Python offers a variety of libraries, such as Matplotlib and Seaborn (easy interface for Matplotlib), that allow bioinformaticians to create informative and visually appealing plots and graphs to present their findings effectively.

6. **Machine Learning and AI:** Bioinformatics often involves analyzing complex biological data, and Python's integration with machine learning and artificial intelligence libraries allows students to apply advanced algorithms for tasks like classification, clustering, and prediction. The world of biological data science requires knowledge of ML workflows that are highly developed in Python. 

10. **Scripting and Automation:** Python is well-suited for automating repetitive tasks, which is a common need in bioinformatics. Bioinformaticians can create scripts and workflows to process data, perform analyses, and generate reports, saving time and reducing human error. The ability to natively launch cloud-based workflows will take your bioinformatics to the next level. 

10. **Job Opportunities:** Bioinformatics is a rapidly growing field, and proficiency in Python is often a prerequisite for job positions in academia, pharmaceuticals, biotechnology, and healthcare industries. Learning Python can enhance a student's employability and career prospects. There are many flavours of bioinformatics including bioinformatics engineers (akin to a software engineer for biological data), data engineer, and data scientist. For each, industry hiring managers are increasingly looking for solid Python skills. 

In summary, learning Python **empowers** bioinformaticians to efficiently analyze biological data, perform complex computations, and develop innovative solutions to challenges in the field. Its user-friendly syntax, extensive libraries, and broad applications make it an essential tool for any aspiring bioinformatician.


**Exercises:** 

<div style="background-color: black; padding: 15px;">

> Exercise 1: Discuss what bioinformatics skills you think will be valuable to learn for your career.

> Exercise 2: Discuss when would you use Python versus other languages (ie C, R, JavaScript).

</div>

---

## Fundamentals of Python programming

### Data types: strings, integers, floats, booleans


**Strings**

In Python, a string is a sequence of **characters** enclosed within either single quotes (' ') or double quotes (" "). Strings are used to represent textual data and can contain letters, numbers, symbols, and spaces. Here's how you can define a string in Python:

In Python strings use the type hint `str`.

In [None]:
"this is a string with double quotes"

'this is a string with single quotes'

**Integers**

An integer is a whole number (without decimals). Here's how you can define an integer in Python:

In Python integers use the type hint `int`.

In [None]:
11

1e9

**Floats**

In Python, a float is a data type used to represent floating-point numbers, which include both whole numbers and fractions. Floating-point numbers are used to represent real numbers, including those with decimal points. Here's how you can define a float in Python:

In Python floats use the type hint `float`.

In [None]:
3.14

1000000.0

**Booleans**

A boolean is a data type that represents a binary value, typically indicating either `true` or `false`. Booleans are used in logical operations, conditional statements, and comparisons. The two possible boolean values are `True` and `False`. Here's how you can define a boolean in Python:

In Python floats use the type hint `bool`.

In [None]:
True

False

**Variables**

A variable is a name that you can use to refer to a value or an object in memory. Variables allow you to store and manipulate data in your programs. To define a variable in Python, you need to provide a name for the variable and assign a value to it using the assignment operator `=`. Here's the general syntax:

In [None]:
name = "Alice" 

age = 28

grade = 91.5

graduated = False

Variable names in Python must follow these rules:
- They can contain letters, numbers, and underscores.
- They must start with a letter (a-z, A-Z) or an underscore (_).
- They are case-sensitive (e.g., my_variable and My_Variable are different variables).
- They cannot be a reserved keyword (e.g., if, while, for, etc.).
- They should be descriptive and follow a readable naming convention (e.g., user_name instead of u).

**Type Hints**

Python is a **dynamically typed** language. Dynamically typed refers to a programming language feature where variable **types** (string, boolean, integer, float, etc.) are determined at runtime rather than being explicitly declared during variable definition. In dynamically typed languages, you can change the type of a variable simply by assigning a value of a different type to it, and the language's runtime system handles the necessary type conversions. Other languages like C, Java or TypeScript are  **statically typed**, meaning the type must be declared before using a variable. 

However, it is a good practice (and growing requirement in industry) that you provide `type hints` when using Python. These dramatically reduce bugs in your code and make it more readable. Type hints in Python use a colon (`:`) after the variable name and before the equals sign to signify the type of a variable. Examples of a type hint in Python include: 

For more information, checkout Software Carpentry's page on [Data Types](http://swcarpentry.github.io/python-novice-gapminder/03-types-conversion.html). 

In [None]:
string_variable: str = "this is a string"

number: int = 111

**Exercises:** 

> Exercise 3: In a new cell below, create variables that store a string, integer, float, and boolean. If you need to create a new cell, select the `+ Code` button in Notebook.

> Exercise 4: Add type hints to the variables provided in the example above and to the variables created in Excercise 3. 

> Exercise 5: Use the `print()` function to print your variables to the screen. 

**A note about string formatting**

It is useful to be able to include variables inside strings. There are multiple ways to do this. The most Pythonic method is currently to use `f-strings`, which have the built-in ability to cast other variable types to strings! Use an `f` at the beginning of the string (before the quotation marks), and curly brackets (`{}`) inside the string to insert a variable into a string: 

In [None]:
number_of_cats: int = 17

print(f"Wow, he has {number_of_cats} cats in his dorm room!")

### Data structures: lists, dictionaries

**Lists**

A list is a built-in Python data structure that allows you to store and manage a collection of items. Lists are ordered, mutable (meaning you can change their contents after creation), and can contain elements of different data types. Each element in a list is assigned an index, starting from 0 for the first element. Lists are defined using square brackets `[]`. Here's how you can define a list in Python:

Lists use the type hint `list`. You can also type-hint the expected context of a list with square brackets: `list[str]` or `list[int]`. 

In [None]:
my_list = [1, 2, 3, 4, 5]

Items in a list can be accessed by index. Note that Python uses zero (`0`) as the first index. To retrieve items by index, use the following notation: 

You can modify elements in a list using the index:

In [None]:
first_element = my_list[0]  # Retrieves the first element (1)
second_element = my_list[1] # Retrieves the second element (2)

my_list[2] = 10  # Changes the third element from 3 to 10

You can subset lists using a range of indices seperated by a colon (`:`)

In [None]:
subset = my_list[1:4]     # Retrieves elements from index 1 to 3 (inclusive)

Common list operations include adding (appending) or removing items from a list: 

In [None]:
my_list.append(6)         # Adds 6 to the end of the list
my_list.remove(4)         # Removes the element 4 from the list

**Exercises:** 

> Exercise 6: In a new cell below, create a variable storing a list containing at least 4 elements of any type. 

> Exercise 7: Access the 1st element in the list using its index inside a square bracket. Is it the item you expected? If not why? 

---

**Dictionaries**

In Python, a dictionary (dict) is a built-in data structure that allows you to store and manage collections of **key-value pairs.** Each key in a dictionary is unique and maps to a specific value. They are defined using curly braces {} and a series of key-value pairs separated by colons. 

Dictionaries offer fast and efficient look-up times based on their keys, making them suitable for scenarios where you need to quickly access and manipulate data based on unique identifiers (keys). 

Here's how you can define a dictionary in Python:

In [None]:
my_dict = {"name": "Alice", "age": 30, "is_student": False}

You can access values in a dictionary using their keys:

In [None]:
name = my_dict["name"]       # Retrieves the value associated with the key "name"
age = my_dict["age"]         # Retrieves the value associated with the key "age"

You can also modify values and add new key-value pairs to a dictionary:

In [None]:
my_dict["age"] = 31         # Changes the value associated with the key "age" to 31
my_dict["city"] = "New York"  # Adds a new key-value pair "city": "New York" to the dictionary

Dictionaries use the type hint `dict`. You can specify the types of the dictionary keys and values seperately using square bracket notation: `dict[str, int]`. It is common to use string keys with any type of value. To type hint this you can use:

In [None]:
from typing import Any

dict[str, Any]

(Imports will be coverted later in the workshop)

**Exercises:** 

> Exercise 8: Building a Basic Contact List

Imagine you're building a simple contact list program using dictionaries. Each contact will have a name as the key and a phone number as the value. In a new code block below, you will perform various operations on this contact list.

1. Initialize an empty list called `contact_list`.

2. Add at least three contacts as dictionaries to the `contact_list` dictionary. Each contact should have a name as the key and a phone number as the value.

3. Print the entire `contact_list` dictionary.

Here's a starting point for the exercise:


In [None]:
contact_list: list[dict] = []


### Conditional statements, loops, and comprehensions


**Conditional Statements**

Conditional statements are an essential part of programming that allow you to make decisions and control the flow of your code based on certain conditions. In Python, you can use the `if`, `elif` (short for "else if"), and `else` statements to create conditional logic.

When writing a conditional statement the structure is important! After your write the condition to evaluate it must be followed by a colon (`:`). The next block that operates on condition being `True` must be indented. 

Here's an introduction to conditional statements in Python:

**The if statement:**

The if statement is used to execute a block of code if a certain condition is true. If the condition is `False`, the block of code is skipped.

In [None]:
x: int = 10

if x > 5:
    print("x is greater than 5")

**The else statement:**

The else statement is used to define a block of code that will be executed if the condition of the preceding if statement is `False`.

In [None]:
x: int = 3

if x > 5:
    print("x is greater than 5")
else:
    print("x is not greater than 5")

**The elif statement:**

The elif statement allows you to check additional conditions after the initial if statement. It's used when you want to check multiple possibilities.

In [None]:
x: int = 7

if x > 10:
    print("x is greater than 10")
elif x > 5:
    print("x is greater than 5 but not greater than 10")
else:
    print("x is not greater than 5")


There are many ways to use flow control in your code -- anything that can evaluate to `True` or `False` can be used to in a conditional statement! 

Here is a nonsense example: 

In [None]:
name: str = "Jeff" 

if name == "David":
    print("Hello David!")

elif name in "Jeffery":
    print(f"What's up {name}!")

elif (len(name) > 6):
    print(f"Your name is long {name}!")

else:
    print(f"Yoooooooo {name}!")

**Exercises:** 

> Exercise 9: Assign a random number **between 1 and 100** to a variable. Evaluate if it is greater than, less than, or equal to 50. How many conditional statements do you need? 

Hint: a random number (in this example, between 1 and 10) can be generated using: 

In [None]:
import random

random.randint(1, 10)  # Generates a random integer between 1 and 10

(did you type hint your variable assignment?)

**Loops**

In Python, loops are used to execute a block of code repeatedly. There are two main types of loops: for loops and while loops. We will only use for loops in this workshop. 

**For Loops:**

for loops are used to iterate over an **iterable** sequence (such as a list, tuple, string, etc.) and perform a certain action for each element in the sequence.

Example:

In [None]:
fruits: list[str] = ["apple", "banana", "cherry"]
for fruit in fruits:
    print(fruit)

**While Loops:**

While loops are used to repeatedly execute a block of code as long as a specified condition evaluates to `True`. 

Example:

In [None]:
count = 0
while count < 5:
    print("Count:", count)
    count += 1 

**Exercises:** 

> Exercise 10: In a cell below, write a program that calculates the sum of all numbers from 1 to 10 using a for loop.

hint: `range(1, 11)` produces an ordered list of integers between 1 to 10. 

**Comprehensions**

Comprehensions are one of the most useful tools in Python for keeping your code compact and clear. 

List comprehension is a concise and powerful way to create lists in Python. It allows you to create a new list by applying an expression to each item in an existing iterable (like a list, tuple, or range) while also filtering elements based on a condition. List comprehensions are a more compact alternative to using traditional for loops to generate lists.

Here's a basic introduction to list comprehensions:

Syntax:

```python
new_list = [expression for item in iterable if condition]
```

Where:
expression: The operation you want to perform on each item.
item: Represents each element in the iterable.
iterable: The collection you want to iterate over.
condition (optional): A condition that filters which elements are included in the new list.

Example:

In [None]:
numbers = [1, 2, 3, 4, 5]
even_numbers = [x for x in numbers if x % 2 == 0]
divided_even_numbers = [x/2 for x in even_numbers]

**Exercises:** 

> Exercise 11: Generating Even Squares

In a cell below:
1. Create a list of numbers from 1 to 10 using the range() function.

2. Use a list comprehension to generate a new list containing the squares of even numbers from the list you created in step 1.

3. Print the original list and the new list of squared even numbers.

Here's a starting point for the exercise:

In [None]:
# Step 1: Create a list of numbers from 1 to 10 using range()
numbers = list(range(1, 11))

# Step 2: Use a list comprehension to generate squared even numbers

# Step 3: Print the original list and the new list of squared even numbers

### Functions

In Python, a function is a reusable block of code that performs a specific task. Functions are a fundamental concept in programming because they allow you to break down your code into smaller, manageable pieces, which makes your code more organized, modular, and easier to maintain.

Here's an introduction to functions in Python:

Function Definition:
In Python, you can define a function using the following parts:
- the `def` keyword followed by the function name 
- parameters (if any) inside parentheses
- a colon indicating the end of the function definition line
- an indented block of code that makes up the function's body. 
- an optional return call to return objects from the function into the main body of your code. 

Return Statement:

Functions can use the return statement to send a value back to the caller. This value can then be assigned to a variable or used in further calculations.

Function Parameters:

Functions can take parameters, which are placeholders for values that you pass when you call the function. Parameters allow you to customize the behavior of the function for different inputs.

The syntax is as follows:

```python
def function_name(parameters):
    # Function body
    # Code to perform the desired task
    return result  # Optional
```

Example:

Here's a simple example of a function that calculates the square of a number:

In [None]:
def square(x):
    return x ** 2

**Calling a Function:**

To use a function, you "call" it by using its name followed by parentheses. If the function takes parameters, you provide the values for those parameters within the parentheses. The result of the function can be assigned to a variable or used directly.

In [None]:
result = square(5) # Calling the 'square' function with parameter 5
print(result) # Prints 25

Advanced function definitions additionally include: 
- type hints for the parameters defined using a colon (`:`),
- default values defined using an equals sign (`=`),
- type hints for return values defined using an arrow (`->`),
- a `docstring` between triple quotation marks (`"""`). Docstrings allow you to document your code. 

An example is provided below: 

In [None]:
def greet(name: str, greeting: str = "Hello") -> str:
    """
    Generates a personalized greeting.

    Parameters:
    name (str): The name of the person to greet.
    greeting (str): The greeting message (default is "Hello").

    Returns:
    str: The personalized greeting message.
    """
    return f"{greeting}, {name}!"

# Calling the function with both parameters provided
result1 = greet("Alice", "Hi")

# Calling the function with only the 'name' parameter provided, using the default 'greeting'
result2 = greet("Bob")

print(result1)  # Output: "Hi, Alice!"
print(result2)  # Output: "Hello, Bob!"

**Exercises:**

Exercise 12: In a Code cell below, write two functions:
1. a simple function that calculates the area of a rectangle 
2. another function that calculates the area of a circle. 

Then, we'll call these functions with different inputs.

Hint: how would you type-hint these functions? 

### Installing and importing modules

In Python, a module is a file containing Python definitions and statements. Modules allow you to organize your code into separate files, making it easier to manage and reuse. Python comes with a rich standard library that includes many modules for various tasks, such as working with files, handling dates and times, and performing mathematical operations. Additionally, you can install and use third-party modules to extend Python's functionality.

Here's an introduction to installing and importing modules in Python:

Installing Modules:

**Standard Library Modules:** Python's standard library modules come pre-installed with Python. You don't need to install them separately; you can directly import and use them in your code.

**Third-Party Modules:** Third-party modules are not part of the standard library and need to be installed before you can use them. You can use a package manager like `pip` to install third-party modules. For example, to install the requests module, you can run the following command in your terminal:

<div style="background-color: rgb(50, 50, 50); padding: 15px;">

```
pip install requests
```
</div>

**Importing Modules:**

After installing a module, you need to import it into your Python script to use its functionality. You can import the entire module or specific items (such as functions, classes, or constants) from the module.

Importing the Whole Module:
To import the entire module, you use the import keyword followed by the module's name:

<div style="background-color: rgb(50, 50, 50); padding: 15px;">

```python
import module_name
```
</div>

Example:

<div style="background-color: rgb(50, 50, 50); padding: 15px;">

```python
import math
```
</div>

Importing Specific Items:
To import specific items from a module, you can use the from keyword followed by the module's name and the items you want to import:

<div style="background-color: rgb(50, 50, 50); padding: 15px;">

```python
from module_name import item1, item2, ...
```
</div>

Example:

<div style="background-color: rgb(50, 50, 50); padding: 15px;">

```python
from math import pi, sqrt
```
</div>

Using Imported Modules:

Once you've imported a module or its specific items, you can use their functions, classes, and constants in your code.

Example:

<div style="background-color: rgb(50, 50, 50); padding: 15px;">

```python
import math

radius = 5
circumference = 2 * math.pi * radius
```
</div>


Note: When using third-party modules, make sure they are installed in the same environment where your Python script runs.

In summary, modules are an essential part of Python that allow you to organize and extend your code's functionality. By installing and importing modules, you can easily leverage existing code and build more powerful and efficient programs.


**Exercises:**

<div style="background-color: black; padding: 15px;">

Exercise 13: In a Code cell below install the `requests` module using `pip` -- you may have to add an exclaimation mark at the front (`!pip install ...`) for it to be called within a notebook.

Exercise 14: In a seperate Code cell, call the `requests.get()` function to call the website `https://www.uoguelph.ca/` -- assign the results to a variable called `response`. What is in the response variable? 
</div>


A note on **"coding style"** From [Software Carpentry](http://swcarpentry.github.io/python-novice-gapminder/18-style.html): 

> A consistent coding style helps others (including our future selves) read and understand code more easily. Code is read much more often than it is written, and as the Zen of Python states, “Readability counts”. Python proposed a standard style through one of its first Python Enhancement Proposals (PEP), PEP8.

- Document your code and ensure that assumptions, internal algorithms, expected inputs, expected outputs, etc., are clear
use clear, semantically meaningful variable names
- Use white-space (spaces), not tabs, to indent lines (tabs can cause problems across different text editors, operating systems, and version control systems)

BUT don't stress too much about coding style. Instead, use a linter!
- [Black](https://github.com/ambv/black) - Formats Python code without compromise
- [Isort](https://github.com/timothycrosley/isort) - Formats imports by sorting alphabetically and separating into sections
- [MyPy](http://mypy-lang.org/) - Checks for optionally-enforced static types

