### MEDC0106: Bioinformatics in Applied Biomedical Science

<p align="center">
  <img src="../../resources/static/Banner.png" alt="MEDC0106 Banner" width="90%"/>
  <br>
</p>

---------------------------------------------------------------

# 01 - Introduction to Python

*Written by:* Oliver Scott

**This notebook provides a general introduction to Python.**

Do not be afraid to make changes to the code cells to explore how things work!

### What is Python?

**Python** is a popular general-purpose, high-level programming language. It is paticularly popular amongst the scientific community due to it's inherent readability.

It is commonly used for:

- web development (server-side),
- software development,
- system scripting,
- science,
- ...and much more!


### What is Jupyter?

**Jupyter** is an open-source web application that allows the creation and sharing of documents that contain live code, equations, visualizations and explanatory text.

Uses include:

- data exploration and visualisation
- numerical simulation
- statistical modeling
- machine learning
- ...and much more!

-----

## Contents

1. [Writing code](#Writing-code)
2. [Comments](#Comments)
3. [Variables and data types](#Variables-and-data-types)
4. [Operators](#Operators)
5. [Control flow](#Control-flow)
6. [Reading files](#Reading-files)
7. [Discussion](#Discussion)

-----

### Extra resources:

This introduction to Python is by no means comprehensive. Below are some links to external resources for learning Python if you are interested.

- [Real Python](https://realpython.com/) - Free Python tutorials
- [CodeAcademy](https://www.codecademy.com/learn/learn-python-3) - Python lessons
- [Cheat-Sheets](https://ehmatthes.github.io/pcc_2e/cheat_sheets/cheat_sheets/) - Python reference sheets

-----

## Writing code

We can write sections of code in blocks called code cells.

Try running the cell below (click the run/play button in the toolbar).

The cell should be selected first (click).

*Keyboard shortcuts:*

- Windows: `ctrl` + `enter`
- MacOS: `⌘` + `enter`
- Binder: `ctrl` + `enter`

In [None]:
print("This is a code cell!")

## Comments

To enhance readability of code, programmers use comments. These have no effect on the running of the program, but are important to make your code understandable.

```python
# We can use the hash symbol to define a comment
```

Comments will be used in the code blocks below to help you understand what is going on.

In [None]:
# This is a comment
# Notice that this cell outputs nothing when run

Comments make it clear what sections of code are doing:

In [None]:
# This code line will print the sum of two numbers
print(10 + 12)

## Variables and data types

Variables are one of the most important components of a programming language. Variables are used to store information. This gives a short hand notation to refer to potentially very large amounts of information.

We can assign data to variables using the `=` symbol:

```python
variable = data
```

Once we have assigned data to a variable we can then access that data using the variable name.

When naming a variable it is best practice to give it a name which describes the data it holds. We use underscores `_` to make variable names easier to read (blank spaces cannot be used) e.g.:

```python
my_long_variable_name
```

Also note that Python contains keywords that **should not** be used as variable names. This is because they have explicit functions in Python. In this editor (Jupyter) you will be able to tell a keyword when it is automatically highlighted. Alternatively [here](https://www.w3schools.com/python/python_ref_keywords.asp) is a list of reserved keywords.

*Note: Python is a dynamically typed language and therfore the type of a variable does not need to be specified. If you are familiar with statically typed languages (e.g. c, c++, Java) this may seem a little unusual.*

In [None]:
# In this line we assign the text "Hello, World!" to the variable `my_string`
my_string = "Hello, World!"

# We can print the contents of the variable using the print function
print(my_string)

# We can also create aliases to the same data
my_string_alias = my_string

# We could also do the above like so
my_string = my_string_alias = "Hello, World"

**Python** has several built-in data types/objects which are useful to a programmer.

In this session we will look at:

- strings 
- numerical types
- booleans
- collections

*Note that this is only an introduction. There is much more you can do with these data types!*

Check out these [Cheat-Sheets](https://ehmatthes.github.io/pcc_2e/cheat_sheets/cheat_sheets/), for easy-to-use reference.

### Strings

Strings in Python hold text data (`str`).

Strings can be enclosed with either single or double quotation marks (`''` or `""`). Multi-line strings can be defined using triple quotes. Sometimes multi-line strings can be useful for making long comments or documenting code.

In [None]:
# Double quote syntax
string_one = "This is a Python string"
print(string_one)

# Single quote syntax
string_two = 'This is also a string'
print(string_two)

# Triple quote syntax
string_three = """This is also a string,
but can span multiple lines"""
print(string_three)

### Numerical or scalar types

**Python** has three built-in numerical types:

**Integer** `int`:

Int, or integer, is a whole number, positive or negative, without decimals, of unlimited length.

**Float** `float`:

Float, or "floating point number" is a number, positive or negative, containing one or more decimals.

**Complex** `complex`:

Complex numbers are written with a "j" as the imaginary part:

In [None]:
x = 42    # Integer (int)
y = 42.0  # Float (float)
z = 2j    # Complex (complex)

# We can also print multiple things at the same time
print(x, y, z)

# To verify the types of the above data we can use the `type` function 
print(type(x))
print(type(y))
print(type(z))

Floats can also be formatted in scientific notation with an "e" to indicate the power of 10.

In [None]:
big_float = 1e9
small_float = -82.7e10  # Negative number

print(big_float)
print(small_float)

### Type conversion

You can convert from one type to another using:

```python
int()
float()
complex()
```

This conversion is called *casting*.

*Casting is useful for more than just numerical types. We will see examples of this later.*

In [None]:
x = 1    # int
y = 2.8  # float

# Convert from int to float
a = float(x)

# Convert from float to int
b = int(y)

print(x, "->", a)
print(y, "->", b)

# We can verify that they type has changed
print(type(a))
print(type(b))

*Note: complex numbers cannot be converted into other numerical types (although float and int can be converted to complex)*

### Booleans

*Booleans* `bool` represent one of two values: `True` or `False`.

In programming you often need to know if an expression is `True` or `False`. 

When you compare two values, the expression is evaluated and Python returns the **Boolean** answer.

*We will learn more about comparisons in the [Operators](#Operators) section. Booleans are also essential for [control flow](#Control-flow)*.

In [None]:
# We will take a look at these operators in a latter section

print(10 > 9)   # Greater than `>`
print(10 == 9)  # Equal to `==`
print(10 < 9)   # Less than `<`

# We can also assign booleans to variables
true = True
false = False

# Verify that the type is indeed boolean `bool`
print(type(true))
print(type(false))

We can also type convert booleans into and from multiple types.

Try to **guess the output** of the cell below before running it.

*Hint: think binary!*

In [None]:
true_int = int(True)    # Convert the boolean `True` to an integer
false_int = int(False)  # Convert the boolean `False` to an integer

# What will these lines print?
print(true_int)
print(false_int)

### Collections

There are four core collection data types in Python:

- **Lists** `list` is a collection which is **ordered** and **changeable**. Allows duplicate members.
- **Tuples** `tuple` is a collection which is **ordered** and **unchangeable**. Allows duplicate members.
- **Sets** `set` is a collection which is **unordered** and **unindexed**. No duplicate members.
- **Dictionaries** `dict` is a collection which is **unordered**, **changeable** and **indexed**. No duplicate members.

When choosing a collection type, it is useful to understand the properties of that type.

*Sometimes you will see the terms changeable and unchangeable written as mutable and immutable, respectively.*

----

### Lists
A `list` is **ordered** and **changeable**.

In Python, lists are written with square brackets:

```python
my_list = ['hello', 'John', 22]
```

A list can contain any type of Python object.

In [None]:
# Define and print a list containg colours
colour_list = ['red', 'green', 'blue', 'yellow', 'white', 'black']
print(colour_list)

We can use **indexes** to access items in a `list`.

In Python **indexes start at 0** refering to the **first** item in the list.

We can also use **negative indexes** to access items from the end of the list. i.e. -1 refers to the last item in the list and -2 refers to the item before the last item in the list.

Take a look at the reference image below:

![PythonIndexing](https://railsware.com/blog/wp-content/uploads/2018/10/positive-indexes.png)

[Image source](https://railsware.com/blog/python-for-machine-learning-indexing-and-slicing-for-lists-tuples-strings-and-other-sequential-types/)

To use indexes we use a square bracket syntax:

```python
my_list[0]
```

The above example will access the first item in the list

*Try to guess the output of the cell below before running!*

In [None]:
print(colour_list[0])
print(colour_list[1])
print(colour_list[3])
print(colour_list[-1])
print(colour_list[-3])

# We can also change a value using indexing (the index must already exist!)
colour_list[4] = 'purple'

# Let's verify this change
print(colour_list)

# What do you think this will output? Hint: think ranges!
print(colour_list[0:3])

*The [n:m] syntax generates a slice of the data, where n and m are numerical indexes forming a ranged query.*

#### Slicing 

The last example with the colon (`[n:m]`) is called slicing. Slicing uses the following syntax:

```
[lower-bound : upper-bound : step-size] 
```

...and can be used like so:

```python
a[start:stop]      # items start through stop - 1
a[start:]          # items start through the rest of the collection
a[:stop]           # items from the beginning through stop - 1
a[:]               # a copy of the whole collection
a[start:stop:step] # start through not past stop, by step
```

The key point to remember is that the `stop` value represents the **first value that is not in the selected slice**. So, the difference between `stop` and `start` is the number of elements selected (if `step` is 1, i.e. the default).

*`step` is the size of the increment!* 

`start` or `stop` may also be a negative number, which means it counts from the end of the collection instead of the beginning:

```python
a[-1]    # last item in the collection
a[-2:]   # last two items in the collection
a[:-2]   # everything except the last two items
```

Let's see how this works:

In [None]:
# Simple slicing
print('colour_list     ', colour_list)
print('colour_list[1:3]', colour_list[1:3])
print('colour_list[3:] ', colour_list[3:])
print('colour_list[:3] ', colour_list[:3])
print('colour_list[:]  ', colour_list[:])

# Slicing with negative indexes
print('\ncolour_list[-2:]  ', colour_list[-2:])
print('colour_list[:-1]  ', colour_list[:-1])

# Slicing with steps
print('\ncolour_list[0:-1:2]  ', colour_list[0:-1:2])

**Strings can also be sliced!**

In [None]:
string = 'Hello, World!'

print(string[:5])
print(string[7:])

new_string = 'Bonjour,' + string[6:] 
print(new_string)

#### Finding the number of items in a list

Finding the number of items in a list is as easy as using the built-in `len()` function.

```python
n_items = len(my_list) 
```

The `len()` function can also be used with many other collections types and strings.

In [None]:
n_colors = len(colour_list)
print('Number of colours:', n_colors)

#### Adding items to a list

There are numerous ways to add and remove items from a list. For now we will only mention one of these; the `append()` function.

```python
my_list.append('item to append')
```

This method adds an item to the end of a list.

To see other methods check out the [Cheat-Sheet](https://github.com/ehmatthes/pcc_2e/releases/download/v1.0.1/beginners_python_cheat_sheet_pcc_lists.pdf) later.

In [None]:
# Add an item to the list
colour_list.append('pink')

# Lets verify that we added 'pink' to the end of the list.
print(colour_list)

### Tuples

A `tuple` is **ordered** and **unchangeable**.

In Python tuples are written with round brackets.

```python
my_tuple = ('John', 'Doe', 22)
```

- Like lists, tuples can contain strings and any other type of Python object.
- Like lists, we can also use indexing to access items in the tuple.
- Unlike lists, we cannot change, add, or remove items in a tuple.

In [None]:
xyz = (3.01, 1.23, -1.22)  # Define a tuple holding three-dimensional coordinates.

print(xyz[0], xyz[1], xyz[2])

Remember that tuples are **unchangeable** so trying to change items will give us an error message.

Error messages in Python are designed to be understandable and relatively easy to read.

Let's try it out (uncomment and run):

In [None]:
# xyz[1] = 2.90

### Sets

A `set` is **unordered** and **unindexed**.

Sets also do not contain duplicate items.

In Python, sets are written with curly brackets.

```python
my_set = {'John', 'Doe', 22}
```

- Sets can contain strings and any hashable* Python object.
- Like lists, items can be added and removed from a set.
- Unlike lists and tuples, sets cannot be indexed as they have no order.
- Unlike lists and tuples, sets do not hold duplicate items.


\* For a Python object to be **hashable**, it must be ordered and unchangeable. Therefore, we can add a tuple to a set but not a list. Numerical data types and strings are also unchangeable so we can also add these to sets.

In [None]:
# Define a set holding the names of fruits
fruit_set = {'cherry', 'banana', 'apple'}
print(fruit_set)

# We can add items to the set using the `add` method
fruit_set.add('mango')
print(fruit_set)

# Let's try adding another apple to the set; can you guess what will happen?
fruit_set.add('apple')
print(fruit_set)

# We can remove items using the `remove` method
fruit_set.remove('cherry')
print(fruit_set)

**Sets** also have more useful methods.

**Intersection** and **union** are particularly useful.

- The intersection of two sets is a new set that contains all of the elements that are in both sets.
- The union of two sets is a new set that contains all of the elements that are in at least one of the two sets.

Try to work out the output before you run the cell below.

In [None]:
a = {1, 2, 3}  # Define a set of numbers
b = {3, 4, 5}  # Define another set of numbers

# Compute the intersection "a and b"
inter = a.intersection(b)
print(inter)

# Compute the union "a or b"
union = a.union(b)
print(union)

### Dictionary

A **dictionary** `dict` is **unordered**, **changeable** and **indexed**.

In Python dictionaries are written with curly brackets, and they have **keys** and **values**.

```python 
my_dict = {'my_key_1': 1, 'my_key_2': 2}
```

- Like lists and tuples, dictionaries can contain strings and any Python object. Keys however must be hashable.
- Like lists and sets, items can be added and removed from a dictionary.
- Like lists and tuples, dictionaries can be indexed although the syntax is slightly different.

Dictionaries cannot contain duplicate keys but can contain duplicated values.

In [None]:
# Define a dictionary describing a user
user_dict = {
    'name': 'John',
    'last_name': 'Doe',
    'age': 22
}
print(user_dict)

# We can index the dictionary using keys
age = user_dict['age']  # Get the age of the user
print(age)

# We can add items using a similar syntax
user_dict['bought'] = ['apple', 'mango']
print(user_dict)

# We can also change values easily
user_dict['age'] = 23  # It is John's birthday! 
print(user_dict)

Dictionaries are a very versatile data type.

To learn more about the utilities of dictionaries download the dictionary [Cheat-Sheet](https://github.com/ehmatthes/pcc_2e/releases/download/v1.0.1/beginners_python_cheat_sheet_pcc_dictionaries.pdf).

----

## Operators

Operators are used to perform operations on variables and values.

Python provides multiple types of operators which can be used with multiple data types. Here, we will look at:

- arithmetic operators,
- assignment operators,
- and comparison operators.

----

#### Arithmetic operators

They are used with numeric values to perform common mathematical operations. Most of these are relatively intuitive.

| Operator | Name           | Example |
|:---------|:---------------|:--------|
| `+`      | Addition       | x + y   |
| `-`      | Subtraction    | x - y   |
| `*`      | Multiplication | x * y   |
| `/`      | Division       | x / y   |
| `%`      | Modulus        | x % y   |
| `**`     | Exponentiation | x ** y  |
| `//`     | Floor division | x // y  |


In [None]:
x = 10  # Define an integer 10.
y = 2   # Define an integer 2.

print(x + y)   # Addition
print(x - y)   # Subtraction
print(x * y)   # Multiplication
print(x / y)   # Division
print(x % y)   # Modulus (remainder) 
print(x ** y)  # Exponentiation

y = 3          # Redefine y so we can see the effect of floor division 
print(x / y)   # Division
print(x // y)  # Floor division

*Note: Some arithmetic operators can also be used with other data types like strings. Maybe you could try and see what happens if you add two strings or multiply a string by an integer.*

----

#### Assignment operators

Assignment operators are used to assign values to variables.

We already know the `=` assignment operator, though there are multiple others.

Notice that numeric types are **unchangeable** so if we want to change the number assigned to a variable we need to reassign it like this:

```python
x = 3      # We have assigned 3 to x
x + 3      # Here x has not changed
x = x + 3  # Here we have reassigned x to x + 3 (or 6)
```

Using assignment operators is a simpler way of writing the same experession:

```python
x = 3   # We have assigned 3 to x
x + 3   # Here x has not changed
x += 3  # Here we have reassigned x to x + 3 (or 6)
```

Try some other assignment operators to get used to the syntax.

In [None]:
x = 5  # Assign 5 to x
x * 2  # Multiply by 5 (no change to x)

# Let's verify that nothing has changed
print(x)

# Now we will try an assignment operator
x *= 2    
print(x)  # Notice that x is now 10

# We use the other syntax when we want to store the answer in a different variable
y = x / 2
print(y)

### Comparison operators

Comparison operators are used to **compare** values

Comparison operators return **Booleans**, True or False.

They are useful for controlling the flow of a program. We will learn more about this in the [Control flow](#Control-flow) section.

| Operator | Name                       | Example |
|:---------|:---------------------------|:--------|
| `==`     | Equal                      | x == y  |
| `!=`     | Not equal                  | x != y  |
| `>`      | Greater than               | x > y   |
| `<`      | Less than                  | x < y   |
| `>=`     | Greater than or equal to   | x >= y  |
| `<=`     | Less than or equal to      | x <= y  |

Notice how the equal operator is `==` rather than `=`. Why do you think this is the case?

*Hint: remember the assignment operators.*

Before running the cell below try to work out what the output will be.

In [None]:
x = 15
y = 5.5

print(x == y)
print(x != y)
print(x > y)
print(x < y)
print(x >= y)
print(x <= y)

Comparison operators are not limited to numerical types.

Try comparing some strings!

----

## Control flow

Python also offers a number of ways to control how and what code is executed.

**Conditionals:**

```python
if ... else
```

**Loops:**

```python
for ... while
```

----

### Conditionals

We can combine comparison operators with `if` statements to control how code is executed:

```python
if CONDITION:
    # do something
```
    
We can chain multiple comparisons together using `elif` and `else`, giving us even more control over the execution of code:

```python
if CONDITION:
    # do something
elif CONDITION:
    # do something
else:
    # do something
```
    
`elif` is shorthand for 'else if'. When using `elif`, if an above comparison evaluates to True then any below conditions prefixed by `elif` will not be evaluated. If they were prefixed by `if` they will also be evaluated.

Try adjusting the code below to change the output.

In [None]:
x = 100
y = 2

if x > y:
    print('x is greater than y')

Again try adjusting the code below to change the output.

In [None]:
x = 100
y = 200

if x == y:
    print('x is equal to y')
else:
    print('x is not equal to y')

And now with `elif`...

Again try changing the numbers to change the output.

In [None]:
x = 200
y = 33

if x < y:
    print("x is less than y")
elif x == y:
    print("x and y are equal")
else:
    print("x is greater than y")

### Loops

Python has two primitive loops:

- `for` loops
- `while` loops

A `for` loop is used for iterating over a sequence (that is either a list, a tuple, a dictionary, a set, or a string) i.e. an iterable object.

With the `for` loop we can execute a set of statements, once for each item in a list, tuple, set etc.

```python
sequence = [1, 2, 3, 4]
for x in sequence:
    # do something
```

In [None]:
shopping_list = ['apples', 'cereal', 'beer', 'milk']  # Define a sequence of items.

"""
As we iterate through the list, each item is evaluated individually.
After the evaluation of an item is complete we move to the next item automatically.

In this case, the loop will assign each item in the list to the variable `item`,
so that we can use it in our code.

Here, we use a conditional to control how we process each item.
"""

for item in shopping_list:
    if item == 'beer' or item == 'milk':
        print('We need to buy:', item, '(Drink)')
    else:
        print('We need to buy:', item, '(Food)')

With a `while` loop we can execute a set of statements as long as a condition evaluates to True.

In [None]:
i = 1  # We will start at 1

"""
Here we evaluate code until a condition is met.
In this case we are waiting until i > 6.

Remember we need to increment i. We use the assignment
operator here. If we forget to increment i, the loop
will run forever!

If this happens you can press stop in the toolbar above
to interrupt code execution.
"""

while i <= 6:
    print(i)
    i += 1

Check out the `while` loop [Cheat-Sheet](https://github.com/ehmatthes/pcc_2e/releases/download/v1.0.1/beginners_python_cheat_sheet_pcc_if_while.pdf).

## Reading files

Often we will need to read data from files for further processing with Python. 

It is best practice to read files using the following syntax:

```python
with open('location_of_file') as f:
    # do something with the file
```    
    
- The `with` means that we do not have to remember to close the file, it will be handled automatically.
- The file location is the location/path of the file on your computer.
- The `as f` means the file object can be referenced using the variable `f`.

To check out more about reading files download the [Cheat-Sheet](https://github.com/ehmatthes/pcc_2e/releases/download/v1.0.1/beginners_python_cheat_sheet_pcc_files_exceptions.pdf).

## Discussion

That was it for the Python introduction.

Feel free to add more code cells and experiment with the concepts you have learnt.

In the next notebook we will look at how to use functions to define resusable blocks of code. You can use this notebook as reference if you need to refresh your knowledge on any of the concepts explored.

If you want to learn more there are some extra external resources linked at the beginning of this notebook. You can click [here](#Contents) to go back to the top.