### MEDC0106: Bioinformatics in Applied Biomedical Science

<p align="center">
  <img src="../../resources/static/Banner.png" alt="MEDC0106 Banner" width="90%"/>
  <br>
</p>

---------------------------------------------------------------

# 01 - Introduction to Python

*Written by:* Oliver Scott

**This notebook provides a general introduction to Python.**

Do not be afraid to make changes to the code cells to explore how things work!

### What is Python?

**Python** is a popular general-purpose, high-level programming language. It is paticularly popular amongst the scientific community due to it's inherent readability.

It is commonly used for:

- Web development (server-side),
- Software development,
- System scripting,
- Science,
- Much more!


### What is Jupyter?

**Jupyter** is an open-source web application that allows the creation and sharing of documents that contain live code, equations, visualizations and explanatory text.

Uses include:

- Data exploration and visualisation
- Numerical simulation
- Statistical modeling
- Machine learning
- Much more!

-----

## Contents

- [Writing Code](#Writing-code)
- [Comments](#Comments)
- [Variables and Datatypes](#Variables-and-Datatypes)

-----

#### Extra Resources:

This introduction to Python is by no means comprehensive. Below are some links to resources for learning Python if you are interested.

- [RealPython](https://realpython.com/) - Free python tutorials from beginner to advanced
- [CodeAcademy](https://www.codecademy.com/learn/learn-python-3) - Python lessons
- [Cheat-Sheets](https://ehmatthes.github.io/pcc_2e/cheat_sheets/cheat_sheets/) - Python reference sheets

-----

## Writing code

We can write sections of code in blocks called code cells.

Try running the cell below (click the run/play button in the toolbar).

The cell should be selected first (click).

*Shortcuts:*

- WIN: ctrl + enter
- OSX: ⌘ + enter


In [None]:
print("This is a code cell!")

## Comments

To enhance readability of code, programmers use comments. These have no effect on the running of the program, but are important to make your code understandable.

`# We can use the hash symbol to define a comment`

Comments will be used in the code blocks below to help you understand what is going on.

In [None]:
# This is a comment
# Notice that this cell outputs nothing when run

Comments make it clear what sections of code are doing:

In [None]:
# This code line will print the sum of two numbers
print(10 + 12)

## Variables and Datatypes

Variables are one of the most important components of a programming languyage. Variables are used to store information. This gives a short hand notation to refer to potentially very large amounts of information!

We can assign data to variables using the `=` symbol:

`variable = data`

Once we have assigned data to a variable we can then access that data using the variable name.

When naming a variable it is best practice to give it a name which describes the data it holds. We use underscores `_` to make variable names easier to read (spaces cannot be used) e.g. 

`my_long_variable_name`

Also note that Python contains keywords that **should not** be used as variable names. This is because they have explicit functions in Python. In this editor (Jupyter) you will be able to tell a keyword when it is automatically highlighted. Alternatively [here](https://www.w3schools.com/python/python_ref_keywords.asp) is a list of reserved keywords.

*Note: Python is a dynamically typed language and therfore the type of a variable does not need to be specified. If you are familiar with statically typed languages (e.g. c, c++, Java) this may seem a little unusual.*

In [None]:
# In this line we assign the text "Hello, World!" to the variable `my_string`
my_string = "Hello, World!"

# We can print the contents of the variable using the print function
print(my_string)

# We can also create aliases to the same data
my_string_alias = my_string

# We could also do the above like so
my_string = my_string_alias = "Hello, World"

**Python** has several built in datatypes/objects which are useful to a programmer.

In this session we will look at:

- Strings
- Numerical Types
- Booleans
- Collections

*Note that this is only an introduction. There is much more you can do with these data types!*

Check out these [cheat-sheets](https://ehmatthes.github.io/pcc_2e/cheat_sheets/cheat_sheets/), for easy to use reference.

### Strings

Strings in Python hold text data (`str`).

Strings can be enclosed with either single or double quotation marks. Multi-line strings can be defined using triple quotes. Sometimes multi-line strings can be useful for making long comments or documenting code.

In [None]:
# Double quote syntax
string_one = "This is a Python string"
print(string_one)

# Single quote syntax
string_two = "This is also a string"
print(string_two)

# Triple quote syntax
string_three = """This is also a string,
but can span multiple lines"""
print(string_three)

### Numerical/Scalar types

**Python** has three built-in numerical types:

**Integer** `int`:

Int, or integer, is a whole number, positive or negative, without decimals, of unlimited length.

**Float** `float`:

Float, or "floating point number" is a number, positive or negative, containing one or more decimals.

**Complex** `complex`:

Complex numbers are written with a "j" as the imaginary part:

In [None]:
x = 42    # Integer (int)
y = 42.0  # Float (float)
z = 2j    # Complex (complex)

# We can also print multiple things at the same time
print(x, y, z)

# To verify the types of the above we can use the `type` function 
print(type(x))
print(type(y))
print(type(z))

Floats can also be formatic in scientific notation with an "e" to indicate the power of 10.

In [None]:
big_float = 1e9
small_float = -82.7e10  # negative number

print(big_float)
print(small_float)

### Type Conversion

You can convert from one type to another using `int()`, `float()`, and `complex()`

This conversion is called *casting*.

*Casting is useful for more than just numerical types, we will see examples of this later.*

In [None]:
x = 1    # int
y = 2.8  # float

# Convert from int to float
a = float(x)

# Convert from float to int
b = int(y)

print(x, "->", a)
print(y, "->", b)

# We can verify that they type has changed
print(type(a))
print(type(b))

*Note: complex numbers cannot be converted into other numerical types (although float and int can be converted to complex)*

### Booleans

*Booleans* `bool` represent one of two values: `True` or `False`.

In programming you often need to know if an expression is `True` or `False`. 

When you compare two values, the expression is evaluated and Python returns the **Boolean** answer

*We will learn more about comparisons in the [Operators](#Operators) section. Booleans are also essential for [Control Flow](#Control-Flow)*

In [None]:
# We will take a look at these operators in a latter section

print(10 > 9)   # Greater than `>`
print(10 == 9)  # Equal to `==`
print(10 < 9)   # Less than `<`

# We can also assign booleans to variables.
true = True
false = False

# Verify that the type is indeed boolean `bool`.
print(type(true))
print(type(false))

We can also type convert booleans into and from multiple types.

Try to **guess the output** of the cell below before running it.

*Hint: Think Binary!*

In [None]:
true_int = int(True)    # Convert the boolean `True` to an integer
false_int = int(False)  # Convert the boolean `False` to an integer

# What will these lines print?
print(true_int)
print(false_int)

### Collections

There are four core collection data types in Python:

- **Lists** `list` is a collection which is **ordered** and **changeable**. Allows duplicate members.
- **Tuples** `tuple` is a collection which is **ordered** and **unchangeable**. Allows duplicate members.
- **Sets** `set` is a collection which is **unordered** and **unindexed**. No duplicate members.
- **Dictionaries** `dict` is a collection which is **unordered**, **changeable** and **indexed**. No duplicate members.

When choosing a collection type, it is useful to understand the properties of that type.

*Sometimes you will see the terms changeable and unchangeable written as mutable and immutable respectively.*

### Lists
A list list is ordered and changeable.

In Python lists are written with square brackets:

`my_list = ['hello', 'John', 22]`

A list can contain any type of Python object.

In [None]:
# Define and print a list containg colours
colour_list = ['red', 'green', 'blue', 'yellow', 'white', 'black']
print(colour_list)

We can use **indexes** to access items in a `list`.

In Python **indexes start at 0** refering to the **first** item in the list.

We can also use **negative indexes** to access items from the end of the list. i.e. -1 refers to the last item in the list and -2 refers to the item before the last item in the list.

Take a look at the reference image below:

![PythonIndexing](https://railsware.com/blog/wp-content/uploads/2018/10/positive-indexes.png)

[Image source](https://railsware.com/blog/python-for-machine-learning-indexing-and-slicing-for-lists-tuples-strings-and-other-sequential-types/)

To use indexes we use a square-bracket syntax:

`my_list[0]`

The above example will access the first item in the list

*Try to guess the output of the cell below before running!*

In [None]:
print(colour_list[0])
print(colour_list[1])
print(colour_list[3])
print(colour_list[-1])
print(colour_list[-3])

# We can also change a value using indexing (the index must already exist!)
colour_list[4] = 'purple'

# lets verify this change
print(colour_list)

# Advanced - What do you think this will output? Hint: think ranges!
print(colour_list[0:3])

*The [n:m] syntax generates a slice of the data, where n and m are numerical indices forming a ranged query*

----

There are numerous ways to add and remove items from a list. For now we will only mention one of these; the `append` function.

`my_list.append('item to append')`

The append method adds an item to the end of a list.

To see other methods check out the list [cheat-sheet](https://github.com/ehmatthes/pcc_2e/releases/download/v1.0.1/beginners_python_cheat_sheet_pcc_lists.pdf) later (click the link to download the sheet).

Lets try out the append method!

In [None]:
# Add an item to the list
colour_list.append('pink')

# Lets verify that we added 'pink' to the end of the list.
print(colour_list)

### Tuples

A **tuple** `tuple` is ordered and unchangeable.

In Python tuples are written with round brackets.

`my_tuple = ('John', 'Doe', 22)`

- Like lists, tuples can contain strings and any other type of Python object.
- Like lists, we can also use indexing to access items in the tuple.
- Unlike lists, we cannot change, add, or remove items in a tuple.

In [None]:
xyz = (3.01, 1.23, -1.22)  # define a tuple holding 3D coordinates.

print(xyz[0], xyz[1], xyz[2])

Remember that tuples are **unchangeable/Immutable** trying to change items will give us an error message.

Error messages in Python are designed to be understandable and relatively easy to read.

Lets try it

In [None]:
xyz[1] = 2.90

### Sets

A **set** `set` is **unordered** and **unindexed**.

Sets also do not contain duplicate items.

In Python, sets are written with curly brackets.

`my_set = {'John', 'Doe', 22}`

- Sets can contain strings and any hashable* Python object.
- Like lists, items can be added and removed from a set.
- Unlike lists and tuples, sets cannot be indexed as they have no order.
- Unlike lists and tuples, sets do not hold duplicate items.


\* For a python object to be **hashable**, it must be ordered and unchangeable. Therefore we can add a tuple to a set but not a list. Numerical types and strings are also unchangeable so we can also add these to sets.

In [None]:
# Define a set holding the names of fruits.
fruit_set = {'cherry', 'banana', 'apple'}
print(fruit_set)

# We can add items to the set using the `add` method.
fruit_set.add('mango')
print(fruit_set)

# Lets try add another apple to the set. Can you guess what will happen?
fruit_set.add('apple')
print(fruit_set)

# We can remove items using the `remove` method.
fruit_set.remove('cherry')
print(fruit_set)

**Sets** also have more useful methods.

**intersection** and **union** are particularly useful.

- The intersection of two sets is a new set that contains all of the elements that are in both sets.
- The union of two sets is a new set that contains all of the elements that are in at least one of the two sets.

Lets try. Try to work out the output before you run the cell.

In [None]:
a = {1, 2, 3}  # define a set of numbers (A)
b = {3, 4, 5}  # define a set of numbers (B)

# Compute the intersection "A and B"
inter = a.intersection(b)
print(inter)

# Compute the union "A or B"
union = a.union(b)
print(union)