# Week 1 Tutorial - Data Types and Basic Operations

For the first class, we will cover the following topics:

1. Python Programming in Biology 
2. Using Jupyter Notebook
3. Data Types and Basic Operations 
    * Numbers and arithmetic operations
    * Strings and string operations
    * Booleans, comparison operations, and logical operations 
    * Variables 
    * Lists, tuples, dictionaries, and sets 
    * Indexing and slicing of sequence data types 
4. Input and Output 
    * input()
    * print()
5. Special Characters 
    * Backslash 

## Python Programming in Biology

Python is a versatile and widely used programming language, it can be used in a wide range of applications in biology research. Some of the common uses of Python in biology research include:

* Data analysis: Python can be used to analyse large datasets, such as those generated by high-throughput sequencing technologies, microarrays, or proteomics experiments. 
* Computational modelling: Python can be used to develop mathematical models of biological systems, such as protein-protein interactions, metabolic pathways, or gene regulatory networks. 
* Machine learning: Python can be used for machine learning applications in biology, such as predicting protein structure or function, identifying disease biomarkers, or classifying cell types. 
* Image analysis: Python can be used to process and analyse biological images, such as those generated by microscopy or medical imaging. 

Python allows researchers to automate repetitive tasks, process large datasets efficiently, and develop and simulate complex models. Python is also easy to learn and use, making it accessible to a wide range of researchers, including those with little or no experience. Additionally, the large and active Python community means that there are many resources, libraries, and packages available to support biological research. 

## Using Jupyter Notebook 

__About Jupyter Notebook:__

Jupyter Notebook is an open-source web-based interactive computing environment that allows you to create, run, and share live code, equations, visualizations, and narrative text in a single document. It supports multiple programming languages, including Python, R, Julia, and many others. Jupyter Notebook is commonly used in data science, machine learning, scientific computing, and other domains for interactive data analysis, visualization, and exploration. 

Jupyter Notebook provides a browser-based interface where you can create "notebooks" that contain code cells, text cells, and other types of cells. Code cells allow you to write and execute code directly in the notebook, while text cells allow you to write narrative text, equations, and markdown for documentation and explanations. This combination of code and text cells makes Jupyter Notebook an excellent tool for creating reproducible research, documenting data analysis workflows, and sharing interactive notebooks with others.

There are other text editors you can use for Python programming as well, such as Visual Studio Code, Spyder, and PyCharm. 

__Run your first code:__

Put the cursor in the box below and click the "run" button (or press `shift+enter` on your keyboard) to see the result.

In [3]:
print("Hi")

Hi


The above cell is a code cell. For all code cells, you can see a `In []` displaying before the cell. 

__Text cells:__

You can double click to edit the text cells, and click the "Run" button or press `shift+enter` to display the text cells properly. We're not going to cover how to write formatted texts in Jupyter Notebook in this course, our main purpose is to practise coding. 

__The two status of a cell:__

* When you're in the edit mode of a cell, the cell will show a __green__ border. You can edit the contents of the cell.
* When you're in the command mode of a cell, the cell will show a __blue__ border. You can perform various operations on the notebook, such as adding cells , deleting cells , and changing cell types. 

__Shortcuts:__

When you're in the edit mode:
* Press `shift+enter` to run the cell. 
* Press `esc` to go to the command mode.

When you're in the command mode:
* Press `b` to add a new cell.
* Press `d` twice to delete the cell.
* Press `y` to change the cell type to code.
* Press `m` to change the cell type to text.
* Press `enter` to go to the edit mode.

If you forgot the shortcuts, you can always use the tool bar.

__Exercise: play with the above shortcuts, create some new cells, delete some existing cells, and changing between cell types.__

## Data Types and Basic Operations 
### Integers `int`

In Python, integers (int) are a built-in data type that represents whole numbers without any fractional or decimal parts. Integers can be positive, negative, or zero, and can have an unlimited range of values, limited only by the available system memory.

You can define an integer variable in Python by assigning a whole number to a variable name. For example:

In [4]:
x = 5
y = -10

In [5]:
x

5

### Floating-point numbers `float`

Floating-point numbers in Python are a type of numerical data that can represent real numbers with decimal values. They are implemented as a built-in data type called "float" in Python and are used for performing arithmetic operations with real numbers that require precision beyond what can be achieved with integers.

In Python, you can create a floating-point number by simply including a decimal point in a numeric value. For example:

In [7]:
a = 3.14 
b = 2.0 

In [9]:
type(x)

int

### Arithmetic operations 

Arithmetic operations can be performed on integers and floating-point numbers in Python using the standard arithmetic operators.

In [11]:
# addition 
10+3

13

In [12]:
# substraction 
10-3

7

In [13]:
# multiplication
10*3

30

In [14]:
# division 
10/3

3.3333333333333335

In [15]:
# exponentiation
10**3

1000

In [16]:
# modulo operator 
10%3

1

In [17]:
# floor division operator 
10//3

3

In [18]:
# calculate the quotient and remainder
divmod(17,5)

(3, 2)

__Commenting in Python:__ anything after a `#` is a comment, comments won't be read by the computer so it means nothing in programming. It is used to explain why the code is written in a particular way. Normally, we use `#` for single line comments and `""" comments """` for multi-line comments. For example:

In [19]:
print("Start")

"""
Hi!
Let's do some calculation!
"""

print("End")

Start
End


You may think we are never going to use these functions... However, for DNA sequences we often care about the multiples of three:

In [20]:
divmod(1464, 3)

(488, 0)

You can perform arithmetic operations on floating-point numbers too, try it yourself:

In [None]:
# addition 

In [None]:
# substraction 

In [None]:
# multiplication 

In [None]:
# division 

In [None]:
# exponentiation 

In [None]:
# modulo operator 

In [None]:
# floor division operator 

__When you perform an arithmetic operation on floating-point numbers, what is the type of the result number?__

It will a floating-point number. 

__Limited precision of floating-point numbers:__

Floating-point numbers in Python, as well as in most other programming languages, have inherent limitations due to the way they are represented in computer hardware. Floating-point numbers are represented with a fixed number of bits, which limits their precision. As a result, some numbers cannot be exactly represented, and rounding errors can occur in arithmetic operations. For example:

In [21]:
0.1+0.2

0.30000000000000004

In [22]:
0.3-0.1

0.19999999999999998

__What would you get if you calculate an integer with a floating-point number?__ Please try it yourself:

In [23]:
# please do some calculations on integers with floating-point numbers 
3 + 3.0 

6.0

The result will be floating-point number, even if it is a whole number. 

### Strings `str`

In Python, a string is a sequence of characters enclosed in either single quotes '' or double quotes "". Strings are one of the built-in data types in Python and are used to represent text and manipulate it in various ways. 

In [24]:
"Hello, world!"

'Hello, world!'

In [25]:
'Hello, world!'

'Hello, world!'

In [26]:
"My name is ..."

'My name is ...'

You can put double quotes in single quotes and vice versa. 

In [27]:
"It's Harvey's birthday today."

"It's Harvey's birthday today."

In [28]:
'It's Harvey's birthday today.'

SyntaxError: invalid syntax (1417220977.py, line 1)

In [29]:
"I was "surprised" when I saw her."

SyntaxError: invalid syntax (634155479.py, line 1)

In [30]:
'I was "surprised" when I saw her.'

'I was "surprised" when I saw her.'

### String Operations 

__Concatemation__

Strings can be concatenated using the `+` operator, which combines two or more strings into a single string.

In [32]:
"Hello," + "world!"

'Hello,world!'

When we use `+` operator to concatenate strings, it doesn't provide the space character in between so we need to add it manually.

In [33]:
# add the space
"Hello, " + "world!"

'Hello, world!'

In [34]:
"Hi, " + "my name is ..."

'Hi, my name is ...'

In [38]:
x = 6

In [40]:
"I have " + str(x) + " apples."

'I have 6 apples.'

From the error message we can also see that we can't concatenate strings with integers using the `+` operator. 

__Repetition__

Strings can be repeated using the `*` operator, which creates multiple copies of a string. 

In [41]:
"Hello!" * 3

'Hello!Hello!Hello!'

In [42]:
"Let's go!" * 2

"Let's go!Let's go!"

__Converting cases and replacing substrings__



In [43]:
"The weather is good today!".upper()

'THE WEATHER IS GOOD TODAY!'

In [44]:
"LOL".lower()

'lol'

In [45]:
"My name is Jiajia.".replace("Jiajia", "your name")

'My name is your name.'

### Booleans `bool`

Booleans in Python are a data type that can have one of two possible values: True or False. 

They are commonly used in conditional statements, comparisons, and logical operations. Booleans in Python are a subclass of integers, with True representing the integer 1 and False representing the integer 0.

### Comparison Operations

Comparison operations, also known as relational operators, are used in Python to compare values and determine their relationship. They return a boolean value, either TRUE or FALSE, depending on whether the comparison is true or false. 

__Equality: `==`__

In [46]:
1 == 1

True

In [47]:
1.0 == 1

True

In [48]:
1 == 3

False

In [62]:
66 == "66"

False

In [50]:
"Blue" == "Red"

False

In [51]:
"Blue" == 'Blue'

True

Think of some examples yourself and try the equal to `==` operator:

__Inequality: `!=`__

In [63]:
1 != 1

False

In [64]:
1 != 2.0

True

In [65]:
66 != "66"

True

In [66]:
"Blue" != "Red"

True

In [67]:
"Blue" != "Blue"

False

Think of some examples yourself and try the not equal to `!=` operator:

__Greater than: `>`__

In [None]:
1 > 2

In [None]:
3 > 2

Comparing an integer with a string:

In [68]:
66 > "55"

TypeError: '>' not supported between instances of 'int' and 'str'

Turns out we can't compare integers with strings.

Comparing a string with a string:

In [69]:
"123" > "100"

True

In [70]:
"10000" > "101"

False

In [71]:
"apple" > "banana"

False

In [None]:
"appleapple" > "banana"

Python compares strings lexicographically based on their ASCII values. The comparison starts with the first character of the strings and proceeds to subsequent characters until a difference is found or until the end of one of the strings is reached. 

In [75]:
"Aa" > "aa"

False

If there is no difference until the end of one string, it will compare the lengths. 

In [76]:
"1231" > "123"

True

Think of some examples yourself and try the greater than `>` operator:

__Less than:__ `<` same usage as above. 

Think of some examples yourself and try the less than `<` operator:

__Greater than or equal to:__ `>=`

In [77]:
2.0 >= 2

True

In [None]:
2.0 >= 1

In [None]:
2.0 >= 3.0

In [78]:
66 >= "66"

TypeError: '>=' not supported between instances of 'int' and 'str'

In [79]:
"abc" >= "ab"

True

In [80]:
"abc" >= "abcf"

False

Think of some examples yourself and try the greater than or equal to `>=` operator:

__Less than or equal to:__ `<=`

Think of some examples yourself and try the less than or equal to `<=` operator:

__Chain multiple comparisons together:__

In [81]:
1 < 2 < 3

True

In [82]:
1.0 < 1.0 < 2

False

In [83]:
"123" < "1234" < "12345"

True

Think of some examples yourself and try to chain multiple comparisons together:

### Logical Operations 

Logical operations are used to combine or modify boolean values. There are three primary logical operators `and`, `or`, and `not`. They are used to perform conjunction, disjunction, and negation operations, respectively.

__`and` (conjunction):__ 

You have to satisfy both conditions to be True. 

In [89]:
True and True

True

In [90]:
True and False

False

In [91]:
False and False 

False

__`&` can represent `and` as well, please give it a try:__

In [92]:
True & False

False

__`or` (disjunction):__

You only need to meet one of the two conditions to be True.

In [93]:
True or True

True

In [94]:
True or False

True

In [95]:
False or False

False

__`|` can represent `or`, please give it a try:__

In [96]:
True | False

True

__`not` (negation):__

In [97]:
not True

False

In [98]:
not False 

True

### Variables 

In Python, variables are used to store values or references to objects, making it easier to use, manipulate, and reference data throughout your code. Variables can store data of various types, such as integers, floats, strings, lists, tuples, dictionaries, and more. 

To create a variable in Python, we use the assignment operator `=` to assign a value to a variable name. 

The variable name must follow the naming conventions:
* It should start with a letter (a-z or A-Z) or an underscore (`_`).
* No restriction on the length of the variable name.
* It can consist of letters, digits, and underscores. 
* It cannot contain dashes or hyphens (`-`). 

For example:

In [102]:
age = 90

In [103]:
age

90

In [None]:
name = "Jiajia"

In [None]:
name

In [None]:
greeting = "Hello, world!"

In [None]:
greeting

__Using variables in expressions:__

We can use variables in expressions and pass them as arguments to functions or methods. 

In [104]:
price = 26.99
total_price = price * 1.1 # 10% GST

In [105]:
total_price

29.689

__Reassigning a value:__

We can also reassign a value to a variable name.

In [None]:
age = 26
age = "27"

In [None]:
age

### Lists 

In Python, a list is a mutable, ordered collection of items. Lists can contain items of different data types, including other lists. They are created using square brackets `[]` and items are separated by commas. 

__What does mutable mean?__

Mutable means you can edit the list after you have created it. Immutable data type means you cannot edit it once it has been  created.

__What does ordered collection of items mean?__

An ordered collection of items refers to a data structure where the items are stored in a specific order or sequence. The order of the items is maintained, and each item has a unique index based on its position in the collection. For example, you can access a item by telling python it is the first/second/third in the list. We will cover how to access the items in a list later. 

__Create a list with integers:__

In [106]:
numbers = [1, 2, 3, 4, 5]

In [107]:
numbers

[1, 2, 3, 4, 5]

__Create a list with mixed data types:__

In [None]:
mixed_list = [1, "Hi", 99.99, True]

In [None]:
mixed_list

__Empty list:__

You can also create a empty list and add the elements later.

In [108]:
empty_list = []

In [109]:
empty_list

[]

Append an item to the empty list:

In [118]:
empty_list.append("apple")
empty_list.append(1)
empty_list.append(False)

In [119]:
empty_list

['apple', 1, 'apple', 1, False]

In [113]:
empty_list.remove("apple")

In [114]:
empty_list

[]

### Tuples

A tuple is an immutable, ordered collection of items. Tuples can contain elements of different data types, including other tuples and lists. Tuples are similar to lists, but unlike lists, their elements cannot be changed once they are created. Tuples are created using parentheses `()` and items are separated by commas.

__Create a tuple with integers:__

In [None]:
numbers = (1, 2, 3, 4, 5)

In [None]:
numbers

__Create a tuple with mixed data types:__

In [None]:
mixed_tuple = (1, "Hello", 2.33, False)

In [None]:
mixed_tuple

__You can also create an empty tuple:__

But can you append data to it?

In [120]:
empty_tuple = ()

In [None]:
empty_tuple

In [121]:
empty_tuple.append(66)

AttributeError: 'tuple' object has no attribute 'append'

Like we mentioned before, tuples are immutable so we can't add elements to it once it has been created. 

### Dictionaries 

A dictionary is a mutable, unordered collection of key-value pairs, where each key is unique. Dictionaries are created using curly brackets `{}` with key-value pairs separated by commas and a colon separates each key from its associated value. 

__Creating a dictionary:__

You can create a dictionary in this way:

In [122]:
person = {"name": "John", "age": 30, "city": "New York"}

In [126]:
person = {"name": "John", "age": 30, "age": 44, "city": "New York"}

In [127]:
person

{'name': 'John', 'age': 44, 'city': 'New York'}

So, the item before the colon is key, the item after the colon is value. name is the key and John is the value of the key. 

Or you can write a dictionary in this way:

In [None]:
movie = {
    "name": "Inception",
    "year": 2010,
    "review": 8.8
}

In [None]:
movie = {
    "name": "Inception",
    
}

In [None]:
movie

The second way is preferred because when you have many key-value pairs, the second one can be easily read. 

__Creating a dictionary with mixed data types:__ 

* Both keys and values can contain different types of data. 
* Keys can contain strings, integers, floats, booleans, and tuples as data types. __However__, it is not recommended to use other data types of keys than strings. It can make your code harder to read and maintain. 
* Values have no restrictions on data types. 

Here is an example dictionary containing values of more complexed data types.

In [129]:
company = {
    'name': 'TechCorp',
    'founded': 2005,
    'employees': [
        {'id': 1, 'name': 'John', 'position': 'CEO'},
        {'id': 2, 'name': 'Jane', 'position': 'CTO'},
        {'id': 3, 'name': 'Bob', 'position': 'Software Engineer'}
    ],
    'locations': [
        {'city': 'New York', 'state': 'NY'},
        {'city': 'San Francisco', 'state': 'CA'}
    ],
    'revenue': 25000000.0
}

In [130]:
company

{'name': 'TechCorp',
 'founded': 2005,
 'employees': [{'id': 1, 'name': 'John', 'position': 'CEO'},
  {'id': 2, 'name': 'Jane', 'position': 'CTO'},
  {'id': 3, 'name': 'Bob', 'position': 'Software Engineer'}],
 'locations': [{'city': 'New York', 'state': 'NY'},
  {'city': 'San Francisco', 'state': 'CA'}],
 'revenue': 25000000.0}

__Creating an empty directory and add data later:__

Dictionary is a mutable data type, so you can change and add data after it has been created. 

In [None]:
empty_dict = {}

In [None]:
empty_dict

In [None]:
empty_dict["key1"] = "to_my_home"
empty_dict["key2"] = "to_the_office"
empty_dict["key3"] = "bike"

In [None]:
empty_dict

__Accessing values in the dictionary:__

We can access values in a dictionary by using the keys associated with the vales.

In [None]:
movie["name"]

In [None]:
company["name"]

Access the value of "year" in the dictionary movie:

What's the ID number of the employee Jane? How to access it?

### Sets 

In Python, a set is an unordered collection of unique items. Sets are mutable. 

Sets are particularly useful when you need to store or operate on a collection of distinct elements, such as performing set operations like union, intersection, and difference.

To create a set, you can use curly braces {} with comma-separated values, or use the set() constructor with an iterable (e.g., list or tuple) as an argument.

__Creating a set:__

In [None]:
my_set = {1, 2, 3, 4, 5}

In [None]:
my_set

__Creating a set with mixed values:__

Sets can contain different data types, but the elements must be hashable (immutable) data types like strings, numbers (integers and floats), and tuples containing only hashable elements. Lists and dictionaries cannot be elements of a set.

In [None]:
mixed_set = {42, 'hello', 3.14, ('a', 'b', 'c'), True}

In [None]:
mixed_set

Set is a less used data type compares to other data types, so we will not cover too much in this class. 

### Indexing

Indexing refers to access individual elements of a sequence data type (strings, lists, and tuples) based on their position. 

In Python, indexing is zero-based, meaning the first element has an index of 0, the second element has an index of 1, and so on. 

__Indexing with strings:__

In [None]:
gene = "ATGCGTAAGCTTAGACCGATCATGCCGCTAGGCTACGAAGTCCATAG"

In [None]:
first_base = gene[0]
first_base

In [None]:
second_base = gene[1]
second_base

__Negative indexing:__

Negative indexing can be used to access elements from the end of the sequence. For example, an index of -1 refers to the last element, -2 refers to the second-to-last element, and so on. 

In [None]:
last_base = gene[-1]
last_base

In [None]:
second_to_last_base = gene[-2]
second_to_last_base

__Indexing with lists:__

Indexing with lists is similar with strings.

In [None]:
amino_acids = ['Alanine', 'Arginine', 'Asparagine', 'Aspartic acid', 'Cysteine',
               'Glutamine', 'Glutamic acid', 'Glycine', 'Histidine', 'Isoleucine',
               'Leucine', 'Lysine', 'Methionine', 'Phenylalanine', 'Proline',
               'Serine', 'Threonine', 'Tryptophan', 'Tyrosine', 'Valine']

To access the first amino acid in the list:

To access the second amino acid in the list:

To access the last amino acid in the list:

To access the second to last amino acid in the list:

__Indexing with tuples:__

Indexing with tuples is similar with strings and lists. 

In [None]:
dna_bases = ('Adenine', 'Thymine', 'Cytosine', 'Guanine')

To access the first DNA base in the tuple:

To access the second DNA base in the tuple:

To access the last DNA base in the tuple:

To access the second to last DNA base in the tuple:

### Slicing 

Slicing is a technique in Python that allows you to extract a portion (a continuous subsequence) of a sequence data type like strings, list, and tuples. Slicing uses the colon `:` operator to indicate the start and end position of the slice. 

__The basic syntax for slicing is:__

In [None]:
sequence[start:end]

Where `start` is the index of the first element you want to __include__ in your slice, and `end` is the first element you want to __exclude__ from the slice. 

The resulting slice will include elements from the `start` index (inclusive) to the `end` index (exclusive). 

__Slicing with strings:__

Let's continue to use the string we created the last time.

In [None]:
gene

To get the first 5 bases from the string:

In [None]:
gene[0:5]

To get the 5th to 13rd bases from the string:

In [None]:
gene[4:13]

You can also leave the start or end index blank, it will slice from the beginning to the end index or from the start index to the end.

To get the slice from the beginning to the 11th base:

In [None]:
gene[:11]

To get the slice from the 18th base to the end:

In [None]:
gene[17:]

__Negative indexing:__

We can also use negative indexing when slicing. Negative indexing allows you to reference elements from the end of the sequence. When using negative indexing in slices, -1 refers to the last elements, -2 refers to the second-to-last element, and so on. 

To get the slice from the last 6th element to the second-to-last element of the string:

In [None]:
gene[-6:-1]

To get the last 5 elements of the string:

In [None]:
gene[-5:]

To get the slice from the beginning to the 11th-to-last element of the string:

In [None]:
gene[:-10]

__Reverse a sequence:__

In [None]:
gene[::-1]

__Slicing with lists:__

Slicing lists is similar to how we slice strings. We will use the amino_acids list we created before for practising.

In [None]:
amino_acids

Slicing the first 3 amino acids from the list:

Slicing the 13rd to 16th amino acids from the list:

Slicing from the 8th-to-last element to the end of the list:

Reverse `amino_acids`:

__Slicing with tuples:__

Slicing tuples is similar to how we slice strings and lists. We will practise on the dna_bases tuple we created before. 

In [None]:
dna_bases

Get the first 2 elements of the tuple:

Get the second and the third elements of the tuple:

Get the last 3 elements of the tuple:

Reverse `dna_bases`:

## Input and Output 

In Python, there are several ways to handle input and output of data. 

__`input()` function for input:__

We can use the `input()` function to read input from the user through `stdin`. The `input()` function displays prompt message on the console and waits for the user to enter a value. The value entered by the user is returned as a string. For example:

In [None]:
name = input("What is your name?")

In [None]:
name

In [None]:
supervisor = input("Who is your supervisor?")

In [None]:
supervisor

__`print()` function for output:__

The `print()` function is used to output data to the console. For example, output the 2 values we have input used the input function on the screen:

In [None]:
print(name)
print(supervisor)

We can pass one or more arguments to `print()` and separate them with commas, it will concatenate them into a string and display the result on the screen.

In [None]:
print("My name is", name, "and my supervisor is", supervisor, ".")

__The dot in the end is not in place, why is that?__

By default, when print multiple objects using `print()`, the `print()` function will put a space between the objects. The variable `supervisor` and the dot are two separated objects so they have a space in between.

How we can delete the additional space?

__`print()` with option `sep=`:__

The option `sep=` allows us to choose which delimiter to use rather than the default space character. For example:

In [None]:
print("My name is", name, "and my supervisor is", supervisor, ".", sep="!")

__Exercise: use the `sep=` option to make the additional space disappear:__

### Special Characters

__Backslash `\`:__

In Python, a backslash is used as an escape character to represent certain special characters that have a specific meaning. When a backslash is followed by a character, it creates a special sequence that represents a character that cannot be typed directly in a string.

The commonly used blackslash escape sequences are:
* `\n`: represents a new line character
* `\t`: represents a tab character 
* `\"`: represents a double quote character without the meaning of creating a string 
* `\'`: represents a single quote character without the meaning of creating a string 

For example, use print function to create a table:

In [None]:
print("Title1\tTitle2\tTitle3\nd\te\tf")

__Exercise: create print a table like the image below__

![table](./figures/table.png)

Write your answer here:

Example of escaping double quotes:

In [None]:
print("She said \"Hello\"")

__Exercise: practise the escape of single quotes__