# Workshop 1 - Data Types and Basic Operations

### Topics

1. Python Programming in Biology 
2. Using Jupyter Notebook
3. Getting Help with a Function
4. Data Types and Basic Operations 
    * Numbers and arithmetic operations
    * Strings and string operations
    * Variables 
    * Lists, tuples, dictionaries, and sets
5. Indexing and Slicing  

## Python Programming in Biology

Python is a versatile and widely used programming language, it can be used in a wide range of applications in biology research. Some of the common uses of Python in biology research include:

* Data analysis: Python can be used to analyse large datasets, such as those generated by high-throughput sequencing technologies, microarrays, or proteomics experiments. 
* Computational modelling: Python can be used to develop mathematical models of biological systems, such as protein-protein interactions, metabolic pathways, or gene regulatory networks. 
* Machine learning: Python can be used for machine learning applications in biology, such as predicting protein structure or function, identifying disease biomarkers, or classifying cell types. 
* Image and Sound Analysis: Python can be used to process and analyse biological sounds and images, such as animal sounds and images generated by microscopy or medical imaging. 

Python allows you to automate repetitive tasks, process large datasets efficiently, and develop and simulate complex models. Python is also easy to learn and use, making it accessible to a wide range of researchers, including those with little or no experience. Additionally, the large and active Python community means that there are many resources, libraries, and packages available to support biological research. 

## Using Jupyter Notebook 

__About Jupyter Notebook:__

Jupyter Notebook is an open-source web-based interactive computing environment that allows you to create, run, and share live code, equations, visualizations, and narrative text in a single document. It supports multiple programming languages, including Python, R, Julia, and many others. Jupyter Notebook is commonly used in data science, machine learning, scientific computing, and other domains for interactive data analysis, visualization, and exploration. 

Jupyter Notebook provides a browser-based interface where you can create "notebooks" that contain code cells, text cells, and other types of cells. 

__Code cells__ allow you to write and execute code directly in the notebook, while __text cells__ allow you to write narrative text, equations, and markdown for documentation and explanations. This combination of code and text cells makes Jupyter Notebook an excellent tool for creating reproducible research, documenting data analysis workflows, and sharing interactive notebooks with others.

There are other IDEs (Integrated Development Environment) you can use for Python programming, such as Visual Studio Code, Spyder, and PyCharm. 

__Run your first code:__

Put the cursor in the box below and click the "run" button (or press `shift+enter` on your keyboard) to see the result.

In [None]:
print("Hi")

The above cell is a code cell. For all code cells, you can see a `In []` displaying before the cell. 

__Text cells:__

You can double click to edit the text cells, and click the "Run" button or press `shift+enter` to display the text cells properly. We're not going to cover how to write formatted texts in Jupyter Notebook in this course, our main purpose is to practise coding. 

__The two status of a cell:__

* When you're in the edit mode of a cell, the cell will show a __green__ border. You can edit the contents of the cell.
* When you're in the command mode of a cell, the cell will show a __blue__ border. You can perform various operations on the notebook, such as adding cells , deleting cells , and changing cell types. 

__Shortcuts:__

When you're in the edit mode:
* Press `shift+enter` to run the cell. 
* Press `esc` to go to the command mode.

When you're in the command mode:
* Press `b` to add a new cell.
* Press `d` twice to delete the cell.
* Press `y` to change the cell type to code.
* Press `m` to change the cell type to text.
* Press `enter` to go to the edit mode.

If you forgot the shortcuts, you can always use the tool bar.

__Exercise: play with the above shortcuts, create some new cells, delete some existing cells, and changing between cell types.__

## Getting help with a function in Python

Before we start to learn any function in Python, we need to know how to get help in Python, especially when you're working with unfamiliar functions or libraries. Python provides several ways to access documentation and information about functions.

__Using the `help()` function:__

You can use the built-in `help()` function to access documentation for any Python functions, simply pass the function name as an argument to `help()`.

In [None]:
help(print) # provides information about the print function 

__Interative help in IDEs and Editors:__

Many IDEs and code editors provide built-in support for accessing help and documentation. In Jupyter Notebook, you can use a question mark before a function to get its documentation:

In [None]:
?print

## Data Types and Basic Operations 
### Integers `int`

In Python, integers are a built-in data type that represents whole numbers without any fractional or decimal parts. Integers can be positive, negative, or zero, and can have an unlimited range of values, limited only by the available system memory.

You can define an integer variable in Python by assigning a whole number to a variable name. For example:

In [None]:
x = 5
y = -10

In [None]:
x

### Floating-point numbers `float`

Floating-point numbers in Python are a type of numerical data that can represent real numbers with decimal values. They are implemented as a built-in data type called "float" in Python and are used for performing arithmetic operations with real numbers that require precision beyond what can be achieved with integers.

In Python, you can create a floating-point number by simply including a decimal point in a numeric value. For example:

In [None]:
a = 3.14 
b = 2.0 

In [None]:
type(a)

### Arithmetic operations 

Arithmetic operations can be performed on integers and floating-point numbers in Python using the standard arithmetic operators.

In [None]:
# addition 
10 + 3

In [None]:
# substraction 
10 - 3

In [None]:
# multiplication
10 * 3

In [None]:
# division 
10 / 3

In [None]:
# exponentiation
10 ** 3

In [None]:
# modulo operator 
10 % 3

In [None]:
# floor division operator 
10 // 3

In [None]:
# calculate the quotient and remainder
divmod(10, 3)

__Commenting in Python:__ anything after a `#` is a comment, comments won't be read by the computer so it means nothing in programming. It is used to explain why the code is written in a particular way. Normally, we use `#` for single line comments and `""" comments """` for multi-line comments. For example:

In [None]:
print("Start")

"""
Hi!
Let's do some calculation!
"""

print("End")

You may think we are never going to use these functions... However, for DNA sequences we often care about the multiples of three:

In [None]:
divmod(1464, 3)

You can perform arithmetic operations on floating-point numbers too, please give it a try.

In [None]:
# addition 

In [None]:
# substraction 

In [None]:
# multiplication 

In [None]:
# division 

In [None]:
# exponentiation 

In [None]:
# modulo operator 

In [None]:
# floor division operator 

__Q: When you perform an arithmetic operation on floating-point numbers, what is the type of the result number?__

__Limited precision of floating-point numbers:__

Floating-point numbers in Python, as well as in most other programming languages, have inherent limitations due to the way they are represented in computer hardware. Floating-point numbers are represented with a fixed number of bits, which limits their precision. As a result, some numbers cannot be exactly represented, and rounding errors can occur in arithmetic operations. For example:

In [None]:
0.1 + 0.2

In [None]:
from decimal import Decimal, getcontext

# set the precision for all Decimal operations
getcontext().prec = 10

a = Decimal("0.1")
b = Decimal('0.2')

print(a + b)

In [None]:
print(0.3 - 0.1)

__Q: What type of number would you get if you calculate an integer with a floating-point number?__

### Strings `str`

In Python, a string is a sequence of characters enclosed in either single quotes `''` or double quotes `""`. Strings are one of the built-in data types in Python and are used to represent text and manipulate it in various ways. 

In [None]:
print("Hello, world!")

In [None]:
print('Hello, world!')

In [None]:
print("My name is ...")

You can put double quotes inside single quotes and vice versa. 

In [None]:
print("It's Harvey's birthday today.")

In [None]:
print('I was "surprised" when I saw her.')

But you can't put single quotes inside single quotes or double quotes inside double quotes, it will cause syntax error. 

In [None]:
print('It's Harvey's birthday today.')

In [None]:
print("I was "surprised" when I saw her.")

### String Operations 

__Concatenation__

Strings can be concatenated using the `+` operator, which combines two or more strings into a single string.

In [None]:
print("Hello," + "world!")

When we use the `+` operator to concatenate strings, it doesn't provide any conjunction characters in between so we need to add them manually.

In [None]:
# add the space
print("Hello, " + "world!")

In [None]:
print("Hi, " + "my name is ...")

The `+` operator can only concatenate strings to strings. 

In [None]:
x = 6

In [None]:
print("I have " + x + " apples.")

__Repetition__

Strings can be repeated using the `*` operator, which creates multiple copies of a string. 

In [None]:
"Hello!" * 3

In [None]:
"Let's go!" * 2

__String Methods__

Python provides built-in methods for every data type. We will talk about Python methods later in the course. These built-in methods offer various operations on the data.

Here are some methods specific to the string data type. 

In [None]:
# Convert a string to uppercase. 
str.upper()

# Convert a string to lowercase.
str.lower()

# Remove leading and trailing whitespace.
str.strip()

# Split a string into a list based on a delimiter.
str.split()

# Replace occurrences of a substring with another substring. 
str.replace()

# Find the index of the first occurrence of a substring. 
str.find(substring)

__Exercises:__

01. Convert the string `The weather is good today!` to uppercase.
02. Convert the string `LOL` to lowercase.
03. Remove the leading and trailing whitespaces of the string `   Accidentally typed more spaces oops!   `
04. Split the string `Delimiter,is,comma`.
05. Replace `apple` with your favorite fruit in the string `I love apple!`.
06. Find the index if the first occurrence of `ATG` in the string `GCTAGCTAGCTAATGGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAG`. 

In [None]:
# Exercise 01
"The weather is good today!".upper()

In [None]:
# Exercise 02
"I love apple!".replace("apple", "grapefruit")

In [None]:
# Exercise 03
"  Accidentally typed more spaces oops!  ".strip()

In [None]:
# Exercise 04

In [None]:
# Exercise 05

In [None]:
# Exercise 06
"GCTAGCTAGCTAATGGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAG".find("ATG")

### Variables 

In Python, variables are used to store values or references to objects, making it easier to use, manipulate, and reference data throughout your code. Variables can store data of various types, such as integers, floats, strings, lists, tuples, dictionaries, and more. 

To create a variable in Python, we use the assignment operator `=` to assign a value to a variable name. 

In [None]:
age = 90

To access the value of a variable, we simply write the variable name:

In [None]:
age

You can use it with a function:

In [None]:
type(age)

Or use it for operations, for example arithmetic perations:

In [None]:
age + 10

__Naming conventions of Python variables:__

* __Valid Characters:__ Variable names can contain letters (a-z, A-Z), digits (0-9), and underscores(`_`). They must start with a letter or an underscore. For example `my_variable`, `x`, `_count`, and `var123` are valid variable names.
* __Reserved Words:__ You cannot use Python's reserved words (also called keywords) as variable names. For example, you cannot name a variable `if`, `else`, `for`, `while`, `import`, `def`, etc. Here's a list of reserved words in Python:

```
False    await    else     import   pass
None     break    except   in       raise
True     class    finally  is       return
and      continue for      lambda   try
as       def      from     nonlocal while
assert   del      global   not      with
async    elif     if       or       yield
```

* __PEP 8 Style Convention:__ While not strictly a rule, it is a convention to follow the PEP 8 style guide for Python. Variables should be lowercase with words separated by underscores. For example, `my_variable`, `user_name`, `total_count`. 

In [None]:
my_name = "Jiajia"
my_uni_id = "u1133"

__Re-assign a value:__

We can also re-assign a value to an existing variable name.

In [None]:
age = 26
age = "27"

In [None]:
age

### Lists 

In Python, a list is a mutable, ordered collection of items. Lists can contain items of different data types, including other lists. They are created using square brackets `[]` and items are separated by commas. 

__What does mutable mean?__

Mutable means you can edit the list after you have created it. Immutable data type means you cannot edit it once it has been  created.

__What does ordered collection of items mean?__

An ordered collection of items refers to a data structure where the items are stored in a specific order or sequence. The order of the items is maintained, and each item has a unique index based on its position in the collection. For example, you can access a item by telling python it is the first/second/third in the list. We will cover how to access the items in a list later. 

__Create a list with integers:__

In [None]:
numbers = [1, 2, 3, 4, 5]

In [None]:
numbers = [1, 1, 1, 0, 0]

In [None]:
numbers

__Create a list with mixed data types:__

In [None]:
mixed_list = [1, "Hi", 99.99, True]

In [None]:
mixed_list

__Empty list:__

You can also create a empty list and add the elements later.

In [None]:
empty_list = []

In [None]:
empty_list

Append an item to the empty list:

In [None]:
empty_list.append("apple")
empty_list.append(1)
empty_list.append(False)

In [None]:
empty_list

In [None]:
empty_list.remove("apple")

In [None]:
empty_list

### Tuples

A tuple is an immutable, ordered collection of items. Tuples can contain elements of different data types, including other tuples and lists. Tuples are similar to lists, but unlike lists, their elements cannot be changed once they are created. Tuples are created using parentheses `()` and items are separated by commas.

__Create a tuple with integers:__

In [None]:
numbers = (1, 2, 3, 4, 5)

In [None]:
numbers

__Create a tuple with mixed data types:__

In [None]:
mixed_tuple = (1, "Hello", 2.33, False)

In [None]:
mixed_tuple

__You can also create an empty tuple:__

But can you append data to it?

In [None]:
empty_tuple = ()

In [None]:
empty_tuple

In [None]:
empty_tuple.append(66)

Like we mentioned before, tuples are immutable so we can't add elements to it once it has been created. 

### Dictionaries 

A dictionary is a mutable, unordered collection of key-value pairs, where each key is unique. Dictionaries are created using curly brackets `{}` with key-value pairs separated by commas and a colon separates each key from its associated value. 

__Creating a dictionary:__

You can create a dictionary in this way:

In [None]:
person = {"name": "John", "age": 30, "city": "New York"}

In [None]:
person = {"name": "John", "age": 30, "city": "New York"}

In [None]:
person

Or you can write a dictionary in this way:

In [None]:
movie = {
    "name": "Inception",
    "year": 2010,
    "review": 8.8
}

In [None]:
movie

The second way is preferred because when you have many key-value pairs, the second one can be easily read. 

__Creating a dictionary with various data types:__

In [None]:
company = {
    'name': 'TechCorp',
    'founded': 2005,
    'employees': [
        {'id': 1, 'name': 'John', 'position': 'CEO'},
        {'id': 2, 'name': 'Jane', 'position': 'CTO'},
        {'id': 3, 'name': 'Bob', 'position': 'Software Engineer'}
    ],
    'locations': [
        {'city': 'New York', 'state': 'NY'},
        {'city': 'San Francisco', 'state': 'CA'}
    ],
    'revenue': 25000000.0
}

In [None]:
company

__Q: How many key-value pairs does this dictionary have?__

__Creating an empty dictionary and add data later:__

Dictionary is a mutable data type, so you can change and add data after it has been created. 

In [None]:
empty_dict = {}

In [None]:
empty_dict

In [None]:
empty_dict["key1"] = "to_my_home"
empty_dict["key2"] = "to_the_office"
empty_dict["key3"] = "bike"

In [None]:
empty_dict

In [None]:
empty_dict["key3"] = "car"

__Accessing values in the dictionary:__

We can access values in a dictionary by using the keys associated with the vales.

In [None]:
movie["name"]

In [None]:
company["name"]

Accessing the value of "year" in the dictionary movie:

In [None]:
movie["year"]

### Sets 

In Python, a set is an unordered collection of unique items. Sets are mutable. 

Sets are particularly useful when you need to store or operate on a collection of distinct elements, such as performing set operations like union, intersection, and difference.

To create a set, you can use curly braces {} with comma-separated values, or use the set() constructor with an iterable (e.g., list or tuple) as an argument.

__Creating a set:__

In [None]:
my_set = {1, 2, 3, 4, 5}

In [None]:
my_set = {1,0,1}

In [None]:
type(my_set)

__Creating a set with a list:__

In [None]:
my_list = [0, 1, 1, 66, 55]

In [None]:
my_set = tuple(my_list)

In [None]:
my_set

__Creating a set with a tuple:__

__Creating a set with mixed values:__

Sets can contain different data types, but the elements must be hashable (immutable) data types like strings, numbers (integers and floats), and tuples containing only hashable elements. Lists and dictionaries cannot be elements of a set.

In [None]:
mixed_set = {42, 'hello', 3.14, ('a', 'b', 'c'), True}

In [None]:
mixed_set

## Indexing and Slicing 
### Indexing

Indexing refers to access individual elements of a sequence data type (strings, lists, and tuples) based on their position. 

In Python, indexing is zero-based, meaning the first element has an index of 0, the second element has an index of 1, and so on. 

__Indexing with strings:__

In [1]:
gene = "ATGCGTAAGCTTAGACCGATCATGCCGCTAGGCTACGAAGTCCATAG"

In [2]:
first_base = gene[0]
first_base

'A'

In [3]:
second_base = gene[1]
second_base

'T'

Q: What is the 10th base?

In [4]:
gene[9]

'C'

__Negative indexing:__

Negative indexing can be used to access elements from the end of the sequence. For example, an index of -1 refers to the last element, -2 refers to the second-to-last element, and so on. 

In [5]:
last_base = gene[-1]
last_base

'G'

In [6]:
second_to_last_base = gene[-2]
second_to_last_base

'A'

Q: What is the 5th-to-last base?

In [7]:
gene[-5]

'C'

__Indexing with lists:__

Indexing with lists is similar with strings.

In [None]:
amino_acids = ['Alanine', 'Arginine', 'Asparagine', 'Aspartic acid', 'Cysteine',
               'Glutamine', 'Glutamic acid', 'Glycine', 'Histidine', 'Isoleucine',
               'Leucine', 'Lysine', 'Methionine', 'Phenylalanine', 'Proline',
               'Serine', 'Threonine', 'Tryptophan', 'Tyrosine', 'Valine']

Q: What is the 6th amino acid in the list?

In [None]:
amino_acids[5]

Q: What is the 3rd-to-last amino acid in the list?

In [None]:
amino_acids[-3]

__Indexing with tuples:__

Indexing with tuples is similar with strings and lists. 

In [18]:
dna_bases = ('Adenine', 'Thymine', 'Cytosine', 'Guanine')

Q: What is the second DNA base in the tuple?

In [None]:
dna_bases[1]

### Slicing 

Slicing is a technique in Python that allows you to extract a portion (a continuous subsequence) of a sequence data type (strings, list, and tuples). Slicing uses the colon `:` operator to indicate the start and end position of the slice. 

__The syntax for slicing is:__

In [None]:
sequence[start:end]

Where `start` is the index of the first element you want to __include__ in your slice, and `end` is the first element you want to __exclude__ from the slice. 

The resulting slice will include elements from the `start` index (inclusive) to the `end` index (exclusive). 

__Slicing with strings:__

Let's continue to use the string we created before.

In [8]:
print(gene)

ATGCGTAAGCTTAGACCGATCATGCCGCTAGGCTACGAAGTCCATAG


Q: What are the first 5 bases of the string `gene`? 

In [9]:
gene[0:5]

'ATGCG'

Q: What is the 5th to 13th bases of the string?

In [10]:
gene[4:13]

'GTAAGCTTA'

You can also leave the start or end index blank, it will slice from the beginning to the end index or from the start index to the end.

In [None]:
sequence[:end]
sequence[start:]

Q: What are the first 11 bases of the string?

In [None]:
gene[:11]

Q: What is the sequence from the 18th base to the end?

In [None]:
gene[17:]

__Negative indexing:__

We can also use negative indexing when slicing. Negative indexing allows you to reference elements from the end of the sequence. When using negative indexing in slices, -1 refers to the last elements, -2 refers to the second-to-last element, and so on. 

Q: What is the sequence from the 6th-to-last base the 2nd-to-last base?

In [11]:
print(gene)

ATGCGTAAGCTTAGACCGATCATGCCGCTAGGCTACGAAGTCCATAG


In [12]:
gene[-6:-1]

'CCATA'

Q: What are the last 5 bases of the string?

In [13]:
gene[-5:]

'CATAG'

Q: What is the sequence from the beginning to the 11th-to-last base of the string?

In [14]:
gene[:-10]

'ATGCGTAAGCTTAGACCGATCATGCCGCTAGGCTACG'

__To reverse a sequence:__

In [15]:
gene[::-1]

'GATACCTGAAGCATCGGATCGCCGTACTAGCCAGATTCGAATGCGTA'

In [16]:
print(gene)

ATGCGTAAGCTTAGACCGATCATGCCGCTAGGCTACGAAGTCCATAG


__Slicing with lists:__

Slicing lists is similar to how we slice strings. We will use the amino_acids list we created before for practising.

In [None]:
print(amino_acids)

Slicing the first 3 amino acids from the list:

In [None]:
amino_acids[:3]

Slicing the 13rd to 16th amino acids from the list:

In [None]:
amino_acids[12:16]

Slicing from the 8th-to-last element to the end of the list:

In [None]:
amino_acids[-8:]

Reverse `amino_acids`:

In [None]:
amino_acids[::-1]

__Slicing with tuples:__

Slicing tuples is similar to how we slice strings and lists. We will practise on the dna_bases tuple we created before. 

In [None]:
print(dna_bases)

Get the first 2 elements of the tuple:

In [19]:
dna_bases[:2]

('Adenine', 'Thymine')