# Welcome to Python programming via Jupyter interface
***
### Advices
* We are going to spend 6 sessions (9 hours) on this interface. If you want to make the most out of these sessions, or take a first step toward mastering Python, do some self practices and learn from other sources alongside this course.
* One good place to start is MIT OCW's [6.0001 course](https://ocw.mit.edu/courses/6-0001-introduction-to-computer-science-and-programming-in-python-fall-2016/)
***
### Our objectives
1. Get you familiar with general features of computer programming
2. Let you practice editing an existing codes to get desired results
3. Introduce you to data analysis techniques and their implementation in Python

## First, there are two main panels in Jupyter interface
* This text-only panel is called a **markdown** panel (or **text** panel on Google Colab)
* The panel below, which contains runnable commands, is called a **code** panel
  * Even in the code panel, you can add descriptions of your code using the **#** symbol

In [None]:
x = 5            ## define x
x = x * 20       ## some calculation
print('x is', x) ## print something to the screen

### Markdown helps readers, including your future self, understand what the code does
You can make your markdown panel very [__fancy__](https://medium.com/ibm-data-science-experience/markdown-for-jupyter-notebooks-cheatsheet-386c05aeebed)

## Next is a markdown demo
Double click on it to view how **boldface**, *italic*, and table are formatted

## Lab note for Nov 5th, 2023
### Data were acquired by two laboratories on two sequencing platforms
*   **M0** dataset was acquired on an **Illumina** platform by a group at CU
*   **JI** and **KI** datasets were acquired by a group at JP on **454** and **Illumina** platforms, respectively

### *M0* dataset was acquired in 3 sequencing batches
*    Batch ID can be found in **M0_manifest.tsv**

### For each dataset, there are two files:
1.   **_feature.tsv** files contain OTU abundance table

| OTU ID   | sample-1 | sample-2 |    ...   | sample-N |
| :-: | :-: | :-: | :-: | :-: |
| OTU-1   | 39 | 214 |    ...   | 380 |
| OTU-2   | 359 | 0 |    ...   | 495 |
| ...   | ... | ... |    ...   | ... |
| OTU-M   | 155 | 108 |    ...   | 12 |

2.   **_taxonomy.tsv** files contain taxonomy mapping for each OTU

| OTU ID   | taxonomy | confidence |
|  :-:  | :- | -: |
| OTU-1   | D_1__Actinobacteria;D_2__Coriobacteriia;D_3__Coriobacteriales;D_4__Coriobacteriaceae;D_5__Collinsella | 0.99 |
| OTU-2   | D_1__Firmicutes;D_2__Clostridia;D_3__Clostridiales;D_4__Lachnospiraceae;D_5__Fusicatenibacter | 0.971 |
| ...   | ... | ... |
| OTU-M   | D_1__Firmicutes;D_2__Clostridia;D_3__Clostridiales;D_4__Ruminococcaceae;D_5__Subdoligranulum | 0.98 |

## Now, let's dive into some codes
### Tips:
* Use **print()** when you are unsure about what the value of a variable is
* **print(x, y, z)** will print *x*, *y*, and *z* at once
***
### Variable assignments
In Python, a variable can be assigned to any data type

In [None]:
x = 7
print(x)

x = 'computational biology'
print(x)

#### We can assign one variable with the value of a nother

In [None]:
y = x
print(y)

#### And they will remain independent

In [None]:
x = 5
y = x
print('y =', y, ', x =', x)

x = 10
print('y =', y, ', x =', x)

### Numerical operations
What is the difference between **x / 2** and **x // 2**?

In [None]:
x = 7

print(x + 1)
print(x * 10)
print(x / 2)
print(x // 2)
print(x ** 2)

print('x / 2 is', x / 2)
print('x // 2 is', x // 2)

#### Can you guess what % operator does?

In [None]:
print(15 % 4)
print(20 % 7)
print(100 % 13)

#### How about the ** operator?

In [None]:
x = 3
print(x ** 3)
print('What does "**" do?')

### Tips:
* Variable can be updated at the same time that an operation is performed

In [None]:
x = 7.9
x = x + 2
print('x =', x)

y = 7.9
y = y / 2 # divide y by 2
y = y - 5
print('y =', y)

### String operations
* String can be defined with either **'** or **"** symbol
* String can be added (or concatenated) with +

In [None]:
msg1 = 'hello'
msg2 = "world"
print(msg1)

msg = msg1 + ' ' + msg2
print(msg)

#### String can be reformatted in several ways
Notice that the original string remains unaltered

In [None]:
msg = 'deep learning'
print('capitalized:', msg.capitalize())
print('uppercase:', msg.upper())
print()
print('original value:', msg)

#### Assign the value if you want to make the change permanent

In [None]:
msg = 'deep learning'
msg = msg.capitalize()
print('capitalized:', msg)

#### str.replace(x, y) will replace all occurrences of x with y in the string

In [None]:
msg = 'deep learning with deep network'
print(msg.replace('deep', 'machine'))
print(msg.replace('learning', 'tech').upper())
print()
print(msg)

#### str.strip() will remove all spaces in the beginning and the end of the string

In [None]:
msg = '    DNA-RNA   '
print(msg)
print(msg.strip())

#### str.startswith() and st.endswith() can be used to check the prefix and suffix of a string

In [None]:
gene = 'ENST000123129'
print(gene.startswith('ENST'))

In [None]:
file_name = 'hg19.genomic.fasta'
print(file_name.endswith('.fasta'))

### Different variable types may not be compatible
Adding a number with a string does not make sense. And we just generate our first **Error message**

In [None]:
print(2 + 'two')

## Error messages
### Tips:
* Do not be scared of them. Error messages often provide helpful information and will point you to where the problem is
* Look for the **line numbers** and the **--> arrows pointing to the issue**
* Read the error's type (*NameError*, *SyntaxError*, etc.) and the details
* **Searching these errors on the internet or asking ChatGPT often yields a solution**

[An example of Google search result](https://www.google.com/search?q=NameError%3A+name+%27unknown1%27+is+not+defined&oq=NameError%3A+name+%27unknown1%27+is+not+defined&gs_lcrp=EgZjaHJvbWUyBggAEEUYOTIGCAEQRRg60gEHNjA2ajBqN6gCALACAA&sourceid=chrome&ie=UTF-8) for the error below

In [None]:
print('NameError occurs when you use undefined variable')
x = unknown1

print('The code will terminate before this part gets printed')

#### Variable name cannot begin with number
Also notice the green highlight for the number **1**

In [None]:
1x = 4.89

#### Illegal mathematical operation will also produce an error

In [None]:
x = 6 / 0

### Some names are reserved
Do not use names that are automatically highlighted in color by the Jupyter interface

In [None]:
int
float
str
bool

list
dict
tuple

print
open
with
del

and
or
not

True
False

## Task 1: Python as a calculator
### Find the value of A $= \left(\frac{2.45 \times 3.7}{8.1 - 0.5 \times (9.4 + 1.2)}\right)^{1.4}$

Handling complex mathematical operation requires *careful parenthesis balancing*

In [None]:
### fill in your answer here

### To compute advanced functions, we need the [math module](https://docs.python.org/3/library/math.html)

In [None]:
import math

print(math.log10(1000))
print(math.sin(math.pi / 2))
print(math.e)
print(math.exp(5))
print(math.pow(2, 6))

#### When borrowing function from a module, always include the module name
Python understands *math.log10* as the function *log10* from the *math* module

In [None]:
print(math.log10(1000))
print(log10(1000))

### Find the value of B $= \frac{1 - e^{-2}}{\ln(7)}$

In [None]:
### fill in your answer here

## Defining your own functions
Instead of using only built-in functions, like print() or math.log(), you can define customize functions with **def** (define)

In [None]:
def special_add(v1, v2):
    return v1 * 3 + v2 * 5

In [None]:
print(special_add(3.6, 10.2))

#### Your function can return nothing

In [None]:
def wanna_print(message):
    print('I want to', message)

In [None]:
wanna_print('sleep')

## Task 1: Write a simple calculator
* The following function takes in three inputs from the user: two numbers, x and y, and a mathematical command, *add*, *subtract*, *multiply*, or *divide*, as a string
* Then, the output of the calculation is returned
* However, if the user enters unexpected command, a message is printed and *None* is returned by default

**if-elif-else** is a control structure that let us define how the code should behave under each condition

Note that **elif** is shortened from *else-if*

In [None]:
def simple_calculator(x, y, command):
    if command == 'add':
        return ### fill in your answer here
    elif command == 'subtract':
        return ### fill in your answer here
    elif command == 'multiply':
        return ### fill in your answer here
    elif command == 'divide':
        return ### fill in your answer here
    else:
        print('Error, unknown command: ', command)

#### Test your work here

In [None]:
print(simple_calculator(5, 10, 'divide'))     ## should get 0.5
print(simple_calculator(2, 3, 'multiply'))    ## should get 6
print(simple_calculator(2, 8, 'exponential')) ## should get an error message (and None)

## A function can return multiple values

In [None]:
def my_swap(x, y):
    return y, x

In [None]:
print(my_swap(1, 2))

## Task 2: Write a function that return the arithmetic, geometric, and harmonic means
* Harmonic mean of x and y is $\frac{2}{\frac{1}{x} + \frac{1}{y}}$

In [None]:
def get_means(x, y):
    return ### fill in your answer here

In [None]:
print(get_means(1000, 10)) ## should get (505.0, 100.0, 19.801980198019802)
print(get_means(200, 30))  ## should get (115.0, 77.45966692414834, 52.173913043478265)

## If-else control statement
Let's explore the building block behind if-else statement - the condition checking and boolean

#### Notice the difference between assignment and condition checking
* command = 'add' is an assignment
* command == 'add' is condition checking (whether the value of *command* variable is the string 'add')
* **!=** is the same as **not ==**

#### The output of condition checking is either *True* or *False*, which are called **boolean**

In [None]:
command = 'add'

print(command == 'add')
print(command == 'subtract')

print(command != 'add')
print(not command == 'add')

## Boolean

In [None]:
t = True

if t:
    print('"t" is True')
else:
    print('"t" is False')
    
f = False

if f:
    print('"f" is True')
else:
    print('"f" is False')

#### Boolean operations: and, or, not

In [None]:
print(t and f)
print(t or f)
print(not t)

#### Be careful about how boolean operations are prioritized (like +, -, x, /)
Use parenthesis to make the operation order clear

In [None]:
print(not t and f)
print(not (t and f))

## Task 3: Let's create a simple guessing game
* You define a secret target value of 7
* The player will enter a guess
* Your function should provide a feedback whether the guess is *higher than*, *lower than*, or *equal to* the secret target

In [None]:
target = 7

def guessing_game(guess):
    if ### fill in your answer here
        print() ### fill in your answer here
    elif ### fill in your answer here
        print() ### fill in your answer here
    else:
        print() ### fill in your answer here

## Next, we are moving on from primitive variables to objects
Objects are more complex data structure that can hold multiple values, like a list, a dictionary (mapping), or a tuple

## List
List is a 1D data structure that store multiple elements: [a, b, ...]

String can be thought of as a list of characters

### Tips:
* In computer science, index begins at 0, not 1
* *a*[0] is the first element of a list *a*, not *a*[1]
* Likewise, *a*[-1] is the last element of a list *a*

In [None]:
mylist = [1, 1, 2, 3, 5, 8, 13, 21, 34]

print(mylist)
print('first element:', mylist[0])
print('second element:', mylist[1])
print('third element:', mylist[2])
print('sixth element:', mylist[5])

#### List members can be conveniently accessed from the end as well

In [None]:
print(mylist)
print('last element:', mylist[-1])
print('second last element:', mylist[-2])

#### Multiple consecutive members can be extracted via slicing with the : symbol
### Tips:
* *a*[i:j] will extract *a*[i], *a*[i+1], ..., up to *a*[j - 1], but will not include *a*[j]

In [None]:
mylist = [1, 1, 2, 3, 5, 8, 13, 21, 34]
print(mylist)
print('third and fourth elements:', mylist[2:4])
print('the first three elements:', mylist[:3])
print('the last five elements:', mylist[-5:])

#### We can control the slicing by defining the start:stop:step

In [None]:
mylist = [1, 1, 2, 3, 5, 8, 13, 21, 34]
print(mylist[1:6:2])

## Objects like lists become dependent when assigned
### Tips:
* Use *a2* = *a1*[:] instead of *a2* = *a1* (*a1*[:] refers to all elements in *a1*, not the object *a1* itself)

Below, notice how the values of the two lists keep changing when one is changed

In [None]:
mylist = [1, 1, 2, 3, 5]
yourlist = mylist
print(mylist)
print(yourlist)

mylist[0] = 'my_new_entry'
print(mylist)
print(yourlist)

yourlist[0] = 'your_new_entry'
print(mylist)
print(yourlist)

In [None]:
mylist = [1, 1, 2, 3, 5]
yourlist = mylist[:]
print(mylist)
print(yourlist)

mylist[0] = 'my_new_entry'
print(mylist)
print(yourlist)

yourlist[0] = 'your_new_entry'
print(mylist)
print(yourlist)

### Some list operations

In [None]:
mylist = []
mylist.append(1)
print(mylist)

mylist.extend([2, 3, 4])
print(mylist)

### list.index() can identify the location of an element in a list

In [None]:
department = ['medicine', 'radiology', 'pathology', 'pediatrics']
print(department.index('pediatrics'))

#### However, a ValueError is returned when the element cannot be found

In [None]:
print(department.index('AI'))

## Task 4: Handle element not found error from list.index()
Using your knowledge of **if-else**, write a function that can address the error caused by list.index()

In [None]:
def error_free_index(input_list, element):
    ### fill in your answer here

In [None]:
print(error_free_index(department, 'radiology')) ## should get 2
print(error_free_index(department, 'AI'))        ## no error here

### String can be thought of as a list of character
Part of a string or a specific position can be extracted in the same way

In [None]:
seq = 'ATGAACGGGTAG'
print(seq[:3])
print(seq[7])
print(seq[-2:])

## range: a built-in function for generating list of integers
range(start, stop, step)

In [None]:
print(range(5))
print(list(range(5)))

## For control statement
For loop is used to repeatedly run a block of code multiple times

In [None]:
for i in range(3):
    print('wake up')

In [None]:
for i in range(5):
    print(i)

#### Step in range can be negative to generate a descending list

In [None]:
for i in range(3, 0, -1):
    print(i)

## For statement with list
For loop is a perfect way to iterate through the content of a list in a controlled manner

In [None]:
more_department = ['medicine', 'radiology', 'pathology', 'pediatrics', 'surgery', 'immunology', 
                   'microbiology', 'anesthesiology']

for i in range(0, len(more_department), 2):
    print(i, more_department[i])

#### We can also iterate through elements of a list directly

In [None]:
for x in more_department:
    print(x[:5])

## Task 5: Print out every department name that ends with -logy

In [None]:
for x in more_department:
    ### fill in your answer here

# Exercises: Let's apply what we learned so far to do some actual analyses

In [None]:
patient_age = [18,        47,   12,     8,      4,     65,      17,      34,      77]
patient_name = ['Alice', 'Bob', 'Clare', 'Don', 'Eric', 'Fei', 'Gabriel', 'Henry', 'Ivan']

## Count the number of patients

In [None]:
number_of_patient = 9 ### replace 9 with your answer that should always work
print('there are', number_of_patient, 'patients')

## Calculate the geometric mean of patient's ages
An example that you can adapt is provided

In [None]:
total_age = 0

for x in patient_age:
    total_age += x
    
average_age = total_age / number_of_patient ## this use your answer from Task 5
    
print('average patient age is', average_age)

## Find the lowest age
An example that you can adapt is provided

In [None]:
max_age = 0

for x in patient_age:
    if x > max_age:
        max_age = x
    
print('the oldest patient\'s age is', max_age)

## Find the name of the youngest patient

## Count the number of patients above 60 years old

## List the names of all patients above 60 years old

## And now, the BEST feature of Python
This is a technique called **list comprehension**

In [None]:
y = [i for i in range(5) if i % 2 == 0]
print(y)

#### List comprehension combines list creation with if condition into one operation
The above code is a shortened form of the following full version

In [None]:
y = []

for i in range(5):
    if i % 2 == 0:
        y.append(i)

print(y)

### Previous tasks, but with list comprehension
#### Find departments with names ending with -logy

In [None]:
print([x for x in more_department if x[-4:] == 'logy'])

#### Number of patients above 60 years old

In [None]:
age_above_60 = [x for x in patient_age if x > 60]

print('there are', len(age_above_60), 'patients above 60 years old')

#### The names of patients above 60 years old

In [None]:
patient_above_60 = [patient_name[i] for i in range(len(patient_name)) if patient_age[i] > 60]
print(patient_above_60)