# Jupyter Notebook
---
Jupyter notebooks let you interactively run code and inspect your work in "blocks (cells)."

## Interface

#### Kernel
- Restart the Kernel: Use this option if your notebook is stuck or you want to clear all variables and start fresh.
- Shut Down the Kernel: Shutting down a kernel stops the execution of the notebook but retains its content.
- Select a Kernel: You can switch between different kernels (e.g., Python, R) if you have multiple environments installed.

#### Cell
- add / delete
- copy / paste
- move
- undo / redo

## Operations

#### Switching Modes:
- Press `Enter` to enter Edit Mode (where you can type inside a cell).
- Press `Esc` to exit Edit Mode and return to Command Mode (where you can use shortcuts to manipulate cells).

#### Running a Cell:
- Press `Shift-Enter` to execute the code or render text in the current cell and automatically move to the next cell.

#### Changing Cell Type:
- Press `y` to change the current cell to a Code cell (used for writing code).
- Press `m` to change the current cell to a Markdown cell (used for writing formatted text, headings, lists, etc.).

#### Adding New Cells:
- Press `a` to create a new cell above the current one.
- Press `b` to create a new cell below the current one.

#### Deleting a Cell:
- Press d twice (`d-d`) while in Command Mode to delete the current cell.

In [None]:
print('Hello world')

# Fundamentals
---

## Variables
- integer
- float
- boolean
- string

### integer

In [None]:
a = 1
print(type(a))
print(a)

In [None]:
a = a + 3
print(a)

In [None]:
a = a**2
print(a)

### float

In [None]:
a = 1.5
print(type(a))
print(a)

In [None]:
a = a / 3
print(a)

### boolean

In [None]:
a = True
b = False
print('a =', a)
print('b =', b)

In [None]:
print(a & b)

In [None]:
print(a | b)

### string

In [None]:
a = 'This is a string'
print(type(a))
print(a)

In [None]:
print(len(a))

In [None]:
b = ', please enjoy it'
print(a + b)

In [None]:
a[0]

In [None]:
a[0:4] # a[:4]

## Data Types and Structures
- List: Ordered collection of items
- Set: Unordered collection of unique items
- Tuple: Ordered, immutable collection of items
- Dictionary: Unordered collection of key-value pairs

| Feature | List | Set | Tuple | Dict |
| --- | --- | --- | --- | --- |
| Ordered | Yes | No | Yes | No |
| Indexing | Yes | No | Yes | Keys act as indexes |
| Mutable | Yes | Yes | No | Yes |
| Allows Duplicates | Yes | No | Yes | Keys: No, Values: Yes |

### list

In [None]:
l = [1, 2, 3, 4]
print(l)

In [None]:
# indexing

print(l[1])

In [None]:
# mutable

l[2] = 5
print(l)

In [None]:
# duplicates

l.append(6)
print(l)

### set

In [None]:
s = set({1, 2, 3, 4})
print(s)

In [None]:
# mutable

s.add(5)
print(s)

In [None]:
s.remove(4)
print(s)

### tuple

In [None]:
t = tuple([1, 2, 3, 4])
print(t)

In [None]:
# indexing

print(t[2])

### dictionary

In [None]:
d = dict({'a':1, 'b':2, 'c':3})
print(d)

In [None]:
# indexing

print(d['c'])

In [None]:
# mutable

d['b'] = 4
print(d)

In [None]:
# duplicates

d['d'] = 1
print(d)

## Conditional Statements

Conditional statements allow your program to make decisions based on certain conditions.

- `if`: Executes a block of code if the condition is `True`.
- `elif`: (short for "else if") Checks another condition if the previous `if` or `elif` was `False`.
- `else`: Executes a block of code if all previous conditions were `False`.

In [None]:
today = "friday"

if today == "friday":
    print("Thank goodness it's Friday!")
elif today == "thursday":
    print("One more day to Friday!")
else:
    print("Can't wait until it's Friday!")

## Looping and Iteration

Looping allows you to execute a block of code repeatedly, either a specific number of times or until a condition is met.

- `for` Loop: Used for iterating over a sequence (e.g., list, tuple, string, range, dictionary). Executes a block of code for each element in the sequence.

- `while` Loop: Repeats as long as a specified condition evaluates to `True`. Useful when the number of iterations is not predetermined.

In [None]:
# for loop

s = [1, 2, 3, 4]
for i in s:
    print(i)

In [None]:
# while loop

k = 0
while k < 10:
    print(k)
    k += 1

## Functions

A function is a block of reusable code that performs a specific task.

In [None]:
def F(x):
    return x + 2

x = 1
print(F(x))

In [None]:
def F(x):
    if x%2 == 0:
        print("x is even")
    else:
        print("x is odd")

x = 4
F(x)

**NOW YOU TRY: Write a Function to List All Even Numbers**

Task:
Create a function that takes a list of numbers as input and return a list of all even numbers from the input.

The function should:
- Accept a single list of numbers as input.
- Use the modulo operator `%` to check if a number is even.
- Return a new list containing only the even numbers from the input list.

# Packages
---

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

## Numpy
A library for high-performance operations on multi-dimensional arrays and matrices, along with mathematical functions.

In [None]:
li = [1,2,3,4]
arr = np.array(li)

In [None]:
# sum

np.sum(arr)

In [None]:
# mean

np.mean(arr)

**NOW YOU TRY: Write a Function to Calculate Standard Deviation**

Task:
Create a function that takes a list of numbers as input and return the standard deviation (STD).

$$
\sigma = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (x_i - \mu)^2}
$$

In [None]:
def STD(li):
    # calculate the mean
    
    # calculate the squared difference
    
    # compute the average of squared difference
    
    # take the square root
    
    return std

In [None]:
# using numpy

np.std(arr)

## Pandas
A library for data manipulation and analysis, particularly for tabular data.

In [None]:
# read a tsv file
file = 'data/gene_chrom.tsv'
gene_chrom_table = pd.read_csv(file, sep='\t')

### check out the data

In [None]:
gene_chrom_table.shape

In [None]:
gene_chrom_table.columns

In [None]:
gene_chrom_table.head()

In [None]:
gene_chrom_table.tail()

### indexing

In [None]:
gene_chrom_table.loc[3]

In [None]:
gene_chrom_table.loc[3, 'geneSymbol']

### filtering

In [None]:
gene_chrom_table[gene_chrom_table.geneSymbol == 'TP53']

In [None]:
gene_chrom_table[(gene_chrom_table.geneSymbol == 'TP53') | (gene_chrom_table.geneSymbol == 'BRCA1')]

In [None]:
gene_chrom_table[gene_chrom_table.geneSymbol.isin(['TP53', 'BRCA1'])]

### grouping

In [None]:
gene_chrom_table.groupby('chromosome').count()

---

### read an intypical tsv file

In [None]:
file = 'data/chrom_lengths.tsv'
chrom_length_table = pd.read_csv(file, sep=' ', names=['chrom', 'length'])
chrom_length_table.head()

In [None]:
chrom_length_table['length'].dtype

In [None]:
# string conversion of numbers
chrom_length_table = pd.read_csv(file, sep=' ', names=['chrom', 'length'], thousands=',')
chrom_length_table.head()

In [None]:
chrom_length_table['length'].dtype

### modify data

In [None]:
# string
'chr' + chrom_length_table['chrom']

In [None]:
# float
chrom_length_table['length']/(10**6)

In [None]:
chrom_length_table['length(Mb)'] = chrom_length_table['length']/(10**6)
chrom_length_table.head()

### reset index

In [None]:
chrom_length_table = chrom_length_table.set_index('chrom')
chrom_length_table.tail()

In [None]:
chrom_length_table = chrom_length_table.reset_index()
chrom_length_table.tail()

---

### Gene density

**NOW YOU TRY: Compute the Gene Density**

Task: Compute the gene density for each chromosome.
$$
\text{gene density} = \frac{\text{gene count}}{\text{length (Mb)}}
$$

In [None]:
# rename chromosome in chrom_length_table


In [None]:
# get number of genes of each chromosome using groupby


In [None]:
# merge tables


In [None]:
# compute gene density


## Seaborn
A library for the creation of statistical plots.

### Bar plot

In [None]:
plot_df = merge_table.reset_index()
sns.barplot(data=plot_df, x='chrom', y='gene_density')

In [None]:
# rotate the x-axis labels
sns.barplot(data=plot_df, x='chrom', y='gene_density')
_ = plt.xticks(rotation=45)

### Scatter plot

In [None]:
sns.scatterplot(x=[1,2,3,4], y=[4,3,2,1])

**NOW YOU TRY: Visualize the Distribution of Gene Counts and Chromosome Lengths**

Task: Create a scatter plot to visualize the relationship between gene counts (x-axis) and chromosome lengths (y-axis).