# Jupyter Notebook
---
Jupyter notebooks let you interactively run code and inspect your work in "blocks (cells)."

## Interface

#### Kernel
- Restart the Kernel: Use this option if your notebook is stuck or you want to clear all variables and start fresh.
- Shut Down the Kernel: Shutting down a kernel stops the execution of the notebook but retains its content.
- Select a Kernel: You can switch between different kernels (e.g., Python, R) if you have multiple environments installed.

#### Cell
- add / delete
- copy / paste
- move
- undo / redo

## Operations

#### Switching Modes:
- Press `Enter` to enter Edit Mode (where you can type inside a cell).
- Press `Esc` to exit Edit Mode and return to Command Mode (where you can use shortcuts to manipulate cells).

#### Running a Cell:
- Press `Shift-Enter` to execute the code or render text in the current cell and automatically move to the next cell.

#### Changing Cell Type:
- Press `y` to change the current cell to a Code cell (used for writing code).
- Press `m` to change the current cell to a Markdown cell (used for writing formatted text, headings, lists, etc.).

#### Adding New Cells:
- Press `a` to create a new cell above the current one.
- Press `b` to create a new cell below the current one.

#### Deleting a Cell:
- Press d twice (`d-d`) while in Command Mode to delete the current cell.

In [64]:
print('Hello world 111')

Hello world


# Fundamentals
---

## Variables
- integer
- float
- boolean
- string

### integer

In [2]:
a = 1
print(type(a))
print(a)

<class 'int'>
1


In [3]:
a = a + 3
print(a)

4


In [4]:
a = a**2
print(a)

16


### float

In [5]:
a = 1.5
print(type(a))
print(a)

<class 'float'>
1.5


In [6]:
a = a / 3
print(a)

0.5


### boolean

In [7]:
a = True
b = False
print('a =', a)
print('b =', b)

a = True
b = False


In [8]:
print(a & b)

False


In [9]:
print(a | b)

True


### string

In [10]:
a = 'This is a string'
print(type(a))
print(a)

<class 'str'>
This is a string


In [11]:
print(len(a))

16


In [12]:
b = ', please enjoy it'
print(a + b)

This is a string, please enjoy it


In [13]:
a[0]

'T'

In [14]:
a[0:4] # a[:4]

'This'

## Data Types and Structures
- List: Ordered collection of items
- Set: Unordered collection of unique items
- Tuple: Ordered, immutable collection of items
- Dictionary: Unordered collection of key-value pairs

| Feature | List | Set | Tuple | Dict |
| --- | --- | --- | --- | --- |
| Ordered | Yes | No | Yes | No |
| Indexing | Yes | No | Yes | Keys act as indexes |
| Mutable | Yes | Yes | No | Yes |
| Allows Duplicates | Yes | No | Yes | Keys: No, Values: Yes |

### list

In [15]:
l = [1, 2, 3, 4]
print(l)

[1, 2, 3, 4]


In [16]:
# indexing

print(l[1])

2


In [17]:
# mutable

l[2] = 5
print(l)

[1, 2, 5, 4]


In [18]:
# duplicates

l.append(6)
print(l)

[1, 2, 5, 4, 6]


### set

In [19]:
s = set({1, 2, 3, 4})
print(s)

{1, 2, 3, 4}


In [20]:
# mutable

s.add(5)
print(s)

{1, 2, 3, 4, 5}


In [21]:
s.remove(4)
print(s)

{1, 2, 3, 5}


### tuple

In [22]:
t = tuple([1, 2, 3, 4])
print(t)

(1, 2, 3, 4)


In [23]:
# indexing

print(t[2])

3


### dictionary

In [24]:
d = dict({'a':1, 'b':2, 'c':3})
print(d)

{'a': 1, 'b': 2, 'c': 3}


In [25]:
# indexing

print(d['c'])

3


In [26]:
# mutable

d['b'] = 4
print(d)

{'a': 1, 'b': 4, 'c': 3}


In [27]:
# duplicates

d['d'] = 1
print(d)

{'a': 1, 'b': 4, 'c': 3, 'd': 1}


## Conditional Statements

Conditional statements allow your program to make decisions based on certain conditions.

- `if`: Executes a block of code if the condition is `True`.
- `elif`: (short for "else if") Checks another condition if the previous `if` or `elif` was `False`.
- `else`: Executes a block of code if all previous conditions were `False`.

In [28]:
today = "friday"

if today == "friday":
    print("Thank goodness it's Friday!")
elif today == "thursday":
    print("One more day to Friday!")
else:
    print("Can't wait until it's Friday!")

Thank goodness it's Friday!


## Looping and Iteration

Looping allows you to execute a block of code repeatedly, either a specific number of times or until a condition is met.

- `for` Loop: Used for iterating over a sequence (e.g., list, tuple, string, range, dictionary). Executes a block of code for each element in the sequence.

- `while` Loop: Repeats as long as a specified condition evaluates to `True`. Useful when the number of iterations is not predetermined.

In [29]:
# for loop

s = [1, 2, 3, 4]
for i in s:
    print(i)

1
2
3
4


In [30]:
# while loop

k = 0
while k < 10:
    print(k)
    k += 1

0
1
2
3
4
5
6
7
8
9


## Functions

A function is a block of reusable code that performs a specific task.

In [31]:
def F(x):
    return x + 2

x = 1
print(F(x))

3


In [32]:
def F(x):
    if x%2 == 0:
        print("x is even")
    else:
        print("x is odd")

x = 4
F(x)

x is even


**NOW YOU TRY: Write a Function to List All Even Numbers**

Task:
Create a function that takes a list of numbers as input and return a list of all even numbers from the input.

The function should:
- Accept a single list of numbers as input.
- Use the modulo operator `%` to check if a number is even.
- Return a new list containing only the even numbers from the input list.

# Packages
---

In [33]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

## Numpy
A library for high-performance operations on multi-dimensional arrays and matrices, along with mathematical functions.

In [34]:
li = [1,2,3,4]
arr = np.array(li)

In [35]:
# sum

np.sum(arr)

np.int64(10)

In [36]:
# mean

np.mean(arr)

np.float64(2.5)

**NOW YOU TRY: Write a Function to Calculate Standard Deviation**

Task:
Create a function that takes a list of numbers as input and return the standard deviation (STD).

$$
\sigma = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (x_i - \mu)^2}
$$

In [37]:
def STD(li):
    # calculate the mean
    
    # calculate the squared difference
    
    # compute the average of squared difference
    
    # take the square root
    
    return std

In [38]:
# using numpy

np.std(arr)

np.float64(1.118033988749895)

## Pandas
A library for data manipulation and analysis, particularly for tabular data.

In [39]:
# read a tsv file
file = 'data/gene_chrom.tsv'
gene_chrom_table = pd.read_csv(file, sep='\t')

### check out the data

In [40]:
gene_chrom_table.shape

(30619, 2)

In [41]:
gene_chrom_table.columns

Index(['geneSymbol', 'chromosome'], dtype='object')

In [42]:
gene_chrom_table.head()

Unnamed: 0,geneSymbol,chromosome
0,DDX11L1,chr1
1,OR4F5,chr1
2,DQ597235,chr1
3,DQ599768,chr1
4,LOC100132062,chr1


In [43]:
gene_chrom_table.tail()

Unnamed: 0,geneSymbol,chromosome
30614,U3,chrX
30615,SNORD112,chrX
30616,U7,chrX
30617,Mir_105,chrX
30618,U6,chrY


### indexing

In [44]:
gene_chrom_table.loc[3]

geneSymbol    DQ599768
chromosome        chr1
Name: 3, dtype: object

In [45]:
gene_chrom_table.loc[3, 'geneSymbol']

'DQ599768'

### filtering

In [46]:
gene_chrom_table[gene_chrom_table.geneSymbol == 'TP53']

Unnamed: 0,geneSymbol,chromosome
11939,TP53,chr17


In [47]:
gene_chrom_table[(gene_chrom_table.geneSymbol == 'TP53') | (gene_chrom_table.geneSymbol == 'BRCA1')]

Unnamed: 0,geneSymbol,chromosome
11939,TP53,chr17
12301,BRCA1,chr17


In [48]:
gene_chrom_table[gene_chrom_table.geneSymbol.isin(['TP53', 'BRCA1'])]

Unnamed: 0,geneSymbol,chromosome
11939,TP53,chr17
12301,BRCA1,chr17


### grouping

In [49]:
gene_chrom_table.groupby('chromosome').count()

Unnamed: 0_level_0,geneSymbol
chromosome,Unnamed: 1_level_1
chr1,2735
chr10,1162
chr11,1817
chr12,1382
chr13,566
chr14,1048
chr15,1441
chr16,1148
chr17,1647
chr17_ctg5_hap1,26


---

### read an intypical tsv file

In [50]:
file = 'data/chrom_lengths.tsv'
chrom_length_table = pd.read_csv(file, sep=' ', names=['chrom', 'length'])
chrom_length_table.head()

Unnamed: 0,chrom,length
0,1,249698942
1,2,242508799
2,3,198450956
3,4,190424264
4,5,181630948


In [51]:
chrom_length_table['length'].dtype

dtype('O')

In [52]:
# string conversion of numbers
chrom_length_table = pd.read_csv(file, sep=' ', names=['chrom', 'length'], thousands=',')
chrom_length_table.head()

Unnamed: 0,chrom,length
0,1,249698942
1,2,242508799
2,3,198450956
3,4,190424264
4,5,181630948


In [53]:
chrom_length_table['length'].dtype

dtype('int64')

### modify data

In [54]:
# string
'chr' + chrom_length_table['chrom']

0      chr1
1      chr2
2      chr3
3      chr4
4      chr5
5      chr6
6      chr7
7      chr8
8      chr9
9     chr10
10    chr11
11    chr12
12    chr13
13    chr14
14    chr15
15    chr16
16    chr17
17    chr18
18    chr19
19    chr20
20    chr21
21    chr22
22     chrX
23     chrY
Name: chrom, dtype: object

In [55]:
# float
chrom_length_table['length']/(10**6)

0     249.698942
1     242.508799
2     198.450956
3     190.424264
4     181.630948
5     170.805979
6     159.345973
7     145.138636
8     138.688728
9     133.797422
10    135.186938
11    133.275309
12    114.364328
13    108.136338
14    102.439437
15     92.211104
16     83.836422
17     80.373285
18     58.617616
19     64.444167
20     46.709983
21     51.857516
22    156.040895
23     57.264655
Name: length, dtype: float64

In [56]:
chrom_length_table['length(Mb)'] = chrom_length_table['length']/(10**6)
chrom_length_table.head()

Unnamed: 0,chrom,length,length(Mb)
0,1,249698942,249.698942
1,2,242508799,242.508799
2,3,198450956,198.450956
3,4,190424264,190.424264
4,5,181630948,181.630948


### reset index

In [57]:
chrom_length_table = chrom_length_table.set_index('chrom')
chrom_length_table.tail()

Unnamed: 0_level_0,length,length(Mb)
chrom,Unnamed: 1_level_1,Unnamed: 2_level_1
20,64444167,64.444167
21,46709983,46.709983
22,51857516,51.857516
X,156040895,156.040895
Y,57264655,57.264655


In [58]:
chrom_length_table = chrom_length_table.reset_index()
chrom_length_table.tail()

Unnamed: 0,chrom,length,length(Mb)
19,20,64444167,64.444167
20,21,46709983,46.709983
21,22,51857516,51.857516
22,X,156040895,156.040895
23,Y,57264655,57.264655


---

### Gene density

**NOW YOU TRY: Compute the Gene Density**

Task: Compute the gene density for each chromosome.
$$
\text{gene density} = \frac{\text{gene count}}{\text{length (Mb)}}
$$

In [59]:
# rename chromosome in chrom_length_table


In [60]:
# get number of genes of each chromosome using groupby


In [61]:
# merge tables


In [62]:
# compute gene density


## Seaborn
A library for the creation of statistical plots.

### Bar plot

In [63]:
plot_df = merge_table.reset_index()
sns.barplot(data=plot_df, x='chrom', y='gene_density')

NameError: name 'merge_table' is not defined

In [None]:
# rotate the x-axis labels
sns.barplot(data=plot_df, x='chrom', y='gene_density')
_ = plt.xticks(rotation=45)

### Scatter plot

In [None]:
sns.scatterplot(x=[1,2,3,4], y=[4,3,2,1])

**NOW YOU TRY: Visualize the Distribution of Gene Counts and Chromosome Lengths**

Task: Create a scatter plot to visualize the relationship between gene counts (x-axis) and chromosome lengths (y-axis).