# Import

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib

# Overall promoter methylation

We'll analyse methylation in Lower Grade Glioma (brain cancer).

First, let's look at overall methylation landscape in LGG. 

We have methylation signal of 1000 cytosines in CpG islands in gene promoters.

In [None]:
island_promoter_methylation = pd.read_csv('./1000_island_promoter_probes.tsv', sep='\t', index_col=0)
island_promoter_methylation.shape

In [None]:
island_promoter_methylation.head()

In [None]:
sns.clustermap(island_promoter_methylation, cmap=matplotlib.cm.Greens, figsize=(6, 6))

We see two clusters of samples: with low and high methylation. What is the biology of these two clusters?

Many samples in LGG have mutated IDH gene.

In [None]:
idh_annotation = pd.read_csv('./IDH_mutation_annotation.tsv', sep='\t', index_col=0)['IDH_status']
idh_annotation.shape

In [None]:
idh_annotation

We see two clusters of methylaltion: hyper and hypo methylated samples. How are these clusters associated with IDH mutation?

In [None]:
sns.clustermap(island_promoter_methylation, cmap=matplotlib.cm.Greens,
              col_colors=idh_annotation.map({'WT': "green", 'Mutant': "red"}),
              figsize=(6, 6))

IDH-mut samples have hypermethylated phenotype, IDH-wt samples - hypomethylated. This happends, because mutated IDH inhibits demethylation in cancer cells.

**Task 1**

Calcultate mean island promoter methylation in IDHmut and IHDwt groups

# MLH3  methylation

Next we'll analyse methylation of gene MLH3.

MLH3 gene maintains genomic integrity during DNA replication. It is involved in DNA mismatch repair pathway. Cancer cells often silence this gene to acquire more mutations. MLH3 can be silenced by mutation, deletion or hypermethylation.

## Promoter

Let's look at methylation of MLH3 promoter

In [None]:
mlh3_methylation_promoter = pd.read_csv('./MLH3_promoter_probes_methylation.tsv', sep='\t', index_col=0)
mlh3_methylation_promoter.shape

In [None]:
mlh3_methylation_promoter

In [None]:
mlh3_expression = pd.read_csv('./MLH3_expression.tsv', sep='\t', index_col=0)['MLH3_expression']
mlh3_expression.shape

In [None]:
mlh3_expression

Count mean methylation of promoter

In [None]:
mlh3_methylation_promoter_mean = mlh3_methylation_promoter.mean()
mlh3_methylation_promoter_mean = mlh3_methylation_promoter_mean.rename("MLH3_mean_promoter_methylation")

In [None]:
mlh3_methylation_promoter_mean

Is there correlation between promoter methylation and expression?

In [None]:
sns.jointplot(mlh3_methylation_promoter_mean, mlh3_expression, kind='reg')

**Task 2**  
Count Spearman correlation

**Task 3.1**    
Let's calculate how many samples have methylated and umnethylated MLH3 promoter?   
To solve this, first find the threshold for promoter methylatin using sns.distplot(). Select theshold from 0, 0.1, 0.2, ..., 0.9, 1

**Task 3.2**    
Now calculate how many samples have methylated and umnethylated MLH3 promoter?   

## Gene body 

Let's look at methylation of MLH3 gene body

In [None]:
mlh3_methylation_body = pd.read_csv('./MLH3_gene_body_probes_methylation.tsv', sep='\t', index_col=0)
mlh3_methylation_body.shape

Count mean methylation of gene body

In [None]:
mlh3_methylation_body_mean = mlh3_methylation_body.mean()
mlh3_methylation_body_mean = mlh3_methylation_body_mean.rename("MLH3_mean_gene_body_methylation")

In [None]:
sns.jointplot(mlh3_methylation_body_mean, mlh3_expression, kind='reg')

**Task 4**  

Count Sperman correlation

## MLH3 methylation and IDH mutation

Is there any difference in MLH3 methylation in IDH-mutatant and IDH-wild-type samples?

Let's make dataframe with data, which are necessary for plotting

In [None]:
mlh3_methylation = pd.DataFrame()

In [None]:
mlh3_methylation['Promoter'] = mlh3_methylation_promoter_mean

In [None]:
mlh3_methylation['Body'] = mlh3_methylation_body_mean

In [None]:
mlh3_methylation['IDH_status'] = idh_annotation

In [None]:
mlh3_methylation

In [None]:
pd.DataFrame(mlh3_methylation).boxplot(column='Promoter', by='IDH_status', grid=False, figsize=(4, 5))

MLH3 methylation in promoter is higher in IDH-mutant samples.

**Task 5**  

What % of IDHmut samples have hypermethylated MLH3?
What % of IDHwt samples have hypermethylated MLH3?

MLH3 methylation in body is similar between groups.

In [None]:
pd.DataFrame(mlh3_methylation).boxplot(column='Body', by='IDH_status', grid=False, figsize=(4, 5))

Methylation of MLH3 gene body is the same in IDH-mutant and IDH-wild type samples.