> We have already established that using annotated HeLa m6A sites, we can observe changes in genes with m6A sites (HL-60) cells. In order to confirm this m6A sites, we performed MeRIP-seq in treated and untreated cells, and did observe a general increase in m6A levels upon treatments for a large number of annotated sites. Here, our goal is to indpendently analyze the MeRIP data without relying on HeLa annotations and use it to define a **treatment-induced hyper-methylation sites**. We will then assess the location and behaviour of these targets across the other datasets generated in this study.

## Test enrichment of treatment-induced hyper/hypo-methylation sites

### Goal
Here, I aim to identify the genes that are hyper or hypo methylated as genesets, and see if they have enriched accross all datasets; the input table is list of genes with control vs. treated fold change of RNA expression, RNA stability and translational efficiency. 
### Steps 
1. Prepare inputs  
    - Filtering genes with $\Delta$methylation >= 2 as hyper-methylation sites (P-Value < 0.01)
    - Filtering genes with $\Delta$methylation <= -2 as hypo-methylation sites (P-Value < 0.01)
2. Run `run_mi_gene_list.pl` command 


In [9]:
import sys 
import pandas as pd 
import numpy as np
sys.path.append('../')

from util import *


def write_gene_file(df,file_name):
    
    df = pd.DataFrame({'ensembl':[ens[:-3] for ens in df.ensembl.tolist()]})#.drop_duplicates('ensembl')
    df.to_csv(file_name,sep='\t',index=None,header=None)


In [10]:
%%time 
data = pd.read_csv('../meRIP-seq/radar/result.all.txt',sep='\t').loc[:,['ensembl','logFC','p_value']]#.set_index('ensembl')
data = data.iloc[[int(data[(data.ensembl == gene)].logFC.abs().idxmax()) for gene in set(data.ensembl)],:]

CPU times: user 32 s, sys: 69 µs, total: 32 s
Wall time: 32.1 s


In [14]:
!mkdir mtyl-enrichment

In [15]:
hyper, hypo = two_sided_mtyl(data,fcthr=2)
write_gene_file(hyper,'mtyl-enrichment/hyper_mtyl.txt')
write_gene_file(hypo,'mtyl-enrichment/hypo_mtyl.txt')

2. Using a [TEISER](https://github.com/goodarzilab/TEISER) script to do enrichment test 



In [16]:
%%bash

declare -a Genesets=('hyper_mtyl' 'hypo_mtyl')
declare -a Experiments=(
# Ribo-seq
'../Ribo-seq/hl60_delta_te.txt'

## HL-60 RNA-seq 
# RNA experssion 
'../RNA-seq/hl60-exp/6h_delta_exp.txt' 
'../RNA-seq/hl60-exp/72h_delta_exp.txt' 
'../RNA-seq/hl60-exp/120h_delta_exp.txt' 
# RNA stability  
'../RNA-seq/hl60-stbl/120h_delta_stbl.txt'  
'../RNA-seq/hl60-stbl/6h_delta_stbl.txt'

## 5 other AML cell lines RNA-seq
# RNA experssion 
'../RNA-seq/other-exp/kg1_delta_exp.txt' 
'../RNA-seq/other-exp/molm14_delta_exp.txt'
'../RNA-seq/other-exp/ociaml2_delta_exp.txt' 
'../RNA-seq/other-exp/ociaml3_delta_exp.txt'
'../RNA-seq/other-exp/thp1_delta_exp.txt'
# RNA stability  
'../RNA-seq/other-stbl/kg1_delta_stbl.txt' 
'../RNA-seq/other-stbl/molm14_delta_stbl.txt' 
'../RNA-seq/other-stbl/ociaml2_delta_stbl.txt' 
'../RNA-seq/other-stbl/ociaml3_delta_stbl.txt'
'../RNA-seq/other-stbl/thp1_delta_stbl.txt'
)

for exp in "${Experiments[@]}"; do
    for geneset in "${Genesets[@]}"; do
    
        echo $exp $geneset
        base=`basename $exp`
        base=${base/.txt/}
        
        # get intersect 
        awk 'NR==FNR{A[$1];next}$1 in A' $exp mtyl-enrichment/${geneset}.txt > mtyl-enrichment/${geneset}_${base}.txt
        
        perl $TEISERDIR/run_mi_gene_list.pl \
            --expfile=$exp \
            --genefile=mtyl-enrichment/${geneset}_${base}.txt \
            --exptype=continuous \
            --ebins=7 \
            --species=human \
            --doremovedups=0 \
            --doremoveextra=0 &> mtyl-enrichment/${geneset}_${base}.log
#         # remove results from previous run 
#         rm -fr mtyl-enrichment/${geneset}_${base}_GENESET
        
        rm mtyl-enrichment/${geneset}_${base}.txt
        mv ${exp}_GENESET mtyl-enrichment/${geneset}_${base}_GENESET
        
        echo 'done!'
        
    done 

done

../Ribo-seq/hl60_delta_te.txt hyper_mtyl
done!
../Ribo-seq/hl60_delta_te.txt hypo_mtyl
done!
../RNA-seq/hl60-exp/6h_delta_exp.txt hyper_mtyl
done!
../RNA-seq/hl60-exp/6h_delta_exp.txt hypo_mtyl
done!
../RNA-seq/hl60-exp/72h_delta_exp.txt hyper_mtyl
done!
../RNA-seq/hl60-exp/72h_delta_exp.txt hypo_mtyl
done!
../RNA-seq/hl60-exp/120h_delta_exp.txt hyper_mtyl
done!
../RNA-seq/hl60-exp/120h_delta_exp.txt hypo_mtyl
done!
../RNA-seq/hl60-stbl/120h_delta_stbl.txt hyper_mtyl
done!
../RNA-seq/hl60-stbl/120h_delta_stbl.txt hypo_mtyl
done!
../RNA-seq/hl60-stbl/6h_delta_stbl.txt hyper_mtyl
done!
../RNA-seq/hl60-stbl/6h_delta_stbl.txt hypo_mtyl
done!
../RNA-seq/other-exp/kg1_delta_exp.txt hyper_mtyl
done!
../RNA-seq/other-exp/kg1_delta_exp.txt hypo_mtyl
done!
../RNA-seq/other-exp/molm14_delta_exp.txt hyper_mtyl
done!
../RNA-seq/other-exp/molm14_delta_exp.txt hypo_mtyl
done!
../RNA-seq/other-exp/ociaml2_delta_exp.txt hyper_mtyl
done!
../RNA-seq/other-exp/ociaml2_delta_exp.txt hypo_mtyl
done!
../RNA-

3. Merge hypo and hyper results

In [17]:
import os 
from glob import glob 

comps = [(
    comp.split('/')[1].split('_mtyl_')[0],
    comp.split('/')[1].split('_mtyl_')[1].replace('_GENESET','')
) for comp in glob('mtyl-enrichment/*_GENESET')]

comps.sort(key=lambda a: len(a[1]))

expfiles = {}
summaryfiles = {}
pvmatrixfiles = {}

for c in {comp for _,comp in comps}:
    
    path = f'mtyl-enrichment/{c}'
    os.mkdir(path) 
    expfiles[c] = {}
    summaryfiles[c] = {}
    pvmatrixfiles[c] = {}
    
    for m in {mtyl for mtyl,_ in comps}:
        expfiles[c][m] = f'mtyl-enrichment/{m}_mtyl_{c}_GENESET/{c}.txt'
        summaryfiles[c][m] = f'mtyl-enrichment/{m}_mtyl_{c}_GENESET/{c}.txt.summary'
        pvmatrixfiles[c][m] = f'mtyl-enrichment/{m}_mtyl_{c}_GENESET/{c}.txt.matrix'
    
    exp_df = pd.read_csv(expfiles[c]['hyper'],sep='\t', header=None)
    exp_df.to_csv(f'{path}/{c}.txt',header=None,index=False,sep='\t')
    
    sum_df = pd.concat([pd.read_csv(summaryfiles[c]['hyper'],sep='\t'),pd.read_csv(summaryfiles[c]['hypo'],sep='\t')])
    sum_df['index'] = ['Hyper-methylated geneset','Hypo-methylated geneset']
    sum_df.to_csv(f'{path}/{c}.txt.summary',index=False,sep='\t')
    
    
    mtx_df = pd.concat([pd.read_csv(pvmatrixfiles[c]['hyper'],sep='\t'),pd.read_csv(pvmatrixfiles[c]['hypo'],sep='\t')])
    mtx_df.MOTIF = ['Hyper-methylated geneset','Hypo-methylated geneset']
    mtx_df.to_csv(f'{path}/{c}.txt.matrix',index=False,sep='\t')

In [18]:
%%bash 
cd mtyl-enrichment
for exp in `ls | grep -v '_GENESET' |  grep -v '.log' | grep 'delta'`; do 

    echo $exp

    cd $exp

    perl /flash/bin/TEISERv1.1/Scripts/teiser_draw_matrix.pl \
        --pvmatrixfile=${exp}.txt.matrix \
        --summaryfile=${exp}.txt.summary \
        --expfile=${exp}.txt \
        --quantized=0 \
        --order=0 \
        --min=-10 --max=10 --cluster=5 \
        --colmap=/flash/bin/TEISERv1.1//Scripts/HEATMAPS/cmap_1.txt 
    cd ..
    echo "------------------------------------------"

done 
cd ../

120h_delta_exp
Reading MI data ... Done.
Start drawing
1.36	-1
Hyper-methylated geneset
Hypo-methylated geneset
Outputing EPS file 120h_delta_exp.txt.summary.eps
Convert to PDF 120h_delta_exp.txt.summary.pdf
ps2pdf -dEPSCrop -dAutoRotatePages=/None 120h_delta_exp.txt.summary.eps 120h_delta_exp.txt.summary.pdf
Finished.
------------------------------------------
120h_delta_stbl
Reading MI data ... Done.
Start drawing
0.54	-0.54
Hyper-methylated geneset
Hypo-methylated geneset
Outputing EPS file 120h_delta_stbl.txt.summary.eps
Convert to PDF 120h_delta_stbl.txt.summary.pdf
ps2pdf -dEPSCrop -dAutoRotatePages=/None 120h_delta_stbl.txt.summary.eps 120h_delta_stbl.txt.summary.pdf
Finished.
------------------------------------------
6h_delta_exp
Reading MI data ... Done.
Start drawing
1.06	-0.76
Hyper-methylated geneset
Hypo-methylated geneset
Outputing EPS file 6h_delta_exp.txt.summary.eps
Convert to PDF 6h_delta_exp.txt.summary.pdf
ps2pdf -dEPSCrop -dAutoRotatePages=/None 6h_delta_exp.txt.s

4. Redraw heatmaps using `--min=-3 --max=3` thresholds for those plots which have smaller range of signals:

In [19]:
%%bash 
cd mtyl-enrichment

declare -a Experiments=(
'6h_delta_stbl' '120h_delta_stbl' 
'kg1_delta_stbl' 'ociaml2_delta_stbl' 'molm14_delta_stbl' 
'ociaml3_delta_stbl' 'thp1_delta_stbl'
'hl60_delta_te'
)
for exp in "${Experiments[@]}"; do

    echo $exp

    cd $exp

    perl /flash/bin/TEISERv1.1/Scripts/teiser_draw_matrix.pl \
        --pvmatrixfile=${exp}.txt.matrix \
        --summaryfile=${exp}.txt.summary \
        --expfile=${exp}.txt \
        --quantized=0 \
        --order=0 \
        --min=-3 --max=3 --cluster=5
    cd ..
    echo "------------------------------------------"

done 
cd ../

6h_delta_stbl
Reading MI data ... Done.
Start drawing
0.26	-0.32
Hyper-methylated geneset
Hypo-methylated geneset
Outputing EPS file 6h_delta_stbl.txt.summary.eps
Convert to PDF 6h_delta_stbl.txt.summary.pdf
ps2pdf -dEPSCrop -dAutoRotatePages=/None 6h_delta_stbl.txt.summary.eps 6h_delta_stbl.txt.summary.pdf
Finished.
------------------------------------------
120h_delta_stbl
Reading MI data ... Done.
Start drawing
0.54	-0.54
Hyper-methylated geneset
Hypo-methylated geneset
Outputing EPS file 120h_delta_stbl.txt.summary.eps
Convert to PDF 120h_delta_stbl.txt.summary.pdf
ps2pdf -dEPSCrop -dAutoRotatePages=/None 120h_delta_stbl.txt.summary.eps 120h_delta_stbl.txt.summary.pdf
Finished.
------------------------------------------
kg1_delta_stbl
Reading MI data ... Done.
Start drawing
0.74	-1.04
Hyper-methylated geneset
Hypo-methylated geneset
Outputing EPS file kg1_delta_stbl.txt.summary.eps
Convert to PDF kg1_delta_stbl.txt.summary.pdf
ps2pdf -dEPSCrop -dAutoRotatePages=/None kg1_delta_stbl

In [21]:
!mkdir mtyl-enrichment/log
!mv mtyl-enrichment/*_GENESET mtyl-enrichment/log
!mv mtyl-enrichment/*.log mtyl-enrichment/log

4. Make `png` figures:

In [22]:
%%bash 
for pdf in mtyl-enrichment/*/*.txt.summary.pdf; do 
    png=${pdf/.pdf/.png}
    di=`dirname $pdf`
    out=`basename $di`
    
    bash /rumi/shams/abe/GitHub/Abe/my_scripts/pdf2png.sh $pdf 

    mv $pdf mtyl-enrichment/${out}.pdf
    mv $png mtyl-enrichment/${out}.png
done 

mtyl-enrichment/120h_delta_exp/120h_delta_exp.txt.summary.pdf > mtyl-enrichment/120h_delta_exp/120h_delta_exp.txt.summary.png
done!
mtyl-enrichment/120h_delta_stbl/120h_delta_stbl.txt.summary.pdf > mtyl-enrichment/120h_delta_stbl/120h_delta_stbl.txt.summary.png
done!
mtyl-enrichment/6h_delta_exp/6h_delta_exp.txt.summary.pdf > mtyl-enrichment/6h_delta_exp/6h_delta_exp.txt.summary.png
done!
mtyl-enrichment/6h_delta_stbl/6h_delta_stbl.txt.summary.pdf > mtyl-enrichment/6h_delta_stbl/6h_delta_stbl.txt.summary.png
done!
mtyl-enrichment/72h_delta_exp/72h_delta_exp.txt.summary.pdf > mtyl-enrichment/72h_delta_exp/72h_delta_exp.txt.summary.png
done!
mtyl-enrichment/hl60_delta_te/hl60_delta_te.txt.summary.pdf > mtyl-enrichment/hl60_delta_te/hl60_delta_te.txt.summary.png
done!
mtyl-enrichment/kg1_delta_exp/kg1_delta_exp.txt.summary.pdf > mtyl-enrichment/kg1_delta_exp/kg1_delta_exp.txt.summary.png
done!
mtyl-enrichment/kg1_delta_stbl/kg1_delta_stbl.txt.summary.pdf > mtyl-enrichment/kg1_delta_stbl/k

In [1]:
%%bash 
cd mtyl-enrichment
zip merged.zip *pdf
cd ../

  adding: 120h_delta_exp.pdf (deflated 24%)
  adding: 120h_delta_stbl.pdf (deflated 24%)
  adding: 6h_delta_exp.pdf (deflated 24%)
  adding: 6h_delta_stbl.pdf (deflated 24%)
  adding: 72h_delta_exp.pdf (deflated 24%)
  adding: hl60_delta_te.pdf (deflated 24%)
  adding: kg1_delta_exp.pdf (deflated 24%)
  adding: kg1_delta_stbl.pdf (deflated 24%)
  adding: molm14_delta_exp.pdf (deflated 24%)
  adding: molm14_delta_stbl.pdf (deflated 24%)
  adding: ociaml2_delta_exp.pdf (deflated 24%)
  adding: ociaml2_delta_stbl.pdf (deflated 24%)
  adding: ociaml3_delta_exp.pdf (deflated 24%)
  adding: ociaml3_delta_stbl.pdf (deflated 24%)
  adding: thp1_delta_exp.pdf (deflated 24%)
  adding: thp1_delta_stbl.pdf (deflated 24%)


5. Write README.md draft
    - Write HTML codes which link all plots into a `README.md` format to prepare GitHub friendly report

In [None]:
# %%bash 
# readme='mtyl-enrichment.md'
# touch $readme
# for f in mtyl-enrichment/*.png; do 
#     b=`basename $f`
#     t=${b/.png/}
#     echo '#### '$t >> $readme
#     echo -e "<img src=\""$f"\" title=\""$t"\" style=\"width:1000px\">\n" >> $readme
# done 

In [52]:
# https://github.com/artemy-bakulin/iPAGE-2/


# import numpy as np
# import pandas as pd
# import matplotlib.pyplot as plt
# from matplotlib.gridspec import GridSpec
# import matplotlib
# import copy



# def columnwise_heatmap(array, ax=None, expression=False, cmap_main='RdBu_r', cmap_reg='YlOrBr', **kw):
#     #ax = ax or plt.gca()
#     images = []
#     if expression:
#         current_cmap = copy.copy(matplotlib.cm.get_cmap(cmap_reg))
#         current_cmap.set_bad(color='black')
#         im = ax[0].imshow(np.atleast_2d(array[:, 0]).T, cmap=current_cmap, **kw)
#         images.append(im)
#         im = ax[1].imshow(np.atleast_2d(array[:, 1:]), cmap=cmap_main, **kw)
#     else:
#         im = ax.imshow(np.atleast_2d(array[:, :]), cmap=cmap_main, **kw)

#     images.append(im)
#     return images


# def add_colorbar(fig, ims, n):
#     fig.subplots_adjust(left=0.06, right=0.65)
#     rows = n
#     cols = 1
#     gs = GridSpec(rows, cols)
#     gs.update(left=0.7, right=0.75, wspace=1, hspace=0.3)
#     if n == 0:
#         colorbar_names = ['']
#         colorbar_images = []
#     elif n == 1:
#         colorbar_names = ['Regulon\'s \n enrichment']
#         colorbar_images = [-1]
#     elif n == 2:
#         colorbar_names = ['Regulator\'s \n expression', 'Regulon\'s \n enrichment']
#         colorbar_images = [0, 1]
#     for i in colorbar_images:
#         cax = fig.add_subplot(gs[i // cols, i % cols])
#         fig.colorbar(ims[i], cax=cax)
#         cax.set_title(colorbar_names[i], fontsize=10)


# def draw_heatmap(names, values, output_name='output_ipage', expression=None, cmap_main='RdBu_r', cmap_reg='RdBu_r'):

#     if type(names[0]) != list:
#         df = pd.DataFrame(values, index=names)
#     else:
#         df = pd.DataFrame(values, index=names[0], columns=names[1])

#     if expression:
#         df.insert(0, 'regulator', expression)
#     plt.rcParams.update({'font.weight': 'roman'})
#     plt.rcParams.update({'ytick.labelsize': 10})
#     fontsize_pt = plt.rcParams['ytick.labelsize']
#     dpi = 72.27
#     matrix_height_pt = (fontsize_pt+30/2) * df.shape[0]
#     matrix_height_in = matrix_height_pt / dpi
#     matrix_width_pt = (fontsize_pt+50/2) * df.shape[1]
#     matrix_width_in = matrix_width_pt / dpi
#     top_margin = 0.04  # in percentage of the figure height
#     bottom_margin = 0.04  # in percentage of the figure height / (1 - top_margin - bottom_margin)
#     figure_height = matrix_height_in
#     figure_width = matrix_width_in

#     if expression:
#         fig, ax = plt.subplots(1, 2, figsize=(figure_width, figure_height), gridspec_kw={'width_ratios': [1, df.shape[1]-1]})
#         fig.subplots_adjust(wspace=0.05)
#     else:
#         fig, ax = plt.subplots(1, 1, figsize=(figure_width, figure_height))

#     ims = columnwise_heatmap(df.values, ax=ax, aspect="auto", expression=bool(expression),
#                              cmap_main=cmap_main, cmap_reg=cmap_reg)
#     if expression:
#         ax[0].set(xticks=[], yticks=np.arange(len(df)), yticklabels=df.index, xlabel='Regulator')
#         ax[0].xaxis.set_label_position('top')
#         ax[1].set(xticks=[], yticks=[], xlabel='Regulon')
#         ax[1].xaxis.set_label_position('top')
#     else:

#         ax.set(xticks=[], yticks=np.arange(len(df)), yticklabels=df.index, xlabel='Regulon')
#         ax.xaxis.set_label_position('top')
#         plt.xticks(rotation=90)



#     # ax.tick_params(bottom=False, top=False,
#     #               labelbottom=False, labeltop=True, left=False)
#     if expression:
#         n = 2
#     else:
#         n = 1
#     add_colorbar(fig, ims, n)
#     if output_name == 'stdout':
#         plt.show(block=False)
#     else:
#         plt.savefig('%s.svg' % output_name, bbox_inches='tight')
#         plt.close()
