> We have already established that using annotated HeLa m6A sites, we can observe changes in genes with m6A sites (HL-60) cells. In order to confirm this m6A sites, we performed MeRIP-seq in treated and untreated cells, and did observe a general increase in m6A levels upon treatments for a large number of annotated sites. Here, our goal is to indpendently analyze the MeRIP data without relying on HeLa annotations and use it to define a **treatment-induced hyper-methylation sites**. We will then assess the location and behaviour of these targets across the other datasets generated in this study.

## Test enrichment of treatment-induced hyper/hypo-methylation sites

### Goal
Here, I aim to identify the genes that are hyper or hypo methylated as genesets, and see if they have enriched accross all datasets; the input table is list of genes with control vs. treated fold change of RNA expression, RNA stability and translational efficiency. 
### Steps 
1. Prepare inputs  
    - Filtering genes with $\Delta$methylation >= 2 as hyper-methylation sites (P-Value < 0.01)
    - Filtering genes with $\Delta$methylation <= -2 as hypo-methylation sites (P-Value < 0.01)
2. Run `run_mi_gene_list.pl` command 


In [6]:
import os 
from glob import glob 
import sys 
import pandas as pd 
import numpy as np
sys.path.append('../')

from util import *

In [18]:
for exp in [
    '../RNA-seq/exp/hl60_6h_delta_exp.txt',
    '../RNA-seq/exp/hl60_72h_delta_exp.txt',
    '../RNA-seq/exp/hl60_72h_only_delta_exp.txt',
    '../RNA-seq/exp/hl60_120h_delta_exp.txt',
    '../RNA-seq/exp/kg1_delta_exp.txt',
    '../RNA-seq/exp/molm14_delta_exp.txt',
    '../RNA-seq/exp/ociaml2_delta_exp.txt',
    '../RNA-seq/exp/ociaml3_delta_exp.txt',
    '../RNA-seq/exp/thp1_delta_exp.txt',
    '../DAC-rg3039/RNA-seq/comb_vs_decitabine_delta_exp.txt',
    '../DAC-rg3039/RNA-seq/comb_vs_dmso_delta_exp.txt'

]: 
    pd.read_csv(exp,sep='\t').drop_duplicates('gene_name').to_csv(exp.replace('.txt','.c.txt'),sep='\t',index=None)


In [7]:
%%time 
data = pd.read_csv('../meRIP-seq/hl60_delta_mtyl_table.txt',sep='\t').loc[:,['ensembl','name','logFC','p_value']]
data = data.iloc[[int(data[(data.ensembl == gene)].logFC.abs().idxmax()) for gene in set(data.ensembl)],:].set_index('ensembl')

CPU times: user 2.35 s, sys: 26.3 ms, total: 2.37 s
Wall time: 2.37 s


In [20]:
!mkdir -p mtyl-enrichment

In [8]:
hyper, hypo = two_sided_mtyl(data,fcthr=2)

hyper.reindex(
    hyper.logFC.abs().sort_values(ascending=False).index
).to_csv('mtyl-enrichment/hyper_mtyl.txt',sep='\t',index=None,header=None)

hypo.reindex(
    hypo.logFC.abs().sort_values(ascending=False).index
).to_csv ('mtyl-enrichment/hypo_mtyl.txt', sep='\t',index=None,header=None)

2. Using a [TEISER](https://github.com/goodarzilab/TEISER) script to do enrichment test 



In [3]:
ls ~/Workflows

[0m[38;5;9miPAGEv1.0.zip[0m  [38;5;9mQoRTs-STABLE.jar[0m  [38;5;27mTEISERv1.1[0m/  [38;5;9mTEISER.zip[0m


In [36]:
%%bash

# export PAGEDIR='/data_gilbert/home/aarab/iPAGE'
export TEISERDIR='/data_gilbert/home/aarab/Workflows/TEISERv1.1'

declare -a Genesets=('hyper_mtyl' 'hypo_mtyl')
declare -a Experiments=(
# # Ribo-seq
# '../Ribo-seq/hl60_delta_te.txt'

# ## HL-60 RNA-seq 
# # RNA experssion 
# '../RNA-seq/exp/hl60_6h_delta_exp.c.txt' 
# '../RNA-seq/exp/hl60_72h_delta_exp.c.txt' 
# '../RNA-seq/exp/hl60_72h_only_delta_exp.c.txt' 
# '../RNA-seq/exp/hl60_120h_delta_exp.c.txt' 
# # RNA stability  
# '../RNA-seq/stbl/hl60_120h_delta_stbl.txt'  
# '../RNA-seq/stbl/hl60_72h_delta_stbl.txt'
# '../RNA-seq/stbl/hl60_6h_delta_stbl.txt'

# ## 5 other AML cell lines RNA-seq
# # RNA experssion 
# '../RNA-seq/exp/kg1_delta_exp.c.txt'
# '../RNA-seq/exp/molm14_delta_exp.c.txt'
# '../RNA-seq/exp/ociaml2_delta_exp.c.txt' 
# '../RNA-seq/exp/ociaml3_delta_exp.c.txt'
# '../RNA-seq/exp/thp1_delta_exp.c.txt'
# # RNA stability  
# '../RNA-seq/stbl/kg1_delta_stbl.txt' 
# '../RNA-seq/stbl/molm14_delta_stbl.txt' 
# '../RNA-seq/stbl/ociaml2_delta_stbl.txt' 
# '../RNA-seq/stbl/ociaml3_delta_stbl.txt'
# '../RNA-seq/stbl/thp1_delta_stbl.txt'

# ## drug combination
# # RNA experssion 
# '../DAC-rg3039/RNA-seq/comb_vs_decitabine_delta_exp.c.txt'
# '../DAC-rg3039/RNA-seq/comb_vs_dmso_delta_exp.c.txt'
## DNA RNA me
'../DNA-RNA-mtyl/CpG-nearend.txt'
'../DNA-RNA-mtyl/CpG-promoter.txt'
)

for exp in "${Experiments[@]}"; do
    for geneset in "${Genesets[@]}"; do
    
        echo $exp $geneset
        base=`basename $exp`
        base=${base/.txt/}
        
#         # remove results from previous run 
#         rm -rf ${exp}_GENESET
        
#         # remove results from previous run 
#         rm -fr mtyl-enrichment/${geneset}_${base}

        # get intersect
        awk 'NR==FNR{A[$1];next}$1 in A' $exp mtyl-enrichment/${geneset}.txt > mtyl-enrichment/${geneset}_${base}.txt
        
        perl ${TEISERDIR}/run_mi_gene_list.pl \
            --expfile=$exp \
            --genefile=mtyl-enrichment/${geneset}_${base}.txt \
            --exptype=continuous \
            --ebins=11 \
            --species=human \
            --doremovedups=0 \
            --doremoveextra=0 &> mtyl-enrichment/${geneset}_${base}.log
        
        
        rm mtyl-enrichment/${geneset}_${base}.txt #mtyl-enrichment/${geneset}_${base}.log
        mv ${exp}_GENESET mtyl-enrichment/${geneset}_${base}_GENESET
        
        echo 'done!'
        
    done

done

../DNA-RNA-mtyl/CpG-nearend.txt hyper_mtyl
done!
../DNA-RNA-mtyl/CpG-nearend.txt hypo_mtyl
done!
../DNA-RNA-mtyl/CpG-promoter.txt hyper_mtyl
done!
../DNA-RNA-mtyl/CpG-promoter.txt hypo_mtyl
done!


In [27]:
%%bash

# export PAGEDIR='/data_gilbert/home/aarab/iPAGE'
export TEISERDIR='/data_gilbert/home/aarab/Workflows/TEISERv1.1'
perl /data_gilbert/home/aarab/Workflows/TEISERv1.1/Scripts/teiser_draw_matrix.pl --pvmatrixfile=../DNA-RNA-mtyl/CpG-promoter.txt_GENESET/CpG-promoter.txt.matrix --summaryfile=../DNA-RNA-mtyl/CpG-promoter.txt_GENESET/CpG-promoter.txt.summary --expfile=../DNA-RNA-mtyl/CpG-promoter.txt_GENESET/CpG-promoter.txt --quantized=0 --colmap=/data_gilbert/home/aarab/Workflows/TEISERv1.1/Scripts/HEATMAPS/cmap_1.txt --order=0 --min=-10 --max=10 --cluster=5 --suffix=


Reading MI data ... 

Option suffix requires an argument
Table.pm: cannot open file "../DNA-RNA-mtyl/CpG-promoter.txt_GENESET/CpG-promoter.txt.matrix" ..


CalledProcessError: Command 'b"\n# export PAGEDIR='/data_gilbert/home/aarab/iPAGE'\nexport TEISERDIR='/data_gilbert/home/aarab/Workflows/TEISERv1.1'\nperl /data_gilbert/home/aarab/Workflows/TEISERv1.1/Scripts/teiser_draw_matrix.pl --pvmatrixfile=../DNA-RNA-mtyl/CpG-promoter.txt_GENESET/CpG-promoter.txt.matrix --summaryfile=../DNA-RNA-mtyl/CpG-promoter.txt_GENESET/CpG-promoter.txt.summary --expfile=../DNA-RNA-mtyl/CpG-promoter.txt_GENESET/CpG-promoter.txt --quantized=0 --colmap=/data_gilbert/home/aarab/Workflows/TEISERv1.1/Scripts/HEATMAPS/cmap_1.txt --order=0 --min=-10 --max=10 --cluster=5 --suffix=\n"' returned non-zero exit status 2.

3. Merge hypo and hyper results

In [None]:
# comps = [(
#     comp.split('/')[1].split('_mtyl_')[0],
#     comp.split('/')[1].split('_mtyl_')[1].replace('_GENESET','')
# ) for comp in glob('mtyl-enrichment/*delta*')]

# comps.sort(key=lambda a: len(a[1]))

# comps

In [25]:
comps = [(
    comp.split('/')[1].split('_mtyl_')[0],
    comp.split('/')[1].split('_mtyl_')[1].replace('_GENESET','')
) for comp in glob('mtyl-enrichment/*CpG*')]

comps.sort(key=lambda a: len(a[1]))

In [26]:
expfiles = {}
summaryfiles = {}
pvmatrixfiles = {}

for c in {comp for _,comp in comps}:
    
    path = f'mtyl-enrichment/{c}'
    os.mkdir(path) 
    expfiles[c] = {}
    summaryfiles[c] = {}
    pvmatrixfiles[c] = {}
    
    for m in {mtyl for mtyl,_ in comps}:
        expfiles[c][m] = f'mtyl-enrichment/{m}_mtyl_{c}_GENESET/{c}.txt'
        summaryfiles[c][m] = f'mtyl-enrichment/{m}_mtyl_{c}_GENESET/{c}.txt.summary'
        pvmatrixfiles[c][m] = f'mtyl-enrichment/{m}_mtyl_{c}_GENESET/{c}.txt.matrix'
    
    exp_df = pd.read_csv(expfiles[c]['hyper'],sep='\t', header=None)
    exp_df.to_csv(f'{path}/{c}.txt',header=None,index=False,sep='\t')
    
    sum_df = pd.concat([pd.read_csv(summaryfiles[c]['hyper'],sep='\t'),pd.read_csv(summaryfiles[c]['hypo'],sep='\t')])
    sum_df['index'] = ['Hyper-methylated geneset','Hypo-methylated geneset']
    sum_df.to_csv(f'{path}/{c}.txt.summary',index=False,sep='\t')
    
    
    mtx_df = pd.concat([pd.read_csv(pvmatrixfiles[c]['hyper'],sep='\t'),pd.read_csv(pvmatrixfiles[c]['hypo'],sep='\t')])
    mtx_df.MOTIF = ['Hyper-methylated geneset','Hypo-methylated geneset']
    mtx_df.to_csv(f'{path}/{c}.txt.matrix',index=False,sep='\t')

FileNotFoundError: [Errno 2] No such file or directory: 'mtyl-enrichment/hyper_mtyl_CpG-nearend_GENESET/CpG-nearend.txt.summary'

In [24]:
%%bash 
cd mtyl-enrichment
for exp in `ls | grep -v '_GENESET' | grep 'delta'`; do 

    echo $exp

    cd $exp

    perl ${TEISERDIR}Scripts/teiser_draw_matrix.pl \
        --pvmatrixfile=${exp}.txt.matrix \
        --summaryfile=${exp}.txt.summary \
        --expfile=${exp}.txt \
        --quantized=0 \
        --order=0 \
        --min=-10 --max=10 --cluster=5 \
        --colmap=${TEISERDIR}Scripts/HEATMAPS/cmap_1.txt &> ${exp}.log
        
        rm ${exp}.log
    cd ..
    echo "done!"

done 
cd ../

comb_vs_decitabine_delta_exp.c
done!
comb_vs_dmso_delta_exp.c
done!
hl60_120h_delta_exp.c
done!
hl60_120h_delta_stbl
done!
hl60_6h_delta_exp.c
done!
hl60_6h_delta_stbl
done!
hl60_72h_delta_exp.c
done!
hl60_72h_delta_stbl
done!
hl60_72h_only_delta_exp.c
done!
hl60_delta_te
done!
kg1_delta_exp.c
done!
kg1_delta_stbl
done!
molm14_delta_exp.c
done!
molm14_delta_stbl
done!
ociaml2_delta_exp.c
done!
ociaml2_delta_stbl
done!
ociaml3_delta_exp.c
done!
ociaml3_delta_stbl
done!
thp1_delta_exp.c
done!
thp1_delta_stbl
done!


4. Redraw heatmaps using `--min=-3 --max=3` thresholds for those plots which have smaller range of signals:

In [25]:
%%bash 
cd mtyl-enrichment

declare -a Experiments=(
'hl60_6h_delta_stbl' 'hl60_72h_delta_stbl' 'hl60_120h_delta_stbl' 
'kg1_delta_stbl' 'ociaml2_delta_stbl' 'molm14_delta_stbl' 
'ociaml3_delta_stbl' 'thp1_delta_stbl'
'hl60_delta_te'
)
for exp in "${Experiments[@]}"; do

    echo $exp

    cd $exp

    perl /flash/bin/TEISERv1.1/Scripts/teiser_draw_matrix.pl \
        --pvmatrixfile=${exp}.txt.matrix \
        --summaryfile=${exp}.txt.summary \
        --expfile=${exp}.txt \
        --quantized=0 \
        --order=0 \
        --min=-3 --max=3 --cluster=5 \
        --colmap=${TEISERDIR}Scripts/HEATMAPS/cmap_1.txt &> ${exp}.log
        
        rm ${exp}.log
    cd ..
    echo "done!"

done 
cd ../

hl60_6h_delta_stbl
done!
hl60_72h_delta_stbl
done!
hl60_120h_delta_stbl
done!
kg1_delta_stbl
done!
ociaml2_delta_stbl
done!
molm14_delta_stbl
done!
ociaml3_delta_stbl
done!
thp1_delta_stbl
done!
hl60_delta_te
done!


In [26]:
# !mkdir mtyl-enrichment/log
# !mv mtyl-enrichment/*_GENESET mtyl-enrichment/log
# !mv mtyl-enrichment/*.log mtyl-enrichment/log
!rm -r mtyl-enrichment/*_GENESET

4. Make `png` figures:

In [None]:
%%bash 
for pdf in mtyl-enrichment/*/*.txt.summary.pdf; do 
    png=${pdf/.pdf/.png}
    di=`dirname $pdf`
    out=`basename $di`
    
    bash /rumi/shams/abe/GitHub/Abe/my_scripts/pdf2png.sh $pdf 

    mv $pdf mtyl-enrichment/${out}.pdf
    mv $png mtyl-enrichment/${out}.png
    
done 

In [42]:
rm mtyl-enrichment/*.png

In [45]:
%%bash 
for f in mtyl-enrichment/*.c.pdf; do o=${f/.c./.}; mv $f $o; done 

In [None]:
rm mtyl-enrichment/hyper_mtyl.txt mtyl-enrichment/hypo_mtyl.txt

In [40]:
!date

Wed Dec 22 18:10:46 PST 2021
