# Building a methylation-informed regulatory network
Daniel C Morgan<sup>1</sup>

<sup>1</sup>Channing Division of Network Medicine, Harvard Medical School, Boston, MA.

## Introduction

Inexpensive, high quality methylation platforms (illumina 450k/850k arrays) have shifted the prospect of integrating such omics into more traditional resources for the inference of gene regulatory networks (GRN). The recent literature has expanded in every direction to take advantage of this new, ubiquitous data source, and as such a wide range of approaches have been proposed. Recent publications range from potential use cases where it makes the most sense to account for DNA methylation in regulatory framework to methodological approaches for doing just that without regard for the research question. This book chapter is organized based upon this continuum, starting with use cases most appropriate for taking advantage of DNA methylation information towards more recent approaches of methodologies and frameworks for appropriately and accurately doing so. Lastly, we shall work through a modern approach for both, namely a meGRN framework and use case example, to see the utility methylation can lend to an investigation.

## Loading libraries

In [1]:
import matplotlib.pyplot as plt
import numpy as np
import os
import seaborn as sns
import scipy
import pandas as pd
import pybedtools

## Motivation

The matter involves finding regions of known TF binding which also have methylation events proximal to these locations. For this, one can rely on WGBS chromosome location or array data which illumina provides cg names for, which map to similar locations.


## Outline:
- Brief literature review
- Data formats -- what is required, depending on availability
  - Chr start stop
  - Cg annotation (illumina)
  - Other → some method for relating genome location to binding
- Pipeline:
  0. FIMO scan & open-access data
  1. (py)bedtools
  2. integrating
    1. Integrating into netzoo bipartite
    1. Integrating into other GRN framework
      1. Smaller GRN models (TF-gene subsets) could just merge with methyl-motif and reduce their estimates where overlap / downweight (GRN estimate x 1- avg meth ratio)
  3. Benchmarking against ChIP-seq

## Gathering data

In [2]:
## A549 WGBS
# !wget https://www.encodeproject.org/files/ENCFF005TID/@@download/ENCFF005TID.bed.gz
# !gunzip ENCFF279HCL.bed.gz

## PPI
# !wget https://granddb.s3.amazonaws.com/gpuPANDA/ppi2015_freezeCellLine.txt

## remap ChIP (filter to A549)
# !wget http://remap.univ-amu.fr/storage/remap2020/hg38/MACS2/remap2020_nr_macs2_hg38_v1_0.bed.gz
# !gunzip ENCFF279HCL.bed.gz

In [None]:
methyl_motif['TF']=methyl_motif['TF'].str.replace(r"\(.*\)","")
m_motif=methyl_motif[['TF','gene','weight','W1','ChIPTF']]
# um_motif=m_motif.drop_duplicates()
min_um_motif=motif_file.merge(m_motif,on=['TF','gene'])
min_um_motif['W1']=np.round(min_um_motif['W1'],decimals=3)
mm=min_um_motif.groupby(['TF','gene']).max()
mmm=mm.reset_index()
mmm[['TF','gene','W1']].to_csv('min_0max_methyl_motif.txt',sep='\t',header=False,index=False)
mmm[['TF','gene','weight_y']].to_csv('min_0max_pwm_motif.txt',sep='\t',header=False,index=False)
# mmm[['TF','gene','ChIPTF']].to_csv('min_0max_ChIPTF_motif.txt',sep='\t',header=False,index=False)
mmm.weight_y=1
mmm[['TF','gene','weight_y']].to_csv('min_0one_motif.txt',sep='\t',header=False,index=False)


nn=min_um_motif.groupby(['TF','gene']).mean()
nnn=nn.reset_index()
nnn[['TF','gene','W1']].to_csv('min_0mean_methyl_motif.txt',sep='\t',header=False,index=False)
nnn[['TF','gene','weight_y']].to_csv('min_0mean_pwm_motif.txt',sep='\t',header=False,index=False)


In [8]:
def test_panda(motif,size):
    panda_objC=netZooPy.panda.Panda(expression_file=None,#'drive/MyDrive/Colab Notebooks/milipede_bench/Hugo_exp1_lcl.txt',
                          motif_file='drive/MyDrive/Colab Notebooks/milipede_bench/'+size+'/'+motif,
                          ppi_file='ppi2015_freezeCellLine.txt',
                          computing='cpu',modeProcess='intersection',save_memory=False,save_tmp=False,
                          precision='single',keep_expression_matrix=False)
    panda_objC.export_panda_results.to_csv('bench/'+size+'/output_'+size+'/'+motif,sep='\t',header=True,index=False)
    return panda_objC.export_panda_results

In [None]:
#mean no buffer
test_panda(min_0mean_methyl_motif,0bp_bench)
test_panda(min_0mean_pwm_motif,0bp_bench)
test_panda(min_0max_one_motif,0bp_bench)

##max no buffer
test_panda(min_0max_methyl_motif,0bp_bench)
test_panda(min_0max_pwm_motif,0bp_bench)
# test_panda(min_0max_one_motif,0bp_bench) not needed

##mean 100bp buffer
test_panda(min_100mean_methyl_motif,100bp_bench)
test_panda(min_100mean_pwm_motif,100bp_bench)
test_panda(min_100mean_one_motif,100bp_bench)

##max 100bp buffer
test_panda(min_100max_methyl_motif,100bp_bench)
test_panda(min_100max_pwm_motif,100bp_bench)
test_panda(min_100max_one_motif,100bp_bench)
