> We have already established that using annotated HeLa m6A sites, we can observe changes in genes with m6A sites (HL-60) cells. In order to confirm this m6A sites, we performed MeRIP-seq in treated and untreated cells, and did observe a general increase in m6A levels upon treatments for a large number of annotated sites. Here, our goal is to indpendently analyze the MeRIP data without relying on HeLa annotations and use it to define a **treatment-induced hyper-methylation sites**. We will then assess the location and behaviour of these targets across the other datasets generated in this study.

## Hyper-geometric test

### Goal
Here, I aim to do _hyper-geometric test_ on enriched genesets (iPAGE results) in CRISPR screening experiment (Rho scores) to test hyper/hypo methylation enrichment. 

1. Read iPAGE results into python

In [None]:
# https://bioinformatics.stackexchange.com/questions/5400/how-to-convert-data-in-gmt-format-to-dataframe
# https://gseapy.readthedocs.io/en/latest/gseapy_tutorial.html

In [28]:
import pandas as pd
from glob import glob 
import sys

sys.path.append('/rumi/shams/abe/GitHub/Abe/my_scripts/')

import ipage_down as ipd

In [32]:
type(ipd.make_page_dict)

function

In [31]:
[
    'make_page_dict',
    'read_gmt',
    'read_page_annotations',
    'read_page_index',
    'read_page_names',
    'subset_dict'
]

['make_page_dict',
 'read_gmt',
 'read_page_annotations',
 'read_page_index',
 'read_page_names',
 'subset_dict']

In [2]:
def get_enriched(pagefolder,side='L'):
    paths = glob(f'{pagefolder}/*/pvmatrix.{side}.txt')

    out = {} 
    for f in paths:
        page = make_page_dict(f)
        if 'index' in page['annotations'].keys():
            key = page['gs_name']
#             out[key] = page['data'].index.tolist()
            out[key] = [page['annotations']['index'][gs] for gs in page['data'].index.tolist()]
        
    return out

In [None]:
D = read_page_index('/flash/bin/iPAGEv1.0/PAGE_DATA/ANNOTATIONS/human_c2_gs/human_c2_gs_index.txt')

In [None]:
enrich = get_enriched('screen/CRISPRi_HL60_rho/')

In [None]:
import numpy as np

In [None]:
out.values()

In [None]:
l_page = [make_page_dict(path) for path in glob('screen/CRISPRi_HL60_rho/*/pvmatrix.L.txt')]
r_page = [make_page_dict(path) for path in glob('screen/CRISPRi_HL60_rho/*/pvmatrix.R.txt')]

2. Run the hypergeom test





https://github.com/JohnDeJesus22/DataScienceMathFunctions/blob/master/hypergeometricfunctions.py#L38

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from scipy.special import comb

def hypergeom_pmf(N, A, n, x):
    '''
    Probability Mass Function for Hypergeometric Distribution
    :param N: population size
    :param A: total number of desired items in N
    :param n: number of draws made from N
    :param x: number of desired items in our draw of n items
    :returns: PMF computed at x
    '''
    Achoosex = comb(A,x)
    NAchoosenx = comb(N-A, n-x)
    Nchoosen = comb(N,n)
    
    return (Achoosex)*NAchoosenx/Nchoosen
    
    
def hypergeom_plot(N, A, n):
    '''
    Visualization of Hypergeometric Distribution for given parameters
    '''
    x = np.arange(0, n+1)
    y = [hypergeom_pmf(N, A, n, x) for x in range(n+1)]
    plt.plot(x, y, 'bo')
    plt.vlines(x, 0, y, lw=2)
    plt.xlabel('# of desired items in our draw')
    plt.ylabel('Probablities')
    plt.title('Hypergeometric Distribution Plot')
    plt.show()


In [None]:
hyper, hypo = [set(list(mtyl.name)) for mtyl in two_sided_mtyl()]

In [None]:
l_page

In [None]:
gmt = l_page[3]['annotations']['gmt']
pw = [gmt[pw]['genes'] for pw in gmt][0]

In [None]:
def intersection(lst1, lst2): 
    lst3 = [value for value in lst1 if value in lst2] 
    return lst3 

In [None]:
N = len(hyper)
A = len(intersection(hyper, pw))

In [None]:
help(hypergeom_pmf)

For this matter, we have:

    ○ N = # of all genes 
    ○ n/A = # of metylated genes OR enriched pathway's genes 
    ○ x = # of genes in n and A overlap 

In [None]:
hypergeom_pmf(100,10,5,4)

In [None]:
from utility import *

In [None]:
screen = pd.read_excel('screen/CRISPRi_HL60_DAC_genetable_collapsed.xlsx')

Data['hl60']['rho'] = make_score_df(screen, 'rho')
Data['hl60']['gamma'] = make_score_df(screen, 'gamma')

OK! I need to write something that can read index and names in iPAGE format, easy!