# Convert ChIP-Seq Signals to a Pattern of Mark States

Given a BedGraph file showing "fold-change over control" signals for arbitrary genomic intervals, generate a sequence of methylation states per nucleosome, where 0 indicates no tail methylated, 1 indicates one tail methylated, and 2 indicates both tails methylated.

<br>

### Before running this notebook:

- Download the desired BigWig file for an epigenetic mark with "fold signal over control" values from the ENCODE database.
- Move the BigWig file to directory containing UCSC sequence tool executables.
- Run the bigWigToBedGraph executable from UCSC using `./ bigWigToWig <BigWigFileName> <OutputFileName> chrom=chr##` from the directory containing the executable.
- Fill in the inputs to this notebook and run all cells.

### Setup

Import necessary modules.

In [1]:
from typing import Optional

import numpy as np
import pandas as pd
import bioframe as bf

import analyses.modifications.bedgraph_to_sequence as b2s

### Specify Inputs

Conversion from raw ChIP-seq to modification sequence depends on the following input parameters.

In [2]:
# Location of Bed Graph file; should contain "fold-change over control" values
file_path = "/scratch/users/jwakim/chromo_two_mark_phase_transition/output/example_bedgraph/ENCFF919DOR_H3K27me3_Bedgraph.bed"

# Path at which to save modification sequence
out_path = "/scratch/users/jwakim/chromo_two_mark_phase_transition/output/example_bedgraph/ENCFF919DOR_H3K27me3_methyl.txt"

# Fraction of all histone TAILS modified with methylation mark
fraction_methylated = 0.4

# Max num. iters to adjust thresholds to converge on fraction methylated target
max_iters = 1000

# Relative tolerance around fraction methylated target
rel_tol = 0.001

# Chromosome number for sequence of interest
chromosome = "chr16"

# Bead discretization in units of base pair
bp_per_nucleosome = 200

### Calibrate Conversion

Signal cutoffs will be defined such that a specified fraction of histone tails are methylated. Cutoffs are defined by percentile values in the overall distribution of signals.

In [3]:
pct_cutoffs = b2s.get_cutoffs(fraction_methylated)
print("Percentile Cutoffs between 0/1 and 1/2 tails methylated:")
print(pct_cutoffs)

Percentile Cutoffs between 0/1 and 1/2 tails methylated:
[0.26666667 0.53333333]


### Redistribute interval to match desired discretization

The BedGraph file discretizes signals into arbitrary bins. Redistribute the signals into widths matching individual nucleosomes.

In [4]:
signals = b2s.read_signals_from_bedgraph(file_path)
signals.head()

Unnamed: 0,chrom,start,end,value
0,chr16,0,10279,0.0
1,chr16,10279,10489,0.78156
2,chr16,10489,10858,0.0
3,chr16,10858,11068,0.78156
4,chr16,11068,12789,0.0


In [5]:
processed_signals = b2s.rediscretize_signals(signals, chromosome, bp_per_nucleosome)
processed_signals.head()

Unnamed: 0,chrom,start,end,value_scaled
0,chr16,0,200,0.0
1,chr16,200,400,0.0
2,chr16,400,600,0.0
3,chr16,600,800,0.0
4,chr16,800,1000,0.0


### Check Rediscretization

In [6]:
valid = b2s.check_rediscretization(signals, processed_signals)
assert valid, "Error with rediscretization. Inconsistent total signal."

### Generate Sequence of Modification States

In [7]:
methyl = b2s.get_modification_pattern(
    processed_signals, fraction_methylated, pct_cutoffs
)
print(f"Fraction Tails Methylated: {round(np.sum(methyl) / (2 * len(methyl)), 3)}")

Convergence Successful!
Fraction Tails Methylated: 0.401


In [8]:
# Save methylation pattern
np.savetxt(out_path, methyl, fmt='%i', delimiter="\n")

### Summary:

Given ChIP-seq outputs indicating "fold-change over control" signals for various genomic intervals, we redistributed those signals into nucleosome-scale bins and we applied cutoffs so that some setpoint fraction number of tails are methylated. We store the methylation sequence in an output directory for use in future simulations.