# Preprocessing

In this folder, there are two scripts:
- feature_selection.py, to perform feature selection, i.e select CpG sites with the highest variance across cell types in the methylation reference matrix.
- intersect_bed.py, to intersect the reference and samples between them, so that their CpG sites are consistent and that they're usable by the DeMethify algorithm.
These 2 files are supposed to take bedmethyl files as input, so they assume that the first 3 columns correspond respectively to chromosome, start position and end position. They're easily adaptable to other cases, but aren't at this time.

To use intersect_bed.py, one should have bedtools installed and modify the script to indicate where it's located.

## Feature selection

In [1]:
!python feature_selection.py -h

usage: feature_selection.py [-h] [--bedfile BEDFILE] [--n N] [--out [OUT]]

Select top N rows with highest variance from a BED file.

options:
  -h, --help         show this help message and exit
  --bedfile BEDFILE  Path to the input BED file
  --n N              Number of top rows to select
  --out [OUT]        Path to output folder


Here, let's select the 100000 CpG sites with the highest variance in bed1.bed. 

In [12]:
!python feature_selection.py --bed bed1.bed --n 100000 --out .

A file bed1_select_ref.bed has been created. 

## Intersection

In [13]:
!python intersect_bed.py -h

usage: intersect_bed.py [-h] [--bed BED [BED ...]] [--out [OUT]]

Intersect multiple BED files using bedtools.

options:
  -h, --help           show this help message and exit
  --bed BED [BED ...]  List of BED files to intersect (at least two files
                       required).
  --out [OUT]          Path to output folder


Let's intersect our new reference bed1_select_ref.bed with the rest of the bed files in the folder:

In [15]:
!python intersect_bed.py --bed bed1_select_ref.bed bed2.bed bed3.bed bed4.bed --out .

Intersected files created:  ['bed1_select_ref_intersect.bed', 'bed2_intersect.bed', 'bed3_intersect.bed', 'bed4_intersect.bed']


All ready to apply DeMethify now!