# CGI_Finder API usage

## Import module

In [2]:
# Import main module 
from pycoMeth.CGI_Finder import CGI_Finder

# optionally inport jupyter helper functions
from pycoMeth.common import head, jhelp

## Getting help

In [3]:
jhelp(CGI_Finder)

**CGI_Finder** (ref_fasta_fn, output_tsv_fn, output_bed_fn, merge_gap, min_win_len, min_CG_freq, min_obs_CG_ratio, verbose, quiet, progress, kwargs)

Simple method to find putative CpG islands in DNA sequences by using a sliding window and merging overlapping windows satisfying the CpG island definition. Results can be saved in bed and tsv format

---

* **ref_fasta_fn** (required) [str]

Reference file used for alignment in Fasta format (ideally already indexed with samtools faidx)

* **output_tsv_fn** (default: None) [str]

Path to write an more extensive result report in TSV format (At least 1 output file is required)

* **output_bed_fn** (default: None) [str]

Path to write a summary result file in BED format (At least 1 output file is required)

* **merge_gap** (default: 0) [int]

Merge close CpG island within a given distance in bases

* **min_win_len** (default: 200) [int]

Length of the minimal window containing CpG. Used as the sliding window length

* **min_CG_freq** (default: 0.5) [float]

Minimal C+G frequency in a window to be counted as a valid CpG island

* **min_obs_CG_ratio** (default: 0.6) [float]

Minimal Observed CG dinucleotidefrequency over expected distribution in a window to be counted as a valid CpG island

* **verbose** (default: False) [bool]

* **quiet** (default: False) [bool]

* **progress** (default: False) [bool]

* **kwargs**



## Example usage

#### Basic usage with yeast genome

In [6]:
ff = CGI_Finder (
    ref_fasta_fn="./data/yeast.fa",
    output_bed_fn="./results/yeast_CGI.bed",
    output_tsv_fn="./results/yeast_CGI.tsv",
    progress=True)

head("./results/yeast_CGI.tsv")
head("./results/yeast_CGI.bed")

## Checking options and input files ##
## Parsing reference fasta file ##
	Parsing Reference sequence: I
100%|██████████| 230k/230k [00:00<00:00, 769k bases/s] 
	Parsing Reference sequence: II
100%|██████████| 813k/813k [00:00<00:00, 932k bases/s] 
	Parsing Reference sequence: III
100%|██████████| 316k/316k [00:00<00:00, 865k bases/s] 
	Parsing Reference sequence: IV
100%|██████████| 1.53M/1.53M [00:01<00:00, 830k bases/s]
	Parsing Reference sequence: V
100%|██████████| 577k/577k [00:00<00:00, 852k bases/s] 
	Parsing Reference sequence: VI
100%|██████████| 270k/270k [00:00<00:00, 621k bases/s] 
	Parsing Reference sequence: VII
100%|██████████| 1.09M/1.09M [00:01<00:00, 866k bases/s]
	Parsing Reference sequence: VIII
100%|██████████| 562k/562k [00:00<00:00, 921k bases/s] 
	Parsing Reference sequence: IX
100%|██████████| 440k/440k [00:00<00:00, 826k bases/s] 
	Parsing Reference sequence: X
100%|██████████| 746k/746k [00:00<00:00, 855k bases/s] 
	Parsing Reference sequence: XI
100%|██████

chromosome start end   length num_CpG CG_freq obs_exp_freq 
I          17    333   316    4       0.509   0.614        
I          1804  2170  366    14      0.495   0.650        
I          25527 25912 385    16      0.488   0.776        
I          31835 32949 1114   59      0.497   0.876        
I          33497 34371 874    39      0.506   0.715        
I          38163 38471 308    13      0.487   0.715        
I          44294 44565 271    12      0.487   0.747        
I          44730 44988 258    9       0.481   0.608        
I          45308 45526 218    12      0.495   0.908        

track name=CpG_islands
I	17	333
I	1804	2170
I	25527	25912
I	31835	32949
I	33497	34371
I	38163	38471
I	44294	44565
I	44730	44988
I	45308	45526

