# Process Coverage Notebook

This notebook demonstrates how to load and process .cov single-cell methylome files generated by [Bismark](https://www.bioinformatics.babraham.ac.uk/projects/bismark/) into binarized methylation matrices that are required as input to the run_scAge function. Refer to the README file on the GitHub page for information on alternative processing methods for single-cell methylation data. <br><br> Three single-cell methylation profiles from the [(Gravina et al, 2016)](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-016-1011-3) study are provided as both raw .cov files and processed .tsv files in `sc_data_raw` and `sc_data_processed`, respectively.

## Import required packages

In [1]:
import numpy as np
import pandas as pd
import os
import scAge

## Check coverage files

In [14]:
# designate input directory of .cov files
input_coverage_directory = "./sc_data_raw/"

# get .cov files
input_coverage_files = sorted(os.listdir(input_coverage_directory))
print("Coverage file input directory: '%s'" % input_coverage_directory)

# cycle through .cov files
for count, file in enumerate(input_coverage_files):
    print("\tRaw .cov file #%s --> '%s'" % (count + 1, file))
    
# denote output path for processed .tsv files
output_path = "./sc_data_processed/"
print("\nProcessed file output directory: '%s'" % output_path)

Coverage file input directory: './sc_data_raw/'
	Raw .cov file #1 --> 'SRR3136627_bismark_bt2.deduplicated.bismark.cov.gz'
	Raw .cov file #2 --> 'SRR3136628_bismark_bt2.deduplicated.bismark.cov.gz'
	Raw .cov file #3 --> 'SRR3136659_bismark_bt2.deduplicated.bismark.cov.gz'

Processed file output directory: './sc_data_processed/'


## Process coverage files

In [11]:
# run process_coverage
scAge.process_coverage(cov_directory = input_coverage_directory,
                       n_cores = 3,
                       max_met = 100,
                       split = "_",
                       chunksize = 1,
                       binarization = "round",
                       output_path = output_path)

process_coverage function starting!

----------------------------------------------------------
Loading .cov files from './sc_data_raw/'
Number of Bismark .cov files = 3
First .cov file name: 'SRR3136627_bismark_bt2.deduplicated.bismark.cov.gz'
----------------------------------------------------------

----------------------------------------------------------
Starting parallel loading and processing of .cov files...


Single-cell loading progress :   0%|          | 0/3 [00:00<?, ? cell methylomes/s]


Parallel loading complete!
Processed binary methylation matrices written to './sc_data_processed/'
----------------------------------------------------------

Time elapsed to process coverage files = 13.429 seconds

process_coverage run complete!
