# Setup and Configuration File Extraction

A configuration file `config.cnvkit.yml` in the `config/` directory is provided for specifying file paths, references to build, comparisons to analyze, chromosomes to plot, and cores for parallelization.  

All the analyses are done by extracting parameters from the configuration file, looping with Python, and running bash system commands through Python's `os` library.

In [None]:
import yaml
import os

In [None]:
with open('config-cnvkit.yml') as file:
    config = yaml.full_load(file)

In [None]:
bed = config['bed_path']
chromosomes = config['chromosomes']
comparisons = config['comparisons']
cores = config['cores']
fasta = config['fasta_path']
output = config['output_path']
references = config['references']

# Reference Creation

Compiling a copy-number reference from given files or directory (containing normal samples). The reference can be constructed from zero, one or multiple control samples. If given a reference genome, also calculate the GC content and repeat-masked proportion of each region. Files needed:

- bam files of normal/control sample(s)
- fasta file
- bed file with target regions

### Command:

**Option 1**:

Using wildcard * to specify all normal/control files to use for reference building.

```
cnvkit.py batch --normal normalFile*.bam \
--output-reference /output/path/nameOfReferenceToCreate.cnn \
--fasta /path/fastaFile.fna \
--targets /path/bedFile.bed \
--output-dir /output/path \
-p numberOfCoresToUseForParallelization
```

**Option 2**:

Listing each normal/control file separately if wildcard cannot be applied.

```
cnvkit.py batch --normal normalFile1.bam normalFile2.bam normalFileN.bam \
--output-reference /output/path/nameOfReferenceToCreate.cnn \
--fasta /path/fastaFile.fna \
--targets /path/bedFile.bed \
--output-dir /output/path \
-p numberOfCoresToUseForParallelization
```

In [None]:
for ref in references:
    cmd = """cnvkit/cnvkit.py batch --normal %s \
--output-reference %s/%s \
--fasta %s \
--targets %s \
--output-dir %s \
-p %d""" % (' '.join(references[ref]['files_for_ref']),
            output, references[ref]['output_ref'], 
            fasta, 
            bed,
            output,
            cores)
    os.system(cmd)

# Comparisons

Using a reference for calculating coverage in the given regions from BAM read depths.

### Command:

```
cnvkit.py batch mutantFile.bam \
-r /output/reference/path/referenceFile.cnn \
-d /output/path
-p numberOfCoresToUseForParallelization
```

In [None]:
for variety in comparisons:
    for comparison in comparisons[variety]:
        cmd = """cnvkit/cnvkit.py batch %s \
-r %s/%s \
-d %s/%s \
-p %d""" % (comparisons[variety][comparison]['mutant'], 
            output, comparisons[variety][comparison]['reference'],
            output, variety, 
            cores)
        os.system(cmd)

# Plotting

Plot bin-level log2 coverages and segmentation calls together. Without any further arguments, this plots the genome-wide copy number in a form familiar to those who have used array comparative genomic hybridization (aCGH). The options `--chromosome` or `-c` focuses the plot on the specified region.

### Command:

```
cnvkit.py scatter /output/path/mutantFileName.cnr \
-s /output/path/mutantFileName.cns \
-c chromosomeName
-o /output/path/nameOfPlot.png
-p numberOfCoresToUseForParallelization
```

In [None]:
for variety in comparisons:
    for comparison in comparisons[variety]:
        filename = comparison.replace('%s-vs-' % variety, '')
        for chromosome in chromosomes:
            cmd = """cnvkit/cnvkit.py scatter %s/%s/%s.cnr \
-s %s/%s/%s.cns \
-c %s \
-o %s/%s/%s-%s.png \
""" % (output, variety, filename,
            output, variety, filename,
            chromosome,
            output, variety, comparison, chromosome)
            os.system(cmd)

# Zooming into Interesting Regions
### Command Example

In [None]:
cmd = """cnvkit/cnvkit.py scatter \
-s ./output-using-bedtools-merge-count/catuai/1-C7.cns \
-c NC_039902.1 \
-o ./output-using-bedtools-merge-count/catuai-vs-1-C7-NC_039902.1.no_zoom.png \
./output-using-bedtools-merge-count/catuai/1-C7.cnr"""

os.system(cmd)

In [None]:
cmd = """cnvkit/cnvkit.py scatter \
-s ./output-using-bedtools-merge-count/catuai/1-C7.cns \
-c NC_039902.1:10000000-12000000 \
-o ./output-using-bedtools-merge-count/catuai-vs-1-C7-NC_039902.1.zoom.png \
./output-using-bedtools-merge-count/catuai/1-C7.cnr"""

os.system(cmd)