# **Automation of the CLI commands of the Snakemake workflow**

This Jupyter Notebook has the objective of automating Snakemake commands.

While it can be useful for quick tasks, using this instead of the command line may not provide you with the ability to access some of Snakemake's advanced features.

* Environment to use: [Greference_tools](code/environments/Greference_tools.yml)

In [2]:
## Libreries
import subprocess
import os
import re

In [None]:
## lists for the loops
chr_list        = ["chr3", "chr5", "chr7", "chr12", "chr17"]

sample_list     = ["ERR696683", "ERR753368", "ERR753369" ,"ERR753370" ,"ERR753371" ,"ERR753372", 
                  "ERR753373", "ERR753374", "ERR753375", "ERR753376", "ERR753377", "ERR753378"]


1. **rule download_data**

In [None]:
subprocess.run(["snakemake", "--cores", "1", "download_data"])

* rule pre_processing

Filtering the chromosomes, make sure that the chromosomes of the config file are **ch3**. 

In [None]:
## Running the 3º chromosome
subprocess.run(["snakemake", "--cores", "1", "data/original_bam/filtering/ERR696683_chr3_sorted.bam"])

## Running the rest of the chromosomes
for chr in range(len(chr_list) - 1):
    chr1 = chr_list[chr]
    chr2 = chr_list[chr + 1]
    
    print("""
          Chromosomes transformed:
          """, chr1, chr2)
    
    subprocess.run(["sed", "-i", "6,+12s/" + chr1 + "/" + chr2 + "/g", "config.yaml"])
    subprocess.run(["sed", "-i", "39s/" + chr1 + "/" + chr2 + "/g", "config.yaml"])

    subprocess.run(["snakemake", "--cores", "1", "data/original_bam/filtering/ERR696683_" + chr2 + "_sorted.bam"])

In [None]:
## reset config 
subprocess.run(["sed", "-i", "6,+12s/chr17/chr3/g", "config.yaml"])
subprocess.run(["sed", "-i", "39s/chr17/chr3/g", "config.yaml"])

* rules reference_genome; fastqc; fastp; fastqc_trimmed; bwa_mapping; sam_to_bam; delete_duplicates; extracting_variants; vep_install; vep_cli; parsing_dataR

In [None]:
## Running the workflow for the 3º chormosome

subprocess.run(["snakemake", "--cores", "1", "data/reference/genome.fa"]) ## rule reference_genome 

for sample in sample_list:
    ## Quality inspection
    subprocess.run(["snakemake", "--cores", "1", "--use-conda",
                    "results/fastqc_result/" + sample + "_chr3_1_fastqc.html",   
                    "data/processed/" + sample + "_chr3_1_fastp.fastq.gz",   
                    "results/fastqc_result/trimmed/" + sample + "_chr3_1_fastp_fastqc.html"])
    
for sample in sample_list:
    ## Mapping reads
    subprocess.run(["snakemake", "--cores", "1", "--use-conda",
                    "results/mapped_reads/" + sample + "_chr3_sorted.sam",   
                    "results/mapped_reads/bam_files/" + sample + "_chr3_sorted.bam",   
                    "results/mapped_reads/bam_files/" + sample + "_chr3_dedup.bam"])
    
for sample in sample_list:
    # Extracting variants  
    subprocess.run(["snakemake", "--cores", "1", "--use-conda", "results/variants/" + sample + "_chr3.vcf"])   

## Downloading VEP
subprocess.run(["snakemake", "--cores", "1", "--use-conda", "vep_install_db"])
## VEP CLI
for sample in sample_list:
    subprocess.run(["snakemake", "--cores", "1", "--use-conda", "results/variants/vep/" + sample + "_chr3.txt"])

## Extracting the gene PIKCA3 from the VEP files
subprocess.run(["snakemake", "--cores", "1", "parsing_dataR"])


## Changing the config file to do the chromosome 5

In [5]:
changes_chr3_to_5 = ["6,+12s/chr3/chr5/g", "24,+12s/chr3/chr5/g", "39s/chr3/chr5/g",
                    "46s/chromosome.3.fa.gz/chromosome.5.fa.gz/g", "65s/PIK3CA/APC/g"]
for change in changes_chr3_to_5:
    subprocess.run(["sed", "-i", change, "config.yaml"])

In [None]:
## Running the workflow for the 5º chormosome

subprocess.run(["snakemake", "--cores", "1", "--force", "data/reference/genome.fa"]) ## rule reference_genome 

for sample in sample_list:
    ## Quality inspection
    subprocess.run(["snakemake", "--cores", "1", "--use-conda",
                    "results/fastqc_result/" + sample + "_chr5_1_fastqc.html",   
                    "data/processed/" + sample + "_chr5_1_fastp.fastq.gz",   
                    "results/fastqc_result/trimmed/" + sample + "_chr5_1_fastp_fastqc.html"])
    
for sample in sample_list:
    ## Mapping reads
    subprocess.run(["snakemake", "--cores", "1", "--use-conda",
                    "results/mapped_reads/" + sample + "_chr5_sorted.sam",   
                    "results/mapped_reads/bam_files/" + sample + "_chr5_sorted.bam",   
                    "results/mapped_reads/bam_files/" + sample + "_chr5_dedup.bam"])
    
for sample in sample_list:
    # Extracting variants  
    subprocess.run(["snakemake", "--cores", "1", "--use-conda", "results/variants/" + sample + "_chr5.vcf"])   

## VEP CLI
for sample in sample_list:
    subprocess.run(["snakemake", "--cores", "1", "--use-conda", "results/variants/vep/" + sample + "_chr5.txt"])

## Extracting the gene PIKCA3 from the VEP files
subprocess.run(["snakemake", "--cores", "1", "parsing_dataR"])

## Changing the config file to do the chromosome 7 

In [6]:
changes_chr5_to_7 = ["6,+12s/chr5/chr7/g", "24,+12s/chr5/chr7/g", "39s/chr5/chr7/g",
                     "46s/chromosome.5.fa.gz/chromosome.7.fa.gz/g", "65s/APC/BRAF/g"]
for change in changes_chr5_to_7:
    subprocess.run(["sed", "-i", change, "config.yaml"])   

In [None]:
## Running the workflow for the 7º chormosome

subprocess.run(["snakemake", "--cores", "1", "--force", "data/reference/genome.fa"]) ## rule reference_genome 

for sample in sample_list:
    ## Quality inspection
    subprocess.run(["snakemake", "--cores", "1", "--use-conda",
                    "results/fastqc_result/" + sample + "_chr7_1_fastqc.html",   
                    "data/processed/" + sample + "_chr7_1_fastp.fastq.gz",   
                    "results/fastqc_result/trimmed/" + sample + "_chr7_1_fastp_fastqc.html"])
    
for sample in sample_list:
    ## Mapping reads
    subprocess.run(["snakemake", "--cores", "1", "--use-conda",
                    "results/mapped_reads/" + sample + "_chr7_sorted.sam",   
                    "results/mapped_reads/bam_files/" + sample + "_chr7_sorted.bam",   
                    "results/mapped_reads/bam_files/" + sample + "_chr7_dedup.bam"])
    
for sample in sample_list:
    # Extracting variants  
    subprocess.run(["snakemake", "--cores", "1", "--use-conda", "results/variants/" + sample + "_chr7.vcf"])   

## VEP CLI
for sample in sample_list:
    subprocess.run(["snakemake", "--cores", "1", "--use-conda", "results/variants/vep/" + sample + "_chr7.txt"])

## Extracting the gene PIKCA3 from the VEP files
subprocess.run(["snakemake", "--cores", "1", "parsing_dataR"])

## Changing the config file to do the chromosome 12 

In [7]:
changes_chr7_to_12 = ["6,+12s/chr7/chr12/g", "24,+12s/chr7/chr12/g", "39s/chr7/chr12/g",
                      "46s/chromosome.7.fa.gz/chromosome.12.fa.gz/g", "65s/BRAF/KRAS/g"]
for change in changes_chr7_to_12:
    subprocess.run(["sed", "-i", change, "config.yaml"])     

In [None]:
## Running the workflow for the 12º chormosome

subprocess.run(["snakemake", "--cores", "1", "--force", "data/reference/genome.fa"]) ## rule reference_genome 

for sample in sample_list:
    ## Quality inspection
    subprocess.run(["snakemake", "--cores", "1", "--use-conda",
                    "results/fastqc_result/" + sample + "_chr12_1_fastqc.html",   
                    "data/processed/" + sample + "_chr12_1_fastp.fastq.gz",   
                    "results/fastqc_result/trimmed/" + sample + "_chr12_1_fastp_fastqc.html"])
    
for sample in sample_list:
    ## Mapping reads
    subprocess.run(["snakemake", "--cores", "1", "--use-conda",
                    "results/mapped_reads/" + sample + "_chr12_sorted.sam",   
                    "results/mapped_reads/bam_files/" + sample + "_chr12_sorted.bam",   
                    "results/mapped_reads/bam_files/" + sample + "_chr12_dedup.bam"])
    
for sample in sample_list:
    # Extracting variants  
    subprocess.run(["snakemake", "--cores", "1", "--use-conda", "results/variants/" + sample + "_chr12.vcf"])   

## VEP CLI
for sample in sample_list:
    subprocess.run(["snakemake", "--cores", "1", "--use-conda", "results/variants/vep/" + sample + "_chr12.txt"])

## Extracting the gene PIKCA3 from the VEP files
subprocess.run(["snakemake", "--cores", "1", "parsing_dataR"])

## Changing the config file to do the chromosome 17 

In [8]:
changes_chr12_to_17 = ["6,+12s/chr12/chr17/g", "24,+12s/chr12/chr17/g", "39s/chr12/chr17/g",
                       "46s/chromosome.12.fa.gz/chromosome.17.fa.gz/g", "65s/KRAS/TP53/g"]
for change in changes_chr12_to_17:
    subprocess.run(["sed", "-i", change, "config.yaml"])

In [None]:
## Running the workflow for the 17º chormosome

subprocess.run(["snakemake", "--cores", "1", "--force", "data/reference/genome.fa"]) ## rule reference_genome 

for sample in sample_list:
    ## Quality inspection
    subprocess.run(["snakemake", "--cores", "1", "--use-conda",
                    "results/fastqc_result/" + sample + "_chr17_1_fastqc.html",   
                    "data/processed/" + sample + "_chr17_1_fastp.fastq.gz",   
                    "results/fastqc_result/trimmed/" + sample + "_chr17_1_fastp_fastqc.html"])
    
for sample in sample_list:
    ## Mapping reads
    subprocess.run(["snakemake", "--cores", "1", "--use-conda",
                    "results/mapped_reads/" + sample + "_chr17_sorted.sam",   
                    "results/mapped_reads/bam_files/" + sample + "_chr17_sorted.bam",   
                    "results/mapped_reads/bam_files/" + sample + "_chr17_dedup.bam"])
    
for sample in sample_list:
    # Extracting variants  
    subprocess.run(["snakemake", "--cores", "1", "--use-conda", "results/variants/" + sample + "_chr17.vcf"])   

## VEP CLI
for sample in sample_list:
    subprocess.run(["snakemake", "--cores", "1", "--use-conda", "results/variants/vep/" + sample + "_chr17.txt"])

## Extracting the gene PIKCA3 from the VEP files
subprocess.run(["snakemake", "--cores", "1", "parsing_dataR"])

In [9]:
## reset config 
changes_chr17_to_3 = ["6,+12s/chr17/chr3/g", "24,+12s/chr17/chr3/g", "39s/chr17/chr3/g",
                      "46s/chromosome.17.fa.gz/chromosome.3.fa.gz/g", "65s/TP53/PIK3CA/g"]
for change in changes_chr17_to_3:
    subprocess.run(["sed", "-i", change, "config.yaml"])

## **Last rule**

### **rule R_plotting**

Plotting the data of the 5 gene tables

In [None]:
subprocess.run(["snakemake", "--cores", "1", "R_plotting"])