# Running FastQC through Python
- This notebook contain code for execute FastQC through Python using Subprocess Module.

## Installing FastQC

In [1]:
# Checking the Java version
! java -version

openjdk version "11.0.14" 2022-01-18
OpenJDK Runtime Environment (build 11.0.14+9-Ubuntu-0ubuntu2.18.04)
OpenJDK 64-Bit Server VM (build 11.0.14+9-Ubuntu-0ubuntu2.18.04, mixed mode, sharing)


In [2]:
## Based On
## https://raw.githubusercontent.com/s-andrews/FastQC/master/INSTALL.txt

# Downloading FastQC package
!wget https://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.9.zip

# Unzipping 
!unzip /content/fastqc_v0.11.9.zip

# Making file executable
!chmod 755 /content/FastQC/fastqc

# placing link in /usr/local/bin
!ln -s /content/FastQC/fastqc /usr/local/bin/fastqc

--2022-03-15 19:02:10--  https://www.bioinformatics.babraham.ac.uk/projects/fastqc/fastqc_v0.11.9.zip
Resolving www.bioinformatics.babraham.ac.uk (www.bioinformatics.babraham.ac.uk)... 149.155.133.4
Connecting to www.bioinformatics.babraham.ac.uk (www.bioinformatics.babraham.ac.uk)|149.155.133.4|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10249221 (9.8M) [application/zip]
Saving to: ‘fastqc_v0.11.9.zip’


2022-03-15 19:02:16 (1.81 MB/s) - ‘fastqc_v0.11.9.zip’ saved [10249221/10249221]

Archive:  /content/fastqc_v0.11.9.zip
  inflating: FastQC/cisd-jhdf5.jar   
   creating: FastQC/Configuration/
  inflating: FastQC/Configuration/adapter_list.txt  
  inflating: FastQC/Configuration/contaminant_list.txt  
  inflating: FastQC/Configuration/limits.txt  
  inflating: FastQC/fastqc           
  inflating: FastQC/fastqc_icon.ico  
   creating: FastQC/Help/
   creating: FastQC/Help/1 Introduction/
   creating: FastQC/Help/1 Introduction/.svn/
  inflating: FastQC/He

In [3]:
!fastqc --help


            FastQC - A high throughput sequence QC analysis tool

SYNOPSIS

	fastqc seqfile1 seqfile2 .. seqfileN

    fastqc [-o output dir] [--(no)extract] [-f fastq|bam|sam] 
           [-c contaminant file] seqfile1 .. seqfileN

DESCRIPTION

    FastQC reads a set of sequence files and produces from each one a quality
    control report consisting of a number of different modules, each one of 
    which will help to identify a different potential type of problem in your
    data.
    
    If no files to process are specified on the command line then the program
    will start as an interactive graphical application.  If files are provided
    on the command line then the program will run with no user interaction
    required.  In this mode it is suitable for inclusion into a standardised
    analysis pipeline.
    
    The options for the program as as follows:
    
    -h --help       Print this help file and exit
    
    -v --version    Print the version of the program and exit

## Python for FastQC

### Sequence File

In [4]:
! cp /content/drive/MyDrive/04-Work/Genomiki-Task/Sequences/VIC385_R1.fq /content/

In [5]:
! du -sh /content/VIC385_R1.fq

155M	/content/VIC385_R1.fq


In [6]:
! head -20 /content/VIC385_R1.fq 

@NB551233:185:H2HLKAFX2:1:11101:1131:9762/1
CCTTCAGTTGAACAGAGAAAACAAGATGATAAGAAAATCAAAGCTTGTGTTGAAGAAGTTACAACAACTCTGGAAGAAACTAAGTTCCTCACAGAAAACTTGTTACTTTATATTGACATTAATGGCAATCTTCATCCAGATTCTGCCACT
+
AAAAAEEEEEEEEEEAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE<EEEEEEEEEEEEEEEEEEEEEEEEEEEEEAAEEEEEEEE<<EEE
@NB551233:185:H2HLKAFX2:1:11101:1134:14269/1
ACATAGTGCTTAGCACGTAATCTGGCATTGACAACACTCAAATCATAATTTGTGGCCATTGAAATTTCATAAAAGACAACTATATCTGCTGTCGTCTCAGGCAATGCATTTACAGTACAAAAGACATACTGTTCTAATGTTGAATTCACTT
+
/AAAAEEEEE6EEAEEEEEEEEAEEAEEEA/EA/A<EEE6EE6AEEAE/EE6E66EEEAEEE6EEEEEEE6EEEAEEEEE/E<EEEEEE///E//A//EAA<AE/EEE6/EEA<//EEEAEEEAEA/EE</E/6EE///AE/E///6A<<E
@NB551233:185:H2HLKAFX2:1:11101:1139:14506/1
AGGGTGTTCACTTTGTTTGCAACTTGCTGTTGTTGTTTGTAACAGTTTACTCACACCTTTTGCTCGTTGCTGCTGGCCTTGAAGCCCCTTTTCTCTATCTTTATGCTTTAGTCTACTTCTTGCAGAGTATAAACTTTGTAAGAATAATAAT
+
AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE<EEEEEEEEEEEEEEE

In [7]:
# ! fastqc /content/VIC385_R1.fq -o /content/fastqc_op --extract

### Main Code

In [8]:
import os
import subprocess

In [9]:
# Sequence File Path
fastq_path = "/content/VIC385_R1.fq"

# Sequence file name
file_name = os.path.basename(fastq_path).split('.')[0]

# Output directory 
output_dir_name = "{}/{}_fastqc_output".format(os.getcwd(), file_name)

# Creating Output Directory
os.mkdir(output_dir_name)

# Creating Command
options = ['--extract',]
fastqc_command = "fastqc {} -o {} {}".format(fastq_path, output_dir_name, " ".join(options)) 

# Running FastQC
subprocess.check_call(fastqc_command, shell=True)

0

## References

1. https://www.youtube.com/watch?v=2Fp1N6dof0Y
2. https://www.geeksforgeeks.org/python-subprocess-module-to-execute-programs-written-in-different-languages/
3. https://geekflare.com/learn-python-subprocess/