<a href="https://colab.research.google.com/github/Aksinhaa/ColabFold/blob/main/NGS_collab_snp_identification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Variant calling is the process of identifying genetic variants, such as single nucleotide polymorphisms (SNPs) and small insertions or deletions (indels) from next-generation sequencing (NGS) data. It involves comparing sequencing reads from an individual to a reference genome to detect differences, like single nucleotide variants (SNVs). In this tutorial, we will focus specifically on detecting SNPs and indels.

In this step:

a) Call variants (Single Nucleotide Polymorphisms and Indels) using tools like bcftools or strelka

b) Produce a VCF file (Variant Call Format) with detailed information about each variant

c) Optionally apply variant filtering to remove false positives or low-confidence variants

Why it's important: This is the core step of population geneticsâ€”identifying the genetic differences across sample.


For this tutorial we are going to use Strelka, a tool utilised for germline and somatic variant calling.

1: Germline Calling: Utilizes haplotype-based model to accurately detect inherited variants.

(Haplotype is a set of DNA variants inherited together on the same chromosome copy)

2: Somatic Calling: identifying genetic mutations that arise in somatic (non-germline) cells. These mutations are not inherited from parents and do not get passed on to offspring.

Workflow Execution: Strelka2 can be run in two steps: configuration (specifying input data) and execution (specifying parameters).


The first step is to download the Miniconda installer for Linux using `wget`.


In [None]:
# Miniconda installation and environment setup for Colab NGS Workshop

# Download and install Miniconda (skip if already installed)
!wget -q https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
!bash miniconda.sh -b -p /usr/local/miniconda

import sys, os
sys.path.append('/usr/local/miniconda/lib/python3.8/site-packages')
os.environ['PATH'] = "/usr/local/miniconda/bin:" + os.environ['PATH']

# Explicitly clear potentially problematic environment variables from Python's os.environ
if 'CONDA_PREFIX' in os.environ:
    del os.environ['CONDA_PREFIX']
if 'CONDA_ENVS_PATH' in os.environ:
    del os.environ['CONDA_ENVS_PATH']

# Accept ToS for main and R conda channels
!conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main
!conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r

# Remove existing conda environment if it exists, then create a new one


# Install necessary bioinformatics tools into the environment
# Swapping channel order to prioritize conda-forge for dependencies
# and trying to install strelka without a specific build string first.
!conda create -n strelka -c bioconda strelka


In [None]:
!bash -c "source /usr/local/miniconda/bin/activate strelka && configureStrelkaGermlineWorkflow.py"