Skip to content

TonyKess/genotyping_hpc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Genotyping in an HPC environment

Preamble

What is this?

This repo is to provide example code for a training module on running a genotyping and genotype likelihoods pipeline in an HPC environment. It also works other places. Go here for varying versions of the code used in Atlantic Salmon and Lumpfish. Go here for info on how these tools will run in a cloud virtual machine. The code used here is to show one way of running a genotyping pipeline with a specific set of tools. There are many different ways to approach genotyping, but the general steps involve some version of: cleaning up raw reads, aligning them to a genome, updating and improving those alignments a bit, and calling variant sites of different kinds.. Often these approaches are cited as the "GATK best practices", but they're definitely usually not, unless it's a human genomics project. This paper remains my go-to manuscript for getting a better handle on the general ideas underlying most of these steps.

What isn't this!?

The point of this repo and training is not to turn you into an expert bioinformatcian, or software engineer. I definitely don't expect that of myself, and this code won't get you there. The goal here is build some general familiarty with genomic data processing, genotyping tools, and submitting jobs to a cluster. There is likely a tradeoff in time investment with developing elegant code and generating biologically and scientifically meaningful results:

Bioinformatics_laffercurve

But building these skills can be rewarding and fun. So, my only advice here is try to pick up new things while developing code that looks good enough to you to get the information from the natural world that you care about.

Misc

Also some misc. links that cover tools used that I have found helpful:

The most useful (and basically only) training resource I've used.

GATK parallelism and multithreading

GATK best practices in practice

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages