This repository contains all of the resources I have found helpful in studying genomics from the ground up.
- A Brief Guide to Genomics by the National Human Genome Research Institute
- Overview of genomic file formats. Introduction to common file types for genomics.
- FastQ file specification. Illumina raw data format specification. In general, this is thought to be the industry standard.
- SAM/BAM file specification. This is the industry standard file-type for aligned sequence data. Know this format like the back of your hand.
- Samtools: manipulate and perform common tasks for SAM/BAM/CRAM files.
- BWA aligner: Industry standard WGS/WXS aligner
- STAR aligner: Industry standard Transcriptome aligner
- Picard: Swiss-army knife of genomics
- GATK: Genome Analysis Toolkit, industry standard variant caller.
- FastQC: Industry standard raw data quality check software.
- htseq: Useful for counting gene expression
- 1000 genomes: The "thousand genome project" is a well-known project that houses raw/variant data from 1000 people across the world.
- dbGaP: NIH (U.S. based) genomics repository
- EGA: EBI (European based) genotype-phenotype repository
- dbSNP: de-facto single nucleotide polymorphism database. Mostly research oriented.
- OMIM: The "Online Mendelian Inheritance in Man" is a catalog that maps genotypes -> phenotypes. A great resource for handcrafting articles and literature curation.
- ClinVar: clinical significance of variations for humans.
- PolyPhen-2: The "Polymorphism Phenotyping v2" tool attempts to predict the functional/structural effects of a variant on a human protein.
- SIFT: The SIFT tool attempts to predict whether an amino acid substitution affects protein function.
- FATHMM: FATHMM stands for "functional analysis though hidden markov models". This tool attempts to predict the functional affect of a variant on the resulting protein.