Skip to content

AgResearch/DECONVQC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This repository contains scripts used to process and manage output sequence data and metadata related to Illumina hiseq and miseq machines, including running a number of GBS-specific Q/C steps for predominately GBS-related sequencing output. This project is focused on immediate upstream Q/C and sequence delivery, rather than custom downstream analyses.

Refer to DECONVQC.docx for more details

This resource contributes to and is partly supported by the MBIE programme "Genomics for Production & Security in a Biological Economy"

Below is a summary of the contents of the repository

#Database ##Schema

  • setup.psql
  • setup_quick_reports.psql

#Keyfile and run data management

  • deleteKeyfile.psql
  • get_keyfilename.psql
  • updateFastQLocationInKeyFile.psql
  • checkKeyFiles.psql
  • extractKeyfile5.psql
  • extract_sample_species.psql
  • extractKeyfile.psql
  • get_fastq_link.psql
  • addRun.sh
  • get_enzyme_count_from_database.sh
  • get_lane_from_database.sh
  • is_keyfile_in_database.sh
  • deleteKeyfile.sh
  • get_enzyme_from_database.sh
  • importKeyfile.sh
  • is_run_in_database.sh
  • updateFastqLocations.sh
  • extractKeyfile.sh
  • get_flowcellid_from_database.sh
  • importOrUpdateKeyfile.sh
  • listDBKeyfile.sh
  • sanitiseKeyFile.py

#Generic Sequence Q/C (i.e. not GBS-specific) ##Fastqc ##Alignment of sample against references

  • run_mapping_preview.sh

##Contamination check - alignment of random sample of reads against nt

  • run_sample_contamination_checks.sh
  • summarise_global_hiseq_taxonomy.sh
  • summarise_global_hiseq_taxonomy.py
  • taxonomy_clustering.r

##GBS Q/C

###Tassel

  • link_key_files.sh
  • summarise_global_hiseq_reads_tags_cv.sh
  • summarise_read_and_tag_counts.py
  • summarise_hiseq_taxonomy.py
  • get_reads_tags_per_sample.py
  • tags_plots.r

###KGD

  • batch_kgd.sh
  • run_kgd.sh
  • GBS-Chip-Gmatrix.R
  • run_kgd.R

###k-mer analysis

  • kmer_entropy.py
  • kmer_plots_gbs.r

#Overall workflow

  • process_hiseq1.0.sh
  • gbs_hiseq1.0.sh
  • archive_hiseq1.0.sh
  • process_hiseq1.0.mk
  • gbs_hiseq1.0.mk
  • archive_hiseq1.0.mk
  • get_processing_parameters.py
  • species_config.txt

#Presenting Results to Users

  • extract_peacock.psql
  • make_peacock_plots.sh
  • make_peacock_plots.py
  • make_run_plots.py

#Project Lifecycle

  • clean_hiseq_run.sh
  • fix_archived_run_fastq_links.sh

#Utilities

  • cat_tag_count.sh
  • tags_to_fasta.py
  • prbdf.py

#Documentation

  • DECONVQC.docx