This repository contains scripts used to process and manage output sequence data and metadata related to Illumina hiseq and miseq machines, including running a number of GBS-specific Q/C steps for predominately GBS-related sequencing output. This project is focused on immediate upstream Q/C and sequence delivery, rather than custom downstream analyses.
Refer to DECONVQC.docx for more details
This resource contributes to and is partly supported by the MBIE programme "Genomics for Production & Security in a Biological Economy"
Below is a summary of the contents of the repository
#Database ##Schema
- setup.psql
- setup_quick_reports.psql
#Keyfile and run data management
- deleteKeyfile.psql
- get_keyfilename.psql
- updateFastQLocationInKeyFile.psql
- checkKeyFiles.psql
- extractKeyfile5.psql
- extract_sample_species.psql
- extractKeyfile.psql
- get_fastq_link.psql
- addRun.sh
- get_enzyme_count_from_database.sh
- get_lane_from_database.sh
- is_keyfile_in_database.sh
- deleteKeyfile.sh
- get_enzyme_from_database.sh
- importKeyfile.sh
- is_run_in_database.sh
- updateFastqLocations.sh
- extractKeyfile.sh
- get_flowcellid_from_database.sh
- importOrUpdateKeyfile.sh
- listDBKeyfile.sh
- sanitiseKeyFile.py
#Generic Sequence Q/C (i.e. not GBS-specific) ##Fastqc ##Alignment of sample against references
- run_mapping_preview.sh
##Contamination check - alignment of random sample of reads against nt
- run_sample_contamination_checks.sh
- summarise_global_hiseq_taxonomy.sh
- summarise_global_hiseq_taxonomy.py
- taxonomy_clustering.r
##GBS Q/C
###Tassel
- link_key_files.sh
- summarise_global_hiseq_reads_tags_cv.sh
- summarise_read_and_tag_counts.py
- summarise_hiseq_taxonomy.py
- get_reads_tags_per_sample.py
- tags_plots.r
###KGD
- batch_kgd.sh
- run_kgd.sh
- GBS-Chip-Gmatrix.R
- run_kgd.R
###k-mer analysis
- kmer_entropy.py
- kmer_plots_gbs.r
#Overall workflow
- process_hiseq1.0.sh
- gbs_hiseq1.0.sh
- archive_hiseq1.0.sh
- process_hiseq1.0.mk
- gbs_hiseq1.0.mk
- archive_hiseq1.0.mk
- get_processing_parameters.py
- species_config.txt
#Presenting Results to Users
- extract_peacock.psql
- make_peacock_plots.sh
- make_peacock_plots.py
- make_run_plots.py
#Project Lifecycle
- clean_hiseq_run.sh
- fix_archived_run_fastq_links.sh
#Utilities
- cat_tag_count.sh
- tags_to_fasta.py
- prbdf.py
#Documentation
- DECONVQC.docx