GitHub

This the pipeline we use to test MAPGD.

Reference simulation: By in large genomes do not have a random structure, but instead transpositions, duplications, selective constraint etc. create substantial sequence similarity between different regions of the genome. This may create systematic errors when attempting to align reads to a reference genome/population, and we would like to capture these potential errors in our simulation. Ideally the de novo assembly of the genome should be simulated, so that the effects of the artifacts created by the assembly process can be correctly modeled, but we do not currently model this process. NEEDS: NOTHING MAKES: REFERENCE FILE STATUS: DONE
Population simulation: Individuals within a population are related to each other by a pedigree structure. For instance, in finite populations we cannot model the genotypes of two individuals as being independent samples. Nor can we assume that the coalescent trees between freely recombining sites are independent of each other, because the underlying pedigree which relates individuals influences the coalescent tree. We explicitly model this pedigree, but currently can only model freely recombining loci.

NEEDS: NOTHING MAKES: VARIANT PRESENCE/ABSENSE, PHENOTYPIC SCORES TODO: ADD LINKAGE
Variant simulation: Variants themselves are not random, but occur at different rates depending on the type of variant and the local sequences context.

NEEDS: REFERENCE, VARIANT PRESENCE/ABSENCE MAKES: INDIVIDUAL FASTAS TODO: ADD CONTEXT MODIFIRES
Sequencing simulation: Sequencers do not just make random errors when sequencing DNA. Luckily many programs have been developed to simulate sequencing, and we can just make use of the programs that already exist.

NEEDS: INDIVIDUAL FASTAS, REFERENCE MAKES: BAMFILES STATUS: DONE
Analysis pipelines: Labeled frequency calling: NEEDS: BAMFILES, REFFERENCE STATUS: MAPGD, GATK, BCFTOOLS, ANGSD Relatedness: NEEDS: BAMFILES, REFERENCE, PHENOTYPIC SCORES STATUS: MAPGD, GCTA, PLINK, Pooled frequency calling: NEEDS: BAMFILE, REFERENCE STATUS: MAPGD, BRESEQ

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
alignment		alignment
analysis_pipelines		analysis_pipelines
common		common
data		data
fast_correl		fast_correl
figures		figures
gsl_correl		gsl_correl
marcus		marcus
papers		papers
population_simulation		population_simulation
real_genomes		real_genomes
reference_simulation		reference_simulation
variant_simulation		variant_simulation
.gitignore		.gitignore
Allele_Frequency_Spectrum.py		Allele_Frequency_Spectrum.py
BayesB_Trait_set1.fitting		BayesB_Trait_set1.fitting
Checks.ods		Checks.ods
LD_Frequency_Spectrum.py		LD_Frequency_Spectrum.py
README.md		README.md
Rscript.rscript		Rscript.rscript
Rscript2.rscript		Rscript2.rscript
TODO		TODO
analysis_files		analysis_files
end.sh		end.sh
graph.vg		graph.vg
header.txt		header.txt
make_fam.py		make_fam.py
make_ms.sh		make_ms.sh
mapped.gam		mapped.gam
ms_to_state.py		ms_to_state.py
run_admixture.sh		run_admixture.sh
run_admixture2.sh		run_admixture2.sh
run_cov_test.sh		run_cov_test.sh
run_cov_test2.sh		run_cov_test2.sh
run_cov_test3.sh		run_cov_test3.sh
run_cov_test3_fast.sh		run_cov_test3_fast.sh
run_gwas_simulation.sh		run_gwas_simulation.sh
run_iota_scan.py		run_iota_scan.py
run_labeled_simulation.sh		run_labeled_simulation.sh
run_ld_simulation.sh		run_ld_simulation.sh
run_ld_simulation2.sh		run_ld_simulation2.sh
run_pooled_simulation.sh		run_pooled_simulation.sh
run_simulated_pedigree.sh		run_simulated_pedigree.sh
run_vg_test.sh		run_vg_test.sh
sequences		sequences
sequencing_simulation		sequencing_simulation
settings.sh		settings.sh
simulate_on_pedigree.py		simulate_on_pedigree.py

LynchLab/genomics_simulation

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Languages