-
Notifications
You must be signed in to change notification settings - Fork 0
albodrug/haplopath
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
# ASSOCIATED SCIENTIFIC PUBLICATION Bodrug-Schepers A, Stralis-Pavese N, Buerstmayr H, Dohm JC, Himmelbauer H. Quinoa genome assembly employing genomic variation for guided scaffolding. Theor Appl Genet. 2021 Nov; DOI: 10.1007/s00122-021-03915-x # INSTALLATION REQUIREMENTS # Haplopath runs on Linux machines (tested on CentOS release 6.10 with 32 cores and 128 GB memory). It makes use of Perl libraries usually available with Perl 5.10 distributions. In addition it appeals to the system to use bcftools (view and query), vcf-annotate, vcftools, and sed. # RUNNING HAPLOPATH # To see the help of haplopath_connect.pl: >perl haplopath_connect.pl --help To run haplopath_connect.pl with the test data: First unpack the .gz test_data/sample.test.vcf.gz (gzip -dc). >perl haplopath_connect.pl --buildblock --cleanblock --buildlink --threads 30 --file test_data/sample.test.vcf You can only run the script within this folder because the functions in Pattsfunc.pm are indicated to be found in the path 'code_varpat/' (line 5). If you wish to execute it from somewhere else, change this path. To see the results expected from this run, look into the folder test_results/. You can browse the found relations (called links within the script and the output) by looking into the file links*hps. 'hps' stands for human perl script (output of dump from the Dumper package). 'bps' stands for binary perl script and will be used as input in the following step. You can see the intermediate steps before the relations files (links*) which are the raw fingerprints (block*) and the curated fingerprints, as well as the occurence of variation patterns throughout the vcf. The name of the output contains the timestamp from when it was generated. The stderr will detail the major steps of the script, you can save it by redirecting it with '&> out.stderr' # FILTERING RELATIONS/LINKS # Once the relations found within the vcf file, to filter sufficiently supported ones, use haplopath_validator.pl: >perl haplopath_validator.pl --help >perl haplopath_validator.pl --file test_output/links_unset_unset_6_10_2020_12_33_18.bps --jaccard 0.1 The output should be (contig1, contig2, average Jaccard index, number of relations, most abundant orientation): qpac010tig00005444len=60711 qpac010tig00005445len=129041 0.947916666666667 2 hh qpac010tig00005445len=129041 qpac010tig00005764len=450333 0.621668261990843 3 hh qpac010tig00000175len=540389 qpac010tig00005445len=129041 0.549448830713748 4 na qpac010tig00005444len=60711 qpac010tig00005764len=450333 0.702702702702703 1 na qpac010tig00000175len=540389 qpac010tig00005764len=450333 0.604737448370188 4 na # VISUALIZING RELATIONS The output of haplopath_validator.pl can be used as input for haplopath_viz.pl in order to visualize the relations obtained. haplopath_viz.pl is used as follows: >perl haplopath_viz.pl --help >perl haplopath_viz.pl --file test_output/validator.out --jaccard 0.1 --exclude exclude.list --start qpac010tig00005444len=60711 The exclude.list should contain any contig ID that you wish to exclude (for example contigs with known misassemblies). The starting contig for building the graph is passed with --start. All contigs directly or indirectly related to the starting contig will appear in the graph. The graph is in dot langage and can be simply passed to neato or dot interpreter to make a pdf or a png, as follows: >perl haplopath_viz.pl --file test_output/validator.out --jaccard 0.1 --exclude exclude.list --start qpac010tig00005444len=60711 | dot -n -Tpng > sample_graph.png You can find the png of the test data in test_output/. # SAMPLINGS Samplings used for testing and benchmarking come from thale cress 1135 accessions vcf file and are found in the thalecress_samplings folder. ######## For questions email alexandrina.bodrug@boku.ac.at 14.10.2020
About
perl module and program to compute population variation based connections between scaffolds of a fragmented assembly
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published