Skip to content

Converts files in Variant-Calling Format (VCF) to Heterozygous Allele Depth Format (HAD), the format required for GBS2Ploidy. Also converts VCF files to the format for COLONY software.

Notifications You must be signed in to change notification settings

BioMatt/VCF-File-Converter

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

VCF-File-Converter

Converts files in Variant-Calling Format (VCF) to Heterozygous Allele Depth Format (HAD), the format required for GBS2Ploidy. Also converts VCF files to the format for COLONY software.

This program must be run on a Windows machine

For converting to colony format:

I have allowed for this program to separate data for offspring and (potential) maternal specimens. If you want the program to automatically make separate maternal and offspring files, you must use our naming convention. We label our maternal individuals with their group name followed by their number in the group. For example, these sample names ("abc1", "abc2" , "abc3") are for maternal specimens from project "abc", with numbers 1, 2, and 3. The offspring will have the maternal individual's name followed by the letter "e" (for egg) and then followed by that individual's number in the group. For example, if the maternal individual's name is "abc1", their offspring will be named "abc1e1", "abc1e2", "abc1e3", etc.

What's important is that the offspring individuals' names must end in a number, followed by the letter "e", followed by another number. I used the regular expressions library to search through the individuals names in the vcf files with the string "\de\d*?". If that search returns a match, the program writes that individual's information into the offspring file. otherwise, it goes into the maternal file.

If you would like to edit my code to fit your project's naming convention, go to line 254. That line in createColonyFile() is where it uses the regular expression search for offspring names to determine which file the specimen's data goes into.

About

Converts files in Variant-Calling Format (VCF) to Heterozygous Allele Depth Format (HAD), the format required for GBS2Ploidy. Also converts VCF files to the format for COLONY software.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%