Skip to content

Commit

Permalink
code
Browse files Browse the repository at this point in the history
  • Loading branch information
hewm2008 committed May 29, 2019
1 parent fd76db2 commit bac3e47
Show file tree
Hide file tree
Showing 16 changed files with 1,606 additions and 0 deletions.
44 changes: 44 additions & 0 deletions INSTALL.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@

VCF2Dis: A new simple and efficient software to calculate p-distance matrix based Variant Call Format


1) Introduction
------------

This software relies on two other library packages [zlib]

---------------------- zlib infomation ----------------------------
If Lib [zlib] do not work
you can download form this website and install it
http://www.zlib.net/


2) linux/Unix/MacOS INSTALL
--------------------------------------

Just execute as follows :
tar -zxvf VCF2DisXXX.tar.gz
cd VCF2DisXXX.tar.gz;
make ; make clean
./bin/VCF2Dis

#Note: If fail to link,try to re-install the libraries zlib
#if Link do not work ,try Re-install the zlib librarys and copy them to the library Dir

VCF2Dis-xx/src/include/zlib


#step3 :
sh make.sh # or [make && make clean]

4) Contact
email: hewm2008@gmail.com / hewm2008@qq.com
join the QQ Group : 125293663



######################swimming in the sky and flying in the sea ########################### ##




15 changes: 15 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
CXX=g++
CXXFLAGS= -g -O2
BIN := ./bin
LDFLAGS=-lz
INCLUDE=-L./src/zlib/
all: $(BIN)/VCF2Dis

$(BIN)/VCF2Dis: $(BIN)/../src/VCF2Dis.o
$(CXX) $^ -o $@ $(LDFLAGS) $(INCLUDE)

$(BIN)/%.o: %.cpp
$(CXX) -c $(CXXFLAGS) $< -o $@ $(INCLUDE)

clean:
$(RM) $(BIN)/*.o $(BIN)/../src/*.o
56 changes: 56 additions & 0 deletions Readme
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@

1 Introduction ( VCF2Dis version <1.20)

To new the p_distance matrix besed the VCF file. the more infomation
about the p_distance matrix ,see this website:
http://evolution.genetics.washington.edu/phylip/doc/distance.html

The VCF SNPs datasets were used to calculate p-distance between individuals, according to the follow formula to operate the sample i and sample j genetic distance:

D_ij=(1/L) * [(sum(d(l)_ij))]

Where L is the length of regions where SNPs can be identified, and given the alleles at position l are A/C:
d(l)_ij=0.0 if the genotypes of the two individuals were AA and AA;
d(l)_ij=0.5 if the genotypes of the two individuals were AA and AC;
d(l)_ij=0.0 if the genotypes of the two individuals were AC and AC;
d(l)_ij=1.0 if the genotypes of the two individuals were AA and CC;
d(l)_ij=0.0 if the genotypes of the two individuals were CC and CC;

After p_distance done , software PHYLIP 3.69 (http://evolution.genetics.washington.edu/phylip.html) ,with neighbor-joining method can was used to construct the phylogenetic tree on the basis of this p_distance matrix;
PHYLIPNEW-3.69.650/bin/fneighbor -datafile p_dis.matrix -outfile tree.out1.txt -matrixtype s -treetype n -outtreefile tree.out2.tre
The MEGA6 (http://www.megasoftware.net/) was used to present the phylogenetic tree based this file [tree.out2.tre].


2 Install

Just [make] or [sh make.sh ] to compile this software.
the final software can be found in the Dir [bin/VCF2Dis]


3

3.1 Parameter description:
Usage: VCF2Dis -InPut <in.vcf> -OutPut <p_dis.mat>

-InPut <str> Input GATK VCF genotype File
-OutPut <str> OutPut Sample p-Distance matrix

-SubPop <str> SubGroup SampleList of VCFFile [ALLsample]
-KeepMF Keep the Middle File diff & Use matrix

-help Show more help [hewm2008 v1.10]

3.2 To new all the sample p_distance matrix based VCF, run VCF2Dis directly

./bin/VCF2Dis -InPut in.vcf.gz -OutPut p_dis.mat

3.3 To new sub group sample p_distance matrix ; Pput their sample name into File sample.list

./bin/VCF2Dis -InPut in.vcf.gz -OutPut p_dis.mat -SubPop sample.list



4 Discussing
email: hewm2008@gmail.com / hewm2008@qq.com
join the QQ Group : 125293663

Binary file added bin/VCF2Dis
Binary file not shown.
79 changes: 79 additions & 0 deletions bin/percentageboostrapTree.pl
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
#!/usr/bin/perl -w
use strict;
#explanation:this program is edited to
#edit by hewm; Wed Feb 20 11:02:07 HKT 2019
#Version 1.0 hewm@genomics.org.cn

die "Version 1.0\t2019-02-20;\nUsage: $0 <merge.tre><RepeatTime><boostrap.tre>\n" unless (@ARGV ==3);

#############Befor Start , open the files ####################

open (IA,"$ARGV[0]") || die "input file can't open $!";
my $TotalRepeat=$ARGV[1];
open (OA,">$ARGV[2]") || die "output file can't open $!" ;

################ Do what you want to do #######################
$/=";";

while(<IA>)
{
$_=~s/\n//g;
next if ($_ eq "");
my $Start=0;
my $Now=$Start;
my $Ttue=1;
my $Str=$_ ;

while($Ttue==1)
{
$Now=index($Str,":",$Start);
if ($Now==-1)
{
$Ttue=0;
}
else
{
my $Length=$Now-$Start;
my $AAA=substr($Str,$Start,$Length);
$Start=$Now+1;
my $NowA=index($Str,",",$Start);
my $NowB=index($Str,")",$Start);
if ($NowA!=-1 && $NowB!=-1)
{
if ($NowA>$NowB)
{
$Now=$NowB;
}
else
{
$Now=$NowA;
}
}
elsif ($NowA==-1 && $NowB==-1)
{
print "bad Format,some thing wrong!!!\n";
}
elsif ($NowA==-1 && $NowB!=-1)
{
$Now=$NowB;
}
elsif ($NowA!=-1 && $NowB==-1)
{
$Now=$NowA;
}
$Length=$Now-$Start;
my $BBB=substr($Str,$Start,$Length);
$BBB=sprintf ("%.1f",$BBB*100.0/$TotalRepeat);
$Start=$Now;
print OA "$AAA:$BBB"
}
}
my $Length=length($Str);
my $BBB=substr($Str,$Start,$Length);
print OA "$BBB\n";
}
$/="\n";
close IA;
close OA ;

######################swimming in the sky and flying in the sea ###########################
9 changes: 9 additions & 0 deletions exemple/Run.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#!/bin/sh
#$ -S /bin/sh
#Version1.0 hewm@genomics.org.cn 2017-06-13
echo Start Time :
date
../bin/VCF2Dis -InPut in.vcf.gz -OutPut p_dis.mat
#../bin/VCF2Dis -InPut in.vcf.gz -OutPut p_dis.mat -SubPop sample.list
echo End Time :
date
Binary file added exemple/in.vcf.gz
Binary file not shown.
Loading

0 comments on commit bac3e47

Please sign in to comment.