-
Notifications
You must be signed in to change notification settings - Fork 0
This script reconstructs the base sequence using a base reference sequence in FASTA format, and a SNP loci file and indel file produced by prephix. It then does substituting, inserting, or deleting of the base sequence bases at the given locations in the provided reference base sequence. It writes out a separate regenerated base sequence for eac…
codinghedgehog/snp_swapper
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
SNP Swapper This script reconstructs the base sequence using a base reference sequence in FASTA format, and a SNP loci file and indel file produced by prephix. It then does substituting, inserting, or deleting of the base sequence bases at the given locations in the provided reference base sequence. It writes out a separate regenerated base sequence for each strain in the SNP input file. The output is suitable for use by mlstar (i.e. is a FASTA formatted reference base sequence). The reference base sequence input file should be in FASTA format. It is assumed to to start at loci 1 with the first base. The SNP file input should contain three TAB-delimited columns and no headers: SNP_ID (i.e. strain id) [TAB] Base position [TAB] Base So something like: A12 1045 G A12 4056 A A12 13004 T A35 4 A A35 401 C This is the same format of the snp files produced by the prephix program. The indel input file is in the format produced by the prephix program and contain modified VAAL4 K28 and NUCMER lines. The modification include strain ID and either k28, nuc, or vcf is in the first and second columns. The indel input files have no header files, e.g.: STRAIN_ID k28 0 316 left=CAGGTATTTGACATATAGAG sample=A ref=G right=ACTGAAAAAGTATAATTGTG STRAIN_ID k28 0 419 left=CTGTGCATAACTAATAAGCA sample= ref=ACG right=GATAAAGTTATCCACCGATT STRAIN_ID k28 0 929 left=GACACTTTTGTAATCGGACC sample= ref=C right=GGTAACCGCTTTCCACATGC STRAIN_ID k28 0 953 left=AACCGCTTTCCACATGCAGC sample=A ref= right=AGTTTAGCTGTGGCCGAAGC STRAIN_ID k28 0 965 left=CATGCAGCGAGTTTAGCTGT sample=AAT ref= right=GCCGAAGCACCAGCCAAAGC STRAIN_ID k28 0 1013 left=CCATTATTTATCTATGGAGG sample=G ref= right=GTTGGTTTAGGAAAAACCCA STRAIN_ID nuc 759437 A G 732302 1 732302 1 1 NC007793 JKD6159 STRAIN_ID nuc 759441 T A 732306 3 732306 1 1 NC007793 JKD6159 STRAIN_ID nuc 759444 T C 732309 3 732309 1 1 NC007793 JKD6159 STRAIN_ID nuc 759456 G A 732321 6 732321 1 1 NC007793 JKD6159 STRAIN_ID nuc 759462 A T 732327 6 732327 1 1 NC007793 JKD6159 STRAIN_ID nuc 759504 A G 732369 36 732369 1 1 NC007793 JKD6159 STRAIN_ID nuc 759540 T C 732405 15 732405 1 1 NC007793 JKD6159 STRAIN_ID vcf NZ_CP014696.2 17 . C CT 3218.2 . AC=1;AF=1;LEN=1;TYPE=ins GT 1 INS STRAIN_ID vcf NZ_CP014696.2 18 . CG C 1959.23 . AC=1;AF=1;LEN=1;TYPE=del GT 1 DEL STRAIN_ID vcf NZ_CP014696.2 19 . A AGCGCCTT 1049.14 . AC=1;AF=1;LEN=7;TYPE=ins GT 1 INS STRAIN_ID vcf NZ_CP014696.2 20 . GGT G 2246.71 . AC=1;AF=1;LEN=2;TYPE=del GT 1 DEL STRAIN_ID vcf NZ_CP014696.2 21 . G GA 1936.54 . AC=1;AF=1;LEN=1;TYPE=ins GT 1 INS Usage: $0 <reference base file> <prephix SNP loci input file> [prephix indel file]
About
This script reconstructs the base sequence using a base reference sequence in FASTA format, and a SNP loci file and indel file produced by prephix. It then does substituting, inserting, or deleting of the base sequence bases at the given locations in the provided reference base sequence. It writes out a separate regenerated base sequence for eac…
Resources
Stars
Watchers
Forks
Releases
No releases published