Skip to content

INMEGEN/CovDif

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CovDif

cov_dif.sh

Software Description

CovDif is a tool to obtain and visualize the genetic differences between a reference genome and either one or several groups of target genomes and, also across several groups of target genomes. CovDif is useful to analyze the genomic conservation at base pair resolution. CovDif generates one conservation landscape per each group of genomes and one global differential landscape.

CovDif has two different conservation tracks:

  • Conservative conservation landscape in which it considers as a match all genomes with either an exact match to the reference genome or a non-perfect match with all mismatches being N’s. This process implies that all N’s are considered as reference bases producing the most conservative estimate about the frequency of the mutant allele.
  • Relaxed conservation landscape in which only exact matches are counted, and all N’s are considered as mutant alleles producing a relaxed estimate about the frequency of the mutant allele.

It also has a differential landscape which represents the maximum difference in frequency per reference kmer between all genomes being compared. It ranges from -1 to 1 when only two groups are being compared. 1 means that the specific kmer is found in all genomes from group 1 and none genome from group 2 and -1 implies the same behavior but in the opposite direction.

Software requirements:

Linux based operating System.

Prerequisites:

How to Download this Repository

Use the following command in the desired directory:

git clone https://github.com/pipecedeno/CovDif.git

When you download the repository make sure to give execution permission to the bash and python files. This can be done with the following commands if you are located in the programs directory in your computer:

chmod +x bin/*.sh
chmod +x bin/*.py

Add to path

This step is completely necessary for the program to work.

This intructions were obtained from https://gist.github.com/nex3/c395b2f8fd4b02068be37c961301caa7

1.- Open the .bashrc file in your home directory (for example, /home/your-user-name/.bashrc) in a text editor.
2.- Add export PATH=$PATH:your-dir to the last line of the file, where your-dir is the directory you want to add.
3.- Save the .bashrc file.
4.- Restart your terminal.

And to test if the path was added after restarting the terminal you can use echo $PATH, to see if the directory is there.

Program usage

The program has 2 options: two_groups and many_groups.

two_groups usage

Example:

cov_dif.sh two_groups -c virus_genome_directory/ -n non_virus_directory/ -r reference_genome/reference.fasta -s 20,21,22 -p 10

Options:
-h Displays help message.
-c The directory of the group of interest.
-n The other group directory.
-r The file of the reference genome (.fasta file).
-s (Optional) If given it’s the sizes in which the program will use, the sizes must be separated by commas. Example: 20,21,23. If not given the sizes for default are going to be from 20 to 25.
-p (Optional) Is the number of cores/threads that are going to be used, if not given 1 is going to be used.
-o Is the place where the directory with the output files is going to be saved.
-x Fill the gaps of kmers with N in the groups of interest.
-y Fill the gaps of kmers with N in the other group.
-d If this flag is used the intermediate folder won't be deleted.

It’s important that in the directories of the fasta files of each group all the fasta files have the “.fasta” termination, any file with another termination will be ignored.

many_groups usage

Example:

cov_dif.sh many_groups -m genomes_directory/ -r reference_genome/reference.fasta -s 21,22 -p 4

Options:
-h Displays help message.
-m The directory that contains only the directory of each group that is going to be used.
-r The file of the reference genome (fasta file).
-s (Optional) If given it’s the sizes in which the program will use, the sizes must be separated by commas. Example: 20,21,23. If not given the sizes for default are going to be from 20 to 25.
-p (Optional) Is the number of cores/threads that are going to be used, if not given 1 is going to be used.
-o Is the place where the directory with the output files is going to be saved, if you want it to be saved in your current director you can use ".".
-d If this flag is used the intermediate folder won't be deleted.

Note: cov_dif.sh -h will display a help message with the information of the 2 flows.
It’s important that in the directories of the fasta files of each group all the fasta files have the “.fasta” termination, any file with another termination will be ignored. Also it's important to clarify that the directory of fasta files (the one given by -m) should only contain the directory of each group, where the name of these directories is important as they will be the name of the output files, and the folders of each group shouldn't contain any more directories within it.

snp_files usage

Example:

cov_dif.sh snp_files -l g_snps/ v_snps/ l_snps/ -r reference_genome/covid_reference.fasta -o test_many_20 -k 20

Options:
-k (optional) is the size of the kmer that is going to be used to simulate the conservation landscapes
if not given is going to be 1 and it will be a landscape of base conservation
-l list of directories that have the .snps file to process
Note: only files that finish with .snps are going to be used by the program.
-o output directory were the files are going to be saved (if it doesn't exist the program will create it)
-r reference genome (it's used to know the length of the genome and the get the header that is going to be used for the wig files)

create_genomeview_session

Prerequisites:

java 7+ It can be installed using this instructions: https://phoenixnap.com/kb/how-to-install-java-ubuntu
Or using this ones: https://openjdk.java.net/install/

Usage

Example:

create_genomeview_session.sh -r reference_genome.fasta -f output_files/ -s test_session.gvs 

Options:
-h Display this message
-r Reference genome file
-f Directory where the wig files are located
-s Name of the session file that is going to be made (Use the .gvs termination in the name)

Note: the session file and the .tdf files that are needed for genomeview will be saved in the same directory where the wig files are located.

This program uses the java programs necessary to pass the wig file to the .tdf that the genomeview needs to load the information. And the information of the software can be found here: http://genomeview.org/manual/Wig2tdf

Authors

Luis Felipe Cedeño Pérez
Email: pipecedeno@gmail.com

Corresponding Authors

Laura Lucila Gómez Romero
Email: lgomez@inmegen.gob.mx

About

Software to compare conservation between groups of genomes

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published