Skip to content

DHAMMAPALB/seq2net

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

337 Commits
 
 
 
 
 
 

Repository files navigation

seq2net: a tool for evolutionary protein interactions prediction and analysis of bacterial genomes

This pipeline predicts the protein-protein interactions (PPIs) of bacterial genomes using evolutionary approaches. I present here a simple approach to predict bacterial PPIs using Gene Neighborhood (GN), Phylogenetic Profile (PP), Gene Fusion (GF), Gene-Coevolution (GC) and Interlog (IN) methods. The consistent and already known PPIs are used to build protein interactions networks. A network is further analyzed to enrich nodes with functional annotations and pathways.

Requirements

  • Ncbi-blast+
  • Clustalo
  • Perl: List::Util subroutines
  • Python modules: sys, requests, pandas, matplotlib, biopython
  • R packages: igraph

Installation

sudo chmod +x -R seq2net/Scripts/*
echo 'export PATH="your-dir/seq2net/Scripts/:$PATH"' >> ~/.bashrc
source ~/.bashrc

Usage

Usage: seq2net [-h] [-f1 <file1>] [-Ortho] [-GN] [-PP] [-f2 <file2>] [-OrthoPara] [-GF] [-GC] [-IN <PPIs_faa> <PPIs.txt>] [-r <reference_genome>] [-e <value1> [<value2>]] [-c <coverage1> [<coverage2>]] [-t <number>] [-kPPIs] [-enrich] [-o <outdir>]

positional arguments:
	-f1 <file1>			: Input file with a list of fasta (.faa) files for Ortho, GN and PP
	-Ortho				: Predict Orthologs of the reference genome
	-GN				: Predict PPIs using Gene Neighborhood method
	-PP				: Predict PPIs using Phylogenetic Profile method
	-f2 <file2>			: Input file with a list of fasta (.faa) files for OrthoPara, GF and GC
	-OrthoPara			: Prediction Orthologs and Paralogs of the reference genome
	-GF				: Predict PPIs using Gene Fusion method
	-GC				: Predict PPIs using Gene Coevolution method
	-IN <PPIs_faa> <PPIs.txt>	: Predict PPIs using Interlog method; requires input protein fasta sequences (.faa) and a list of known PPIs
	-r <reference_genome>		: Reference genome name

optional arguments:
	-e <value1> [<value2>]		: E-value for Orthologs or/and Paralogs respectively
	-c <coverage1> [<coverage2>]	: Sequence coverage for Orthologs or/and Paralogs respectively
	-t <number>			: Number of methods of PPIs being consistently predicted
	-kPPIs				: List the known PPIs from the predict PPIs using String database
	-enrich				: Enrichment of proteins in Panther Slim Gene Ontology and Pathways
	-o <outdir>			: Output directory
	-h, --help			: Shows this help message and exit

Others: Orthologs.pl <file1> <reference_genome>] <value1> <coverage1> <outdir>
	GN.pl <file1> <reference_genome> <outdir>
	PP.pl <reference_genome> <outdir>
	OrthoParalogs.pl <file1> <reference_genome>] <value1> <coverage1> <outdir>
	GF.pl <reference_genome> <outdir>
	GC.pl <file2> <reference_genome> <outdir>
	IN.pl <reference_genome> <PPIs_faa> <PPIs.txt> <value1> <coverage1> <outdir>
	Consistency.pl <number> <outdir>
	Networks.R <PPIs_output_file> <color>
	kPPIs.py <reference_genome> <PPIs_output_filename> <Known_PPIs_output_filename> <outdir>
	Enrich.py <reference_genome> <PPIs_output_filename> <outdir>

Example

cd Example
seq2net -f1 Gseq_names1.txt -r Mycobacterium_tuberculosis_H37Rv -Ortho -GN -o Mtb_PINs
seq2net -f1 Gseq_names1.txt -r Mycobacterium_tuberculosis_H37Rv -Ortho -PP -o Mtb_PINs

or

seq2net -f1 Gseq_names1.txt -r Mycobacterium_tuberculosis_H37Rv -Ortho -GN -PP -o Mtb_PINs

Similarly,

seq2net -f2 Gseq_names2.txt -r Mycobacterium_tuberculosis_H37Rv -OrthoPara -GF -GC -o Mtb_PINs
seq2net -r Mycobacterium_tuberculosis_H37Rv -IN DbPPIs/DIP.faa DbPPIs/DIP_PPIs.txt -o Mtb_PINs

A single command for the entire analysis including known PPIs prediction and enrichment analysis would be

seq2net -f1 Gseq_names1.txt -Ortho -GN -PP -f2 Gseq_names2.txt -OrthoPara -GF -GC -IN DbPPIs/DIP.faa DbPPIs/DIP_PPIs.txt -r Mycobacterium_tuberculosis_H37Rv.faa -e 0.001 0.01 -c 40 30 -t 2 -kPPIs -enrich -o Mtb_PINs
  • The reference genome name should be same as NCBI taxonomic name with white spaces given underscores. This will ensure the mapping of reference genome to NCBI Taxonomy databse for the identification of known PPIs from String database and for the gene ontologies and pathways from PANTHER classification system. Protein tables (NCBI format) are required to map the gene identifiers with their gene symbols or synonyms; the protein table names should be same as the names of fasta files but with the extension of csv.

The example figures:

Figure 1.png

Figure 1: Network of PPIs consistently predicted by atleast two methods.

Figure 2.png

Figure 2: Network of PPIs which were consistently predicted by atleast two methods and were already reported in String database.

Figure 3.png

Figure 3: Enrichment of network proteins in PANTHER GO-Slim Biological Process.

Figure 4.png

Figure 4: Enrichment of network proteins in PANTHER GO-Slim Molecular Function.

Figure 5.png

Figure 5: Enrichment of network proteins in PANTHER GO-Slim Cellular Component.

Figure 6.png

Figure 6: Enrichment of network proteins in PANTHER Pathways.

References

About

seq2net predicts protein interactions network and functional annotations of prokaryotic species

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors