This customized version of clusterfinder now takes command line arguments:
USAGE: ClusterFinder.py
The organism name is used to name the output files, which go in the current working directory.
The input table is the special format, outlined below. To make this table you can use the script clusterfinder_make_table.pl, which accepts a table of gene coordinates (either generated by prokka or prodigal), and the domain table from an hmmscan search against Pfam-A (made with the --domtblout option).
USAGE: clusterfinder_make_table.pl [OPTIONS] --gene_positions --hmmscan_table --organism_name <genus species, can be in quotes if it includes spaces> --organism_id --output
Options:
--type <prokka|prodigal> (table input type, default: prokka) --status <finished|draft> (sequencing status, default: draft)
The original ClusterFinder Readme follows...
Predicting biosynthetic gene clusters in genomes. Authors: Peter Cimermancic & Michael Fischbach
Requirements:
- python (2.X)
- numpy
Instructions:
-
an example of input file: example_input.txt: COLUMN DESCRIPTION:
- GeneID
- Sequencing status
- Organism name
- Scaffold OID
- Organism OID
- Locus Tag
- Gene Start
- Gene End
- Strand
- Pfam Template Start
- Pfam Template End
- Pfam Start
- Pfam End
- PfamID
- Pfam E-score
- Enzyme ID
-
if your input file format differs from the one above, please modify the file or lines 51-55 of the ClusterFinder.py script
-
an example of running ClusterFinder is shown in ClusterFinder.py script DESCRIPTION:
- modify paths (if not running from ClusterFinder directory - lines 7, 19 & 20)
- name the organism and the input file - lines 15 & 16
- run: python ClusterFinder.py
-
testing run: python ClusterFinder.py without making any changes to the files
-
OUTPUT1 [organims_name.out]: same as input + a column with probability values
-
OUTPUT2 [organism_name.clusters.out]: same as OUTPUT1, but only for the domains from gene clusters that have passed the filtering steps.