Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Punchline Pfam domain searches


punchline /ˈpʌn(t)ʃlʌɪn/ noun

the final phrase or sentence of a joke or story, providing the humour or some other crucial element.

Dependencies: HMMER3 installation instructions here: Python2.7 or Python3 Biopython -
R corrplot library install.packages("corrplot") from within R

First step:
Gather your files:
1. Download Pfam-A database from here: and FTP tab at the top of the page Make a note of the version number in case you need to quote it later.
2. Collect all your proteins to search in one file as a protein fasta file - you will need to make sure each protein is named so that you know which strain it was from and all need a unique name such as H10407|H10407_0003. The strain name should be first, followed by a | character and the name of the coding sequence or locus tag. ** more details on how to do this if requested **
Second step:
Run Pfam - the fastest way to search a large number of protein sequences with a large number of Pfam motifs is to use hmmsearch.
hmmsearch --noali --notextw --cpu [insert number of cpus here] --domtblout your_outfile_name Pfam_A_hmm_profiles your_protein_fasta

Third step:
Run hmmsearch_outfile Pfam_domain_names

This will create a large number of files, each with the top line of the species in question The Pfam_domain_names file is a list of all the Pfam domain shortnames you are searching with, as a text file with one name per line. An example file is provided in the examples folder and is specific for the 32.0 version of Pfam-A, released Sept. 2018.

Fourth step:
We join these files to an R matrix with:
Kruskal-wallis significance testing using R

***What to do if the plot needs resizing ** coming soon!!

Fifth step:
Heatmap and correlation plot using Rscript
Rscript Heatmapper_punchline.R

Sixth step:
Phylogeny clustering
*still under construction
You will be able to build the input phylogeny file by: running plot_pfam_phylog.R Then run phylip neighbour on that file and afterwards run: plot_pfam_phylog.R If you get errors (usually about the strain names) you can try:

Look at the resulting tree with your favourite tree viewer program such as Figtree
Or try color_tree_labels.R in useful_scripts folder on github.


Punchline for pangenomes







No releases published


No packages published