MakeHub User Guide
Author and Contact Information
Katharina J. Hoff, University of Greifswald, Institute for Mathematics and Computer Science, Bioinformatics Group (firstname.lastname@example.org)
- What is MakeHub?
- Data preparation
- Running MakeHub
- Example data
- Output of MakeHub
- How to use MakeHub output with UCSC Genome Browser
- Bug reporting
- Citing MakeHub
What is MakeHub?
MakeHub is a command line tool for the fully automatic generation of of track data hubs1 for visualizing genomes with the UCSC genome browser2. Track data hubs are data structures that contain all required information about a genome for visualizing with the UCSC genome browser.
Assembly hubs need to be hosted on a publicly available webspace (that might be user/password protected) for usage with the UCSC genome browser.
MakeHub is implemented in Python3 and automatically executes tools provided by UCSC for generation of assembly hubs (http://hgdownload.soe.ucsc.edu/admin/exe) on Linux and MacOS X x86_64 computers. For visualization of RNA-Seq alignment data from BAM files, MakeHub uses Samtools3]. If installed, the AUGUSTUS4 tool bam2wig is used to speed up BAM to wig format conversion (https://github.com/Gaius-Augustus/Augustus), which is otherwise performed without bam2wig.
MakeHub can either be used to create entirely new assembly hubs, or it can be used to add tracks to hubs that were previously created by MakeHub.
For display by the UCSC Genome Browser, assembly hubs need to be hosted on a publicly accessible web server.
MakeHub is a Python3 script for Linux or MacOS X with x86-64 architecture. It requires Python3, Biopython, gzip, sort and - in the case that BAM files are provided - samtools, and optionally the AUGUSTUS tool bam2wig.
Many users who create the input data for MakeHub, e.g. with BRAKER 5, have the required dependencies already installed on their system and my thus skip ahead to section Running MakeHub. In case of doubt, read the following sections about installation of Dependencies and MakeHub installation.
In the following, we give instructions on where dependencies can be obtained, and how they may be installed on Ubuntu Linux.
Python3 is available from https://www.python.org/downloads/, or as package for many Unix systems. Choose version 3.5 or newer (because otherwise, subprocess module is not fully functional).
For example, on Ubuntu, install Python3 with:
sudo apt install python3
We recommend to use pip for installing further python modules. pip is available at https://pypi.org/project/pip/. It is also avilable as package for many Unix systems.
For example, on ubuntu, install pip with:
sudo apt install python3-pip
Further, MakeHub uses Biopython (e.g. for parsing a genome file in order to determine which parts of the genome have been masked for repeats). Install biopython with pip as follows:
pip3 install biopython
MakeHub uses the following tools provided by UCSC at http://hgdownload.soe.ucsc.edu/admin/exe:
You may download these binaries and make them available in your $PATH. However, if you skip installing these tools, they will be downloaded during MakeHub execution, automatically.
In rare cases, particularly on older x86_64 Unix systems, the UCSC tools might throw errors because they are not statically linked in all parts, i.e. they will try to use some old system libraries and crash. If you observe this, try downloading the sources of KentUtils from github. We have had the best experience with compiling Kent tools for MakeHub from https://github.com/ENCODE-DCC/kentUtils/.
MakeHub uses Samtools for BAM file sorting and conversion. Samtools is avilable at https://github.com/samtools/. It is also avilable as package with many linux distributions.
For example, on ubuntu, install samtools with:
sudo apt install samtools
MakeHub has been tested with Samtools 1.8-20-g4ff8062. It is not fully downward compatible with older versions (we have for example tried samtools 1.1 and that is incompatible).
MakeHub uses gzip for compressing wig files that were created from BAM files. gzip is available at https://ftp.gnu.org/gnu/gzip/. It often installed by default on Unix systems. If not, it is usually available as a package.
If missing, on Ubuntu, install with:
sudo apt install gzip
MakeHub uses Unix sort. sort should be installed by default on all Unix systems.
MakeHub can use the AUGUSTUS tool bam2wig, if that tool is available in the $PATH. bam2wig is available as part of AUGUSTUS at https://github.com/Gaius-Augustus/Augustus. Please follow the compilation instructions in Augustus/auxprogs/bam2wig/README.txt in case the default make command fails.
MakeHub is a python3 script named make_hub.py. It does not require a particular installation procedure after download.
It can be executed either with
If you add make_hub.py to your $PATH (i.e. by adding the location
of make_hub.py at the bottom of your ~/.bashrc file similar to
PATH=/path/to/MakeHub:$PATH, followed by loading the
~/.bashrc file in case you did not re-open a new bash session with
source ~/.bashrc)and make it executable (i.e.
chmod u+x make_hub.py), it can
be executed with
from any location on your computer.
MakeHub accepts files in the following formats:
- genome file in FASTA format (simple FASTA headers without whitespaces or special characters); if the file is softmasked, a track with repeat information will automatically be generated. Note that the FASTA headers must be consistent with BAM-, hints- and gene prediction files.
- BAM file(s) with RNA-Seq to genome alignments
- gene prediction file(s) in GTF-format, e.g. from BRAKER
- AUGUSTUS hints files in BRAKER-specific GFF hints format
- Gene prediction files in GFF3-format from MAKER 6 and Gemoma 7
MakeHub can be used either to create new assembly hubs, or to add tracks to assembly hubs that had previously been created.
Creating a new hub
The essential arguments for creating a new assembly hub are:
--email EMAILContact e-mail adress for assembly hub. This e-mail adress will be displayed on all HTML pages that describe this hub and its tracks. Providing an e-mail adress is a requirement for UCSC assembly hubs, e.g. described at http://genomewiki.ucsc.edu/index.php/Assembly_Hubs and http://genomewiki.ucsc.edu/index.php/Public_Hub_Guidelines#Track_description_page_recommendations.
--genome GENOMEGenome file in FASTA format. If the file contains softmasked repeats, a repeat masking track with softmasking information will automatically be generated.
--short_label SHORT_LABELShort label (without whitespaces and special characters) for identifying assembly hub, will also be used as directory name for hub, e.g.
At the point in time of assembly hub creation, we strongly recommend the additional usage of
--long_label LONG_LABELLong label for hub, e.g. english organism name, if it contains whitespaces, pass it with quotation marks:
---long_label "fruit fly"
You may at the point of time of creating a hub already supply information about all gene prediction and evidence tracks that you would like to see in your final hub. Please have a look at the section Options Explained for information about possible tracks. The section also describes how to add latin species name and assembly version.
Usage example 1:
make_hub.py -l hmi1 -L "Rodent tapeworm" -g data/genome.fa -e \ email@example.com
The resulting hub is trivial, as it only displays very basic information about the genome, such as the GC-content, restriction enzyme sites and repeat masking segments.
If you want to visualize the result, connect the following hub with the UCSC genome browser (see section How to use MakeHub output with UCSC Genome Browser): http://augustus.uni-greifswald.de/bioinf/makehub/examples/hmi1/hub.txt
Usage example 2:
make_hub.py -l hmi2 -L "Rodent tapeworm" -g data/genome.fa -e \ firstname.lastname@example.org -a data/annot.gtf -b data/rnaseq.bam \ -d
In comparison to the first example, the resulting hub has a track with
reference annotation genes, and a track with coverage information from
RNA-Seq data, and it displays the native BAM-file (
If you want to visualize the result, connect the following hub with the UCSC genome browser (see section How to use MakeHub output with UCSC Genome Browser): http://augustus.uni-greifswald.de/bioinf/makehub/examples/hmi2/hub.txt
Usage example 4:
make_hub.py -l hmi4 -L "Rodent tapeworm" -g data/genome.fa -e \ email@example.com -a data/annot.gtf -b data/rnaseq.bam \ -d -X data -M data/maker.gff -E data/gemoma.gff \ -N "Hymenolepsis microstoma" -V GCA_000469805.2
In comparison to the first two examples, the resulting hub has a large number of evidence and gene prediction tracks from BRAKER, MAKER and Gemoma.
If you want to visualize the result, connect the following hub with the UCSC genome browser (see section How to use MakeHub output with UCSC Genome Browser): http://augustus.uni-greifswald.de/bioinf/makehub/examples/hmi4/hub.txt
Adding tracks to existing hub
If a hub already exists, you may add tracks to this existing hub
using the option
--add_track. The minimal required arguments
- besides giving the approriate information that you would like to add - are:
--email EMAILContact e-mail adress for assembly hub.
--short_label SHORT_LABELShort label (without whitespaces and special characters) for identifying assembly hub.
--add_trackAdd track(s) to existing hub
Usage example 3:
First, we create a novel track hub hmi3 that is identical to Usage example 2:
make_hub.py -l hmi3 -L "Rodent tapeworm" -g data/genome.fa -e \ firstname.lastname@example.org -a data/annot.gtf -b data/rnaseq.bam \ -d
Subsequently, we add a number of tracks:
make_hub.py -l hmi3 -e email@example.com -i data/hintsfile.gff \ -A -M data/maker.gff -X data
The resulting hub has many gene prediction tracks from the BRAKER output directory
data, and from the MAKER output file
Let's add one more track (only for the sake of demonstration, this track could have been included in the previous example, or course, or at the point of time of track generation):
make_hub.py -l hmi3 -e firstname.lastname@example.org -i data/hintsfile.gff \ -A -E data/gemoma.gff
If you want to visualize the result, connect the following hub with the UCSC genome browser (see section How to use MakeHub output with UCSC Genome Browser): http://augustus.uni-greifswald.de/bioinf/makehub/examples/hmi3/hub.txt
In the following, we explain all options of make_hub.py
-h, --helpPrint help message and exit.
-p, --printUsageExamplesPrint usage examples for make_hub.py to command line (for demonstration).
-e EMAIL, --email EMAILContact e-mail adress for assembly hub. This is a requirement for all publicly listed assembly hubs. It is obligatory for make_hub.py.
-g GENOME, --genome GENOMEGenome file in FASTA format. If the file is softmasked for repeats, a repeat masking track will automatically be generated, unless the option:
-n, --no_repeatsDisable repeat track generation from softmasked genome sequence is activated (this may save runtime, particularly for large genomes).
-L LONG_LABEL, --long_label LONG_LABELLong label for hub, e.g. english organism name, if it contains whitespaces, pass it with quotation marks:
---long_label "fruit fly"
-l SHORT_LABEL, --short_label SHORT_LABELShort label (without whitespaces and special characters) for identifying assembly hub. The short label will also be used as assembly version, unless the following option is specified:
-V ASSEMBLY_VERSION, --assembly_version ASSEMBLY_VERSIONAssembly version, e.g. "BDGP R4/dm3". This argument must be provided if the hub is supposed to be added to the public UCSC list.
-N LATIN_NAME, --latin_name LATIN_NAMELatin species name, e.g. "Drosophila melanogaster". This argument must be provided if the hub is supposed to be added to the public UCSC list.
-s SAMTOOLS_PATH, --SAMTOOLS_PATH SAMTOOLS_PATHPath to samtools executable. By default, make_hub.py will search for a samtools executable in your $PATH. On some systems, e.g. high performance compute clusters, it may be more conventient to specify the path to samtools with this option while calling make_hub.py
-B BAM2WIG_PATH, --BAM2WIG_PATH BAM2WIG_PATHPath to bam2wig executable. bam2wig from AUGUSTUS auxprogs is not required for converting a BAM to a WIG file with make_hub.py. It may be a little faster than the built-in conversion function, though. By default, make_hub.py will search for a bam2wig executable in your $PATH. On some systems, e.g. high performance compute clusters, it may be more conventient to specify the path to bam2wig with this option while calling make_hub.py
-b BAM [BAM ...], --bam BAM [BAM ...]BAM file(s) - space separated - with RNA-Seq information, will be displayed as BigWig coverage track.
-d, --display_bam_as_bamDisplay BAM file(s) as bam tracks (in addition to BigWig coverage tracks)
-c CORES, --cores CORESNumber of cores for samtools sort processes that are used for producing BAM tracks. Usage of more than one core may significantly speed up track generation.
-a ANNOT, --annot ANNOTGTF file with reference annotation (may be particularly interesting to visualize in case of re-annotation of genomes).
-X BRAKER_OUT_DIR, --braker_out_dir BRAKER_OUT_DIRBRAKER output directory with GTF files. If this option is specified, the following options are set, automatically, using the files in BRAKER_OUT_DIR (if these files exist):
-i HINTS, --hints HINTS
-t TRAINGENES, --traingenes TRAINGENES
-m GENEMARK, --genemark GENEMARK
-w AUG_AB_INITIO, --aug_ab_initio AUG_AB_INITIO
-x AUG_HINTS, --aug_hints AUG_HINTS
-y AUG_AB_INITIO_UTR, --aug_ab_initio_utr AUG_AB_INITIO_UTR
-z AUG_HINTS_UTR, --aug_hints_utr AUG_HINTS_UTR
-i HINTS, --hints HINTSGFF file with BRAKER hints (AUGUSTUS-specific GFF format of BRAKER).
-t TRAINGENES, --traingenes TRAINGENESGTF file with training genes.
-m GENEMARK, --genemark GENEMARKGTF file with GeneMark predictions.
-w AUG_AB_INITIO, --aug_ab_initio AUG_AB_INITIOGTF file with ab initio AUGUSTUS predictions
-x AUG_HINTS, --aug_hints AUG_HINTSGTF file with AUGUSTUS predictions with hints
-y AUG_AB_INITIO_UTR, --aug_ab_initio_utr AUG_AB_INITIO_UTRGTF file with ab initio AUGUSTUS predictions with UTRs
-z AUG_HINTS_UTR, --aug_hints_utr AUG_HINTS_UTRGTF file with AUGUSTUS predictions with hints with UTRs
-M MAKER_GFF, --maker_gff MAKER_GFFMAKER2 output file in GFF3 format. This file could be the result of a
gff3_merge -d *_master_datastore_index.logcommand.
-E GEMOMA_FILTERED_PREDICTIONS, --gemoma_filtered_predictions GEMOMA_FILTERED_PREDICTIONSGFF3 output file of Gemoma (filtered_predictions.gff)
-G GENE_TRACK [GENE_TRACK ...], --gene_track GENE_TRACK [GENE_TRACK ...]Gene track with user specified label, argument must be formatted as follows for adding a single track:
--gene_track file.gtf tracklabel
-A, --add_trackAdd track(s) to existing hub
-o OUTDIR, --outdir OUTDIROutput directory to write hub to (default is the current working directory). This directory must be writable.
-r, --no_tmp_rmDo not delete temporary files (e.g. for debugging purposes).
-v VERBOSITY, --verbosity VERBOSITYIf INT VERBOSITY > 0, verbose logging output is produced (e.g. for debugging purposes).
Example data is located in the directory
It consists of the following files:
genome.fa: sequence LN902858_1 of Hymenolepis microstoma, assembly version GCA_000469805.2 from GenBank.
rnaseq.fa: RNA-Seq reads of library ERR337976 that mapped to sequence LN902858_1 with Hisat2.
annot.gtf: NCBI reference annotation of scaffold LN902858_1.
augustus.ab_initio.gtf: AUGUSTUS ab inito gene predictions from a BRAKER run (run was performed on the complete genome, predictions corresponding to LN902858_1 were extracted) with Hisat2 alignments from RNA-Seq library ERR337976.
augustus.hints.gtf: AUGUSTUS gene predictions with hints from a BRAKER run (run was performed on the complete genome, predictions corresponding to LN902858_1 were extracted) with Hisat2 alignments from RNA-Seq library ERR337976.
GeneMark-ET/genemark.gtf: GeneMark-ES/ET predictions from a BRAKER run (run was performed on the complete genome, predictions corresponding to LN902858_1 were extracted) with Hisat2 alignments from RNA-Seq library ERR337976.
hintsfile.gff: Hints from a BRAKER run (run was performed on the complete genome, hints corresponding to LN902858_1 were extracted) with Hisat2 alignments from RNA-Seq library ERR337976.
gemoma.gff: Gemoma predictions from a Gemoma run with Hisat2 alignments from RNA-Seq library ERR337976 and proteins of Echinococcus multilocularis. (Run was performed on the complete genome, predictions corresponding to LN902858_1 were extracted)
maker.gff: MAKER2 predictions from a run with BRAKER gene models as model_gff, Cufflinks assembly of Hisat2 alignments of RNA-Seq library ERR337976, a custom repeat library for RepeatMasker, AUGUSTUS with BRAKER-trained parameters, BUSCO predictions as evidence, and GeneMark-ES/ET predictions with BRAKER-trained parameters.
Output of MakeHub
make_hub.py produces a directory that is called identical to the
argument for option
assume the short label had been
species contains the following files:
hub.txt- this file contains basic information about the assembly hub, for example, the short and long labels, a reference to
genomes.txt, and contact information.
genomes.txt- this file contains references to the configuration files
groups.txt, as well as for example a default browsing location.
aboutHub.html- this file should contain a meaningful description of your assembly hub. Please edit this file, manually.
species contains another directory
in which the hub configuration files
groups.txt, as well
as all files that are required for browsing tracks, reside. The
number of files may differ depending on how many tracks have
actually been created.
species also contains
*.html files for all
tracks. These files should be edited, manually, to contain meaningful
How to use MakeHub output with UCSC Genome Browser
Copy the complete hub folder (e.g.
species) to a publicly
accessible web server.
Go to https://genome.ucsc.edu/index.html, click on
Track Hubs ->
My Hubs and add the link to your
hub.txt file into the URL window.
Subsequently, click on
Before reporting bugs, please check that you are using the most recent versions of MakeHub. Also, check the open and closed issues on github at https://github.com/Gaius-Augustus/MakeHub/issues for possible solutions to your problem.
Reporting bugs on github
Information worth mentioning in your bug report:
make_hub.py prints information about separate steps on STDOUT. Please let us know at which step and with what error message make_hub.py caused problems.
Hoff KJ, “MakeHub: Fully automated generation of UCSC Genome Browser Assembly Hubs.” bioRxiv: doi: https://doi.org/10.1101/550145
All source code is under GNU public license 3.0 (see https://www.gnu.org/licenses/gpl-3.0.de.html).
 Raney BJ, Dreszer TR, Barber GP, Clawson H, Fujita PA, Wang T, Nguyen N, Paten B, Zweig AS, Karolchik D, Kent WJ. 2014. “Track Data Hubs.” Bioinformatics 1;30(7):1003-5.↩
 Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D. 2002. “UCSC Genome Browser.” Genome Res. 12(6):996-1006.↩
 Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R. 2009. “The sequence alignment/map format and SAMtools.” Bioinformatics 26(16):2078-2079.↩
 Stanke M, Diekhans M, Baertsch R, Haussler D. 2008. “Using native and syntenically mapped cDNA alignments to improve de novo gene finding.” Bioinformatics 24(5):637-644.↩
 Hoff KJ, Lange S, Lomsadze A, Borodovsky M, Stanke M. 2015. “BRAKER1: unsupervised RNA-Seq-based genome annotation with GeneMark-ET and AUGUSTUS.” Bioinformatics 32(5), 767-769.↩
 Holt C, Yandell M. 2011. “MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects.” BMC Bioinformatics 12(1), 491.↩
 Keilwagen J, Hartung F, Paulini M, Twardziok SO, Grau J. 2018. “Combining RNA-seq data and homology-based gene prediction for plants, animals and fungi.” BMC Bioinformatics 19(1), 189.↩