The main features of BioJazz are:
- evolves both network topology and connection weights
- designs a network ''de novo'', or starting from user-specified seed network
- uses workstation clusters to speed up the design
- produces a human-readable model of the winning network
- highly configurable and parameterized
Song Feng, Julien F. Ollivier, Peter S. Swain, and Orkun S. Soyer BioJazz: in silico evolution of cellular networks with unbounded complexity using rule-based modeling Nucleic Acids Res. 2015 : gkv595v1-gkv595.
BioJazz requires the ANC and Facile tools. You can tell BioJazz where to get them by setting the ANC_HOME and FACILE_HOME environment variables to point to the appropriate directories. It is recommended to add the following lines to your ''~/.bashrc'' file:
export ANC_HOME=~/workspace/anc export FACILE_HOME=~/workspace/facile alias anc='$ANC_HOME/anc.pl' alias facile='$FACILE_HOME/facile.pl' export BIOJAZZ_HOME=~/workspace/biojazz alias biojazz='$BIOJAZZ_HOME/biojazz.pl'
BioJazz requires Matlab to be installed on all nodes used for computation, and assumes matlab can be started with the command ''matlab''. Here is an example of configuration in "~/.bashrc" file (on Mac OS X):
export MATLAB_HOME=/Applications/MATLAB_R2011b.app/bin alias matlab='$MATLAB_HOME/matlab' export PATH=$MATLAB_HOME:$PATH DYLD_LIBRARY_PATH=/Applications/MATLAB_R2011b.app/bin/maci64:/Applications/MATLAB_R2011b.app/sys/os/maci64:/Applications/MATLAB_R2011b.app/runtime/maci64:$DYLD_LIBRARY_PATH export DYLD_LIBRARY_PATH
Note that if you decide to use a cluster of workstations, these installation instructions apply to all workstations used.
#####CPAN modules CPAN is an internet database of Perl modules. BioJazz/ANC/Facile use several of them and they must be installed prior to use. You will need system administrator priviledges to install these modules (or see for instructions on how to install them in your home directory). You or your sysadmin will typically need to run the following commands on each system used (use sudo as prefix if available, if you don't have a admin privilege here is a solution let you install perl modules in your user directory):
cpan -i Class::Std cpan -i Class::Std::Storable cpan -i String::CRC32 cpan -i Expect cpan -i Carp cpan -i WeakRef cpan -i IPC::Shareable cpan -i Linux::Pid cpan -i Text::CSV
Test your installation by running Facile, ANC and BioJazz without any arguments:
$FACILE_HOME/facile.pl $ANC_HOME/anc.pl $BIOJAZZ_HOME/biojazz.pl
An error will be reported if any of the required modules are still missing. Simply run CPAN again to install the missing module.
If you would like ANC to generate diagrams of the reaction network and species, you will the ''dot'' application and the following CPAN module:
cpan -i GraphViz
####Workspace Creation Depending on your specific application, BioJazz will require some customized configuration and scoring functions. Also, during the course of a single design run, BioJazz will generate a large number of files. For this reason, the user must create a properly configured workspace which will contain the appropriate configuration files, scoring functions, and design files.
To facilitate this, BioJazz can create the workspace for and populate it with the required directories and with template files to get you started. To do this, run the following command:
This will create the directory
bjazz and various sub-directories including
custom. Your configuration files go in the
config directory, while your custom scoring functions go in the
custom directory. At this point, the user should familiarize him/herself with some the template files that are provided, and try to run BioJazz.
The template file will try to design a network which contains a high concentration of dimers, and demonstrates how to use some of the functions available to the user.
cd bjazz less config/template.cfg # template configuration file less config/Template.pm # template application-specific scoring function
After taking a look around, try to run BioJazz. The
cluster_size arguments override the specification contained in the configuration file, and will launch both slave nodes of the cluster on your machine.
biojazz --config=config/template.cfg --tag=first_try --cluster_type="LOCAL" --cluster_size=2
This will evolve the network for only a couple generations. The
tag argument is very important. In BioJazz, each design attempt is associated with a specific, user-specified tag. BioJazz will create a directory in your workspace containing all the results and other files generated during the optimization. This allows the user to attempt several optimizations simultaneously without fear of accidental loss of files. The name of the design's working directory is
work_dir parameter is specified in your configuration file (and has a value of
template in this example). Thus the results of the above run are contained in the directory
[user@host bjazz]$ ls -la template/test_main/ total 168 drwx------ 5 user group 4096 2008-06-03 14:53 . drwx------ 3 user group 4096 2008-06-03 14:51 .. drwx------ 2 user group 4096 2008-06-03 14:53 matlab drwx------ 2 user group 4096 2008-06-03 14:53 obj -rw------- 1 user group 71904 2008-06-03 14:53 ScorNode.0.20080603_145159.log -rw------- 1 user group 67215 2008-06-03 14:53 ScorNode.1.20080603_145159.log drwx------ 2 user group 4096 2008-06-03 14:51 source_2008-06-03-14:51:58
obj directory contains all the genomes generated in a machine-readable form. The
matlab contains the models generated by ANC, and the
matlab scripts generated by Facile. The
ScorNode* files are a log of the activity of each node in the cluster as work to compile and score genomes. The
source* directory is a snapshot of the source code used for that run such as your configuration and custom scoring files.
Now try modifying the configuration file to use other available workstations and run BioJazz again...
####Using a Makefile The program make is a UNIX classic and is useful in the context of BioJazz: not only for running BioJazz's suite of tests, but also as a sophisticated aliasing mechanism. A Makefile is provided when your workspace is created, which contains some useful functions and can serve as a template for customization. The table below documents a number of makefile targets which correspond to some of the most typical and common tasks you will perform. Explaining how GNU's make program works is well beyond the scope of this document, but one of make's most useful switches is the dry-run (-n) switch. This switch will show you the commands to be executed, without actually executing them. You can use this to take a look at the BioJazz command and arguments that are generated for each of the targets below.
|test_custom (module)||make test_custom module=MyDesign||Executes the run_testcases routine that should be contained in your application-specific design class. Typically, the run_testcases routine should hand-craft a genome, then run your scoring function on this genome. This allows you to test your scoring function before trying to evolve some networks.|
|evolve (config, tag, cluster_type, cluster_size)||make evolve config=ultrasensitive.cfg tag=08jan01 cluster_type=SSH cluster_size=5||Starts BioJazz and evolves a circuit as per the given configuration file. All output is echoed to your terminal saved to a logfile for subsequent analysis.|
|score (config, tag, genome)||make score config=ultrasensitive.cfg tag=08jun25c genome=ultrasensitive/08jan01/obj/G199_I00.obj||Runs your custom scoring function on the specified genome. All the files are created in the scratch space for this design. This target is useful for debugging unanticipated problems that crop up for a specific genome during an evolution run.|
|collect (config, tag)||make collect config=ultrasensitive.cfg tag=08jan01||Collects population data from the logfile generated during an evolution run, and save them to a file in object form.|
|analyze (config, tag)||make analyze config=ultrasensitive.cfg tag=08jan01||Reads the population data collected above and performs some simple analysis tasks such as finding the top-scoring individual and computing statistics. The results are saved to an Excel spreadsheet for charting.|
|shell (config, tag)||Starts BioJazz in shell mode.|
|load (config, tag, genome)||Start BioJazz, load the specified genome, and go into shell mode.|
|retag(config, old_tag, new_tag)||make retag config=ultrasensitive.cfg old_tag=temp new_tag=08jan01||Moves and renames design files as appropriate so that they appear under the new_tag.|
|clean (config, tag)||make clean config=ultrasensitive.cfg tag=08jan01||Deletes all design files of the given tag.|
|clean_all (config, tag)||make clean_all config=ultrasensitive.cfg tag=08jan01||Deletes all design files of the given tag, including those in the scratch space (generated by the score target).|
####Scoring a specific genome
####Collecting and Analyzing Statistics
####Writing and Testing Application Specific Functions
#####Debugging when things go wrong
####Workspace Directory Structure
bjazz # workspace home config # configuration files custom # application-specific modules and functions (incl. scoring function) test/custom # recommended location for test results of custom modules test/modules # BioJazz module test results mydesign # application-specific directory mydesign.08jun01.log # master node logfile 08jun01 # results directory for run with TAG=08jun01 ScorNode.i.timestamp.log # log file for slave node i matlab # ANC genome models, eqn files, and matlab files obj # genome objects in binary form
Authors and Contributors
The project is firstly developed by Julien Ollivier, then modified and currently maintained by Song Feng (@LifeWorks).
Support or Contact
If have any problems, please contact @LifeWorks or email firstname.lastname@example.org