This README.md is intended to be terse notes for developers. See smrtflow.readthedocs.io for full docs.
"SMRT" refers to PacBio's sequencing technology and "smrtflow" is the name common models, Resolved Tool Contract/Tool Contract interface, commandline analysis tools, service orchestration engine and web services written in scala. PacBio pipelines can be run leveraging the PacBio workflow engine, pbsmrtipe (must be installed).
This code is written in Scala and this is an SBT multi-project.
Requires java >= 1.8.0_71
and sbt == 0.13.11
# clone the repo
git clone https://github.com/PacificBiosciences/smrtflow.git
# use SBT to build and run tests
sbt clean pack test
# see also `sbt` + interactive `help`, `project`, `test`, `coverage`, `run`, ...
# sbt
# > help
make tools
# or
sbt pack
Add tools to path
source setup-tools-env.sh
pbservice --help
fasta-to-reference --help
See the full docs for details for details and examples of using SL tools, such as pbservice
or fasta-to-reference
.
Running postgres
On the cluster:
module load jdk/1.8.0_71 postgresql
export PGDATA=/localdisk/scratch/$USER/pg
mkdir -p $PGDATA
# on a shared machine, choose a PGPORT that's not already in use
export PGPORT=5442
initdb
perl -pi.orig -e "s/#port\s*=\s*(\d+)/port = $PGPORT/" $PGDATA/postgresql.conf
pg_ctl -l $PGDATA/postgresql.log start
createdb smrtlinkdb
psql -d smrtlinkdb < extras/db-init.sql # these are for the run services or
psql -d smrtlinkdb < extras/test-db-init.sql # for the test db use in the *Spec.scala tests. The DB tables are drop and the migrations are run before each Spec.
export SMRTFLOW_DB_PORT=$PGPORT
Other Custom DB values:
ENV | Property (-D<key>=<value> ) |
---|---|
SMRTFLOW_DB_USER | smrtflow.db.properties.user |
SMRTFLOW_DB_PASSWORD | smrtflow.db.properties.password |
SMRTFLOW_DB_PORT | smrtflow.db.properties.portNumber |
SMRTFLOW_DB_HOST | smrtflow.db.properties.serverName |
SMRTFLOW_DB_NAME | smrtflow.db.properties.databaseName |
to run tests, also do:
export SMRTFLOW_TEST_DB_PORT=$PGPORT
Test DB Configuration for running unittests.
ENV | Property (-D<key>=<value> ) |
---|---|
SMRTFLOW_TEST_DB_USER | smrtflow.test-db.properties.user |
SMRTFLOW_TEST_DB_PASSWORD | smrtflow.test-db.properties.password |
SMRTFLOW_TEST_DB_PORT | smrtflow.test-db.properties.portNumber |
SMRTFLOW_TEST_DB_HOST | smrtflow.test-db.properties.serverName |
SMRTFLOW_TEST_DB_NAME | smrtflow.test-db.properties.databaseName |
Launching SMRT Link/Analysis Services
sbt "smrt-server-analysis/run"
Set custom port
export PB_SERVICES_PORT=9997
sbt "smrt-server-analysis/run"
See the full docs for details
See reference.conf for more configuration parameters.
The SMRT Link Analysis Services are documented using Swagger Specification.
npm install swagger-tools
node_modules/swagger-cli/bin/swagger.js validate /path/to/smrtlink_swagger.json
UI Editor to import and edit the swagger file from a file or URL.
Many core data models are described using XSDs.
See Resources Dir for details.
See the Readme for generating the java classes from the XSDs.
Also see the common model (e.g., Report, ToolContract, DataStore, Pipeline, PipelineView Rules) schemas here
Interactively load the smrtflow library code and execute expressions.
sbt smrtflow/test:console
@ import java.nio.file.Paths
import java.nio.file.Paths
@ val f = "/Users/mkocher/gh_mk_projects/smrtflow/PacBioTestData/data/SubreadSet/m54006_160504_020705.tiny.subreadset.xml"
f: String = "/Users/mkocher/gh_mk_projects/smrtflow/PacBioTestData/data/SubreadSet/m54006_160504_020705.tiny.subreadset.xml"
@ val px = Paths.get(f)
px: java.nio.file.Path = /Users/mkocher/gh_mk_projects/smrtflow/PacBioTestData/data/SubreadSet/m54006_160504_020705.tiny.subreadset.xml
@ import com.pacbio.secondary.analysis.datasets.io._
import com.pacbio.secondary.analysis.datasets.io._
@ val sset = DataSetLoader.loadSubreadSet(px)
sset: com.pacificbiosciences.pacbiodatasets.SubreadSet = com.pacificbiosciences.pacbiodatasets.SubreadSet@62c0ff68
@ sset.getName
res5: String = "subreads-sequel"
@ println("Services Example")
Services Example
@ import akka.actor.ActorSystem
import akka.actor.ActorSystem
@ implicit val actorSystem = ActorSystem("demo")
actorSystem: ActorSystem = akka://demo
@ import com.pacbio.secondary.smrtserver.client.{AnalysisServiceAccessLayer => Sal}
import com.pacbio.secondary.smrtserver.client.{AnalysisServiceAccessLayer => Sal}
@ val sal = new Sal("smrtlink-bihourly", 8081)
sal: com.pacbio.secondary.smrtserver.client.AnalysisServiceAccessLayer = com.pacbio.secondary.smrtserver.client.AnalysisServiceAccessLayer@8639ea4
@ val fx = sal.getStatus
fx: concurrent.Future[com.pacbio.common.models.ServiceStatus] = Success(ServiceStatus(smrtlink_analysis,Services have been up for 46 minutes and 59.472 seconds.,2819472,6d87566f-3433-4d73-8953-92673cc50f80,0.1.10-c63303e,secondarytest))
@ actorSystem.shutdown
@ exit
Bye!
welcomeBanner: Some[String] = Some(Welcome to the smrtflow REPL)
import ammonite.repl._
import ammonite.ops._
res0: Any = ()
Welcome to Scala 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_101).
Type in expressions for evaluation. Or try :help.
scala> :quit
At a minimum, integration test analysis jobs requires installing pbsmrtpipe (in a virtualenv) to run a pbsmrtpipe analysis job. Specific pipeilnes will have dependencies on exes, such as samtools
or blasr
.
- set up PostgreSQL 9.6.1 instance (see configuration above)
- install pbsmrtpipe in a VE
- enable scala tools via
make tools
- add tools to path using
source setup-tools-env.sh
Test withwhich pbservice
orpbservice --help
- fetch PacBioTestData
make PacBioTestData
- launch services
make start-smrt-server-analysis
ormake start-smrt-server-analysis-jar
- import PacBioTestData
make import-pbdata
- import canned ReferenceSet and SubreadSet
make test-int-import-data
- run dev_diagnostic_stress test
make test-int-run-analysis-stress