TechnicalManual

hupponen edited this page Feb 23, 2018 · 127 revisions

Technical manual for Chipster

The manual covers Chipster platform version 3.11 and later. It instructs in setting up your own Chipster server, adding your own tools into Chipster, and more. For the user manual, please see http://chipster.csc.fi/manual/. Technical manual for older versions is here.

Table of contents

Introduction

In the basic setup, Chipster is a client-server system. Chipster server can be run on a single server computer or even a laptop. The Chipster server itself actually contains multiple independent services, so it can be scaled across a cluster of servers to distribute computational and data transfer load.

The system consists of compute, authentication and management services. The services are independent and connected by message and file brokers.

System installation

The recommended way to get Chipster server running is virtual machine installation, as it comes with all dependencies bundled.

Hardware requirements

Some of the tools in Chipster require significant amount of memory. Recommended hardware for a server installation:

  • 16 GB RAM
  • 500 GB storage
  • 2 CPU cores

If you wan't to be able to run multiple large simultaneous analyses, double or triple these figures. Chipster itself is really lightweight and can be run even on a laptop for development purposes with really minimal hardware (2 GB RAM, 1 CPU core), but this won't be enough for most analysis tools.

Virtual machine installation

Chipster is packaged as complete virtual machine images that can be deployed to a variety of virtualisation platforms. The images are based on Ubuntu Linux 16.04. Images are available in ova, vmdk and qcow2 format, supporting most common platforms.

Image formats

It's advisable to use ova format in VirtualBox and VMware Player. In ova format, a single image file is all that is needed.

Vmdk and qcow2 are useful for making custom server installations. Chipster virtual machine consists of three disk images:

  1. Root
  2. Tools
  3. Data

In vmdk and qcow2 format, these three images are in separate files.
Root image contains the Ubuntu operating system and the Chipster software without any external tools. Both tools and data images are initially only empty disks. Tools image will be used for installing the external tools package, in step 4 below. The data image will be used for storing users' datasets.

To use the Chipster virtual machine, you need to:

  1. Install virtualisation software such as VirtualBox or VMware Player
  2. Download Chipster virtual machine
  3. Start the Chipster virtual machine
  4. Download tools package
  5. Start Chipster client

These steps are now described in detail.

Installing virtualisation software

The best option for running a Chipster server is to run it in some cloud IaaS service, so that the cloud service will take care of the server hardware and the virtualisation software for you. In this case can continue directly from the instructions for the cloud.

To run the Chipster virtual machine on a traditional physical server, you need a virtualisation software installed on that server. VirtualBox and VMware Player are two common virtualisation software products, which are known to work with Chipster.

  • VirtualBox for Linux, Mac and Windows, free
  • VMware Player for Linux and Windows, free for personal non-commercial use

KVM, OpenNebula and VMware Enterprise are also supported.

To try out Chipster, you can of course install a virtualisation software and the Chipster virtual machine on your workstation, but please note the hardware requirements above.

Instructions for VirtualBox

The preferred method for installing Chipster is so called online installation, where you first setup a relatively small virtual machine and then download the larger tools package from the internet. If you need to install Chipster without internet connection, please see instructions for offline installation instead.

Download Chipster virtual machine

Download file Chipster.ova under the desired version from:

Add Chipster virtual machine to VirtualBox

  • Open VirtualBox
  • Select "File"->"Import appliance..."
  • Click the folder icon and go to the folder where you downloaded Chipster virtual machine files and select chipster.ova and "Open"
  • Click "Next" and "Import"

VirtualBox won't let you to import the same machine again with the same name "Chipster". You can find the existing virtual machines in folder "VirtualBox VMs" under your home directory. In this case, repeat the steps above and rename the machine before clicking "Import".

Configure Chipster virtual machine

The default network settings in VirtualBox won't work with Chipster. Depending on your needs, choose one of the following configuration options:

Network option 1: Host-only

The virtual machine will be visible only to your host. If you are running a Chipster VM on your laptop, this is what you most likely want. We have to configure two network adapters. The first one (called host-only), will allow you to connect to the VM and the second (NAT) offers an internet connection to the VM.

Open the general settings in File -> Preferences -> Network -> Host-only Networks

  • Click "Add host-only network" (if the vboxnet0 doesn't exist already)
  • Click "Edit host-only network"
  • Check that there is a valid IP configuration with DHCP enabled, for example like this:

Adapter

IPv4 Address: 			192.168.59.3
IPv4 Network Mask: 		255.255.255.0
IPv6 Address:
IPv6 Network Mask Length: 0

DHCP Server

Enable Server selected
Server Address: 			192.168.59.99
Server Mask: 				255.255.255.0
Lower Address Bound: 		192.168.59.103
Upper Address Bound: 		192.168.59.254 
  • Click "Ok" twice to close dialogs

Open the VM settings in Settings -> Network -> Adapter 1

  • In "Attached to", select "Host-only Adapter"

Select the second tab "Adapter 2"

  • Select "Enable Network Adapter"
  • In "Attached to", select "NAT" and click "Ok"

Start the VM by clicking "Start". Log in with username ubuntu and password chipster.

Find out the host-only network ip of the vm by running

hostname -I

This lists both (host-only and NAT) ip addresses of the vm, the host-only is probably the first one, starting with 192.168.

You can open the Chipster startup page by entering this address to a browser on your host machine.

http://192.168.<fill in the rest>:8081

You can start the client to check that your network configuration works, but you won't be able to run most of the tools until you download the tools package.

Network option 2: Bridged

In most networks (having a DHCP server), your VM will automatically get an IP address.

Open the VM settings in Settings -> Network -> Adapter 1

  • In "Attached to", select "Bridged Adapter" and click "Ok"

Start the VM by clicking "Start" and log in with username ubuntu and password chipster. Find out the IP address of the VM with command

ifconfig

You can open the Chipster startup page by entering this address to a browser on your host machine.

http://<IP_ADDRESS>:8081

You can start the client to check that your network configuration works, but you won't be able to run most of the tools until you download the tools package.

Network option 3: NAT

VirtualBox will give an internal address for the VM and clients will connect to the host's address. You must configure port forwarding rules in VirtualBox to relay clients' traffic from host to VM and server components need a special configuration to work behind NAT.

Select "Settings" and "Network".

  • Set "Attached to" to "NAT"
  • Click "Port forwarding"
  • Click "+" icon to insert three port forwarding rules
  • Rule 1: Protocol TCP, Host Port 8080, Guest Port 8080
  • Rule 2: Protocol TCP, Host Port 8081, Guest Port 8081
  • Rule 3: Protocol TCP, Host Port 61616, Guest Port 61616
  • Leave "Host IP" and "Guest IP" empty (unless you have assigned a static IP for your VM)
  • Click "Start" to boot the virtual machine
  • Follow NAT instructions to configure Chipster for NAT after starting the VM

Next step in the installation is to download tools.

Instructions for VMware Player

Download Chipster virtual machine

Download a file Chipster.ova from the desired version from:

Add Chipster virtual machine to VMware Player

  • Start VMware Player
  • Click "Open a New Virtual Machine"
  • Locate the downloaded Chipster.ova file and click Open
  • Give a name for your VM, select a storage path which has enough free space (at least 200GB) and click Import
  • If the import fails because of OVF specification checks, just click "Retry"
  • Click "Edit virtual machine settings"
  • Set the VM memory size. Host RAM size minus 1GB is a good start if you have a dedicated machine for the server (or minus 2GB if your run the client on the same machine)
  • Set the number of processor cores. It should be ok to utilize all the cores on the host.

Start Chipster virtual machine

  • Click "Play virtual machine"

Next step in the installation is to download tools.

Instructions for KVM (libvirt)

Unless you consider yourself a hacker, we recommend VirtualBox or VMware instead of KVM.

Download Chipster virtual machine

Download files root.qcow2, data.qcow2 and tools-empty.qcow2 from misc directory under the desired version from:

Add Chipster virtual machine

We are going to use bridged network. If you are using RedHat or Fedora Linux, you first need to disable NetworkManager, as it does not support bridged mode:

sudo service NetworkManager stop
sudo service network restart

Now we can add Chipster virtual machine, or domain, as it is called in virsh lingo.

sudo virsh iface-bridge <YOUR NETWORK DEVICE> brv

Next edit chipster.xml and update paths to disk images to the directory where you have put them. Paths need to be full.

Now we can add the virtual machine (define domain):

sudo virsh define chipster.xml

Start Chipster virtual machine

Start the virtual machine:

sudo virsh start chipster

Depending on your setup, you might get error stating that disk images cannot be read (Permission denied).

To fix this, edit /etc/libvirt/qemu.conf and set user=root and group=root, then restart libvirtd:

sudo nano /etc/libvirt/qemu.conf
sudo service libvirtd restart

Now you should able to start the domain

Access Chipster virtual machine

To access Chipster server console, use VNC:

vncviewer 0.0.0.0:27277

Shutdown Chipster virtual machine

To shutdown the server, use:

sudo virsh shutdown chipster

To restore your original unbridged network configuration, use:

sudo virsh iface-unbridge brv

And in Red Hat or Fedora, restart NetworkManager:

sudo service NetworkManager start
sudo service network restart

Next step in the installation is to download tools.

Instructions for OpenNebula

To get you started with OpenNebula integration, here is .vmdef template file to use as a reference:

NAME = chipster

CPU    = 8
VCPU   = 8
MEMORY = 8000

CONTEXT = [
#  INIT_SCRIPT_URL = "http://yourhost.com/init.sh",
  HOSTNAME = "chipster",
  AUTHORIZED_KEYS = ""
]


OS = [
    BOOT   = "hd",
    ARCH   = "x86_64"
]

DISK = [
  TYPE     = "disk",
  TARGET   = "vda",
  SOURCE   = "root.qcow2",
  DRIVER   = "qcow2"
]

DISK = [
  TYPE     = "disk",
  TARGET   = "vdb",
  SOURCE   = "data.qcow2",
  DRIVER   = "qcow2"
]

DISK = [
  TYPE     = "disk",
  TARGET   = "vdc",
  SOURCE   = "tools-empty.qcow2",
  DRIVER   = "qcow2"
]

NIC = [
  network_id = "1",
#  ip         = "",
  model      = "virtio"
]

GRAPHICS = [
  TYPE    = "vnc",
#  KEYMAP  = <your keymap>,
  LISTEN  = "0.0.0.0"
]

Unlike in previous example, here we have less conservative CPU and memory settings.

You can also specify full URL's as SOURCE of the disk images. That way machines can be booted from a centralised file server.

Next step in the installation is to download tools.

Instructions for cloud (OpenStack)

Download Chipster virtual machine

These instructions are written for OpenStack cloud using its Horizon user interface, but the process should be very similar also in other clouds. The key difference from the VirtualBox or VMware installation is that only the root image is used and user data and tools are stored on volumes.

Download root-cloud.qcow2 from disk_images directory under the desired version from:

Add Chipster virtual machine to OpenStack

  • Log in to Horizon
  • Select "Image & Snapshots" -> "Create image"
  • Give a name for the image
  • Click the "Browse..." and select root-cloud.qcow2 file you downloaded. Select qcow2 format and click "Create image"
  • It will take a while until the image is uploaded

Configure Chipster virtual machine

Security group

Select "Access & Security" and create a security group to allow the following connections. Your default security group may allow already the first two.

  • All outgoing (Egress) traffic
  • All incoming (Ingress) traffic from this security group
  • Incoming (Ingress) TCP traffic to ports 8080, 8081, 8084 and 61616 from the IP addresses where the clients will be running
  • Incoming (Ingress) TCP traffic to port 22 for SSH from the IP addresses where you will manage these nodes

Instance

  • Select "Instances" -> "Launch instance"
  • Select the image you just created
  • Give a name for the instance
  • On "Access & Security" tab, select your SSH key pair and the security group you created
  • On "Networking" tab, add a network for your instance
  • Click "Launch"

Floating IP

  • Usually a floating IP address has to be added to make the instance visible outside of the cloud network
  • Select "Access & Security" and tab "Floating IPs"
  • You can use any floating IP which isn't yeat associated with an instance, or click "Allocate IP To Project" to add a new one
  • Click "Associate Floating IP", select your instance and click "Associate"

Volumes

  • Select "Volumes"
  • Create two volumes: tools and data. The size of the tools volume must be at least 200 GB and the size of the data volume depends on the datasets you are going to analyze. Setting both disks to 500 GB is a good start and leaves some space for updates.
  • On data volume, click "Edit attachments", select your instance, give a device name (e.g. /dev/vdc/) and click "Attach Volume". Repeat on the tools volume (device name e.g. /dev/vdd)

Log in

  • Log in to your instance using ssh keys you selected when launching the VM. This cloud image contains a cloud-init program, which fetches the ssh key from the OpenStack and adds it to the user's .ssh/authorized_keys file allowing you to access it safely. If the cloud-init doesn't work in the cloud you are using, use root.qcow2 images instead and login with the default credentials: username ubuntu, password chipster.

    ssh ubuntu@FLOATING_IP -i ~/.ssh/YOUR_SSH_KEY

Most common errors:

  • Connection timed out or Connection closed by remote host: Your security group doesn't allow you to access port 22
  • Permission denied (publickey): ssh didn't use the correct key (check with ssh -v) and use -i to give it
  • Permission denied (publickey) even after using the correct key: Most likely you security groups didn't allow cloud-init to fetch the ssh key when the VM started. Fix the security groups and reboot the VM.

Configure

Move existing data directories (even if they are empty) and create a mount point

sudo mv /mnt/data /mnt/data_old
sudo mkdir /mnt/data

Create filesystems

sudo mkfs.xfs -f -L data /dev/vdb
sudo mkfs.xfs -f -L tools /dev/vdc

Labels "data" and "tools" are defined in /etc/fstab and should be automatically mounted in a few seconds. Run commnad df -h and it should print something like this:

Filesystem      Size  Used Avail Use% Mounted on
/dev/vda1       9.4G  6.5G  2.5G  73% /
udev            1.7G  8.0K  1.7G   1% /dev
tmpfs           344M  236K  343M   1% /run
none            5.0M     0  5.0M   0% /run/lock
none            1.7G     0  1.7G   0% /run/shm
/dev/vdc        500G   33M  500G   1% /mnt/data
/dev/vdd        500G   33M  500G   1% /mnt/tools

Move data directories back in place

sudo chown chipster:chipster /mnt/data
sudo mv /mnt/data_old/* /mnt/data
sudo rmdir /mnt/data_old

Chown tools for ubuntu

sudo chown ubuntu:ubuntu /mnt/tools

Configure the new floating IP in Chipster. Give your floating IP when the configuration tool asks for your public host/ip. It's the first question it asks and you can leave all other questions to default values.

cd /opt/chipster
sudo bash configure.sh

Restart Chipster's server components

sudo systemctl restart chipster

Wait for couple seconds and check that all of them are running

sudo systemctl status chipster-activemq
sudo systemctl status chipster-auth
sudo systemctl status chipster-comp
sudo systemctl status chipster-filebroker
sudo systemctl status chipster-jobmanager
sudo systemctl status chipster-toolbox
sudo systemctl status chipster-webstart

Next step in the installation is to download tools.

Download tools package

The Chipster virtual machine images contains only the Chipster installation. The analysis tools, databases, genomes, indexes and example sessions are installed in a separate package. If you internet connection is slow or unreliable you should see the next chapter instead.

  • Login to VM using username: ubuntu, password: chipster
  • Go to chipster directory: cd /opt/chipster
  • Download tools: bash download-tools.sh

This downloads about 120 GB from servers in Finland. This step is needed, because it would be impractical to handle that big virtual machine images. At least comp and file broker needs to be restarted load the new tools and example sessions.

sudo systemctl restart chipster

It will take a minute for the file broker to import all the example sessions. You can follow file broker's log to see how it is proceeding (tail -f /opt/chipster/fileserver/logs/chipster.log). If you start the client (instructions in the next chapter) before the import is completed, the example sessions link in the client may be hidden, or only part of the example sessions may be visible. Restarting the client after the import is completed will fix this.

If you skip this download step, you can only run a few tools implemented in Java or Python, like sort tools (NGS -> Utilities -> Sort BED/GTF/TSV/VCF). If you have troubles with the virtual machine, you can use these tools is to make sure that the Chipster itself is working, even before you have downloaded this tools package.

Download tools package manually

For performance reasons, the download-tools.sh script above is downloading and extracting tools at the same time. If your internet connection isn't very fast or reliable, it's safer do these steps manually to make sure the tools package is intact before trying to extract it.

Download

Download tools.tar.gz from tools directory under the right version in http://bio.nic.funet.fi/pub/sci/molbio/chipster/dist/virtual_machines/

cd /opt/chipster/tools
wget http://bio.nic.funet.fi/pub/sci/molbio/chipster/dist/virtual_machines/CHIPSTER_VERSION/tools/tools.tar.gz

Alternative downlaod methods

If you have troubles downloading the file with a simple wget, there is an alternative server and a few other client programs available. You can try different options to see which one gives you the best performance. There are two servers serving the same files:

wget http://www.nic.funet.fi/pub/sci/molbio/chipster/dist/virtual_machines/CHIPSTER_VERSION/tools/tools.tar.gz
wget http://bio.nic.funet.fi/pub/sci/molbio/chipster/dist/virtual_machines/CHIPSTER_VERSION/tools/tools.tar.gz

You can try also ftp or rsync protocols:

wget ftp://www.nic.funet.fi/pub/sci/molbio/chipster/dist/virtual_machines/CHIPSTER_VERSION/tools/tools.tar.gz
wget ftp://bio.nic.funet.fi/pub/sci/molbio/chipster/dist/virtual_machines/CHIPSTER_VERSION/tools/tools.tar.gz

rsync -avz --progress rsync.nic.funet.fi::ftp/pub/sci/molbio/chipster/dist/virtual_machines/CHIPSTER_VERSION/tools/tools.tar.gz .
rsync -avz --progress bio.nic.funet.fi::ftp/pub/sci/molbio/chipster/dist/virtual_machines/CHIPSTER_VERSION/tools/tools.tar.gz .

You could also install a program called lftp, which is great for transferring huge files over the internet. It tries to make the download faster by opening multiple connections and retries on connection failures.

sudo apt-get install lftp
lftp -c pget ftp://bio.nic.funet.fi/pub/sci/molbio/chipster/dist/virtual_machines/CHISPTER_VERSION/tools/tools.tar.gz

Of course you can also check if you get better transfer rate on the host machine than in the VM, but then make sure you have a plan how to get the file form the host to the VM.

Verify

Next to the tools.tar.gz package, there is a .md5 file. When the download completes, you can calculate the md5 of the downloaded file and check that it matches. This verifies that that the file is complete and hasn't corrupted during the download.

md5sum tools.tar.gz

Extract

Finally just extract the package:

tar -zxf tools.tar.gz -C /opt/chipster/tools/

Restart chipster services like instructed in the previous chapter.

Start Chipster client

Chipster servers are configured to start when the virtual machine is started. Client is supported on any platform with Java 1.7 (Windows, OS X and Linux) and installed automatically with Java Web Start.

After you have the Chipster virtual machine running, start the Chipster client by pointing your web browser to

http://<hostname or IP address of the virtual machine>:8081

and clicking on the Launch Chipster link. Login with chipster/chipster. To get started, you can open an example session (link in Datasets panel).

If you don't know the hostname or IP address of the virtual machine you have started, see instructions in the next section.

If you fail to start the client, there is typically something wrong in the network settings. See next section on how to automatically reconfigure the network as a quick fix.

Before starting to actually use Chipster, it is highly recommended to update the installation to get latest bug fixes. See Upgrading server installation.

User documentation can be found at

http://<hostname or IP address of the virtual machine>:8081/manual/

Configuring Chipster

  • Login to VM using username: ubuntu, password: chipster

  • Check the IP address of the vm

    • IP address is printed in the "message of the day" when you login

    • Or you can use:

        hostname -I
      
    • or

        ifconfig
      
  • For convenience, it is recommended to set the keyboard layout and time zone

    • Instructions are printed in "message of the day" when you login
  • Configure Chipster to use the given IP address:

    cd /opt/chipster;./configure.sh
    
  • You can also use

    cd /opt/chipster;./configure.sh auto
    

which auto detects the IP address and uses default values for other settings.

  • Restart Chipster:

    sudo systemctl restart chipster
    
  • Using a web browser go to the Chipster start page:

    http://<vm ip address>:8081
    

There are two accounts by default:

  • Username: ubuntu, password: chipster
    • has sudo rights for administering the OS installation of the virtual machine
  • Username: chipster, password: chipster
    • for running the chipster service
    • su or sudo rights are not required for running Chipster

Using Chipster server in EGI Federated Cloud

Preparatory steps

In order to launch Chipster servers in EGI Federated Cloud environment, the machine that is used to manage these servers must have following tools and files installed:

  • A valid X.509 certificate.
  • rOCCI command line client for managing cloud computing environment
  • voms-proxy-init command to create proxy certificates
  • Settings to connect the VOMS server hosting chipster.csc.fi VO

In addition you must join the chipster.csc.fi Virtual Oganization. The manager of the virtual Chipster servers needs to do these preparatory steps, described more in detail bellow, only once. After that Chispster servers can be managed with the FedCloud_Chipster_manager tool.

Note that the end-users, that wish just to use the Chipster server, running in Federated Cloud, do not need to do any of these preparatory steps.

Grid certificates

Federated Cloud, like most of the services hosted by EGI, use X.509 certificates for user authentication.

Certificates are granted by a certification authority (CA) who acts as a trusted third party that checks that the certificate is based on valid identity information. Many European University researchers can use GEANT as the the certification authority. In these cases you can use the DigiCert SSO portal to obtain a certificate based on the information provided by the local university authentication system.

[https://www.digicert.com/sso](https://www.digicert.com/sso)

Users from other countries should use their local certification authorities.

Joining chipster.csc.fi Virtual organization

To join the chipster.csc.fi VO go to Chipster Virtual Organization web page: https://voms.fgi.csc.fi:8443/voms/chipster.csc.fi

This web page authenticates you using the certificate installed in your browser. Therefore it is preferred that you use the same machine and browser for obtaining both the certificate and for joining the VO. In the Chipster VO web page, fill the form with you personal information and read the Acceptable Use Policy and accept it. After that you will receive an e-mail confirmation request in your mailbox. Please follow the instructions in that e-mail. After finishing the VO membership application process it will take some time before the VO membership is activated. Normally the membership will be activated within one working day after the application process is finished.

Installing rOCCI and VOMS client

The management of Federated Cloud resources is done using rOCCI, ruby based implementation of OCCI standard. The authentication is done using proxy certificates generated with command voms-proxy-init. The instructions to install these tools to your local linux machine can be found from:

[https://wiki.egi.eu/wiki/Fedcloud-tf:CLI_Environment](https://wiki.egi.eu/wiki/Fedcloud-tf:CLI_Environment)

Once you have installed the voms-proxy-init command, you must still define the connection to Chipster.csc.fi VO management server (VOMS).

To do this, first create directory /etc/grid-security/vomsdir/chipster.csc.fi and go to this directory:

mkdir /etc/grid-security/vomsdir/chipster.csc.fi
cd /etc/grid-security/vomsdir/chipster.csc.fi

Then create a file "voms.fgi.csc.fi.lsc" that contains the following 2 lines:

/O=Grid/O=NorduGrid/CN=host/voms.fgi.csc.fi
/O=Grid/O=NorduGrid/CN=NorduGrid Certification Authority

If you have already have file /etc/vomses, move the file "/etc/vomses" to "/etc/vomses/old_vomses" (voms will be a directory now) Create a file "chipster.csc.fi-voms.fgi.csc.fi" in "/etc/vomses" and write inside the following line:

"chipster.csc.fi" "voms.fgi.csc.fi" "15010" "/O=Grid/O=NorduGrid/CN=host/voms.fgi.csc.fi" "chipster.csc.fi"

Using FedCloud_chipster_manager

FedCloud_chipster_manager is a help tool that can be used to launch Chipster instances to EGI Federated Cloud. It constructs the rOCCI commands needed to launch, list or delete virtual machines that have Chipster server running inside. It can also be used to restart or check the status of the Chipster servers running inside the virtual machines.

Creating a encryption key pair

Some of the FedCloud_chipster_manager operations require that user provides encryption key pair that is used to access the virtual machine. In a Linux machine the key pair can be created with command:

ssh-keygen -t rsa -b 2048 -f keyname

The command above asks you to define a password for your key files and then creates two files: a private key and a public key. For example command:

ssh-keygen -t rsa -b 2048 -f FedCloudKey

will create two files called: FedCloudKey and FedCloudKey.pub. You should use proper passwords for you key files. The key files needs to be created only once: you can use the same key files for several virtual machines.

Setting up VOMS proxy

Before launching virtual Chipster servers, you have to create a proxy certificate that is used to authenticate to FedCloud environment. If you have the voms-proxy-init command installed and a valid X.509 certificate in your .globus directory, you can create a temporary proxy certificate with command: voms-proxy-init --voms chipster.csc.fi --rfc –dont_verify_ac

The command asks the password of your X.509 certificate and creates a proxy certificate that is valid for 12 hours. Note that voms-proxy-init requires that you are using OpenJDK based Java environment. Other Java environments cause error messages like: Credentials couldn't be loaded.

Manging a Chipster server in Federated Cloud

Launch

Once you have a valid proxy certificate generated, you can launch a new Chipster server to EGI Federated Cloud. First download the FedCloud_chipster_manager tool with command:

wget https://raw.githubusercontent.com/chipster/chipster/master/src/main/admin/fedcloud/FedCloud_chipster_manager

and give execution permission for the file

chmod u+x FedCloud_chipster_manager

Now you can launch a new Chipster Virtual Server with command (assuming you have the FedCloud_chipster_manager tool in your current working directory):

./FedCloud_chipster_manager -key keyfile -launch

Below you can find a sample session where a Chipster server is launched with key file FedCloudKey.

$ ./FedCloud_chipster_manager -key FedCloudKey -launch
------------------------------------------------------------
 Remaining validity time for your proxy certificate:
 11:49:07
------------------------------------------------------------
New virtual machine launched with following ID:
https://prisma-cloud.ba.infn.it:8787/compute/237de31f-335b-469a-9c79-edea1427e13b
Linked volume:
https://prisma-cloud.ba.infn.it:8787/storage/5554f22e-6849-47f2-9a94-b3f9e7c9dd22
Getting the specs of the new virtual machine.

The IP-addess of the chipster virtual server is:
212.189.205.205

After few minutes you can connect your virtual machine with command:

  ssh -i FedCloudKey ubuntu@212.189.205.205

The Chipster server can be connected with URL:

  http://212.189.205.205:8081

As shown in the example above, after launching a virtual Chipster server, FedCloud_chipster_manager prints out instructions how to connect the virtual machine and the Chipster server. Note however that it can take some 15 minutes before the server is fully operational so that it can be connected.

List

To list your virtual Chipster servers running in the EGI Federated Cloud, give command:

 ./FedCloud_chipster_manager -list

Example:

 $ ./FedCloud_chipster_manager -list
------------------------------------------------------------
 Remaining validity time for your proxy certificate:
 11:31:30
------------------------------------------------------------
Listing Virtual Machines with name: chipster-vm-kkmattil-at-csc.fi
in endpoint https://prisma-cloud.ba.infn.it:8787/
This may take some time.

https://prisma-cloud.ba.infn.it:8787/compute/237de31f-335b-469a-9c79-edea1427e13b occi.compute.hostname = chipster-vm-kkmattil-at-csc.fi IP: 212.189.205.205

The listing shows the ID of the VM running Chipster in the Federated Cloud and the IP-address that the Chipster server is using. I this example we can see that the ID of the Chipster VM is: https://prisma-cloud.ba.infn.it:8787/compute/237de31f-335b-469a-9c79-edea1427e13b and the Chipster server can be accessed with URL: http://212.189.205.205:8081 (8081 is the default port of Chipster).

Status

Option: -status makes FedCloud_chipster_manager to look for Chipster VM:s launched by the user, and to check the status of the Chipster server running in the VMs found. To check the status of a Chipster server, the FedCloud_chipster_manager needs to open an ssh connection to the VM. Because of that you must define the key file, that was used to launch the server, with option: -key. The password for the key file is asked for each server to be connected.

$ ./FedCloud_chipster_manager -key FedCloudKey -status
------------------------------------------------------------
 Remaining validity time for your proxy certificate:
 07:02:41
------------------------------------------------------------
Listing Virtual Machines with name: chipster-vm-kkmattil-at-csc.fi
in endpoint https://prisma-cloud.ba.infn.it:8787/
This may take some time.
--------------------------------------------------------------

https://prisma-cloud.ba.infn.it:8787/compute/86b97ed5-e256-4bce-83b5-aa3a41920975 occi.compute.hostname =chipster-vm-kkmattil-at-csc.fi IP: 90.147.102.3
Enter passphrase for key 'FedCloudKey': ********
ActiveMQ Broker is running (5995).
Chipster Fileserver Service is running (PID:6118).
Chipster Webstart Service is running (PID:6229).
Chipster Authentication Service is running (PID:6345).
Chipster Computing Service is running (PID:6883).
Chipster Manager Service is running (PID:6579).
chipster-jobmanager              RUNNING    pid 6635, uptime 1:10:06

Restart

The option -restart makes FedCloud_chipster_manager to restart the Chipster server running in the given Federated Cloud VM instance. This option can be used for example to fix the server, if the Chipster Fileservice service is not running in the VM (common problem in the Chipster servers running in FedCloud.) To restart the Chipster server, the FedCloud_chipster_manager needs to open an ssh connection to the VM. Because of that you must define the key file, that was used to launch the server, with option: -key. The password for the key file will be asked to open the connection.

For example, restarting the Chipster server running in instance: https://prisma-cloud.ba.infn.it:8787/compute/86b97ed5-e256-4bce-83b5-aa3a41920975 can be done with command:

$ ./FedCloud_chipster_manager -key FedCloudKey -restart \ https://prisma-cloud.ba.infn.it:8787/compute/86b97ed5-e256-4bce-83b5-aa3a41920975
------------------------------------------------------------
 Remaining validity time for your proxy certificate:
 06:49:57
------------------------------------------------------------
Restarting chipster server running in instance: https://prisma-cloud.ba.infn.it:8787/compute/86b97ed5-e256-4bce-83b5-aa3a41920975
in endpoint https://prisma-cloud.ba.infn.it:8787/
This may take some time.
-------------------------------------------------------------

https://prisma-cloud.ba.infn.it:8787/compute/86b97ed5-e256-4bce-83b5-aa3a41920975 occi.compute.hostname = chipster-vm-kkmattil-at-csc.fi IP: 90.147.102.3
Enter passphrase for key 'FedCloudKey':********
chipster-jobmanager: stopped
Stopping Chipster Authentication Service...
Stopped Chipster Authentication Service.
Stopping Chipster Computing Service...
Stopped Chipster Computing Service.  
Stopping Chipster Manager Service...
Stopped Chipster Manager Service.
Stopping Chipster Fileserver Service...
Stopped Chipster Fileserver Service.
Stopping Chipster Webstart Service...
Stopped Chipster Webstart Service.
Stopping ActiveMQ Broker...
Stopped ActiveMQ Broker.
Starting ActiveMQ Broker...
Waiting 10 seconds for ActiveMQ to start...
Starting Chipster Fileserver Service...
Starting Chipster Webstart Service...
Starting Chipster Authentication Service...
Starting Chipster Computing Service...
Starting Chipster Manager Service...
chipster-jobmanager: started
Stopping Chipster Computing Service...
Stopped Chipster Computing Service.
Starting Chipster Computing Service...
ActiveMQ Broker is running (13047).
Chipster Fileserver Service is running (PID:13170).
Chipster Webstart Service is running (PID:13281).
Chipster Authentication Service is running (PID:13397).
Chipster Computing Service is running (PID:13912).
Chipster Manager Service is running (PID:13631).
chipster-jobmanager              RUNNING    pid 13656, uptime 0:00:05

System administration

Chipster architecture

The shortest description for Chipster architecture would be that it is very flexible. The Chipster environment is based on message oriented architecture (called also message passing architecture or message oriented middleware architecture). Components are connected using message broker (ActiveMQ). This results in a loosely coupled distributed system. Chipster is designed to be based on the idea of broadcast, allowing components to be unaware of each other. Also the system does not depend on the protocol used for communication.

The Chipster environment consists of the following components:

  • message broker (1 to many)
  • file broker (1)
  • compute service (1 to many)
  • authentication service (1)
  • jobmanager service (1)
  • toolbox service (1)
  • manager service (1 to many)
  • client (many)

All components can be added or removed dynamically. In case there are multiple instances of a same component running there's no need for extra configuration, because, for example, multiple compute services can function without being aware of each other. This allows system administrator to add compute components on the fly if there is need for extra processing power. Currently the exception is that there can be only one authentication service.

One of the key ideas in designing Chipster architecture was to carefully consider where each bit of the system's state is managed. Chipster client follows thick client paradigm where client is functionally rich. This decision was made to keep server environment simple and lightweight, to reduce number of messages, to distribute processing load (especially data visualisation) to clients and to allow improved user experience as client application is mostly independent of server components.

Server components explained

Message broker (ActiveMQ) acts as the central point of the system, passing messages between components. ActiveMQ supports broker distribution for improving scalability and reliability, so multiple brokers can be used simultaneously.

File broker distributes files to other components, acting as a supplement to message broker. File distribution is based on pull mechanism, where components go and retrieve files from the file broker. This way compute servers and clients can be behind firewalls. Using separate file broker also allows compute servers to use minimal disk space as files are cached at file server.

Authenticator processes requests from clients. Each request is examined, and if valid session exists for that client it is allowed to continue. Otherwise a request is made for user to authenticate and after a successful authentication session is created. Authentication service supports many types of authentication sources (Unix passwd, JAAS, LDAP...), and can use them simultaneously. Server components authenticate to broker using server specific keys, and are allowed to communicate directly without going through the authenticator. Authentication service is a separate component so that it can be deployed inside intranet, as it might need access to sensitive information such as user databases.

Compute service listens for computation requests. When client initiates a new task, all compute services with free resources reply and client decides which service gets to process the task. This way there is no single point of failure in distribution of tasks to server environment and compute services can be modified easily on fly.

Simple server installation

The simple way to install Chipster environment is to deploy all components to a single server and to distribute clients by using Java Web Start.

All server components run inside their own directories, so having them on a single server does not require any special arrangements. Message broker and file broker are running in their respective ports, and other components connect to them using local network.

Advanced server installation

A good guideline for setting up advanced installation is to dedicate an untrusted server for message broker and file broker components, as they are the only components that have open server ports. That server should not be inside organisations firewall, i.e., be in DMZ network. To secure user credentials, authenticator should be installed separately on a strongly protected machine.

It is possible to deploy multiple compute servers. All of them should have same tools descriptions, but it is possible to select active tools per server. It is also possible to configure maximum job counts. If you have many nodes available but they have also other use besides Chipster it is recommended to deploy compute servers on as many nodes as possible but limit the per server job count to keep Chipster from hogging all the resources. If there are memory intensive tools, it might be a good idea to deploy dedicated node for them with large memory and low maximum job count. Independent compute services can also be deployed to a batch processing system (LSF etc.), following a worker paradigm.

Distributed comp tutorial has more detailed instructions for configuring a this kind of setup.

Running components

To start all the Chipster services, run switch to /opt/chipster and run:

./chipster start

In addition to start, you can also use stop, restart, and status. Restart runs stop and start consecutively and status reports if the services are running (and what are their process ids).

Script chipster is a high level tool for managing all services. It will check for each service the corresponding subdirectory and pass on the command. If subdirectory does not exist, then it will be skipped. This means that components can be removed from a certain node and chipster script can be still used to run the remaining ones.

To control individual services, say restart compute service, use:

./comp/bin/chipster-comp restart

So the script to run is in bin directory under the component directory and has component specific postfix.

If you are using the virtual machine environment or otherwise configured Chipster as daemon, you can use service command from anywhere:

systemctl start chipster

Or to restart the compute service, use:

systemctl restart chipster-comp

So the service name is

chipster-<component directory name>

If any of these give you error "Could not detect hardware architecture, please set platform manually.", it means that hardware architecture (which binary to run) was not detected automatically. It can be set manually by editing all instances of chipster-generic.sh. Architecture is configured by changing the PLATFORM line to match your hardware architecture (see comment above the line for options). To just get things running, you can use the architecture specific scripts under

<component>/bin/<architecture>/chipster-<component>

Upgrading server installation

Upgrading VM bundled installation

Chipster VM bundle comes with an automatic update tool that allows you to update the installation without downloading everything again. Updates do not happen automatically, but must be initiated manually. Before the update, you should stop Chipster services.

./chipster stop
./update.sh
./chipster start

update.sh script is just a bootstrap script that downloads the actual update script and executes it. This way the update system itself also gets updated when needed.

The actual update script is called update-exec*.sh and is located at

http://bio.nic.funet.fi/pub/sci/molbio/chipster/dist/virtual_machines/updates/

When run, update-exec.sh downloads files, unpacks them, moves things around when needed and does other required setup steps.

Chipster update system only manages Chipster installation and tool dependencies. You should also take care of keeping the operating system of the VM installation up to date, using normal Debian tools, such as aptitude.

sudo aptitude upgrade

Operating system packages get updated and a reboot might be necessary.

Upgrading other installations

The automatic update mechanism is made for the virtual machine installation running on a single node. If you have distributed installation where different server components are running on a different servers, it will probably fail on the nodes where the tools don't exist or are mounted as read-only.

Without cloud sessions, there is no other persistent data in Chipster than your own configurations. In this case the easiest way to handle updates is to automate all your installation and configuration steps, so that you can simply start the new version and throw away the old one.

To move relevant functionality over from the previous installation, you should check at least these locations:

  • chipster/*/conf/chipster-config.xml - custom configuration
  • chipster/comp/conf/runtimes.xml - custom analysis tool runtimes
  • chipster/comp/modules - custom tool scripts
  • chipster/webstart/web-root/manual - custom manual pages

For exact details on changes between versions, look at the update-exec3.sh script at http://bio.nic.funet.fi/pub/sci/molbio/chipster/dist/virtual_machines/updates/.

Directory layout

Chipster directory layout is different on client and server sides. On client side the goal has been to make placement of files and directories compatitible with operating system specific conventions. On server side the goal has been to make the layout as coherent as possible (especially to integrate well into Java Service Wrapper that wraps all server components).

Client

Application data (logs, SSL keys, user preferences) is stored in a one place and user data (sessions, workflows) in another.

  • Windows
    • Application data stored in Local Settings\Application Data\Chipster inside user's home directory (in Windows XP?)
    • Application data stored in AppData\Local\Chipster inside user's home directory (in Windows 7)
    • User data stored in My Documents inside user's home directory
  • Mac OS X
    • Application data stored in Library/Application Support/Chipster inside user's home directory
    • User data stored in My Documents inside user's home directory
  • Linux/Unix
    • Application data stored in .chipster inside user's home directory
    • User data stored in home directory, or Document or My Documents inside the home directory if they exist

If operating system is not recognised, we fall back to Linux/Unix. This is because most often esoteric OS's are Unix variants.

Server on Linux

Typically Chipster is installed to /opt/chipster. Inside the installation directory there is a shared directory and several independent component directories (that depend on the shared directory). The contents of the shared directory are:

* chipster/shared
  * bin - generic executable files
  * lib - Java JAR and platform specific libraries
  * lib-src - source codes for libraries that require source code to be distributed together (LGPL)

All of the component directories follow the same basic layout. The contents of the component directories are given below. "Wrapper" means here Java Service Wrapper, which is bundled with Chipster server installation.

* chipster/<component name>
  * bin - executable files and utility scripts
    * chipster-<component name> - main executable script (use this)
    * linux-x86-<32 | 64> - platform specific executables
      * chipster-<component name> - platform specific executable script
      * wrapper - wrapper binary
  * logs - log files for wrapper (console output) and Chipster itself
    * wrapper.log
    * chipster.log
    * messages.log
    * jobs.log
    * security.log
    * status.log
  * security - files related to encryption (and authentication on authentication service)
    * keystore.ks - automatically generated dummy key for SSL
    * users - flat file user database
  * conf - component's configuration
    * chipster-config.xml - main Chipster configuration
    * wrapper.conf - wrapper configuration
    * jaas.config - JAAS authenticator configuration
    * runtimes.xml - compute service runtime environments' configuration (compute service)
    * environment.xml - description of tool runtime environment (compute service)
  * file-root - www-root of file cache (file broker)
  * web-root - www-root of Web Start files (webstart service)
  * jobs-data - working directory for jobs (compute service)
  * modules - directory containing analysis tools (compute service)
    * microarray - microarray tools, in tool type specific subdirectories
       * R-<version>
       * bsh
       * java
       * microarray-module.xml - tool configuration for this module
    * ngs - NGS tools, in tool type specific subdirectories
       * R-<version>
       * java
       * ngs-module.xml - tool configuration for this module
    * sequence - sequence analysis tools, in tool type specific subdirectories
       * shell
       * sequence-module.xml - tool configuration for this module
    * <third party modules>
  * database - monitoring database (manager)
  * database-backups - backups for monitoring database (manager)

ActiveMQ uses it's own directory layout. See ActiveMQ documentation for more information.

Configuration system

Configuring Chipster

If you just want to get your Chipster up and running, execute configure.sh script and your done! If you want to know more about Chipster configuration system, then read on.

Chipster stores application configuration to a file called chipster-config.xml. It is located either in conf subdirectory or loaded dynamically via URL. The former approach is meant for server components and the latter for clients started over Java Web Start.

Configuration is loaded in two steps. First an internal default configuration is loaded (chipster-config-specification.xml, located inside the Chipster JAR) and then the normal configuration file chipster-config.xml. The latter contains only information that needs to be set per instance basis, so it is quite minimalistic. However it is possible to overwrite configuration entries of the internal default configuration using the normal configuration file. Just include the entry in the file and it will replace the default one.

The recommended way to configure a new Chipster instance is to use the configure.sh script located at the installation root directory. It will configure all the components and the Web Start client descriptor. You can also modify the configuration files manually. For information on meaning of the different configuration entries, please refer to https://github.com/chipster/chipster/blob/master/src/main/resources/chipster-config-specification.xml in the code repository.

Loading configuration over URL

Each Chipster component (client, analysis server, file broker etc.) has its own configuration file. If configuration file is not explicitly specified, chipster-config.xml is used. Configuration can be loaded over URL by passing an argument -config at component startup. You can also specify a local file (e.g. -config file:/path/to/config.xml). For Web Start clients configuration file can be set in the chipster.jnlp descriptor file. Using this mechanism allows administrator to manage configuration centrally (such as the address of the broker server).

The configuration file

The configuration file chipster-config.xml contains all configuration entries that different components require. See below for an example configuration file of a file broker component.

<configuration content-version="3">

    <configuration-module moduleId="messaging">

        <entry entryKey="broker-host">
            <value></value>
        </entry>

        <entry entryKey="broker-protocol">
            <value></value>
        </entry>

        <entry entryKey="broker-port">
            <value></value>
        </entry>

    </configuration-module>

    <configuration-module moduleId="security">

        <entry entryKey="username">
            <value>filebroker</value>
        </entry>

        <entry entryKey="password">
            <value>filebroker</value>
        </entry>

    </configuration-module>

    <configuration-module moduleId="filebroker">

        <entry entryKey="url">
            <value>http://chipster.example.com:8080</value>
        </entry>

        <entry entryKey="port">
            <value>8080</value>
        </entry>

    </configuration-module>

</configuration>

The file contains several modules (XML element configuration-module), and the selection of modules varies between different components. Modules security and messaging are related to how Chipster node connects to messaging fabric and are always required. Additionally, there are node specific modules, such as filebroker in the example.

Inside the module, there are configuration entries (XML element entry). Every entry has a key (XML attribute entryKey) and it contains one or more values (XML element value).

Firewalls and proxies

In a typical setup, the following TCP ports need to be open in the firewall:

  • 61616 for message broker service (Openwire or Openwire/SSL)
  • 8080 for file broker service (HTTP or HTTP/SSL)
  • 8081 for webstart service, optional (HTTP or HTTP/SSL)
  • 8082 for admin web console, optional (HTTP or HTTP/SSL)

One of the design guidelines in Chipster was to make it easily adaptable to various firewall configurations. Even though there are many server components, only message and file brokers are listening to open ports. In other words, they act as a hub to which other components connect to. Both of the components are designed so that they can be installed on a "untrusted" machine located in the DMZ. Compute and authentication services often have to be located inside intranet, which is not a problem as they do not act as servers from a networking point of view.

Client uses TCP or SSL to connect to message and file brokers. This communication can be configured to ports 80 and 443 to bypass strict firewalls. In some high security environments practically all network access is disabled, except for HTTP using local proxy. Currently Chipster does not use HTTP, so in this extreme case deployment is not possible without changes to firewall configuration. However routing messages through HTTP is supported by ActiveMQ message broker, so in future these scenarios might also be supported directly.

By default Chipster ignores Java proxy settings and always uses direct connection. Is is so because many proxies are not truly HTTP/1.1 compatible and mess up communication. It is possible the disable the override and make Chipster to use Java proxy settings. In chipster-config.xml, add the following under the module messaging:

<entry entryKey="disable-proxy" type="boolean" description="should we ignore Java proxy settings and connect directly">
<value>false</value>
</entry>

The change needs to made to chipster-config.xml of clients. In normal setups it is served by webstart server and will be in effect when clients are restarted.

NAT

Sometimes the server environment needs to be installed behind NAT (network address translation). This usually happens when the server environment connects to internal network, which is not visible to public internet. To make the system accessible externally, NAT host is added that directs traffic between internal and external networks. Nowadays such a setup is typical especially for cloud installation (e.g. OpenStack based environment).

Operating through NAT makes network configuration more complicated, because servers need to bind to different addresses that they should be connected to. Chipster's configure.sh tool asks separately for public and private IP/hostname.

  • Run configure.sh
    • Use external address for public ip/host
    • Use internal address for private ip/host
  • Restart chipster services
    • sudo systemctl restart chipster

Internal address is the actual IP or host name of the node that the server is running on. External address is the NAT'ed externally visible IP or host name that is mapped to the internal address. Same port numbers must be used internally and externally.

In NAT, the client is able to communicate with the server only through the public address. Therefore, the public address is configured to files webstart/web-root/chipster.jnlp and webstart/web-root/chipster-config.xml. All server components use private address to connect to the message broker.

Most communication happens through the message broker, but file transfers between client, file broker and comp are a special case. Both clients and comp server use file broker's public IP address for file transfers. If you want to optimize network traffic between file broker and comp servers and use internal IP instead, set this internal file broker IP in comp configuration:

<entry entryKey="overriding-filebroker-ip" type="string" description="connect to filebroker using this ip address instead of its public address">
<value>INTERNAL_IP</value>
</entry>

Secure communications

Setting up SSL

By default Chipster virtual machine is configured to use SSL encrypted communication with self-signed keys. These keys are generated when the virtual machine is started for the first time. All the communication is encrypted, but you should get your keys signed by a Certificate Authority (CA) to make sure you are connecting to an authentic server. There are two components that need keys: message broker and file broker and you may want to create own key for both of them. See Java Security documentation for how to get your keys signed by a CA.

Step 1. Locate keystore

You can either use the self-signed keystores and truststore generated in the virtual machine boot, generate your own (see Generating SSL keys) or use the keys signed by CA. Use of CA signed keys is more secure and easier to configure, at least after you have acquired the signed certificate.

There are two parts in SSL: encryption and authentication. Encryption ensures the confidentiality of the communication and is based on private keys. Message broker and file broker have keystore files that contains their private keys. These files must be kept secret and must be available only to these two server processes.

Authentication ensures that the clients are communicating with an authentic server. When using CA signed keys, the client can check that server has an authentic certificate for that hostname. Authenticity of the CA signed certificates is verified using the the certificate chain stored in server's keystore and the root CA certificates included in the Java. This verification happens automatically always when a client connects to a SSL secured server.

As the name suggests, self-signed keys aren't signed by any CA and thus the authenticity of the server can be verified only by having a certificate of the self-signed key on the client side. In Chipster, these certificates are saved in a single truststore file called client.ts, which is copied to all other components.

Step 2. Configure message broker

By default, message broker's keystore is called broker.ks.

  • copy broker.ks to chipster/activemq/conf
  • open chipster/activemq/bin/<platform>/wrapper.conf and edit the following settings (uncomment if needed)
    • javax.net.ssl.keystorePassword=password (or whatever you have used)
    • javax.net.ssl.keystore=%ACTIVEMQ_BASE/conf/broker.ks
  • open chipster/activemq/conf/activemq.xml and check that the protocol is "ssl" (you can change port also)

Step 3. Configure file broker

When you have done step 2, then all confidential information and metadata (including file names and owners) will be encrypted. You can also encrypt the payload of file transfers, though it will have impact on performance. To do so, you need to also configure the file broker to use SSL.

First, you need to have SSL keystore set up (step 1). Then you just have to

  • copy filebroker.ks to chipster/fileserver/security
  • open chipster/fileserver/conf/chipster-config.xml and in module "filebroker" within entry "url" change protocol from http to https (you can change port also)

By default, file broker's keystore file is called filebroker.ks and its password is password. If you have used something else, copy these configuration items to chipster/fileserver/conf/chipster-config.xml and edit accordingly.

<entry entryKey="filebroker-keystore" type="string" description="filebroker keystore file for SSL">
	<value>${chipster_security_dir}/filebroker.ks</value>
</entry>    
<entry entryKey="storepass" type="string" description="keystore password for SSL">
	<value>password</value>
</entry>

Step 4. Configure Chipster components

All components communicating with message broker and file broker must check the identity of these servers. The configuration is little bit different depending on whether you are using CA signed or self-signed certificate.

For a CA signed certificate, the virtual machines' default configuration for self-singed certificates must be removed. This is easiest to do with the configure.sh tool, but you can also manually remove these configuration items in each confguration file. The default values for these configuration items, when not overridden by the virtual machine configuration, are compatible with CA singed certificate: no truststore is used and hostname verification is enabled.

cd /opt/chipster

bash configure.sh edit client remove security/client-truststore
bash configure.sh edit servers    remove security/server-truststore

bash configure.sh edit client remove security/verify-hostname
bash configure.sh edit servers    remove security/verify-hostname

In case of a self-signed certificate, the virtual machine already has a suitable configuration. By default, the trusstore file is called client.ts, it has a password password and a hostname verification is disabled. For reference, these are commands for making this default configuration in case you want to edit any of these default values.

cd /opt/chipster

bash configure.sh edit client set security/client-truststore client.ts
bash configure.sh edit servers    set security/server-truststore '${chipster_security_dir}/client.ts'

bash configure.sh edit client set security/verify-hostname false
bash configure.sh edit servers    set security/verify-hostname false

Restart all server components.

sudo systemctl restart chipster

Restart also the client and that's it.

If the client application fails to start with UnknownHostException, the problem is that hostname cannot be resolved on the workstation. Java SSL requires that hostnames can be resolved for both endpoints. This can happen in Linux, so try "host foobar" on shell. If it says "host not found" your network is a bit problematic. You can add "foobar" to your /etc/hosts after localhost, like "127.0.0.1 localhost foobar", and it should work. You can also contact system administrator to find out why your hostname cannot be resolved.

Some international versions of the Java Runtime do not have all the strong security components in place. If this is the case, you will get "RSA premaster secret error" when trying to run Chipster server. Installing "Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files" should fix it. They can be installed using your system's package manager (if available there) or from Oracle Java site.

How to disable SSL

When troubleshooting SSL, it is sometimes a good idea to check that everything works with the plain unencrypted TCP communication. It may be a simpler option also for small test systems in closed network environment. This is easiest to do with the configure.sh tool. Just change the message broker protocol from ssl to tcp and the file broker protocol from https to http.

cd /opt/chipster
bash configure.sh

Generating SSL keys

Chipster comes with a self-signed keystore that gets you going with SSL. Chipster uses Java's built-in SSL implementation. Keystore can be manipulated as explained in Java Security documentation, so you can also use your existing keys.

You should use CA signed keys always when possible, but here we describe how you can generate your own self-singed SSL keys. Please note that these keys are not approved by any Certificate Authority, and cause warnings if used outside of Chipster environment.

Step 1. Generate keys

Keys can be generated using Java's keytool-application.

Generate key using keytool:

# use RSA certs, because there has been problems with DSA in Jetty according to Jetty docs
keytool -genkeypair -alias broker -dname "cn=chipster activemq self-signed" -validity 1800 -keypass password -storepass password -keyalg RSA -keystore broker.ks
keytool -genkeypair -alias filebroker -dname "cn=chipster filebroker self-signed" -validity 1800 -keypass password -storepass password -keyalg RSA -keystore filebroker.ks

Step 2. Export certificates and create truststore

keytool -exportcert -alias broker -storepass password -file broker-cert -keystore broker.ks
keytool -exportcert -alias filebroker -storepass password -file filebroker-cert -keystore filebroker.ks

keytool -importcert -alias activemq -storepass password -file broker-cert -keystore client.ts -noprompt
keytool -importcert -alias filebroker -storepass password -file filebroker-cert -keystore client.ts -noprompt

rm broker-cert
rm filebroker-cert

Step 3. Set server keys

mv broker.ks /opt/chipster/activemq/conf/
mv filebroker.ks /opt/chipster/fileserver/security/

Step 4. Distribute truststores

cp client.ts /opt/chipster/auth/security/
cp client.ts /opt/chipster/comp/security/
cp client.ts /opt/chipster/fileserver/security/
cp client.ts /opt/chipster/manager/security/
cp client.ts /opt/chipster/webstart/security/
cp client.ts /opt/chipster/webstart/web-root/

Step 5. Restart servers and clients

sudo systemctl rstart chipster

Authentication

Users file

The simplest supported authentication mechanism is the user file in auth/security/users. The format is:

<username>:<password>:<exp. date as YYYY-MM-DD>:comment

Only username and password are required. Blank lines and comment lines starting with # are allowed.

LDAP

See Authentication via LDAP.

Server components

Chipster server components can be divided into services and brokers. Services are independent components that perform tasks related to their roles. They do not use server socket and for that reason can be deployed behind a firewall. When services are started, they "call back" to broker components that take care of communication between the services. Broker components do use server sockects.

Look at Directory layout to see where each of the components is located on the Chipster installation.

Message broker

Message broker is the hub of the system. It is based on the ActiveMQ server that implements the JMS standard.

Message broker is required. Logically there is only one message broker. ActiveMQ also supports clusters of brokers, so that the message broker can be run on multiple servers for improved performance and fault tolerance.

File broker

File broker is the central file repository of the system. It is based on the Jetty server and uses HTTP or HTTPS protocols for communication.

File broker is required. There can be only one of them in the installation.

Cloud sessions

Starting from Chipster 3.7, there is an experimental feature called Cloud sessions. It enables users to save analysis sessions on the server side. Cloud sessions are disabled by default, because the server administrator has to take care of safe storage of the data first. Offering persistent storage in a VM isn't trivial, and we can't do much to automate it, because every cloud has different options for it. Before enabling cloud sessions, you should consider:

  • Where are the files and the database stored?
  • How do you update the VM?
  • How do you backup these files?
  • How do you restore the backups?

Some ideas about these topics are provided below. To enable the Cloud sessions, add this configuration entry to the file /opt/chipster/webstart/web-root/chipster-config.xml and restart the client.

    <configuration-module moduleId="client">
	        <entry entryKey="enable-cloud-sessions" type="boolean" description="Enable cloud sessions">
		        <value>true</value>
	        </entry>
    </configuration-module>

Storage

Make sure the database files in /opt/chipster/fileserver/db-root and the actual dataset files in /opt/chipster/fileserver/file-root/storage are stored on on some peristent disk (i.e. 'volume' in a cloud) that can be moved to the new VM.

Migration

If you have the files on a volume, you should be able to do Chipster updates like this:

  • terminate the old VM
  • launch the new version and attach the old volume to it
  • mount and link the above folders from the old volume

This works as long as we don't change the database schema. If we do, we will provide some conversion script or instructions how to migrate the database to the new format.

Backups

Every disk will fail eventually, bugs may delete the data or administrators can make mistakes. If you enable Cloud sessions and if your users assume that the files stored in Chipster will stay there, you should regularly make backups. You can make the backup copies of the dataset files directly from the above folder. However, you should take the database backups from /opt/chipster/fileserver/metadata-backups instead of the db-root to make sure that that get a consistent copy of the database. You should store the backups on a storage which is unlikely to fail at the some time with your primary storage. Please make sure that you can also restore the server from your backups.

By default the database is backed up 10 past midnight everyday and 100 backups are kept by deleting oldest backups as needed. This results in having daily backups for about 3 months time in this folder. If the database corrupts, these backups may help you to fix it as long as you don't lose the corresponding dataset files.

You can change metadata backup settings in fileserver/conf/chipster-config.xml.

Restoring filebroker database backup

There are two ways to restore the backup.

Option 1:

  • Stop chipster filebroker systemctl stop chipster-fileserver or /opt/chipster/fileserver/bin/linux-x86-64/chipster-fileserver stop
  • Delete corrupted or old database db-root/ChipsterFilebrokerMetadataDatabase
  • Copy backed up database metadata-backups/filebroker-metadata-db-backup-yyyy-mm-dd_hh-mm:ss/ChipsterFilebrokerMetadataDatabase to db-root
  • Restart chipster filebroker systemctl start chipster-fileserver or /opt/chipster/fileserver/bin/linux-x86-64/chipster-fileserver start

Option 2:

Add this configuration entry to the file /opt/chipster/fileserver/conf/chipster-config.xml. Add the name of directory in the metadata-backups folder that you want to restore inside the <value> tag. Restart the file broker and remove this entry from the configuration file, otherwise it will replace your database on the next restart also.

<configuration-module moduleId="filebroker">
	<entry entryKey="metadata-restore-path" type="string" description="directory name in the metadata-backup-dir to restore. This will overwrite the current database">
		<value></value>
	</entry>
</configuration-module>

Example sessions

Example sessions make it possible to try different client features and tools even when there isn't suitable dataset at hand. Example sessions are stored on filebroker. There is a special username example_session_owner, whose cloud sessions are shown for all other users as example sessions.

To modify these sessions, create a password for this special account just like for any other account and log in to client with this username. The file menu contains Open cloud session, Save cloud session and Manage cloud sessions for managing these sessions. It is always safe to add new sessions, but removal of datasets or sessions should be made only during a service break to avoid causing problems for users that are accessing those dataset at the same time.

For programmatical access, its easier to handle example sessions as zip files in /opt/chipster-beta/fileserver/file-root/example-session/. Any modifications to these files will be updated to example sessions when the filebroker is started. Also all modifications done by example_session_owner are exported to this directory.

Additionally, example sessions can be published over http by setting configuration item example-session-path to public/example-session. After that it's possible to download all example session in a single tar archive at

http://<filebroker host>:<port>/public/example-session/all-example-sessions.tar

Importing large files

Importing tens or hundreds of gigabytes of files isn't always easy. The easiest way is to import files using the Chipster GUI on your laptop. Just make sure you have a 1 Gbit/s wired network connection on your laptop and make sure your power saving settings don't allow the laptop to go to sleep during the transfer. This allows transfers speeds up to 100 MB/s. In general purpose storages, be it a single hard drive or a big enterprise storage systems, the throughput of single stream rarely is more than this in real life.

If you know that the storage and network on both your own file server and the Chipster server can do a lot better, your laptop becomes the bottleneck. Here are some ideas how to go around it:

Command line client If you have a SSH access to a server which has you files, you can use the Chipster command line client directly on the server to upload the files to Chipster and save them as a cloud session. You can then open the cloud session in the normal Chipster client.

Chipster with remote desktop software If you have a remote desktop access to a server which has your files, you can run Chipster client on the server, import the files, save a cloud session and then open that cloud session on your own laptop.

Web server If you can share your files with a web server, you can use the "Import from URL to server" tool in Chipster to copy the files. The firewall of the web server must allow connections from the Chipster comp server. The comp downloads the files directly from the web server, without moving them through your laptop. You have to consider if it's ok to make your files public to the Chipster server.

Future improvements It would be nice to mount a network drive to the file-broker and use the files directly without copying them at all in the import phase. However, Chipster is a multiuser systems and usually the files on the network drive cannot be public to all the users. We haven't implemented this, because we haven't found out a generic way to configure the access rights for different Chipster users. It would be easier to build a feature which shows all the mounted files to all users, but this would be useful only to limited set of server maintainers.

Compute service

Compute service takes care of all processing (tools in the client). It calls various backend applications and runtimes to do the actual computation.

Compute service is required. There can be many of them. Typically compute service is the only component that is distributed over multiple servers. When multiple services are added, clients negotiate with them and take care of load distribution over the services.

Compute service state management

The simplest way to run compute service is to make it completely stateless. For each job, it fetches inputs, does the processing and uploads outputs to file broker.

For better performance, compute service can access file broker repository directly (both inputs and outputs). By default the service will access files on the file broker directly from disk if they are installed on the same server. To disable the optimisation, you can clear the following entry from chipster/comp/config/chipster-config.xml:

<entry entryKey="local-filebroker-user-data-path" type="string" description="path to local filebroker's user data directory">
<value></value>
</entry>

Compute service cleans up after each job. For debugging purposes this can be disabled by adding the following entry to comp module in chipster/comp/config/chipster-config.xml:

<entry entryKey="sweep-work-dir" type="boolean">
<value>false</value>
</entry>

Compute service resource utilization

Many compute intensive tools like aligners can utilize more than one CPU core. For tools that parallelize well, doubling the number of cores may almost halve the runtime as long as there are free cores. The maximum number of cores each job can use can be altered by adding the following entry to the comp configuration module in /opt/chipster/comp/config/chipster-config.xml:

<entry entryKey="job-threads-max" type="int" description="max number or threads that single job should use">
	<value>2</value>
</entry>

By default each job can utilize 2 cores. If you let a single job to utilize all the cores, it will run at maximum speed alone, but the runtime will vary a lot depending on how many jobs are running at the same time on a compute node.

You should make sure the jobs won't run out of memory. Limit the number of jobs by by adding the following entry to the comp configuration module in /opt/chipster/comp/config/chipster-config.xml:

<entry entryKey="max-jobs" type="int" mustBeSet="true" description="maximum number of jobs run simultaneously">
	<value>5</value>		
</entry>

By default this limit is disabled. You could start by settings this to the amount of RAM in gigabytes divided by 8 to make sure there is at least 8 GB of memory for each job. If there isn't enough memory to run two jobs at the same time, it's better to run them one after another than use swap that would slow down considerably both jobs.

Please note that virtualisation systems like VirtualBox have their own limits for the number of virtual cores and RAM that should be adjusted accordingly.

Authentication service

Authentication service checks each message from client, requests authentication if needed and forwards them to authenticated area. JMS message topics in Chipster are divided into unauthenticated and authenticated. Only server components can write to authenticated topics, so all client messaging needs to pass through authentication service to be picked up by other server components.

Authentication service is required. There can be only one of them in the installation.

Manager service

Manager server listens to the logging topics and writes log information to a database. It also offers web interfaces for accessing the database and monitoring the system.

Manager service is not required. In principle there can be multiple services running, but that would not be very useful.

By default, manager takes a backup of the database daily at 0:05 am and saves it under /opt/chipster/manager/database-backups. Daily backups are stored for 30 days, after which only the first backup of each month is retained.

When the service is monitored by automatically running jobs, these test jobs will eventually make the job database unnecessarily large. This can be avoided by defining the test accounts in manager configuration admin-test-account-list (see Configuration system). This will delete the test jobs from the database after 30 days and exclude also those jobs from the statistics in admin-web by default.

Admin web

Admin web is a web user interface for administration of the Chipster servers. It is disabled by default, by can be enabled in manager configuration /opt/chipster/manager/conf/chipster-config.xml:

<configuration-module moduleId="manager">
	<entry entryKey="start-admin" type="boolean" description="start admin web console">
		<value>true</value>
	</entry>

	<entry entryKey="admin-username" type="string" description="admin web console username">
		<value>chipster</value>
	</entry>

	<entry entryKey="admin-password" type="string" description="admin web console password">
		<value>chipster</value>
	</entry>
</configuration-module>

Restarting manager will start the Admin web in

http://<manager host>:8083/admin-web/

Features of admin web include:

  • list of connected servers and clients
  • view storage usage and delete cloud sessions
  • view list of running jobs
  • view, sort and filter content of a job database
  • view various job statistics
  • maintenance tools, like stop comp server gracefully

If your installation is visible to the internet, set firewall rules to restrict access to this port.

H2 console

Admin web has a user interface for viewing, sortin and filtering a job database and various job statistics. Alternatively, H2 console allows you to write SQL queries directly to the database. H2 console can be enabled in manager configuration:

<entry entryKey="database-username" type="string" description="username for JDBC">
	<value>chipster</value>
</entry>

<entry entryKey="database-password" type="string" description="password for JDBC">
	<value></value>
</entry>

<entry entryKey="start-web-console" type="boolean" description="is web console enabled">
	<value>true</value>
</entry>

Restarting manager after these changes will start the H2 console in

http://<manager host>:8082/

On the login page, set connection url to ´jdbc:h2:database/chipster-manager´ and enter the default credentials mentioned in the above config. If your installation is visible to the internet, set firewall rules to restrict access to this port.

Job Manager

Job Manager (JM) is a job scheduler for Chipster which receives authenticated job submissions from clients and executes the jobs on Chipster Compute service on clients behalf. On a job has been submitted the client can shutdown and it can check and fetch the results once it connects to Chipster service on the next time.

Tool development

Writing Chipster tools

Basically, you have to do three things:

  • provide the tool itself (command line executable, R script, Java class etc.)
  • write a tool description in [SADLFormat], so that the script can be run and shown in the client application
  • make compute service aware of the tool

You should also follow conventions for Chipster analysis tools.

Adding and modifying tools

Chipster tools are divided into modules. Modules are high level packages that cover some specific area of data analysis, such as next generation sequencing. Tool modules are located at the frontend server in chipster/toolbox/tools directory. Each module has its own subdirectory, where the tools are located in tool type specific subdirectories. Tools can be R scripts, BeanShell scripts, or header stubs that define how command line tools are invoked etc. Besides the tools themselves, each module has a configuration file -module.xml that lists all tools, maps them to runtimes (configured at compute service level) and gives tool specific parameters, if needed.

To get started, go and have a look at the tools directory. After editing a tool script, take changes into use by running chipster/toolbox/reload-tools.sh. Run the job in the client to see the results. Please note that changes to script codes are in use immediately after running the reload-tools.sh script, but changes to tool headers and module configuration files additionally require a client restart.

Writing SADL header

SADL (Simple Analysis Description Language) is a simple notation for describing analysis tools so that they can be used in Chipster environment. SADL describes what input files the tool takes, what output files it produces, and what parameters are needed for running it. For the syntax of SADL please see Describing tools with SADL.

The way how SADL is embedded into script is script type specific. For example, in R scripts you start each line with hash (#), the comment notation of R. The SADL snippet must be the first thing in the script and there must not be any empty lines in it.

Making R scripts Chipster compatible

Chipster uses regular R scripts. The only thing to remember is that interactive functions can not be used.

Before running the script, the system runs the following initialisation snippet:

setwd(".")

The script should output results in table format to a file specified in description header. So, for example like this:

write.table(mytable, file="results.txt", quote=FALSE, col.names=FALSE, row.names=FALSE)

Paths and special variables in scripts

A set of Chipster related variables are made available in R and python scripts.

When referencing tool binaries or other scripts in tool script code, avoid hard coded absolute paths and prefer the following path related variables:

Variable Description
chipster.tools.path path to tools binaries chipster/tools
chipster.common.path path to common module scripts chipster/toolbox/tools/common/R
chipster.module.path path to module in which this script belongs to chipster/toolbox/tools/modulename
chipster.java.libs.path path to jar files chipster/shared/libs

Other variables:

Variable Description
chipster.threads.max Max number of threads a tool should use
chipster.memory.max Max amount of memory a tool should use

Creating manual pages

Manual pages are being delivered from the webstart server:

/opt/chipster/webstart/web-root/manual/

Chipster client maps manual pages to tools by using the ID of the tool. Postfix, if present, is removed and replaced with ".html". So if you have a tool with ID "example_tool.R", you need to create manual page called "example_tool.html" to the manual folder. Pages are shown in user's default browser, so all available web tricks can be used. Supporting material, like images, can be stored in the same directory or a subdirectory can be created.

Describing tools with SADL

SADL (Simple Analysis Description Language) is a simple notation for describing analysis tools so that they can be used in the Chipster framework. SADL describes what input files the tool takes, what output files it produces, and what parameters are needed to run it. In Chipster inputs are selected by user, clicking on datasets at the GUI. Parameters are used to create the parameter panel, and outputs are the datasets produced by the tool.

In SADL, each line describes one thing. General format of a line is: 1) what it is 2) optionality 3) internal name 4) display name 5) type 6) type details 7) description. By default inputs and parameters are required to be set by the user, but they can also be declared optional. All display names are in quotes and descriptions are in parentheses, but they can be omitted when the string does not contain whitespace or operator characters. The required order of the lines is: TOOL, INPUT*, OUTPUT*, PARAMETER*. Example of a SADL description for an simple concatenation tool is given below.

TOOL concat.R: "Concatenate tool" (Concatenates two files.)
INPUT file1.txt: "First input" TYPE GENERIC (First file to concatenate.)
INPUT file2.txt: "Second input" TYPE GENERIC (Second file to concatenate.)
OUTPUT concatenated.txt: "Concatenated file" (The concatenated result file.)

The concatenation tool is very simple. It defines the tool name and description and then the two inputs we are going to concatenate and, finally, the single output. Read further to understand the syntax that is used to define names (first there is the technical name, a colon and then the human readable name).

Names

All names in SADL have same syntax. They can have two parts: ID (technical name) and human readable name (shown in GUI). ID's should not be changed without a very good reason, as they are used to identify tools, parameters etc. in the Chipster framework. Especially for users' workflows to remain valid it is best to not change the ID's. Human readable names can be changed freely.

Example of name without and with human readable part:

p_value
p_value: "The P-value"

The ID part of the name can be followed by colon and human readable name. ID is required, but human readable name is not. Both parts are typically in parentheses, but they can be omitted, if the name is a simple string without spaces or operator characters.

Input and output files

Inputs are the data files that are being processed, and outputs are the results that are returned to user as data files. Input and output definition formats are:

INPUT (META) (OPTIONAL) name TYPE type (description)
OUTPUT (META) (OPTIONAL) name TYPE type (description)

File names follow the normal conventions, with one addition. File name can contain the special string {...}, which makes it an input file set. Chipster binds all matching inputs and gives them names with numbering 1... replacing the special string.

Most tools should simply use type GENERIC for the input type. If you have an input file set, be careful with the input types and the order of the input definitions. You may have to define a stricter input type for the input file set to prevent it from binding all other datasets that should be bound to other inputs. The stricter input types are also used for backwards compatibility in some older tools. An up-to-date list of the available input types can be found from the source code:

Parameters

Parameters allow user to tune behavior of an analysis tool. They are shown in the graphical parameter panel in the Chipster user interface and stored to variables or given as arguments when running the tool.

Parameter definition format is:

PARAMETER (OPTIONAL) name TYPE type FROM min_value TO max_value DEFAULT def_value (description)

FROM, TO and DEFAULT are optional. Description can be left blank.

Valid parameter types are:

  • INTEGER
    • For integer values
    • Represented as a text box in GUI
  • DECIMAL
    • For decimal values
    • Represented as a text box in GUI
  • PERCENT
    • For percentages (integer from between 0 - 100)
    • Might be removed in future, if there is no need for this
    • Represented as a slider in GUI
  • STRING
    • For free string values
    • Represented as a text box in GUI
  • [key1:val1, key1:val2, key1:val3]
    • For enumerated values (selection from a predefined list)
    • Valid values are given in block parenthesis
    • Represented as a drop-down list in GUI
    • First part of the name is the actual technical value of the selection, is second part is given it is used in the GUI
  • COLUMN_SEL
    • For selecting one column from the input dataset
    • Possible values are read from the input dataset
      • In case of multiple inputs, present in all of them
    • Can also be empty
    • Represented as a drop-down list in GUI
  • METACOLUMN_SEL
    • For selecting one column from the phenodata
    • Behaves exactly like COLUMN_SEL, but uses phenodata as input dataset

Numeric parameters allow also minimum and maximum values to be set, by using keywords FROM and TO after the parameter type. For enumeration type, FROM and TO can be used to specify the minimum and maximum number of selections the user can make (by default one selection can be made).

All parameters allow a default value, which is given by using the keyword DEFAULT. The default value must be a valid value for the parameter. User interface implements validity checking in real time, so writing "one" to a INTEGER text box or "10" to a INTEGER text box with maximum of 5 results in immediate error shown in the parameter panel side and run button being blocked.

Create enumerated values automatically from files

It's possible to fill the enumerated values automatically according to files that are available on the server. This allows the server administrator to add or remove e.g. reference genomes on the server and the tool will automatically update the parameter options of the tool using those genomes.

# PARAMETER organism: "Organism" TYPE [other: "Own BED file", "FILES genomes/bed .bed"] DEFAULT other (Choose one of the reference organisms or provide your own BED file.)

Here the second option is FILES genomes/bed .bed. It will list all file names in the directory /opt/chipster/tools/genomes/bed ending with .bed. A parameter option will be added for each matching file, omitting the .bed part. These automatically generated options will be shown directly with their technical name and it's not possible to provide another human friendly name for them. However, it is possible to provide fixed options together with the automatically generated options, like the other option in the above example.

How about having one of the automatically generated options to be the default? Writing one of the file names directly to the script is one way to do it, but often it would be easier to define it on the file system. You can create a symlink pointing to the file that is your default option.

ln -s Homo_sapiens.GRCh38.87.fa default

And reference that symlink in your SADL with SYMLINK_TARGET genomes/indexes/bowtie/default .fa:

# PARAMETER organism: "Genome" TYPE ["FILES genomes/indexes/bowtie .fa"] DEFAULT "SYMLINK_TARGET genomes/indexes/bowtie/default .fa" (Genome or transcriptome that you would like to align your reads against.)

This will locate the symlink genomes/indexes/bowtie/default, read the symlink's target file name (Homo_sapiens.GRCh38.87.fa), remove the .fa from the end of the name and use the remaining string as a default value.

The parameters will updated when you run the toolbox reload:

/opt/chipster/toolbox/reload-tools.sh    

Advanced example

Below is an example of an imaginery tool that highlights all different features that can be described with the language.

TOOL util-test.R: "Test tool" (This tool description is shown to the user in the GUI (note that certain operators must be escaped\).)
INPUT microarray{...}.tsv: "Raw data files" TYPE CDNA (A set of 1 or more raw data files that are given as input.)
INPUT META phenodata.tsv: "Experiment description" TYPE GENERIC (Meta-level description of the input files.)
OUTPUT result.txt: "Result file" (The output file that this tool always produces.)
OUTPUT OPTIONAL warnings.txt: "Warning file" (The tool might produce warnings while running and then they are returned also.)
PARAMETER value1: "The first value" TYPE INTEGER FROM 0 TO 200 DEFAULT 10 (Description of this parameter)
PARAMETER OPTIONAL value2: "The second value" TYPE DECIMAL FROM 0 TO 200 DEFAULT 20.2 (Description of this parameter)
PARAMETER method: "The method" TYPE [linear: "Linear scale", logarithmic: "Logarithmic scale"] DEFAULT logarithmic (Description of this parameter)
PARAMETER genename: "Gene name" TYPE STRING DEFAULT at_1234 (Description of this parameter)
PARAMETER key: "Key column" TYPE COLUMN_SEL (Which column is used as a key)

Format of SADL syntax description

For geek users, a more formal syntax defination is below. It is in the form of rewrite rules. First rule in the list is the initial rule where rewriting is started. Quoted texts are snippets of SADL. For example, TOOL is a term that is rewritten using the given rules, but "TOOL" is a string that should be found in the source code. Operators ?, +, * and | have their common semantics. The canonical syntax definition in maintained in the Javadoc documentation of the class SADLSyntax.

-> TOOL+
TOOL -> "TOOL" NAME DESCRIPTION INPUT* OUTPUT* PARAMETER*
INPUT -> "INPUT" META? OPTIONALITY? NAME "TYPE" TYPE_NAME DESCRIPTION
OUTPUT -> "OUTPUT" META? OPTIONALITY? NAME DESCRIPTION
PARAMETER -> "PARAMETER" OPTIONALITY? NAME "TYPE" PARAMETER_TYPE PARAMETER_FROM? PARAMETER_TO? PARAMETER_DEFAULT? DESCRIPTION
PARAMETER_TYPE -> TOKEN | PARAMETER_TYPE_ENUM
PARAMETER_TYPE_ENUM -> "[" PARAMETER_TYPE_ENUM_ELEMENTS "]"
PARAMETER_TYPE_ENUM_ELEMENTS -> NAME | NAME "," PARAMETER_TYPE_ENUM_ELEMENTS
PARAMETER_FROM -> "FROM" TOKEN
PARAMETER_TO -> "TO" TOKEN
PARAMETER_DEFAULT -> "DEFAULT" PARAMETER_DEFAULT_ELEMENT
PARAMETER_DEFAULT_ELEMENT -> TOKEN | TOKEN "," PARAMETER_DEFAULT_ELEMENT
OPTIONALITY -> "OPTIONAL"
META -> "META"
NAME -> TOKEN | TOKEN ":" TOKEN
DESCRIPTION -> TOKEN
TYPE_NAME -> TOKEN (see SADLSyntax.InputType for declaration, implementations pluggable)
TOKEN -> any single token produced by tokeniser

Output file names

Output file names generated by tools must be named to match the output names defined in the SADL header. It is, however, possible to alter the GUI names (the names shown to user).

When a tool is run the following steps take place:

  • A temporary work directory is created
  • The selected input files are copied to the working directory and named to match definitions in the SADL header as assigned
  • File "chipster-inputs.tsv" is created in the working directory

File "chipster-inputs.tsv" has:

  • Optional header lines starting with #
  • A line for each input file with tab separated entries. First column has the input file name as defined in SADL. The second column the name as shown to user in GUI.

To change the GUI name of outputs you can create a matching file called "chipster-outputs.tsv" in the working directory. You will need to add a line for each output you want to rename. First column should have the output name as in SADL, the second column the desired GUI name. Lines are optional, i.e. you only need to add lines for those outputs you wish to rename. The rest will be shown with their original names. The file itself is also optional, and only needed when you wish to rename outputs. If omitted, all outputs will be shown with their original names.

Tool conventions

The goal in Chipster is to always produce a coherent user experience. Here are some conventions that can be useful when integrating tools into Chipster and should be followed when writing tools that are to be integrated into Chipster main repository.

NGS analysis module

  • Tools should accept and produce read data in FASTQ and BAM format when possible

Microarray analysis module

  • The default data format is TSV (tab separated values), with one row for each gene or probeset
  • The first column should be unnamed or "identifier" and contain the gene/probeset name
  • Tool should not remove any existing columns unless the row structure is changed. In other words, inputs can have annotation etc. data that just passes through analysis steps
  • See AnalysisToolInputsAndOutputs for more information

Troubleshooting

For getting support, please use the chipster-tech mailing list. You don't need to subscribe to send or view messages.

If you have problems with getting the Chipster VM working, the following steps may help in troubleshooting the problem:

  1. Check that VM is running

    • the virtualisation software should say that the VM is running
    • login to VM using username: ubuntu, password: chipster
  2. Check that VM network is working

    • see the networking part of the virtualization software
    • login to vm, the ip address should be visible in the login message
    • run ifconfig to get more information, eth0 should be configured and running
    • check that VM responses to ping: run ping <vm ip> from outside the VM
  3. Check that all the Chipster services are running

    • in the vm, run: sudo systemctl status chipster
    • if one or more of the services is not running, try sudo systemctl restart chipster or sudo systemctl restart chipster-<service>, for example sudo systemctl restart chipster-auth if chipster-auth is not running
    • also see /opt/chipster/<service>/logs/chipster.log and /opt/chipster/<service>/logs/wrapper.log for hints about what could be wrong if a service is not starting
  4. Check that Chipster front page is reachable from within the VM:

    • in the vm, run wget http://localhost:8081
    • this should retrieve the index.html file
    • if it does, the Chipster front page service is running ok (but is maybe not reachable from outside the VM)
    • if it doesn't, the Chipster front page service (chipster-webstart) is not working properly, see /opt/chipster/webstart/logs/chipster.log and /opt/chipster/webstart/logs/wrapper.log for hints about the problem
  5. Reconfigure Chipster (NOTE: this will replace current Chipster configuration)

    • in the vm run cd /opt/chipster/;sudo ./configure.sh auto;systemctl restart chipster
    • this could be needed for example after changing the ip address of the VM
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.