Skip to content

Ensembl/webvm

-*- org -*-

Heading levels are indicated by lines =~ m{^\*+}

Documentation

What is this config?

This project contains an Apache2 configuration suitable for dev (on deskpro or laptop) and live use (on virtual machine), including

  • an httpd.conf laced with “Include”s and ${ENVIRONMENT_VARIABLE} substitutions
  • supporting shell script
  • Perl libraries to bootstrap the provision of APIs needed by the Otter Server
  • Perl libraries to support internal development

Other components it does not contain but can bring in are

  • the httpd, by default from an Ubuntu package
  • larger Perl libraries (webvm-deps.git)
    • WTSI Single Sign On
    • Ensembl
    • BioPerl

Further notes are at http://mediawiki.internal.sanger.ac.uk/wiki/index.php/Anacode:_Web_VMs

Where did it come from?

The Git history shows the details.

It is based on a build-from-tarball Apache 2.2.17 plus modifications until it looks likely to behave for our purposes.

How do I install it?

Assuming you have the latest setup.*.sh scripts handy and wish to install files for your own area:

If you are root (e.g. install to laptop), use

sudo ./setup.root.sh

sudo aptitude install apache2-mpm-prefork # Linux sudo port -v install apache2 +preforkmpm # MacPorts - mca’s guess

Then install the config and (large!) Perl libraries ./setup.user.sh

(Optional) build httpd from tarball cd $WEBDIR rm -rf httpd ln -s ~/my-httpd/ httpd

Running under MacOS X

Note that running on Mac is not (yet) supported and would require chasing out the Ubuntu dependencies. Some of these are marked with [X]XX:UBUNTU.

What should it do?

Run for anyone, independent of location, like this

/usr/sbin/apache2ctl -f /www/$USER/ServerRoot/conf/httpd.conf …

which is cumbersome, so use

/www/$USER/start /www/$USER/stop

Possible actions/options for that script are likely to change.

The easiest way to add new operations (which need to share config) is to put them in tools/ .

Which httpd?

Ubuntu Lucid

Use of /usr/sbin/apache2 is the default, to match the production web servers.

Reasons to share httpd builds

  • Builds take time, and we can reuse them portably (per arch)
  • Configuration has many options, and the defaults are wrong
  • making the LoadModule directives match
    • modules come from the build, as defined by ./configure
    • LoadModule directives come with webvm.git
    • keeping them in sync is messy, but try “./APACHECTL checkmods”
  • So Macs & deskpros can have matching httpd

Run with home-installed / local Apache

You can run with a locally built and installed Apache, provided it includes the necessary features. Note that the default build does not include mod_rewrite.

You can (p)reset the environment seen by the first call to apache2 by writing to the file APACHECTL.sh . This file is listed in .gitignore, is sourced by APACHECTL and does not need to be executable.

This is useful for running with a locally installed Apache2 binary. See branch mca/deskpro for how I do it.

Accumulated wisdom for ./configure of 2.2.x

Configurations used in the past,

  • ./configure –prefix=$HOME/_httpd –exec-prefix=$HOME/_httpd/i386 && make && make install
    • lacks mod_rewrite
    • putting binaries down a level might allow Mac support, but I didn’t use that
    • mca first “local Apache”
  • ./configure –enable-rewrite –enable-so –with-mpm=prefork –prefix=$HOME/_httpd/httpd –enable-pie –enable-mods-shared=most
    • PIE is for security (location in memory is randomised)
    • these enable switches may be redundant
    • attempt to use for webvm

Environment variables for web server

Apache’s configuration files will interpolate ${ENVIRONMENT_VARIABLES} like that. This avoids having to write them with a template to hardwire paths (as the shipped default seems to be).

APACHE2_MODS

The path to files for the LoadModule directive. Default is for Ubuntu.

APACHE2_SHARE

The path to icons and errors. Default is for Ubuntu.

WEBDIR

WEBDIR points to the git working copy. It contains your ServerRoot, htdocs etc..

This variable is built from $0

WEBTMPDIR

By the Web Team’s convention, host-specific writable files (including logs and lockfiles) are kept separate. This makes it clear what files should not be copied, when cloning a machine.

WEBTMPDIR is Anacode’s name for that directory. It defaults to something like /www/tmp/$USER , being derived as a niece of WEBDIR. An override value may be passed in or set from APACHECTL.sh .

This directory must exist, be writable and stored on local filesystem.

Not exported to httpd

These variables are used within the APACHECTL script

APACHE2

The path to the apache2 httpd executable, in the context of the prevailing $PATH . Default is for Ubuntu.

This is word expanded before use, so can also do environment setup.

op

The operation APACHECTL is about to perform.

WEBDEFS

WEBDEFS takes comma-separated keywords to pass as “apache2 -D” flags.

It currently defaults to “vanilla”, but this may change.

Useful options are

DEVEL
enable the server-status & server-info pages. This comes from our config.
DEBUG
to run a single Apache thread and stop it going into the background. This is an Apache option.

Perl environment for Otter Server

TL;DR

@INC and taint mode

The main problem is configuring our @INC while also enabling taint mode.

  • we run root’s Perl but we are not root, so we cannot add modules to the existing @INC
  • taint mode means we cannot add too @INC from the environment
  • a wrapper script between httpd and Perl (to run “perl -T -I… $script”) works
    • it adds complexity
    • there may be a small performance penalty
    • isn’t clean enough to run in production
    • we don’t need to keep it in the long term
  • self-wrapping Perl scripts can do it
    • Perl starts untainted
    • use a module to find the libs and “exec $^X -I… -T $0” unless ${^TAINT}
    • neat but slightly slower (+0.0043s real time)
    • a chance to forget to run in taint mode
    • code checkers looking for “#! perl -T” won’t see it

Which Perl to run?

Options for Perl are

  1. /usr/bin/perl (OS Perl)
  2. /software/perl-*/bin/perl . Not available on web VMs.
  3. /usr/local/bin/perl (may point to OS Perl or /software)
  4. compile or install our own, with attendant libraries.

Until Otter v67 we used /usr/local/bin/perl which points to OS Perl on webservers, /software/perl-5.8.8 on deskpro (local Apache) via /software/bin/perl symlink.

Options for choosing Perl are

  1. hardwired #! line (one size fits all)
  2. hardwired #! line (script installer overwrites it)
  3. #!/usr/bin/env perl (not compatible with “perl -T”)
  4. scripts can self-wrap when they don’t like their environment

For production servers we want a hardwired #! for simplicity. Development servers can self-wrap when configured to do so.

Environments to support - details

perl on intwebdev & others

Current (2012-06) live webservers find Otter Server like this.

  • scripts have #!/usr/local/bin/perl -Tw
    • calls /usr/bin/perl which is 5.8.8
  • @INC contains
    • /etc/perl
    • /usr/local/lib/perl/5.8.8
    • /usr/local/share/perl/5.8.8
    • /usr/lib/perl5
    • /usr/share/perl5
    • /usr/lib/perl/5.8
    • /usr/share/perl/5.8
    • /usr/local/lib/site_perl
  • script uses SangerPaths to add to @INC
    • /usr/lib/perl/5.8/SangerPaths.pm takes config from the webteam, of which we use (some of)
      core
      /WWW/SHARED_docs/lib/core /WWW/SANGER_docs/perl /WWW/SANGER_docs/bin-offline (plus /usr/local/oracle/lib)
      bioperl123
      /WWW/SHARED_docs/lib/bioperl-1.2.3
      ensembl65
      /WWW/SHARED_docs/lib/ensembl-branch-65/ensembl-draw/modules /WWW/SHARED_docs/lib/ensembl-branch-65/ensembl-variation/modules /WWW/SHARED_docs/lib/ensembl-branch-65/ensembl-compara/modules /WWW/SHARED_docs/lib/ensembl-branch-65/modules /WWW/SHARED_docs/lib/ensembl-branch-65/ensembl-external/modules /WWW/SHARED_docs/lib/ensembl-branch-65/ensembl/modules /WWW/SHARED_docs/lib/ensembl-branch-65/ensembl-pipeline/modules /WWW/SHARED_docs/lib/ensembl-branch-65/ensembl-webcode/modules /WWW/SHARED_docs/lib/ensembl-branch-65/ensembl-functgenomics/modules
      otter$N
      /WWW/SANGER_docs/lib/otter/$N
    • root put that there, we don’t (in general) have that option

Otter Server on local Apache (2011 vintage)

In order to run existing code on local Apache, we did (in pseudo-code and omitting error handling etc.) this,

  • httpd.conf uses
    • SetEnv to supply OTTERLACE_SERVER_ROOT and proxy settings
    • ScriptAliasMatch to send requests through team_tools/otterlace/server/cgi-bin/cgi_wrap
  • cgi_wrap
    • inspects $REQUEST_URI
    • locates the real CGI script under configured $OTTERLACE_SERVER_ROOT
    • runs team_tools/otterlace/server/bin/otterlace_cgi_wrap
  • otterlace_cgi_wrap
    • locates team_tools/otterlace/server/perl/
    • adds that and /software/anacode/lib{/site_perl} to @INC using “perl -I”
    • runs the real CGI script under “perl -T”
  • team_tools/otterlace/server/perl/ contains “fake” modules
    SangerPaths.pm
    • provides requested libraries, from the subset we need
    • patch up %ENV to meet expectations of Bio::Otter::ServerScriptSupport
    • pass $OTTERLACE_ERROR_WRAPPING_ENABLED to B:O:SSS
    SangerWeb.pm
    provides the minimum we need, with “always enabled” authentication

web-ottersand01

The environment is more restrictive,

  • /software does not exist in production
  • /usr/local/bin/perl points to /software/bin/perl (does not exist)
  • /usr/bin/perl is Ubuntu OS Perl, 5.10.1
  • vanilla @INC is
    • /etc/perl
    • /usr/local/lib/perl/5.10.1
    • /usr/local/share/perl/5.10.1
    • /usr/lib/perl5
    • /usr/share/perl5
    • /usr/lib/perl/5.10
    • /usr/share/perl/5.10
    • /usr/local/lib/site_perl
    • .
  • tainting @INC is the same without ./

Unlike the deskpro /usr/bin/perl, this one includes a full set of DBI modules.

Running Otter Server directly from a Git clone

This can be done, but is potentially fragile

ln -s $EO/modules $WEBDIR/lib/otter/74 ln -s $EO/scripts/apache $WEBDIR/cgi-bin/otter/74

The differences between this and an install with `otterlace_build –server-only` are

  • includes GUI-only modules not in the Bio:: namespace
  • doesn’t contain Bio::Otter::Git::Cache
    • this absence can cause taint failures
    • since 3020f3f1 it shouldn’t break at compile time

It works, but what happens when the checkout branch jumps to a new version and the symlink becomes stale?

Containerised web apps

  • We do not expect to be able to use virtual hosts in this setting.
  • We do not want per-application edits to the main httpd.conf file.

Lacking (knowledge of) a standard scheme for putting multiple applications on a server in a self-contained way, we push the responsibility for configuring url:file mappings onto the webapp.

Apps are expected to

  • be a directory ${WEBDIR}/apps/$APPNAME/ (most likely a git working copy)
  • supply Apache config in apps/$APPNAME.conf
    • this should be derived from the template config “apps/$APPNAME/app.conf”, process to be defined
  • accept the namespace given to them as $APPNAME, possibly with some interpretation
  • apps whose files all reside externally can be added by dropping one file, not derived from a template. The index.conf does this.

Caveats

Use of non-Sanger domains, including localhost

Note that the Otter Client writes the WTSISignOn cookie from enigma with a .sanger.ac.uk domain, so it will not be used if you are connecting to localhost.

(If you need to do that, making a duplicate cookie to offer to localhost should work.)

Symptoms are repeatedly asking for the password, getting 200 OK each time.

Open questions

Do we want host-dependent environment setup?

As in “source APACHECTL.$(hostname -s).sh” or something more generalised for a class of machines, possibly also providing the notion of dev/live

See commits eadc4d27 (“wrapper script config for webteam standard VMs”, adding www-dev.sh) and 13d197ce (rename to setup/www-dev.sh)

The recipe matches that used by the webteam, e.g. for detecting run in /www/www-live

trigger DEVEL mode

This is done by supplying WEBDEFS=DEVEL either from the calling environment directly or via a shell fragment at setup/APACHECTL.sh - which should exist only on developers’ branches, e.g. mca/deskpro .

How do we get HTTP_CLIENTREALM set?

It is passed down from the ZXTM front end proxy, as the Clientrealm: HTTP header.

SangerWeb.pm (& related) expect it, and it is used (only by testing for qr/sanger/) in parts of ensembl-otter.

On developers’ apache configs, for now, HTTP_CLIENTREALM can be set by including “lib/devstubs” on OTTER_PERL_INC. See commit 312b916f .

ServerConf/conf/ contents

What is /ServerRoot_B0RK ?

The Apache build process hardwires various filenames into the config it generates.

I replaced (15474d66639df5ec8f558b30e930e6102c765156) these paths with something apparently valid but not existing on the filesystem.

This prevents accidental dependency on something we should not be using and makes for an easy grep token.

Then I started replacing them with things that work.

Fixmes and grubbiness that may be worth improving later

git grep -hE ‘[X]XX:|[B]0RK/|[T]ODO’ | sed -e ‘s/^[ #]*//; s/XXX/xXX/; s/[T]ODO/tODO/; s/^/ /’ | LC_ALL=C sort -uf

*** tODO avoid tripping Bio::Otter::Git up, by deploying via a git repo *** tODO rescue content from intweb *** tODO Set MaxClients in conf/extra/httpd-mpm.conf ServerRoot_B0RK/error/include files and copying them to your/include/path, <Directory “/ServerRoot_B0RK/cgi-bin”> <Directory “/ServerRoot_B0RK/error”> <Directory “/ServerRoot_B0RK/manual”> <Directory “/ServerRoot_B0RK/uploads”> Alias uploads “/ServerRoot_B0RK/uploads” AliasMatch ^/manual(?:/(?:de|en|es|fr|ja|ko|pt-br|ru|tr))?(.*)?$ “/ServerRoot_B0RK/manual$1” AuthUserFile “/ServerRoot_B0RK/user.passwd” CustomLog “/ServerRoot_B0RK/logs/ssl_request_log” \ DavLockDB “/ServerRoot_B0RK/var/DavLock” DocumentRoot “/ServerRoot_B0RK/docs/dummy-host.example.com” DocumentRoot “/ServerRoot_B0RK/docs/dummy-host2.example.com” DocumentRoot “/ServerRoot_B0RK/htdocs” ErrorLog “/ServerRoot_B0RK/logs/error_log” htdigest -c “/ServerRoot_B0RK/user.passwd” DAV-upload admin SSLCACertificateFile “/ServerRoot_B0RK/conf/ssl.crt/ca-bundle.crt” SSLCACertificatePath “/ServerRoot_B0RK/conf/ssl.crt” SSLCARevocationFile “/ServerRoot_B0RK/conf/ssl.crl/ca-bundle.crl” SSLCARevocationPath “/ServerRoot_B0RK/conf/ssl.crl” SSLCertificateChainFile “/ServerRoot_B0RK/conf/server-ca.crt” SSLCertificateFile “/ServerRoot_B0RK/conf/server-dsa.crt” SSLCertificateFile “/ServerRoot_B0RK/conf/server.crt” SSLCertificateKeyFile “/ServerRoot_B0RK/conf/server-dsa.key” SSLCertificateKeyFile “/ServerRoot_B0RK/conf/server.key” SSLMutex “file:/ServerRoot_B0RK/logs/ssl_mutex” SSLSessionCache “dbm:/ServerRoot_B0RK/logs/ssl_scache” SSLSessionCache “shmcb:/ServerRoot_B0RK/logs/ssl_scache(512000)” sub code_root { # xXX:DUP Bio::Otter::Server::Config->data_dir tODO: Push environment cleaning up to APACHECTL, so it happens in production also TransferLog “/ServerRoot_B0RK/logs/access_log” xXX: something smarter, based on designations.txt ? xXX:DUP from APACHECTL xXX:UBUNTU GNU-ism.

Set MaxClients in conf/extra/httpd-mpm.conf

jh13 comments “MaxClients 150” to avoid bringing the machine down

rescue content from intweb

  • [ ] /nfs/WWWdev/INTWEB_docs/htdocs/Teams/Team71/vertann
  • [ ] /nfs/WWWdev/INTWEB_docs/cgi-bin/users
    • [ ] jh13
    • [ ] mca
    • [ ] jgrg
    • [ ] ml6
    • [ ] ck1

avoid tripping Bio::Otter::Git up, by deploying via a git repo

  • jh13 would be happy to have B:O:G (caching mechanism?) replaced with something better

Git branch structure

Initial plan…

master Minimal Apache config and consensus set of tools. This branch should be useful for merge into any other.

$USER or $USER/* your stuff e.g. mca/sandbox mca/deskpro

www-anacode One branch for both dev & live, because we have internal structure to separate them

$USER branches which collect a large number of commits for dev Otterlace builds could be reset by

a) cherry pick or otherwise copy useful stuff onto master or $USER/master

b) rewind the branch on intcvs1 to master or $USER/master

c) have the deployment scripts accept forced updates

Installation to sandbox

Each developer has a “sandbox” Apache server, in which Otter Server versions can be installed as necessary.

Note that this setup should also run on a deskpro or any laptop. See “git diff master mca/deskpro” on webvm.git for clues.

Install Apache config and tools

ssh web-ottersand-01 cd /www/$USER

ls -lart www-dev # should be empty git clone intcvs1:/repos/git/anacode/webvm.git www-dev

cd www-dev utilities/start

The webteam have various tools in utilities, of which we need a few.

Tail the logfiles

tail -F /www/tmp/$USER/www-dev/logs/*log

Find it

MY_PORT=$( grep Listen ServerRoot/conf/user/$USER.conf | cut -f2 -d’ ’ ) http://web-ottersand-01.internal.sanger.ac.uk:$MY_PORT/ which is visible at http://$USER-otter.sandbox.sanger.ac.uk/ via the front-end proxy (ZXTM)

The two are equivalent, except that the latter goes through the ZXTM. This means

  • the client IP is in a different variable
  • the URLs /server-status and /server-info will give zero-sized reply
  • HTTP_CLIENTREALM will be set, which is needed for authentication

Install crontab

http://$USER-otter.sandbox.sanger.ac.uk/cgi-bin/crontab should now show links to three operations

wget -q -O- –no-proxy http://localhost:$MY_PORT/cgi-bin/crontab/want | crontab

Now you have logrotation and automatic restarts.

Supply other repositories

git clone intcvs1:/repos/git/anacode/server-config.git data/otter (cd apps; git clone intcvs1:/repos/git/anacode/webvm-deps.git)

The toplevel has an idea of what commitid should be HEAD in apps/webvm-deps but it regards data/otter as cruft. This arose because one should be fairly stable while the other is likely to update frequently.

XXX: how to address this inconsistency?

Testing with server-config.git

This is documented in more detail at http://mediawiki.internal.sanger.ac.uk/index.php/Otter_Server_configuration#Testing_your_work_before_pushing

The “front door” to testing incomplete server-config files is with client url=http://mca-otter.sandbox.sanger.ac.uk/cgi-bin/otter~mca or similar.

mca-otter defines the Otter Server while ~mca defines the server-config, or that part of it which is checked out while taking the remainder from data/otter.

See cgi-bin/otter~mca/$NN/test near “Server::Config” for an explanation of what config is being used.

data/otter may instead be a symlink to something else. Note that the Otter Server requires that HEAD be the dev or live branch, or an equivalent.

Supply Otter Server

Copy in cgi-bin/otter/$NN/ and lib/otter/$NN/ as usual.

The script otter_server_build.sh at the top of webvm.git may help, but this is a work-in-progress and should probably move to a better place.

mca also has an unpushed component in ensembl-otter called “shove” which will send the development clone to otter_server_build.sh

Installation to dev

The details are likely to change rapidly 2013-09 .. 2013-10.

There is one dev server URL, so it is shared across the team.

Which dev server

Currently, there is only one dev server (web-otterdev-01) and no staging or live servers.

Operations on dev, staging and live servers are mediated with ssh key pairs. See team_tools.git bin/smithssh for details and useful calls.

There is a wrapper on top of that, to copy the legacy Otter Servers into web-otterdev-01. See team_tools.git bin/pubweblish for details.

Tail the logfiles

The logfiles are in /www/tmp/www-dev/logs/*log which may be reached by ssh to the box.

More convenient is

smithssh web-otterdev-01 utilities/taillog -h

Option -b is broken, pending RT#353732.

Sizing of machines

Legacy service, 2013-09-25

ServiceHostCoresCPU typeBogomips/CPURAM/GiB
intwebdevwebdev24Opteron 2218520014
webdev34Opteron 2218520014
intwebweb-wwwold-011Xeon X567059008
web-wwwold-021Xeon E550440008
devwebdev2
webdev3
wwwweb-wwwold-01
web-wwwold-02

Virtual machines, 2013-09-25

ServiceHostCoresCPU typeBogomips/CPURAM/GiB
web-ottersand-012Xeon X567059001
web-otterdev-012Xeon X567059001

About

No description or website provided.

Topics

Resources

License

Apache-2.0, Unknown licenses found

Licenses found

Apache-2.0
LICENCE
Unknown
COPYING

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published