-*- org -*-
Heading levels are indicated by lines =~ m{^\*+}
This project contains an Apache2 configuration suitable for dev (on deskpro or laptop) and live use (on virtual machine), including
- an httpd.conf laced with “Include”s and ${ENVIRONMENT_VARIABLE} substitutions
- supporting shell script
- Perl libraries to bootstrap the provision of APIs needed by the Otter Server
- Perl libraries to support internal development
Other components it does not contain but can bring in are
- the httpd, by default from an Ubuntu package
- larger Perl libraries (webvm-deps.git)
- WTSI Single Sign On
- Ensembl
- BioPerl
Further notes are at http://mediawiki.internal.sanger.ac.uk/wiki/index.php/Anacode:_Web_VMs
The Git history shows the details.
It is based on a build-from-tarball Apache 2.2.17 plus modifications until it looks likely to behave for our purposes.
Assuming you have the latest setup.*.sh scripts handy and wish to install files for your own area:
If you are root (e.g. install to laptop), use
sudo ./setup.root.sh
sudo aptitude install apache2-mpm-prefork # Linux sudo port -v install apache2 +preforkmpm # MacPorts - mca’s guess
Then install the config and (large!) Perl libraries ./setup.user.sh
(Optional) build httpd from tarball cd $WEBDIR rm -rf httpd ln -s ~/my-httpd/ httpd
Note that running on Mac is not (yet) supported and would require chasing out the Ubuntu dependencies. Some of these are marked with [X]XX:UBUNTU.
Run for anyone, independent of location, like this
/usr/sbin/apache2ctl -f /www/$USER/ServerRoot/conf/httpd.conf …
which is cumbersome, so use
/www/$USER/start /www/$USER/stop
Possible actions/options for that script are likely to change.
The easiest way to add new operations (which need to share config) is to put them in tools/ .
Use of /usr/sbin/apache2 is the default, to match the production web servers.
- Builds take time, and we can reuse them portably (per arch)
- Configuration has many options, and the defaults are wrong
- making the LoadModule directives match
- modules come from the build, as defined by ./configure
- LoadModule directives come with webvm.git
- keeping them in sync is messy, but try “./APACHECTL checkmods”
- So Macs & deskpros can have matching httpd
You can run with a locally built and installed Apache, provided it includes the necessary features. Note that the default build does not include mod_rewrite.
You can (p)reset the environment seen by the first call to apache2 by writing to the file APACHECTL.sh . This file is listed in .gitignore, is sourced by APACHECTL and does not need to be executable.
This is useful for running with a locally installed Apache2 binary. See branch mca/deskpro for how I do it.
Configurations used in the past,
- ./configure –prefix=$HOME/_httpd –exec-prefix=$HOME/_httpd/i386 && make && make install
- lacks mod_rewrite
- putting binaries down a level might allow Mac support, but I didn’t use that
- mca first “local Apache”
- ./configure –enable-rewrite –enable-so –with-mpm=prefork –prefix=$HOME/_httpd/httpd –enable-pie –enable-mods-shared=most
- PIE is for security (location in memory is randomised)
- these enable switches may be redundant
- attempt to use for webvm
Apache’s configuration files will interpolate ${ENVIRONMENT_VARIABLES} like that. This avoids having to write them with a template to hardwire paths (as the shipped default seems to be).
The path to files for the LoadModule directive. Default is for Ubuntu.
The path to icons and errors. Default is for Ubuntu.
WEBDIR points to the git working copy. It contains your ServerRoot, htdocs etc..
This variable is built from $0
By the Web Team’s convention, host-specific writable files (including logs and lockfiles) are kept separate. This makes it clear what files should not be copied, when cloning a machine.
WEBTMPDIR is Anacode’s name for that directory. It defaults to something like /www/tmp/$USER , being derived as a niece of WEBDIR. An override value may be passed in or set from APACHECTL.sh .
This directory must exist, be writable and stored on local filesystem.
These variables are used within the APACHECTL script
The path to the apache2 httpd executable, in the context of the prevailing $PATH . Default is for Ubuntu.
This is word expanded before use, so can also do environment setup.
The operation APACHECTL is about to perform.
WEBDEFS takes comma-separated keywords to pass as “apache2 -D” flags.
It currently defaults to “vanilla”, but this may change.
Useful options are
- DEVEL
- enable the server-status & server-info pages. This comes from our config.
- DEBUG
- to run a single Apache thread and stop it going into the background. This is an Apache option.
- Our CGI scripts now take detainted @INC elements from $OTTER_PERL_INC
- http://git.internal.sanger.ac.uk/cgi-bin/gitweb.cgi?p=anacode/ensembl-otter.git;h=be62b9f1
- webvm.git/lib/bootstrap/ contains minimal initialisation code
- They run under /usr/bin/perl directly
- http://git.internal.sanger.ac.uk/cgi-bin/gitweb.cgi?p=anacode/ensembl-otter.git;h=3c761714
- this generally gives us 5.10.1 but could change with OS upgrade
- Configure to override the shebang Perl for development on deskpros
(where /usr/bin/perl has no DBI.pm, or you )
- OTTER_PERL_EXE=/software/bin/perl-5.12.2
- http://git.internal.sanger.ac.uk/cgi-bin/gitweb.cgi?p=anacode/webvm.git;h=82d8a3a0
- First developed near the cgi_wrap code, http://git.internal.sanger.ac.uk/cgi-bin/gitweb.cgi?p=anacode/team_tools.git;a=history;f=otterlace/server/perl/SangerPaths.pm;hb=647b3fcf
- webvm-deps.git contains Ensembl API etc. which were previously
provided by the web team.
- beware, this repository contains large commits which can break gitk’s display of patches
- Otter::Paths expects it to be provided as $WEBDIR/apps/webvm-deps/ but will also accept the webteam supplied copies
- Existing “otterlace in local Apache on a deskpro” with otterlace_cgi_wrap continues to work, but is superseded
The main problem is configuring our @INC while also enabling taint mode.
- we run root’s Perl but we are not root, so we cannot add modules to the existing @INC
- taint mode means we cannot add too @INC from the environment
- a wrapper script between httpd and Perl (to run “perl -T -I… $script”) works
- it adds complexity
- there may be a small performance penalty
- isn’t clean enough to run in production
- we don’t need to keep it in the long term
- self-wrapping Perl scripts can do it
- Perl starts untainted
- use a module to find the libs and “exec $^X -I… -T
{^TAINT} - neat but slightly slower (+0.0043s real time)
- a chance to forget to run in taint mode
- code checkers looking for “#! perl -T” won’t see it
Options for Perl are
- /usr/bin/perl (OS Perl)
- /software/perl-*/bin/perl . Not available on web VMs.
- /usr/local/bin/perl (may point to OS Perl or /software)
- compile or install our own, with attendant libraries.
Until Otter v67 we used /usr/local/bin/perl which points to OS Perl on webservers, /software/perl-5.8.8 on deskpro (local Apache) via /software/bin/perl symlink.
Options for choosing Perl are
- hardwired #! line (one size fits all)
- hardwired #! line (script installer overwrites it)
- #!/usr/bin/env perl (not compatible with “perl -T”)
- scripts can self-wrap when they don’t like their environment
For production servers we want a hardwired #! for simplicity. Development servers can self-wrap when configured to do so.
Current (2012-06) live webservers find Otter Server like this.
- scripts have #!/usr/local/bin/perl -Tw
- calls /usr/bin/perl which is 5.8.8
- @INC contains
- /etc/perl
- /usr/local/lib/perl/5.8.8
- /usr/local/share/perl/5.8.8
- /usr/lib/perl5
- /usr/share/perl5
- /usr/lib/perl/5.8
- /usr/share/perl/5.8
- /usr/local/lib/site_perl
- script uses SangerPaths to add to @INC
- /usr/lib/perl/5.8/SangerPaths.pm takes config from the webteam, of which we use (some of)
- core
- /WWW/SHARED_docs/lib/core /WWW/SANGER_docs/perl /WWW/SANGER_docs/bin-offline (plus /usr/local/oracle/lib)
- bioperl123
- /WWW/SHARED_docs/lib/bioperl-1.2.3
- ensembl65
- /WWW/SHARED_docs/lib/ensembl-branch-65/ensembl-draw/modules /WWW/SHARED_docs/lib/ensembl-branch-65/ensembl-variation/modules /WWW/SHARED_docs/lib/ensembl-branch-65/ensembl-compara/modules /WWW/SHARED_docs/lib/ensembl-branch-65/modules /WWW/SHARED_docs/lib/ensembl-branch-65/ensembl-external/modules /WWW/SHARED_docs/lib/ensembl-branch-65/ensembl/modules /WWW/SHARED_docs/lib/ensembl-branch-65/ensembl-pipeline/modules /WWW/SHARED_docs/lib/ensembl-branch-65/ensembl-webcode/modules /WWW/SHARED_docs/lib/ensembl-branch-65/ensembl-functgenomics/modules
- otter$N
- /WWW/SANGER_docs/lib/otter/$N
- root put that there, we don’t (in general) have that option
- /usr/lib/perl/5.8/SangerPaths.pm takes config from the webteam, of which we use (some of)
In order to run existing code on local Apache, we did (in pseudo-code and omitting error handling etc.) this,
- httpd.conf uses
- SetEnv to supply OTTERLACE_SERVER_ROOT and proxy settings
- ScriptAliasMatch to send requests through team_tools/otterlace/server/cgi-bin/cgi_wrap
- cgi_wrap
- inspects $REQUEST_URI
- locates the real CGI script under configured $OTTERLACE_SERVER_ROOT
- runs team_tools/otterlace/server/bin/otterlace_cgi_wrap
- otterlace_cgi_wrap
- locates team_tools/otterlace/server/perl/
- adds that and /software/anacode/lib{/site_perl} to @INC using “perl -I”
- runs the real CGI script under “perl -T”
- team_tools/otterlace/server/perl/ contains “fake” modules
- SangerPaths.pm
-
- provides requested libraries, from the subset we need
- patch up %ENV to meet expectations of Bio::Otter::ServerScriptSupport
- pass $OTTERLACE_ERROR_WRAPPING_ENABLED to B:O:SSS
- SangerWeb.pm
- provides the minimum we need, with “always enabled” authentication
The environment is more restrictive,
- /software does not exist in production
- /usr/local/bin/perl points to /software/bin/perl (does not exist)
- /usr/bin/perl is Ubuntu OS Perl, 5.10.1
- vanilla @INC is
- /etc/perl
- /usr/local/lib/perl/5.10.1
- /usr/local/share/perl/5.10.1
- /usr/lib/perl5
- /usr/share/perl5
- /usr/lib/perl/5.10
- /usr/share/perl/5.10
- /usr/local/lib/site_perl
- .
- tainting @INC is the same without ./
Unlike the deskpro /usr/bin/perl, this one includes a full set of DBI modules.
This can be done, but is potentially fragile
ln -s $EO/modules $WEBDIR/lib/otter/74 ln -s $EO/scripts/apache $WEBDIR/cgi-bin/otter/74
The differences between this and an install with `otterlace_build –server-only` are
- includes GUI-only modules not in the Bio:: namespace
- doesn’t contain Bio::Otter::Git::Cache
- this absence can cause taint failures
- since 3020f3f1 it shouldn’t break at compile time
It works, but what happens when the checkout branch jumps to a new version and the symlink becomes stale?
- We do not expect to be able to use virtual hosts in this setting.
- We do not want per-application edits to the main httpd.conf file.
Lacking (knowledge of) a standard scheme for putting multiple applications on a server in a self-contained way, we push the responsibility for configuring url:file mappings onto the webapp.
Apps are expected to
- be a directory ${WEBDIR}/apps/$APPNAME/ (most likely a git working copy)
- supply Apache config in apps/$APPNAME.conf
- this should be derived from the template config “apps/$APPNAME/app.conf”, process to be defined
- accept the namespace given to them as $APPNAME, possibly with some interpretation
- apps whose files all reside externally can be added by dropping one file, not derived from a template. The index.conf does this.
Note that the Otter Client writes the WTSISignOn cookie from enigma with a .sanger.ac.uk domain, so it will not be used if you are connecting to localhost.
(If you need to do that, making a duplicate cookie to offer to localhost should work.)
Symptoms are repeatedly asking for the password, getting 200 OK each time.
As in “source APACHECTL.$(hostname -s).sh” or something more generalised for a class of machines, possibly also providing the notion of dev/live
See commits eadc4d27 (“wrapper script config for webteam standard VMs”, adding www-dev.sh) and 13d197ce (rename to setup/www-dev.sh)
The recipe matches that used by the webteam, e.g. for detecting run in /www/www-live
This is done by supplying WEBDEFS=DEVEL either from the calling environment directly or via a shell fragment at setup/APACHECTL.sh - which should exist only on developers’ branches, e.g. mca/deskpro .
It is passed down from the ZXTM front end proxy, as the Clientrealm: HTTP header.
SangerWeb.pm (& related) expect it, and it is used (only by testing for qr/sanger/) in parts of ensembl-otter.
On developers’ apache configs, for now, HTTP_CLIENTREALM can be set by including “lib/devstubs” on OTTER_PERL_INC. See commit 312b916f .
The Apache build process hardwires various filenames into the config it generates.
I replaced (15474d66639df5ec8f558b30e930e6102c765156) these paths with something apparently valid but not existing on the filesystem.
This prevents accidental dependency on something we should not be using and makes for an easy grep token.
Then I started replacing them with things that work.
git grep -hE ‘[X]XX:|[B]0RK/|[T]ODO’ | sed -e ‘s/^[ #]*//; s/XXX/xXX/; s/[T]ODO/tODO/; s/^/ /’ | LC_ALL=C sort -uf
*** tODO avoid tripping Bio::Otter::Git up, by deploying via a git repo *** tODO rescue content from intweb *** tODO Set MaxClients in conf/extra/httpd-mpm.conf ServerRoot_B0RK/error/include files and copying them to your/include/path, <Directory “/ServerRoot_B0RK/cgi-bin”> <Directory “/ServerRoot_B0RK/error”> <Directory “/ServerRoot_B0RK/manual”> <Directory “/ServerRoot_B0RK/uploads”> Alias uploads “/ServerRoot_B0RK/uploads” AliasMatch ^/manual(?:/(?:de|en|es|fr|ja|ko|pt-br|ru|tr))?(.*)?$ “/ServerRoot_B0RK/manual$1” AuthUserFile “/ServerRoot_B0RK/user.passwd” CustomLog “/ServerRoot_B0RK/logs/ssl_request_log” \ DavLockDB “/ServerRoot_B0RK/var/DavLock” DocumentRoot “/ServerRoot_B0RK/docs/dummy-host.example.com” DocumentRoot “/ServerRoot_B0RK/docs/dummy-host2.example.com” DocumentRoot “/ServerRoot_B0RK/htdocs” ErrorLog “/ServerRoot_B0RK/logs/error_log” htdigest -c “/ServerRoot_B0RK/user.passwd” DAV-upload admin SSLCACertificateFile “/ServerRoot_B0RK/conf/ssl.crt/ca-bundle.crt” SSLCACertificatePath “/ServerRoot_B0RK/conf/ssl.crt” SSLCARevocationFile “/ServerRoot_B0RK/conf/ssl.crl/ca-bundle.crl” SSLCARevocationPath “/ServerRoot_B0RK/conf/ssl.crl” SSLCertificateChainFile “/ServerRoot_B0RK/conf/server-ca.crt” SSLCertificateFile “/ServerRoot_B0RK/conf/server-dsa.crt” SSLCertificateFile “/ServerRoot_B0RK/conf/server.crt” SSLCertificateKeyFile “/ServerRoot_B0RK/conf/server-dsa.key” SSLCertificateKeyFile “/ServerRoot_B0RK/conf/server.key” SSLMutex “file:/ServerRoot_B0RK/logs/ssl_mutex” SSLSessionCache “dbm:/ServerRoot_B0RK/logs/ssl_scache” SSLSessionCache “shmcb:/ServerRoot_B0RK/logs/ssl_scache(512000)” sub code_root { # xXX:DUP Bio::Otter::Server::Config->data_dir tODO: Push environment cleaning up to APACHECTL, so it happens in production also TransferLog “/ServerRoot_B0RK/logs/access_log” xXX: something smarter, based on designations.txt ? xXX:DUP from APACHECTL xXX:UBUNTU GNU-ism.
jh13 comments “MaxClients 150” to avoid bringing the machine down
- [ ] /nfs/WWWdev/INTWEB_docs/htdocs/Teams/Team71/vertann
- [ ] /nfs/WWWdev/INTWEB_docs/cgi-bin/users
- [ ] jh13
- [ ] mca
- [ ] jgrg
- [ ] ml6
- [ ] ck1
- jh13 would be happy to have B:O:G (caching mechanism?) replaced with something better
Initial plan…
master Minimal Apache config and consensus set of tools. This branch should be useful for merge into any other.
$USER or $USER/* your stuff e.g. mca/sandbox mca/deskpro
www-anacode One branch for both dev & live, because we have internal structure to separate them
$USER branches which collect a large number of commits for dev Otterlace builds could be reset by
a) cherry pick or otherwise copy useful stuff onto master or $USER/master
b) rewind the branch on intcvs1 to master or $USER/master
c) have the deployment scripts accept forced updates
Each developer has a “sandbox” Apache server, in which Otter Server versions can be installed as necessary.
Note that this setup should also run on a deskpro or any laptop. See “git diff master mca/deskpro” on webvm.git for clues.
ssh web-ottersand-01 cd /www/$USER
ls -lart www-dev # should be empty git clone intcvs1:/repos/git/anacode/webvm.git www-dev
cd www-dev utilities/start
The webteam have various tools in utilities, of which we need a few.
tail -F /www/tmp/$USER/www-dev/logs/*log
MY_PORT=$( grep Listen ServerRoot/conf/user/$USER.conf | cut -f2 -d’ ’ ) http://web-ottersand-01.internal.sanger.ac.uk:$MY_PORT/ which is visible at http://$USER-otter.sandbox.sanger.ac.uk/ via the front-end proxy (ZXTM)
The two are equivalent, except that the latter goes through the ZXTM. This means
- the client IP is in a different variable
- the URLs /server-status and /server-info will give zero-sized reply
- HTTP_CLIENTREALM will be set, which is needed for authentication
http://$USER-otter.sandbox.sanger.ac.uk/cgi-bin/crontab should now show links to three operations
wget -q -O- –no-proxy http://localhost:$MY_PORT/cgi-bin/crontab/want | crontab
Now you have logrotation and automatic restarts.
git clone intcvs1:/repos/git/anacode/server-config.git data/otter (cd apps; git clone intcvs1:/repos/git/anacode/webvm-deps.git)
The toplevel has an idea of what commitid should be HEAD in apps/webvm-deps but it regards data/otter as cruft. This arose because one should be fairly stable while the other is likely to update frequently.
XXX: how to address this inconsistency?
This is documented in more detail at http://mediawiki.internal.sanger.ac.uk/index.php/Otter_Server_configuration#Testing_your_work_before_pushing
The “front door” to testing incomplete server-config files is with client url=http://mca-otter.sandbox.sanger.ac.uk/cgi-bin/otter~mca or similar.
mca-otter defines the Otter Server while ~mca defines the server-config, or that part of it which is checked out while taking the remainder from data/otter.
See cgi-bin/otter~mca/$NN/test near “Server::Config” for an explanation of what config is being used.
data/otter may instead be a symlink to something else. Note that the Otter Server requires that HEAD be the dev or live branch, or an equivalent.
Copy in cgi-bin/otter/$NN/ and lib/otter/$NN/ as usual.
The script otter_server_build.sh at the top of webvm.git may help, but this is a work-in-progress and should probably move to a better place.
mca also has an unpushed component in ensembl-otter called “shove” which will send the development clone to otter_server_build.sh
The details are likely to change rapidly 2013-09 .. 2013-10.
There is one dev server URL, so it is shared across the team.
Currently, there is only one dev server (web-otterdev-01) and no staging or live servers.
Operations on dev, staging and live servers are mediated with ssh key pairs. See team_tools.git bin/smithssh for details and useful calls.
There is a wrapper on top of that, to copy the legacy Otter Servers into web-otterdev-01. See team_tools.git bin/pubweblish for details.
The logfiles are in /www/tmp/www-dev/logs/*log which may be reached by ssh to the box.
More convenient is
smithssh web-otterdev-01 utilities/taillog -h
Option -b is broken, pending RT#353732.
Service | Host | Cores | CPU type | Bogomips/CPU | RAM/GiB |
---|---|---|---|---|---|
intwebdev | webdev2 | 4 | Opteron 2218 | 5200 | 14 |
webdev3 | 4 | Opteron 2218 | 5200 | 14 | |
intweb | web-wwwold-01 | 1 | Xeon X5670 | 5900 | 8 |
web-wwwold-02 | 1 | Xeon E5504 | 4000 | 8 | |
dev | webdev2 | ||||
webdev3 | |||||
www | web-wwwold-01 | ||||
web-wwwold-02 |
Service | Host | Cores | CPU type | Bogomips/CPU | RAM/GiB |
---|---|---|---|---|---|
web-ottersand-01 | 2 | Xeon X5670 | 5900 | 1 | |
web-otterdev-01 | 2 | Xeon X5670 | 5900 | 1 |