Skip to content
This repository has been archived by the owner on May 24, 2022. It is now read-only.

Server Install

Bo Ferri edited this page Nov 18, 2016 · 66 revisions

These are the installation instructions for a server and will add some steps that aim at a production environment, which are not described in the [Developer-Install](developer install).

Initial Installation

premise:

  • currently, we've deployed the application only on Ubuntu Linux distributions
  • empty trusty (14.04) or later
  • three additional partitions (for easier increasing the sizes of the partitions separately):
    • /data/log, 1-5GB depending on usage
    • /data/mysql, up to 5GB
    • /data/neo4j, up to 20GB
  • let the $HOME of the less privileged user be '/home/user'
  • all commands boxes start as the less privileged user and in their $HOME.
  • Requiring root is explicitly marked (with su rather than sudo ... ; note that using sudo before each line is not sufficient to yield the same results as with su. Using su may require you to set a password for root if not already done!)

Note: some commands require user input, this is no unattended installation


This step requires root level access

1. install system packages required for building the software

su
apt-get install --no-install-recommends --yes git-core maven nodejs npm build-essential

These steps require less privileged access

2. clone repositories (not as root!)

lookout for the correct path (/home/user)

cd /home/user

git clone --depth 1 --branch builds/unstable https://github.com/dswarm/dswarm.git
git clone --depth 1 --branch master https://github.com/dswarm/dswarm-graph-neo4j.git
git clone --depth 1 --branch builds/unstable https://github.com/dswarm/dswarm-backoffice-web.git

These steps require root level access

3. install Java and Tomcat for the backend

  • D:SWARM requires Java 8, which is no longer available in the default package sources. Follow these steps
su
add-apt-repository ppa:webupd8team/java
apt-get update
apt-get install oracle-java8-installer oracle-java8-set-default

You can verify your java version with

java -version 2>&1 | grep -q "1.8" && echo "OK, Java 8 is available" || echo "Uh oh, Java 8 is not available"

Add $JAVA_HOME in /etc/environment:

JAVA_HOME="/usr/lib/jvm/java-8-oracle"

Earlier versions of Tomcat (< 7.0.30) do not run with Java 8 albeit being advertised to do so (related to this bug). Ubuntu 12.04 Precise includes Tomcat in version 7.0.26, which therefore must be updated. If you run precise, execute these steps:

wget https://launchpad.net/ubuntu/+archive/primary/+files/libservlet3.0-java_7.0.52-1ubuntu0.1_all.deb
wget https://launchpad.net/ubuntu/+archive/primary/+files/libtomcat7-java_7.0.52-1ubuntu0.1_all.deb
wget https://launchpad.net/ubuntu/+archive/primary/+files/tomcat7-admin_7.0.52-1ubuntu0.1_all.deb
wget https://launchpad.net/ubuntu/+archive/primary/+files/tomcat7-common_7.0.52-1ubuntu0.1_all.deb
wget https://launchpad.net/ubuntu/+archive/primary/+files/tomcat7_7.0.52-1ubuntu0.1_all.deb
su
dpkg -i libservlet3.0-java_7.0.52-1ubuntu0.1_all.deb
dpkg -i libtomcat7-java_7.0.52-1ubuntu0.1_all.deb
dpkg -i tomcat7-common_7.0.52-1ubuntu0.1_all.deb
dpkg -i tomcat7-admin_7.0.52-1ubuntu0.1_all.deb
dpkg -i tomcat7_7.0.52-1ubuntu0.1_all.deb

If you run a more recent version, just install tomcat from the official sources:

su
apt-get install tomcat7

If you encounter an install failure, stating $JAVA_HOME not found by tomcat7, than change $JAVA_HOME in /etc/default/tomcat7 and try the installation again:

JAVA_HOME=/usr/lib/jvm/java-8-oracle

4. install system packages required for running the software

su
apt-get install --no-install-recommends --yes mysql-server nginx curl

5. install Data Hub (Neo4j)

currently, we rely on Neo4j version 2.3.2

su
wget -O - http://debian.neo4j.org/neotechnology.gpg.key | apt-key add -
echo 'deb http://debian.neo4j.org/repo stable/' > /etc/apt/sources.list.d/neo4j.list
apt-get update
apt-get install --no-install-recommends --yes neo4j=2.3.2

You can open the Neo4j Browser at http://localhost:7474/browser/ to check that the correct version has been installed.

Make sure Neo4j does not get updated when updating packages. You can use apt-pinning to do so. As root, create a file

su
touch /etc/apt/preferences.d/neo4j.pref

and add the following lines to this file.

Package: neo4j
Pin: version 2.3.2
Pin-Priority: 1000

6. make sure, permissions are correctly

su
chown -R tomcat7:tomcat7 /data/log
chown -R mysql:mysql /data/mysql
chown -R neo4j:adm /data/neo4j

7. install build environment for frontend

su
ln -s /usr/bin/nodejs /usr/bin/node
npm install -g grunt-cli karma bower

8. setup Metadata Repository (MySQL)

Create a database and a user for d:swarm. To customize the settings, edit dswarm/persistence/src/main/resources/create_database.sql. Do not check in this file in case you modify it. Hint: remember settings for step 13 (configure d:swarm).

mysql -uroot -p < dswarm/persistence/src/main/resources/create_database.sql

Then, open /etc/mysql/my.cnf (Ubuntu 14.04) or /etc/mysql/mysql.conf.d/mysqld.cnf (Ubuntu 16.04) and add the following line to the section [mysqld] (around line 45)

wait_timeout = 1209600

in the same file, same sections, change datadir to /data/mysql (around line 40)

datadir         = /data/mysql

add some performance tweaks for innodb for MySQL > 5.5

innodb-read-io-threads=1
innodb-write-io-threads=1

add this directory to AppArmor

su
echo "alias /var/lib/mysql/ -> /data/mysql/," >> /etc/apparmor.d/tunables/alias
/etc/init.d/apparmor reload

and copy whole MySQL data directory to new location (after stopping the mysql service)

service mysql stop
cp -pr /var/lib/mysql/* /data/mysql/

9. setup Nginx

create /etc/nginx/sites-available/dswarm and add the following block

server {
    listen 80 default_server;
    root /var/www/dswarm;

    location / {
        try_files $uri $uri/ =404;
    }

    location /dmp {
        # Allow OPTIONS without Authorization
        # This is required for proper CORS preflight support
        # see http://www.w3.org/TR/cors/#preflight-request ('Exclude user credentials')
        if ($request_method = OPTIONS) {
            # return 599;
            add_header Access-Control-Allow-Origin "http://localhost:9999";
            add_header Access-Control-Allow-Methods "GET, OPTIONS, HEAD, PUT, POST, DELETE";
            add_header Access-Control-Allow-Headers "Accept, Authorization, Origin, X-Requested-With, Content-Type";
            add_header Access-Control-Allow-Credentials "true";
            add_header Content-Length 0;
            add_header Content-Type text/plain;
            return 200;
        }
        client_max_body_size 100M;
        proxy_pass http://127.0.0.1:8080$uri$is_args$args;
        proxy_read_timeout 600s;
        proxy_send_timeout 300s;
    }

    location /neo {
        auth_basic "Restricted";
        auth_basic_user_file /data/.htaccess;
        proxy_pass http://127.0.0.1:7474/browser;
    }

    location /db/ {
        proxy_pass http://127.0.0.1:7474/db/;
    }

    location /dmp/api-docs {
        client_max_body_size 100M;
        proxy_pass http://127.0.0.1:8080$uri$is_args$args;
    }

    location /docs {
         alias /home/user/git/dswarm/controller/src/docs/ui/dist;
    }

    error_page 599 = @dmpnoauth;

    location @dmpnoauth {
        client_max_body_size 100M;
        proxy_pass http://127.0.0.1:8080$uri$is_args$args;
    }
}

for very long running processes, add appropriate settings for timeouts such as the proxy_read_timeout, see http://nginx.org/en/docs/http/ngx_http_proxy_module.html.

su
ln -s /etc/nginx/sites-available/dswarm /etc/nginx/sites-enabled/000-dswarm
mkdir /var/www

note: replace user in /home/user/... with the user home directory of your d:swarm installation + you may need to delete the default site (reference) of nginx

su 
rm /etc/nginx/sites-enabled/default

10. setup tomcat

open /etc/tomcat7/server.xml at line 33 and add a driverManagerProtection="false" so that the line reads

<Listener className="org.apache.catalina.core.JreMemoryLeakPreventionListener" driverManagerProtection="false" />

at line 73, same file, add this option maxPostSize="104857600", so that the Connector block reads

    <Connector port="8080" protocol="HTTP/1.1"
               connectionTimeout="20000"
               maxPostSize="104857600"
               URIEncoding="UTF-8"
               redirectPort="8443" />

then, give tomcat some more memory

su
echo 'export CATALINA_OPTS="-Xms4G -Xmx4G -XX:+CMSClassUnloadingEnabled -XX:+UseConcMarkSweepGC -XX:MaxPermSize=512M"' >> /usr/share/tomcat7/bin/setenv.sh

And finally, you have to tell Tomcat about Java 8. Open the file /etc/default/tomcat7 and around line 12, add this setting

# The home directory of the Java development kit (JDK). You need at least
# JDK version 1.5. If JAVA_HOME is not set, some common directories for
# OpenJDK, the Sun JDK, and various J2SE 1.5 versions are tried.
JAVA_HOME=/usr/lib/jvm/java-8-oracle

11. setup Data Hub (Neo4j)

increase file handlers at /etc/security/limits.conf

root   soft    nofile  40000
root   hard    nofile  40000

plus add ulimit -n 40000 into your neo4j service script (under /etc/init.d, e.g., /etc/init.d/neo4j-service) before starting the daemon

edit /etc/neo4j/neo4j.properties and:

  • insert some storage tweaks
dbms.pagecache.memory=8g
keep_logical_logs=false

edit /etc/neo4j/neo4j-server.properties and:

  • change the database location
org.neo4j.server.database.location=/data/neo4j/data/graph.db
  • disable authentication
dbms.security.auth_enabled=false
  • change the rrd database location
org.neo4j.server.webadmin.rrdb.location=/data/neo4j/data/rrd
  • add our graph extension
org.neo4j.server.thirdparty_jaxrs_classes=org.dswarm.graph.resources=/graph
  • (optional) specify IP address
org.neo4j.server.webserver.address=0.0.0.0

edit /etc/neo4j/neo4j-wrapper.conf and:

  • insert an additional parameter (if your server is x64)
wrapper.java.additional.1=-d64
  • tweak the java heap space size to an appropriate value according to your server ram memory, e.g.,
wrapper.java.initmemory=512
wrapper.java.maxmemory=8192

then, create a symlink from the previous log location to the external partition

su
mv /var/lib/neo4j/data/log{,-old}
ln -s /data/neo4j/log /var/lib/neo4j/data/log
mkdir /data/neo4j/log
chown -R neo4j:adm /data/neo4j/log

By default, the Neo4j Server is bundled with a Web server that binds to host localhost on port 7474, answering only requests from the local machine. If you need remote access to the Neo4j Browser or the D:SWARM Graph Extension API, see Secure the port and remote client connection accepts


These steps require less privileged access

12. configure d:swarm

Follow the instructions in d:swarm Configuration to create a custom d:swarm config file. Make sure that you've added a reference to your d:swarm config file in the Context section of context.xml configuration of Tomcat. Furthermore, make sure that the user that runs your Tomcat server (e.g. tomcat7) has read access to the d:swarm config file and read and write access to the folders that are configured in the paths section of your d:swarm config. You can do this, for example, by changing the owner of these folders to the user that runs your Tomcat server:

su
chown -R tomcat7:tomcat7 /path/to/your-folder-in-paths-section-of-dswarm-config

13. build D:SWARM Graph Extension

pushd dswarm-graph-neo4j
mvn -U -PRELEASE -DskipTests clean package
popd
mv dswarm-graph-neo4j/target/graph-1.3-jar-with-dependencies.jar dswarm-graph-neo4j.jar

14. build backend

pushd dswarm
mvn -U -DskipTests clean install -Dconfig.file=/path/to/dswarm.conf
pushd controller
mvn -U -DskipTests war:war -Dconfig.file=/path/to/dswarm.conf
popd; popd
mv dswarm/controller/target/dswarm-controller-0.1-SNAPSHOT.war dmp.war

note: Please specify the path to your custom d:swarm config, if it is not located at the root directory of the d:swarm backend repository. Otherwise, you can run the maven task with argument -Pdswarm-conf (which looks at the root directory of the d:swarm backend repository for a d:swarm config named dswarm.conf)

15. build frontend

pushd dswarm-backoffice-web; pushd yo
npm install
bower install
STAGE=unstable DMP_HOME=../../dswarm grunt build
popd
rsync --delete --verbose --recursive yo/dist/ yo/publish
popd

note: npm install may needs to be executed as root

set symbolic link to web root directory of the frontend (this step only needs to be done once)

su
ln -s /home/user/dswarm-backoffice-web/yo/publish /var/www/dswarm

These steps require root level access

16. wire everything together

lookout for the correct path (/home/user)

su
rm /var/lib/tomcat7/webapps/dmp.war
rm -r /var/lib/tomcat7/webapps/dmp
cp /home/user/dmp.war /var/lib/tomcat7/webapps/
cp /home/user/dswarm-graph-neo4j.jar /usr/share/neo4j/plugins/

17. restart everything, if needed

su
/etc/init.d/mysql restart
/etc/init.d/neo4j-service restart
/etc/init.d/nginx restart
/etc/init.d/tomcat7 restart

18. initialize/reset Metadata Repository + Data Hub

This step requires less privileged access

When running the backend the first time, the Metadata Repository (MySQL database) needs to be initialized. When updated, a reset is required in case the schema or initial data has changed. lookout for the correct path (/home/user)

pushd dswarm/dev-tools
python reset-dbs.py \
  --persistence-module=../persistence \
  --user=dmp \
  --password=dmp \
  --db=dmp \
  --neo4j=http://localhost:7474/graph

Or provide the credentials and values you configured. Check python reset-dbs.py --help for additional information.

Install Task Processing Unit (TPU)

The Task Processing Unit (TPU) allows for processing larger amounts of data with mappings that were created via the d:swarm Back Office. Please have a look at the TPU documentation for further details on its usage and install instructions.

(Optional) Install local Swagger UI

.. to easily explore the d:swarm backend HTTP API.

1. Go to the root directory of your d:swarm backend repository

cd /home/user/dswarm

2. Fetch submodules of this repository

git submodule update --init --recursive

(this command fetches a copy of the Swagger UI, which is linked as sub module from the backend controller)

3. Edit /etc/nginx/sites-available/default and add this just below the location / block

location /dmp/api-docs {
                  client_max_body_size 100M;
                  proxy_pass http://127.0.0.1:8080$uri$is_args$args;
          }

to forward the Swagger description of the d:swarm backend HTTP API and

location /docs {
                  alias 
[INSERT_HERE_THE_ROOT_DIRECTORY_OF_YOUR_DSWARM_BACKEND_REPOSITORY]/controller/src/docs/ui/dist;
          }

to point to the local Swagger UI installation.

note: you need to insert the correct path to the d:swarm backend repository (e.g. /home/user/dswarm)

4. Edit /home/user/dswarm/controller/src/docs/ui/dist/index.html line 31 to insert the URL of your local d:swarm backend HTTP API Swagger description:

     url = "http://localhost:/dmp/api-docs/"

Now you should be able to open and explore the d:swarm backend HTTP API via

http://localhost/docs

Update the System

1. update repository contents

pushd dswarm; git pull; popd
pushd dswarm-graph-neo4j; git pull; popd
pushd dswarm-backoffice-web; git pull; popd

2. repeat steps 13 (Building D:SWARM Graph Extension) to 18 (Init/reset Metadata Repository + Data Hub) from the installation as necessary

Checklist on Errors

First of all it's a good idea to know which of the four components frontend, backend, Metadata Repository (MySQL) and Data Hub (Neo4j database) does not run. If you already know, skip this list.

  • frontend: open http://localhost:9999 (port defaults to 80 for server installation) in a browser. The front end should be displayed.
  • backend: open http://localhost:8087/dmp/_ping (port defaults to 8080 for server installation) in a browser. The expected response is a page with the word pong.
  • Metadata Repository (MySQL database): open a terminal and type mysql -udmp -p dmp to open a connection to MySQL and select the database dmp. Hint: check for correct user name, password and database name in case you did not use the default values. If you can log in, type select * from DATA_MODEL;. At least three internal data models should be listed.
  • Data Hub (Neo4j database): open http://localhost:7474/browser/ in a browser. The Neo4j browser should be opened.
  • D:SWARM Graph Extension: open http://localhost:7474/graph/gdm/ping in a browser. The expected response is a page with the word pong.

Now that you know which component does not run, go through

  • is curl installed?
  • Did you choose a database name other than the default? If yes, you currently have to modify the init_internal_schema.sql, which is internally used by the script reset-dbs.py and change the USE database statement (this should be improved).
  • when building the projects with maven, did you use the -U option to update project dependencies?
  • Check your dswarm Configuration. Are database name and password correct, i.e., the ones used when installing the Metadata Repository (MySQL; step Setup Metadata Repository)? Compare dswarm/persistence/src/main/resources/create_database.sql with dswarm/dswarm.conf or any other configuration option you use.
  • Can Tomcat read the d:swarm configuration file?
  • initialize/reset the Metadata Repository + Data Hub. They may be empty or contain corrupted data caused by a failed unit tests.
  • Did you miss an update of, e.g., the neo4j version? Compare your installed version with the required version (see step 5)
  • Are do the folders that you've configured in the paths section of your d:swarm config existent and are they accessible (read + write) for the user that runs your Tomcat server?
    • If you specified a root path folder in the config, make sure it contains a tmp/resources and log folder (if you've not specified other folders in the paths section of your d:swarm config)
  • Did you set the maximum file-size for uploads (see Step 9) to a sufficient value for your scenario?
  • Did you set the server proxy timeout (see Step 9) to a sufficient value for your scenario?
  • In order to access the D:SWARM Graph Extension (e.g. .../graph/gdm/ping) you may have to allow access from other than localhost (see step Setup Data Hub).
  • if nginx package was updated, then (probably) the nginx webserver root was overwritten as well, i.e., you need to set the symbolic link to your d:swarm backoffice ui directory again (see Step 9)
Clone this wiki locally