Server Install

Bo Ferri edited this page Nov 18, 2016 · 66 revisions

These are the installation instructions for a server and will add some steps that aim at a production environment, which are not described in the [Developer-Install](developer install).

Initial Installation

premise:

  • currently, we've deployed the application only on Ubuntu Linux distributions
  • empty trusty (14.04) or later
  • three additional partitions (for easier increasing the sizes of the partitions separately):
    • /data/log, 1-5GB depending on usage
    • /data/mysql, up to 5GB
    • /data/neo4j, up to 20GB
  • let the $HOME of the less privileged user be '/home/user'
  • all commands boxes start as the less privileged user and in their $HOME.
  • Requiring root is explicitly marked (with su rather than sudo ... ; note that using sudo before each line is not sufficient to yield the same results as with su. Using su may require you to set a password for root if not already done!)

Note: some commands require user input, this is no unattended installation


This step requires root level access

1. install system packages required for building the software

su
apt-get install --no-install-recommends --yes git-core maven nodejs npm build-essential

These steps require less privileged access

2. clone repositories (not as root!)

lookout for the correct path (/home/user)

cd /home/user

git clone --depth 1 --branch builds/unstable https://github.com/dswarm/dswarm.git
git clone --depth 1 --branch master https://github.com/dswarm/dswarm-graph-neo4j.git
git clone --depth 1 --branch builds/unstable https://github.com/dswarm/dswarm-backoffice-web.git

These steps require root level access

3. install Java and Tomcat for the backend

  • D:SWARM requires Java 8, which is no longer available in the default package sources. Follow these steps
su
add-apt-repository ppa:webupd8team/java
apt-get update
apt-get install oracle-java8-installer oracle-java8-set-default

You can verify your java version with

java -version 2>&1 | grep -q "1.8" && echo "OK, Java 8 is available" || echo "Uh oh, Java 8 is not available"

Add $JAVA_HOME in /etc/environment:

JAVA_HOME="/usr/lib/jvm/java-8-oracle"

Earlier versions of Tomcat (< 7.0.30) do not run with Java 8 albeit being advertised to do so (related to this bug). Ubuntu 12.04 Precise includes Tomcat in version 7.0.26, which therefore must be updated. If you run precise, execute these steps:

wget https://launchpad.net/ubuntu/+archive/primary/+files/libservlet3.0-java_7.0.52-1ubuntu0.1_all.deb
wget https://launchpad.net/ubuntu/+archive/primary/+files/libtomcat7-java_7.0.52-1ubuntu0.1_all.deb
wget https://launchpad.net/ubuntu/+archive/primary/+files/tomcat7-admin_7.0.52-1ubuntu0.1_all.deb
wget https://launchpad.net/ubuntu/+archive/primary/+files/tomcat7-common_7.0.52-1ubuntu0.1_all.deb
wget https://launchpad.net/ubuntu/+archive/primary/+files/tomcat7_7.0.52-1ubuntu0.1_all.deb
su
dpkg -i libservlet3.0-java_7.0.52-1ubuntu0.1_all.deb
dpkg -i libtomcat7-java_7.0.52-1ubuntu0.1_all.deb
dpkg -i tomcat7-common_7.0.52-1ubuntu0.1_all.deb
dpkg -i tomcat7-admin_7.0.52-1ubuntu0.1_all.deb
dpkg -i tomcat7_7.0.52-1ubuntu0.1_all.deb

If you run a more recent version, just install tomcat from the official sources:

su
apt-get install tomcat7

If you encounter an install failure, stating $JAVA_HOME not found by tomcat7, than change $JAVA_HOME in /etc/default/tomcat7 and try the installation again:

JAVA_HOME=/usr/lib/jvm/java-8-oracle

4. install system packages required for running the software

su
apt-get install --no-install-recommends --yes mysql-server nginx curl

5. install Data Hub (Neo4j)

currently, we rely on Neo4j version 2.3.2

su
wget -O - http://debian.neo4j.org/neotechnology.gpg.key | apt-key add -
echo 'deb http://debian.neo4j.org/repo stable/' > /etc/apt/sources.list.d/neo4j.list
apt-get update
apt-get install --no-install-recommends --yes neo4j=2.3.2

You can open the Neo4j Browser at http://localhost:7474/browser/ to check that the correct version has been installed.

Make sure Neo4j does not get updated when updating packages. You can use apt-pinning to do so. As root, create a file

su
touch /etc/apt/preferences.d/neo4j.pref

and add the following lines to this file.

Package: neo4j
Pin: version 2.3.2
Pin-Priority: 1000

6. make sure, permissions are correctly

su
chown -R tomcat7:tomcat7 /data/log
chown -R mysql:mysql /data/mysql
chown -R neo4j:adm /data/neo4j

7. install build environment for frontend

su
ln -s /usr/bin/nodejs /usr/bin/node
npm install -g grunt-cli karma bower

8. setup Metadata Repository (MySQL)

Create a database and a user for d:swarm. To customize the settings, edit dswarm/persistence/src/main/resources/create_database.sql. Do not check in this file in case you modify it. Hint: remember settings for step 13 (configure d:swarm).

mysql -uroot -p < dswarm/persistence/src/main/resources/create_database.sql

Then, open /etc/mysql/my.cnf (Ubuntu 14.04) or /etc/mysql/mysql.conf.d/mysqld.cnf (Ubuntu 16.04) and add the following line to the section [mysqld] (around line 45)

wait_timeout = 1209600

in the same file, same sections, change datadir to /data/mysql (around line 40)

datadir         = /data/mysql

add some performance tweaks for innodb for MySQL > 5.5

innodb-read-io-threads=1
innodb-write-io-threads=1

add this directory to AppArmor

su
echo "alias /var/lib/mysql/ -> /data/mysql/," >> /etc/apparmor.d/tunables/alias
/etc/init.d/apparmor reload

and copy whole MySQL data directory to new location (after stopping the mysql service)

service mysql stop
cp -pr /var/lib/mysql/* /data/mysql/

9. setup Nginx

create /etc/nginx/sites-available/dswarm and add the following block

server {
    listen 80 default_server;
    root /var/www/dswarm;

    location / {
        try_files $uri $uri/ =404;
    }

    location /dmp {
        # Allow OPTIONS without Authorization
        # This is required for proper CORS preflight support
        # see http://www.w3.org/TR/cors/#preflight-request ('Exclude user credentials')
        if ($request_method = OPTIONS) {
            # return 599;
            add_header Access-Control-Allow-Origin "http://localhost:9999";
            add_header Access-Control-Allow-Methods "GET, OPTIONS, HEAD, PUT, POST, DELETE";
            add_header Access-Control-Allow-Headers "Accept, Authorization, Origin, X-Requested-With, Content-Type";
            add_header Access-Control-Allow-Credentials "true";
            add_header Content-Length 0;
            add_header Content-Type text/plain;
            return 200;
        }
        client_max_body_size 100M;
        proxy_pass http://127.0.0.1:8080$uri$is_args$args;
        proxy_read_timeout 600s;
        proxy_send_timeout 300s;
    }

    location /neo {
        auth_basic "Restricted";
        auth_basic_user_file /data/.htaccess;
        proxy_pass http://127.0.0.1:7474/browser;
    }

    location /db/ {
        proxy_pass http://127.0.0.1:7474/db/;
    }

    location /dmp/api-docs {
        client_max_body_size 100M;
        proxy_pass http://127.0.0.1:8080$uri$is_args$args;
    }

    location /docs {
         alias /home/user/git/dswarm/controller/src/docs/ui/dist;
    }

    error_page 599 = @dmpnoauth;

    location @dmpnoauth {
        client_max_body_size 100M;
        proxy_pass http://127.0.0.1:8080$uri$is_args$args;
    }
}

for very long running processes, add appropriate settings for timeouts such as the proxy_read_timeout, see http://nginx.org/en/docs/http/ngx_http_proxy_module.html.

su
ln -s /etc/nginx/sites-available/dswarm /etc/nginx/sites-enabled/000-dswarm
mkdir /var/www

note: replace user in /home/user/... with the user home directory of your d:swarm installation + you may need to delete the default site (reference) of nginx

su 
rm /etc/nginx/sites-enabled/default

10. setup tomcat

open /etc/tomcat7/server.xml at line 33 and add a driverManagerProtection="false" so that the line reads

<Listener className="org.apache.catalina.core.JreMemoryLeakPreventionListener" driverManagerProtection="false" />

at line 73, same file, add this option maxPostSize="104857600", so that the Connector block reads

    <Connector port="8080" protocol="HTTP/1.1"
               connectionTimeout="20000"
               maxPostSize="104857600"
               URIEncoding="UTF-8"
               redirectPort="8443" />

then, give tomcat some more memory

su
echo 'export CATALINA_OPTS="-Xms4G -Xmx4G -XX:+CMSClassUnloadingEnabled -XX:+UseConcMarkSweepGC -XX:MaxPermSize=512M"' >> /usr/share/tomcat7/bin/setenv.sh

And finally, you have to tell Tomcat about Java 8. Open the file /etc/default/tomcat7 and around line 12, add this setting

# The home directory of the Java development kit (JDK). You need at least
# JDK version 1.5. If JAVA_HOME is not set, some common directories for
# OpenJDK, the Sun JDK, and various J2SE 1.5 versions are tried.
JAVA_HOME=/usr/lib/jvm/java-8-oracle

11. setup Data Hub (Neo4j)

increase file handlers at /etc/security/limits.conf

root   soft    nofile  40000
root   hard    nofile  40000

plus add ulimit -n 40000 into your neo4j service script (under /etc/init.d, e.g., /etc/init.d/neo4j-service) before starting the daemon

edit /etc/neo4j/neo4j.properties and:

  • insert some storage tweaks
dbms.pagecache.memory=8g
keep_logical_logs=false

edit /etc/neo4j/neo4j-server.properties and:

  • change the database location
org.neo4j.server.database.location=/data/neo4j/data/graph.db
  • disable authentication
dbms.security.auth_enabled=false
  • change the rrd database location
org.neo4j.server.webadmin.rrdb.location=/data/neo4j/data/rrd
  • add our graph extension
org.neo4j.server.thirdparty_jaxrs_classes=org.dswarm.graph.resources=/graph
  • (optional) specify IP address
org.neo4j.server.webserver.address=0.0.0.0

edit /etc/neo4j/neo4j-wrapper.conf and:

  • insert an additional parameter (if your server is x64)
wrapper.java.additional.1=-d64
  • tweak the java heap space size to an appropriate value according to your server ram memory, e.g.,
wrapper.java.initmemory=512
wrapper.java.maxmemory=8192

then, create a symlink from the previous log location to the external partition

su
mv /var/lib/neo4j/data/log{,-old}
ln -s /data/neo4j/log /var/lib/neo4j/data/log
mkdir /data/neo4j/log
chown -R neo4j:adm /data/neo4j/log

By default, the Neo4j Server is bundled with a Web server that binds to host localhost on port 7474, answering only requests from the local machine. If you need remote access to the Neo4j Browser or the D:SWARM Graph Extension API, see Secure the port and remote client connection accepts


These steps require less privileged access

12. configure d:swarm

Follow the instructions in d:swarm Configuration to create a custom d:swarm config file. Make sure that you've added a reference to your d:swarm config file in the Context section of context.xml configuration of Tomcat. Furthermore, make sure that the user that runs your Tomcat server (e.g. tomcat7) has read access to the d:swarm config file and read and write access to the folders that are configured in the paths section of your d:swarm config. You can do this, for example, by changing the owner of these folders to the user that runs your Tomcat server:

su
chown -R tomcat7:tomcat7 /path/to/your-folder-in-paths-section-of-dswarm-config

13. build D:SWARM Graph Extension

pushd dswarm-graph-neo4j
mvn -U -PRELEASE -DskipTests clean package
popd
mv dswarm-graph-neo4j/target/graph-1.3-jar-with-dependencies.jar dswarm-graph-neo4j.jar

14. build backend

pushd dswarm
mvn -U -DskipTests clean install -Dconfig.file=/path/to/dswarm.conf
pushd controller
mvn -U -DskipTests war:war -Dconfig.file=/path/to/dswarm.conf
popd; popd
mv dswarm/controller/target/dswarm-controller-0.1-SNAPSHOT.war dmp.war

note: Please specify the path to your custom d:swarm config, if it is not located at the root directory of the d:swarm backend repository. Otherwise, you can run the maven task with argument -Pdswarm-conf (which looks at the root directory of the d:swarm backend repository for a d:swarm config named dswarm.conf)

15. build frontend

pushd dswarm-backoffice-web; pushd yo
npm install
bower install
STAGE=unstable DMP_HOME=../../dswarm grunt build
popd
rsync --delete --verbose --recursive yo/dist/ yo/publish
popd

note: npm install may needs to be executed as root

set symbolic link to web root directory of the frontend (this step only needs to be done once)

su
ln -s /home/user/dswarm-backoffice-web/yo/publish /var/www/dswarm

These steps require root level access

16. wire everything together

lookout for the correct path (/home/user)

su
rm /var/lib/tomcat7/webapps/dmp.war
rm -r /var/lib/tomcat7/webapps/dmp
cp /home/user/dmp.war /var/lib/tomcat7/webapps/
cp /home/user/dswarm-graph-neo4j.jar /usr/share/neo4j/plugins/

17. restart everything, if needed

su
/etc/init.d/mysql restart
/etc/init.d/neo4j-service restart
/etc/init.d/nginx restart
/etc/init.d/tomcat7 restart

18. initialize/reset Metadata Repository + Data Hub

This step requires less privileged access

When running the backend the first time, the Metadata Repository (MySQL database) needs to be initialized. When updated, a reset is required in case the schema or initial data has changed. lookout for the correct path (/home/user)

pushd dswarm/dev-tools
python reset-dbs.py \
  --persistence-module=../persistence \
  --user=dmp \
  --password=dmp \
  --db=dmp \
  --neo4j=http://localhost:7474/graph

Or provide the credentials and values you configured. Check python reset-dbs.py --help for additional information.

Install Task Processing Unit (TPU)

The Task Processing Unit (TPU) allows for processing larger amounts of data with mappings that were created via the d:swarm Back Office. Please have a look at the TPU documentation for further details on its usage and install instructions.

(Optional) Install local Swagger UI

.. to easily explore the d:swarm backend HTTP API.

1. Go to the root directory of your d:swarm backend repository

cd /home/user/dswarm

2. Fetch submodules of this repository

git submodule update --init --recursive

(this command fetches a copy of the Swagger UI, which is linked as sub module from the backend controller)

3. Edit /etc/nginx/sites-available/default and add this just below the location / block

location /dmp/api-docs {
                  client_max_body_size 100M;
                  proxy_pass http://127.0.0.1:8080$uri$is_args$args;
          }

to forward the Swagger description of the d:swarm backend HTTP API and

location /docs {
                  alias 
[INSERT_HERE_THE_ROOT_DIRECTORY_OF_YOUR_DSWARM_BACKEND_REPOSITORY]/controller/src/docs/ui/dist;
          }

to point to the local Swagger UI installation.

note: you need to insert the correct path to the d:swarm backend repository (e.g. /home/user/dswarm)

4. Edit /home/user/dswarm/controller/src/docs/ui/dist/index.html line 31 to insert the URL of your local d:swarm backend HTTP API Swagger description:

     url = "http://localhost:/dmp/api-docs/"

Now you should be able to open and explore the d:swarm backend HTTP API via

http://localhost/docs

Update the System

1. update repository contents

pushd dswarm; git pull; popd
pushd dswarm-graph-neo4j; git pull; popd
pushd dswarm-backoffice-web; git pull; popd

2. repeat steps 13 (Building D:SWARM Graph Extension) to 18 (Init/reset Metadata Repository + Data Hub) from the installation as necessary

Checklist on Errors

First of all it's a good idea to know which of the four components frontend, backend, Metadata Repository (MySQL) and Data Hub (Neo4j database) does not run. If you already know, skip this list.

  • frontend: open http://localhost:9999 (port defaults to 80 for server installation) in a browser. The front end should be displayed.
  • backend: open http://localhost:8087/dmp/_ping (port defaults to 8080 for server installation) in a browser. The expected response is a page with the word pong.
  • Metadata Repository (MySQL database): open a terminal and type mysql -udmp -p dmp to open a connection to MySQL and select the database dmp. Hint: check for correct user name, password and database name in case you did not use the default values. If you can log in, type select * from DATA_MODEL;. At least three internal data models should be listed.
  • Data Hub (Neo4j database): open http://localhost:7474/browser/ in a browser. The Neo4j browser should be opened.
  • D:SWARM Graph Extension: open http://localhost:7474/graph/gdm/ping in a browser. The expected response is a page with the word pong.

Now that you know which component does not run, go through

  • is curl installed?
  • Did you choose a database name other than the default? If yes, you currently have to modify the init_internal_schema.sql, which is internally used by the script reset-dbs.py and change the USE database statement (this should be improved).
  • when building the projects with maven, did you use the -U option to update project dependencies?
  • Check your dswarm Configuration. Are database name and password correct, i.e., the ones used when installing the Metadata Repository (MySQL; step Setup Metadata Repository)? Compare dswarm/persistence/src/main/resources/create_database.sql with dswarm/dswarm.conf or any other configuration option you use.
  • Can Tomcat read the d:swarm configuration file?
  • initialize/reset the Metadata Repository + Data Hub. They may be empty or contain corrupted data caused by a failed unit tests.
  • Did you miss an update of, e.g., the neo4j version? Compare your installed version with the required version (see step 5)
  • Are do the folders that you've configured in the paths section of your d:swarm config existent and are they accessible (read + write) for the user that runs your Tomcat server?
    • If you specified a root path folder in the config, make sure it contains a tmp/resources and log folder (if you've not specified other folders in the paths section of your d:swarm config)
  • Did you set the maximum file-size for uploads (see Step 9) to a sufficient value for your scenario?
  • Did you set the server proxy timeout (see Step 9) to a sufficient value for your scenario?
  • In order to access the D:SWARM Graph Extension (e.g. .../graph/gdm/ping) you may have to allow access from other than localhost (see step Setup Data Hub).
  • if nginx package was updated, then (probably) the nginx webserver root was overwritten as well, i.e., you need to set the symbolic link to your d:swarm backoffice ui directory again (see Step 9)
Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.