Installation_Guide

Idoia edited this page Mar 30, 2016 · 82 revisions

Introduction

Architecture

User Guide

Developer Guide

  • [User interface] (User_Interface)
  • [RDFizer] (RDFizer)
  • [Links Discovery] (Links_Discovery)
  • [Linked Data Server] (Linked_Data_server)
  • [CKAN Datahub Page Creation] (CKAN_Datahub_Page_Creation)

Release Notes

  • [Release Notes 1.0] (Release_Notes1)
  • [Release Notes 2.0] (Release_Notes2)
  • [Release Notes 2.1] (Release_Notes_2.1)

RoadMap

Links

Clone this wiki locally

Installation Guide

The following section will guide you through the installation steps of each module.

Prerequisites

The following software is assumed as available on the target machine. At the end of this page you can find a set of download links.

  • JDK 8
  • MySQL 5.5.x: this is not a strict requirement since we are using JDBC and plain standard SQL. However, provided DDL scripts have been tested on MySQL so in case there's a bit of work to do for other databases;
  • Virtuoso Open-Source Edition v7.1
  • Apache Tomcat 7.0.50 or above: as with the database, this is not a strict requirement because ALIADA modules are plain and standard JEE 6 modules. That means you could use your favourite Servlet Engine or Application Server (compliant with that specs level). However this guide will refer to Apache Tomcat.

Step 1: Install and configure Open Link Virtuoso

After installing Virtuoso Open-Source Edition v7.1, it must be configured executing the following ISQL commands from Virtuoso´s ISQL command interface:

 isql localhost:1111 dba dba aliada_config_virtuoso.sql -i VIRTUOSO_DIRS_ALLOWED

where VIRTUOSO_DIRS_ALLOWED will be a folder name included in the DirsAllowed parameter of virtuoso.ini file, and the aliada_config_virtuoso.sql file content will be:

  --------------------------------
  --Install Faceted Browser VAD Package
  --------------------------------
  vad_install('$ARGV[6]/fct_dav.vad', 0);
  --------------------------------
  --Upload Aliada ontology and MARC Codes List
  --------------------------------
  delete from db.dba.load_list;
  SPARQL CLEAR GRAPH  <http://aliada-project.eu/2014/aliada-ontology#>; 
  ld_dir ('$ARGV[6]', 'aliada-ontology.owl', 'http://aliada-project.eu/2014/aliada-ontology#');
  ld_dir ('$ARGV[6]', 'languages.rdf', 'http://id.loc.gov/vocabulary/languages');
  ld_dir ('$ARGV[6]', 'aliada_languages_spanish.nt', 'http://id.loc.gov/vocabulary/languages');
  ld_dir ('$ARGV[6]', 'countries.rdf', 'http://id.loc.gov/vocabulary/countries');
  ld_dir ('$ARGV[6]', 'gacs.rdf', 'http://id.loc.gov/vocabulary/geographicAreas');
  rdf_loader_run ();
  DB.DBA.RDF_LOAD_RDFXML (http_get ('http://www.w3.org/2002/07/owl.rdf'),'no', 'http://aliada-project.eu/2014/aliada-ontology#');
  DB.DBA.RDF_LOAD_RDFXML (http_get ('http://erlangen-crm.org/current/'),'no', 'http://aliada-project.eu/2014/aliada-ontology#');
  DB.DBA.RDF_LOAD_RDFXML (http_get ('http://erlangen-crm.org/efrbroo/'),'no', 'http://aliada-project.eu/2014/aliada-ontology#');
  --------------------------------
  --Create user aliada_dev for SPARQL_UPDATE point
  --------------------------------
  DB.DBA.USER_CREATE ("SPARQL-AUTH USERNAME", "SPARQL-AUTH PASSWORD", vector('SQL_ENABLE', 1, 'DAV_ENABLE', 1));
  GRANT SPARQL_UPDATE TO "SPARQL-AUTH USERNAME";
  --------------------------------
  --Add resource Type (mime type) for downloading correctly the generated triples from CKAN Datahub
  --------------------------------
  INSERT INTO WS.WS.SYS_DAV_RES_TYPES (T_EXT, T_TYPE) VALUES ('gz', 'application/x-ntriples');

The "aliada-ontology.owl" file contains ALIADA ontology.

The "languages.rdf", "countries.rdf" and "gacs.rdf" files are dumps of the Library of Congress MARC Codes of the languages, countries and geographical areas respectivelly.

The "aliada_languages_spanish.nt" file contains some triples to be added to graph "http://id.loc.gov/vocabulary/languages" which contains the MARC codes of the languages. These new triples contain the name of well-known languages translated into Spanish.

The "SPARQL-AUTH USERNAME" and "SPARQL-AUTH PASSWORD" should be replaced by the user and password to be used for the SPARQL authorised user.

Step 2: Create "linking" user in the machine

A new user to execute the programmed linking processes via crontab must be created in the machine where Aliada tool resides. The name of this user will be indicated in the field linking_client_app_user of the DB table aliada.organisation, as described below. We have called it "linking", but it can be another one.

This user must have the following privileges:

  • read/write privileges in the temporary folder indicated in the field tmp_dir of aliada.organisation DB table, because it will read an write files located in this folder;
  • execution privileges over the shell script links-discovery-task-runner.sh of Links Discovery Application Client module. This shell script is located in the folder indicated in the field linking_client_app_bin_dir of aliada.organisation DB table;
  • read privileges over the *.jar libraries used by this shell script (see below the installation guidelines of Links Discovery Application Client module);

The following changes must be applied to the /etc/sudoers file of the system, so that the tomcat user, that executes the Links Discovery component, can program via crontab the linking processes for this new user:

  • Comment the following line so that the web application can execute a "su" command without a tty:

    #Defaults    requiretty
    
  • Insert the following lines in the file so that the web application can execute via "su" the "crontab" command of the new user:

    tomcat ALL=(linking) NOPASSWD: /usr/bin/crontab
    linking ALL=(linking) NOPASSWD: /usr/bin/crontab
    

Step 3: Create CKAN Datahub User and Organization

To be able to use CKAN Datahub RESTful API to create dataset pages, it is required to be registered in CKAN Datahub and to create an organization linked to that user. These steps are explained next:

  • User creation. It is required to create a user ID in CKAN. This is done by using the “Register” link at the top of any page of CKAN datahub http://datahub.io/user/register. The following data will be asked: username, full name, e-mail address and password. When a new user is created, CKAN provides an API key, that must be used when an API function requires authorization. This key will be the value for the organisation.ckan_api_key DB field.

  • Organization creation. To create a dataset, it is required that it belongs to an organization. That is, an organization must be created first. Organizations act like publishing departments for datasets (for example, if CKAN is being used as a data portal by a national government, the organizations might be different government departments, each of which publishes data: the department of Health, etc.). This means that datasets can be published by and belong to a department instead of to an individual user. The URL of the REST service that allows creating an organization is “http://datahub.io/api/action/organization_create”. At the moment, this REST service is only available for the system administrators of CKAN Datahub and the only way for creating an organization is by doing the following as explained in http://help.datahub.io/kb/general/creating-a-dataset-on-the-datahub-december-2013:

    • Choose the title and "slug" (for the url) for your organization (e.g. “My New Organization” and “my-new-organization”). The chosen "slug" will be the value for the organisation.org_name DB field.

    • Open a new ticket at http://help.datahub.io/discussion/new and provide that information along with the DataHub username (so that the user becomes the administrator of that organization). They suggest titling the ticket “New Organization Request” so they can act on it quickly.

Step 4: Create database tables

DDL and in general scripts needed for having a database schema ready can be found under

https://github.com/ALIADA/aliada-tool/tree/master/aliada/src/site/database

Assuming you have a database user with the right grants, execute, in order the following scripts:

00-aliada-admin.sql
01-aliada-rdfizer.sql
02-aliada-linksdiscovery.sql
03-aliada-linkeddataserver.sql
04-aliada-ckancreation.sql

At the end you will have the required tables for each module.

The 00-aliada-admin.sql script will create an administrator user with user/password admin/admin in aliada.user table, and an organisation associated to it, in aliada.organisation table. The values of the aliada.organisation table fields can be changed by hand in this script before executing it. Only the org_name, org_catalog_url and org_logo fields can be changed from the "admin" area of ALIADA tool UI.

The fields in aliada.organisation table are the following:

  • org_name: 'artium' (organization name in CKAN Datahub)
  • org_path: '/usr/share/tomcat/upload/' (folder where to upload the records files to import in ALIADA)
  • org_logo: content of the organization logo.
  • org_catalog_url: 'http://aliada.artium.org' (URL of the catalog where the original records reside)
  • org_description: 'Basque Museum-Center of Contemporary Art' (organization description)
  • org_home_page: 'http://www.artium.org/' (organization home page)
  • aliada_ontology: 'http://aliada-project.eu/2014/aliada-ontology#' (The URI of ALIADA ontology)
  • tmp_dir: '/home/aliada/tmp' (temporary folder used by ALIADA modules to create temporal files)
  • linking_client_app_bin_dir: '/home/aliada/links-discovery/bin/' (the name of the folder where the links-discovery-task-runner.sh shell script has been installed for the Links Discovery module.)
  • linking_client_app_user: 'linking' (the machine´s user for whom the crontab will be programmed to execute the links-discovery-task-runner.sh shell script of the Links Discovery module)
  • store_ip: 'localhost' (IP address of the machine where the RDF store resides)
  • store_sql_port: 1111 (port of the RDF store for SQL access)
  • sql_login: 'dba' (the login of the SQL access)
  • sql_password: 'dba' (the password of the SQL access)
  • isql_command_path: '/home/virtuoso/bin/isql' (full path to the ISQL command)
  • isql_commands_file_dataset_default: '/home/aliada/linked-data-server/config/isql_rewrite_rules_global.sql' (full path of the ISQL commands default file to execute for the dataset from the Linked Data Server module. If the dataset.isql_commands_file_dataset field is null or it does not exist, this one will be used)
  • isql_commands_file_subset_default: '/home/aliada/linked-data-server/config/isql_rewrite_rules_subset_default.sql' (full path of the ISQL commands default file to execute for the subset from the Linked Data Server module. If the subset.isql_commands_file_subset field is null or it does not exist, this one will be used)
  • isql_commands_file_graph_dump: '/home/aliada/ckan-datahub-page-creation/config/dump_one_graph_nt.sql' (full path of the ISQL commands file to dump the triples of a graph in Virtuoso into a compressed file, used in CKAN Datahub Page Creation module)
  • virtuoso_http_server_root: '/home/virtuoso/var/lib/virtuoso/vsp' (full path of Virtuoso HTTP server root folder, where the web page for the dataset will be created from the Linked Data Server module)
  • ckan_api_url: 'https://datahub.io/api/action' (URL of the RESTful API of CKAN Datahub, used in CKAN Datahub Page Creation module)
  • ckan_api_key: 'xxxxxx' (Key to use the RESTful API of CKAN Datahub, used in CKAN Datahub Page Creation module)
  • ckan_org_url: URL of the organization in CKAN Datahub. The value of this field will be written by the CKAN Datahub Page Creation module.
  • dataset_author: 'Aliada Consortium' (author of the generated RDF dataset with ALIADA tool)
  • isql_commands_file_dataset_creation: '/home/aliada/bin/aliada_new_dataset.sql' (full path of the ISQL commands file to create a dataset physically in Virtuoso. It will be called when the dataset is created from the admin area of the UI of ALIADA tool).

This script will also create several rows in aliada.t_external_dataset table, one for each external dataset where the Links Discovery process searchs for links: Europeana, BNB, BNE, Freebase, Dbpedia, NSZL, Geonames and MARC Code Lists, Openlibray, Lobid, Viaf and LCSH. This table contains the following fields:

  • external_dataset_name: name of the external dataset (e.g.: DBPedia).
  • external_dataset_description: description of the external dataset (e.g.: Linked Data version of Wikipedia).
  • external_dataset_homepage: home page of the external dataset (e.g.: http://dbpedia.org).
  • external_dataset_linkingfile: template file to create the SILK configuration file (e.g.: /home/aliada/links-discovery/config/silk/aliada_dbpedia_config.xml).
  • external_dataset_linkingnumthreads: number of threads to configure the SILK process for searching links in the corresponding external dataset.
  • external_dataset_linkingreloadtarget: flag indicating whether the external dataset should be reloaded or not. It will be 0 by default, as the external dataset won´t change that much and it takes long to load it. It is enough to set this flag on once a year or so.
  • external_dataset_linkingreloadsource: flag indicating whether the ALIADA dataset should be reloaded or not. It will be 1 by default, as the imported dataset with ALIADA tool may change.

Step 5: Modules installation

With few exceptions, ALIADA module are web applications so the installation itself is quite easy and it's just a matter of deployment.

User Interface

Datasource

User Interface needs a datasource with a JNDI name "jdbc/aliada" defined in the Servlet Engine / Application Server. This is how you can do that in Tomcat (either in server.xml or in context.xml)

    <Resource 
	name="**jdbc/aliada**"
	auth="Container"
	type="javax.sql.DataSource"
	username="USERNAME HERE" password="PASSWORD HERE"
	driverClassName="com.mysql.jdbc.Driver"
	url="jdbc:mysql://HOST:PORT/aliada"
	maxActive="2"
	maxIdle="2"/> 

Data directories

User Interface needs a directory where the file will be imported. This path is defined by de user of application.

The user who is running User Interface is supposed to have all permissions on this folder.

Deployment

The User Interface is a standard JEE 6 web module so the deployment task is quite trivial. It effectively depends on the target Servlet Enginge / Application Server. First, download the war from here (link is not working because at time of writing the first milestone hasn't been reached) On Apache Tomcat you can just drop the war file onto the webapps folder. Once the server has been started you should see the following messages:

 ...
 12:47:45,229 INFO  [LogonInterceptor] <UserInterface-00001> : User Interface is starting...
 12:47:45,229 INFO  [LogonInterceptor] <UserInterface-00003> : Intializing LogonInterceptor.

RDFizer

The RDFizer is configured in two different places. First, it needs, as described below, something on the hosting environment: a datasource reference within the hosting Application Server or Servlet Engine and several directories for working with data files.

Then, it can be configured using a simple configuration file described here.

Datasource

RDFizer needs a datasource with a JNDI name "jdbc/aliada" defined in the Servlet Engine / Application Server. This is how you can do that in Tomcat (either in server.xml or in context.xml)

    <Resource 
	name="**jdbc/aliada**"
	auth="Container"
	type="javax.sql.DataSource"
	username="USERNAME HERE" password="PASSWORD HERE"
	driverClassName="com.mysql.jdbc.Driver"
	url="jdbc:mysql://HOST:PORT/aliada"
	maxActive="2"
	maxIdle="2"/> 

Data directories

RDFizer needs several work directories where datafiles will be placed and moved.

  • /var/data/pipeline/input/marcxml: MARCXML datafiles working directory;
  • /var/data/pipeline/input/auth: MARCXML authorities datafiles working directory;
  • /var/data/pipeline/input/lido: LIDO datafiles working directory;
  • /var/data/pipeline/input/dc: Dublin Core datafiles working directory;

The user who is running RDFIzer is supposed to have all permissions on those folders.

It also uses a directory for the Named Entity Recognition, where the classifier will be located.

All these work directories are specified here.

Deployment

The RDFizer is a standard JEE 6 web module so the deployment task is quite trivial. It effectively depends on the target Servlet Enginge / Application Server. First, download the war from here (link is not working because at time of writing the first milestone hasn't been reached) On Apache Tomcat you can just drop the war file onto the webapps folder. Once the server has been started you should see the following messages:

 ...
 08:37:32,167 INFO  [ApplicationLifecycleListener] <RDF-IZER-00001> : RDF-izer is starting...    
 08:37:32,168 INFO  [ApplicationLifecycleListener] <RDF-IZER-00010> : RDF-izer open for e-business.    

Links Discovery

Datasource

The Links Discovery module needs a datasource with a JNDI name "jdbc/aliada" defined in the Servlet Engine / Application Server. This is how you can do that in Tomcat (either in server.xml or in context.xml)

    <Resource 
	name="**jdbc/aliada**"
	auth="Container"
	type="javax.sql.DataSource"
	username="USERNAME HERE" password="PASSWORD HERE"
	driverClassName="com.mysql.jdbc.Driver"
	url="jdbc:mysql://HOST:PORT/aliada"
	maxActive="2"
	maxIdle="2"/> 

It also needs to define this datasource parameters as context parameters which will be used to transfer them to the subjobs to be created by the Links Discovery module. This is how you can do that in Tomcat (either in server.xml or in context.xml)

<Parameter name="ddbb.username" value="USERNAME HERE" override="false"/>
<Parameter name="ddbb.password" value="PASSWORD HERE" override="false"/>
<Parameter name="ddbb.driverClassName" value="com.mysql.jdbc.Driver" override="false"/>
<Parameter name="ddbb.url" value="jdbc:mysql://HOST:PORT/aliada" override="false"/>

Data directory

Links Discovery needs several files to configure the linking processes. These files, which are located in resources folder, are the following:

  • aliada_XXXX_config.xml template files to create the SILK configuration files. XXXX stands for the name of the external datasource to link with. Examples of these files are: aliada_dbpedia_config.xml, aliada_europeana_config.xml, etc. The locations of these files are specified in field aliada.t_external_dataset of its corresponding external dataset.

Links Discovery needs a working directory where several files will be created and removed. These files are:

  • crontab file "aliada_links_discovery.cron" for programming the SILK processes;
  • properties files for each subjob created by this module, which contain the parameters to connect to the DB.The DB will contain the configuration parameters of the corresponding subjob.
  • SILK configuration files;
  • files generated by the SILK processes containing the new generated triples to link with external datasources in the Open Linked Data Cloud.

The user who is running the Links Discovery module (tomcat in our case) is supposed to have all permissions on that folder. The path to that folder will be specified in the tmp_dir column of the aliada.linksdiscovery_job_instances DB table of the corresponding job.

Deployment

The Links Discovery module is a standard JEE 6 web module so the deployment task is quite trivial. It effectively depends on the target Servlet Enginge / Application Server. First, download the war from here (link is not working because at time of writing the first milestone hasn't been reached) On Apache Tomcat you can just drop the war file onto the webapps folder. Once the server has been started you should see the following message:

 ...
 14:24:53,140 INFO  [ApplicationLifecycleListener] <LinksDiscovery-00001> : LinksDiscovery is starting...  

It also consists of a standalone Java application that is invoked from the crontab programmed by the Links Discovery web application. To deploy it, just download the zip from here (link is not working because at time of writing the first milestone hasn't been reached). Unzip the file and specify the path where the links-discovery-task-runner.sh shell script resides in the client_app_bin_dir column of the aliada.linksdiscovery_job_instances DB table of the corresponding job.

Links Discovery Application Client

The Links Discovery Application Client module is not a web application, but a normal Java application. It consists of the following files included in the target file aliada-links-discovery-application-client-2.0-standalone.zip:

  • bin/links-discovery-task-runner.sh : a shell script that executes the Java application. The path to the folder where this file resides must be indicated in the field linking_client_app_bin_dir of the aliada.organisation DB table.
  • lib/aliada-links-discovery-application-client-2.0.jar : contains the class with the main method;
  • lib/*.jar : the libraries used by the Java application;

The /lib and /bin folders must be placed under the same folder, so that the shell script bin/links-discovery-task-runner.sh is able to find the *.jar libraries.

Linked Data Server

Datasource

The Linked Data Server module needs a datasource with a JNDI name "jdbc/aliada" defined in the Servlet Engine / Application Server. This is how you can do that in Tomcat (either in server.xml or in context.xml)

    <Resource 
	name="**jdbc/aliada**"
	auth="Container"
	type="javax.sql.DataSource"
	username="USERNAME HERE" password="PASSWORD HERE"
	driverClassName="com.mysql.jdbc.Driver"
	url="jdbc:mysql://HOST:PORT/aliada"
	maxActive="2"
	maxIdle="2"/> 

Data directory

Linked Data Server needs several files to configure Virtuoso for URI dereferencing. These files, which are located in resources folder, are the following:

  • isql_rewrite_rules_dataset_default: ISQL commands default file to execute to configure the URL rewrite rules for the dataset . This file name is specified in linkeddataserver_job_instances.isql_commands_file_dataset_default field. If the dataset.isql_commands_file_dataset field is null or it does not exist, linkeddataserver_job_instances.isql_commands_file_dataset_default field will be used.
  • isql_rewrite_rules_subset_default: ISQL commands default file to execute to configure the URL rewrite rules for the subset. This file name is specified in linkeddataserver_job_instances.isql_commands_file_subset_default field. If the subset.isql_commands_file_subset field is null or it does not exist, linkeddataserver_job_instances.isql_commands_file_subset_default field will be used.

The Linked Data Server module will create the dataset web page folder under the folder indicated by linkeddataserver_job_instances.virtuoso_http_server_root field of the corresponding job, and it will create here the index page of the dataset web. It also needs a working directory, indicated by linkeddataserver_job_instances.tmp_dir DB field, to store temporarily the organisation logo image. Afterwards, it will copied to the web page folder of the dataset. Because of this, the user who is running the Linked Data Server module (tomcat in our case) is supposed to have all permissions on those two folders.

Deployment

The Linked Data Server module is a standard JEE 6 web module so the deployment task is quite trivial. It effectively depends on the target Servlet Enginge / Application Server. First, download the war from here (link is not working because at time of writing the first milestone hasn't been reached) On Apache Tomcat you can just drop the war file onto the webapps folder. Once the server has been started you should see the following messages:

 ...
 14:24:53,140 INFO  [ApplicationLifecycleListener] <LinkedDataServer-00001> : LinkedDataServer is starting...  

CKAN Datahub Page Creation

Datasource

The CKAN Datahub Page Creation module needs a datasource with a JNDI name "jdbc/aliada" defined in the Servlet Engine / Application Server. This is how you can do that in Tomcat (either in server.xml or in context.xml)

    <Resource 
	name="**jdbc/aliada**"
	auth="Container"
	type="javax.sql.DataSource"
	username="USERNAME HERE" password="PASSWORD HERE"
	driverClassName="com.mysql.jdbc.Driver"
	url="jdbc:mysql://HOST:PORT/aliada"
	maxActive="2"
	maxIdle="2"/> 

Data directory

CKAN Datahub Page Creation needs the following file, located in resources folder:

  • dump_one_graph_nt.sql: ISQL commands file which contains the procedure to dump the triples of a graph in Virtuoso into a compressed file. This file name is specified in ckancreation_job_instances.isql_commands_file_graph_dump field.

CKAN Datahub Page Creation needs a working directory, indicated by ckancreation_job_instances.tmp_dir DB field of the corresponding job, to store temporarily the organisation logo image. Afterwards, it will copied to the web page folder of the dataset, indicated by ckancreation_job_instances.virtuoso_http_server_root DB field. It will also generate the dataset dump files and the VOID file describing the dataset in the folder indicated by ckancreation_job_instances.virtuoso_http_server_root field. Because of this, the user who is running the CKAN Datahub Page Creation module (tomcat in our case) is supposed to have all permissions on those two folders.

Deployment

The CKAN Datahub Page Creation module is a standard JEE 6 web module so the deployment task is quite trivial. It effectively depends on the target Servlet Enginge / Application Server. First, download the war from here (link is not working because at time of writing the first milestone hasn't been reached) On Apache Tomcat you can just drop the war file onto the webapps folder. Once the server has been started you should see the following messages:

 ...
 10:06:11,130 INFO  [ApplicationLifecycleListener] <CKANCreation-00001> : CKANCreation is starting...  

Step 6: Configure organization and datasets from admin area of UI of ALIADA tool

When ALIADA tool is already installed and running in the web application server, some data must be configured from the admin area of the the UI of ALIADA tool, before starting to import records.

The URL of the tool will be http://localhost:8080/aliada-user-interface-2.0/. After logging in as admin/admin, access to the "admin" area. Here, the organization can be configured (name, logo, catalog URL, etc.) and the dataset-subsets can be created.

To create a dataset, the following values must be provided, which correspond to the following aliada.dataset table fields:

  • dataset_desc: dataset name/short description.
  • domain_name: dataset domain name, e.g.: data.artium.org. This value will be configured by the systems administrator of the organization for the domain name to exist and to correspond with the port number specified in dataset.listening_hostfield.
  • uri_id_part: used to generate Identifier URI-s, e.g.: ”id”, URI: http://data.szepmuveszeti.hu/id/museumcollection/E18_Physical_Thing/szepmuveszeti.hu_object_29
  • uri_doc_part: used to generate Document URI-s, e.g.: ”doc”, URI: http://data.szepmuveszeti.hu/doc/museumcollection/E18_Physical_Thing/szepmuveszeti.hu_object_29 . It can be NULL, only if dataset.uri_concept_part is not NULL.
  • uri_def_part: used to generate the Ontology URI-s, e.g.: ”def”, URI: http://data.szepmuveszeti.hu/def/museumcollection
  • uri_concept_part: used in all URI types as a prefix to give a description of the dataset in the URI, e.g.: ”data”, URI: http://data.szepmuveszeti.hu/id/data/museumcollection/E18_Physical_Thing/szepmuveszeti.hu_object_29 . It can be NULL, only if dataset.uri_doc_part is not NULL.
  • uri_set_part: used to generate the subsets URI-s, e.g.: ”set” URI: http://data.artium.org/set/library/bib
  • listening_host: The address of the network interface the Virtuoso HTTP server uses to listen and accept connections. It will have the form of ":port_number". When creating a new dataset, it will never be located in the default port of Virtuoso indicated in “virtuoso.ini” file (8890 by default), because the default page of Virtuoso would be overridden. Possible values: 8891, 8892, 8893, 8894 ...
  • virtual_host: It will be the virtual host name that the browser presents as Host: entry in the request headers. i.e. Name-based virtual hosting. It will have the same value than dataset.domain_name.
  • sparql_endpoint_uri: SPARQL endpoint URI. The value of this field will be: http:// + dataset.virtual_host + “/sparql-auth”.
  • sparql_endpoint_login: SPARQL endpoint user name. The value of this field will be the "SPARQL-AUTH USERNAME" specified in Step 1: Install and configure Open Link Virtuoso.
  • sparql_endpoint_password: SPARQL endpoint password. The value of this field will be the "SPARQL-AUTH PASSWORD" specified in Step 1: Install and configure Open Link Virtuoso.
  • public_sparql_endpoint_uri: public SPARQL endpoint URI. The value of this field will be: http:// + dataset.virtual_host + “/sparql”.
  • ckan_dataset_name: dataset name in CKAN datahub. It cannot contain neither upper-case characters nor spaces. E.g.: datos-artium-org
  • dataset_long_desc: dataset long description to be used in CKAN datahub.
  • dataset_source_url: URL of the data source from where the dataset has been generated.
  • license_ckan_id: CKAN license identifier of the dataset to be published in CKAN datahub. E.g.: cc-zero. See http://opendefinition.org/licenses/ and http://datahub.io/api/action/license_list for possible values of the licences in CKAN Datahub.
  • license_url: license URL of the dataset to be published in CKAN datahub. E.g.: http://creativecommons.org/publicdomain/zero/1.0/
  • isql_commands_file_dataset: full path of the ISQL commands file to execute for the dataset. If it is null or it does not exist, the organisation.isql_commands_file_dataset_default field will be used.

When a dataset is created, the UI executes the ISQL commands file specified in organisation.isql_commands_file_dataset_creation DB field, which value is the file aliada_new_dataset.sql located in the resources folder of the UI module. This file is executed in the following way:

${organisation.isql_command_path} ${organisation.store_ip}:${organisation.store_sql_port} ${organisation.sql_login} ${organisation.sql_password} ${organisation.isql_commands_file_dataset_creation} -i -u lhost=${dataset.listening_host} vhost=${dataset.virtual_host}

E.g.:

/home/virtuoso/bin/isql localhost:1111 dba dba /home/aliada/bin/aliada_new_dataset.sql -i -u lhost=:8892 vhost=data.artium.org

To create a subset, the following values must be provided, which correspond to the following aliada.subset table fields:

  • subset_desc: subset name/short description.
  • uri_concept_part: used in all URI types as a prefix to give a description of the subset in the URI, e.g.: ”museumcollection”, URI: http://data.szepmuveszeti.hu/id/data/museumcollection/E18_Physical_Thing/szepmuveszeti.hu_object_29 . It can be NULL.
  • graph_uri: URI of the graph in Virtuoso where the generated RDF triples are saved. The subset.links_graph_uri field will be updated automatically with the following value: subset.graph_uri + “/links”.
  • isql_commands_file_subset: full path of the ISQL commands file to execute for the subset. If it is null or it does not exist, the organisation.isql_commands_file_subset_default field will be used.

When a subset is created, the UI creates the following two graphs in Virtuoso:

  • a graph where the generated RDF triples are saved, which name with is the value of the field subset.graph_uri.
  • a graph for saving the discovered links, which name is subset.graph_uri + “/links”.

Download links