ConedaKOR
ConedaKOR is a web based application which allows you to store arbitrary documents and interconnect them with relationships. You can build huge semantic networks for an unlimited amount of domains. This integrates a sophisticated ontology management tool with an easy to use media database.
Table of Contents
- Table of Contents
- Features
- User documentation
- Changelog
- License
- Requirements
- Installation
- Configuration
- Customization
- Logging
- Backups
- Authentication
- Interfaces
- Generating a virtual appliance
- Generating docker images
- Command line tool
- Development
Features
- Instead of filling countless lists with your metadata, shape it as entities within a graph ... never repeat yourself!
- Add relationships between your entities
- A carefully designed user interface
- Upload any kind of media (pictures, video, spreadsheets, …), also many at a time
- Images and videos are automatically converted for playback on the web
- Define which kind of entities can be related by what relations
- Put your entities in one or many groups and share them with other users
- A Fine-grained permission system with user groups and entity collections
- Easy extension of the schema for every kind of entity: Add fields for all entities of a specific kind or occasionally add data to arbitrary entities
- Tagging with autocomplete and sensible permissions
- Full text search through all your metadata
- A rich API facilitating additional frontends and data harvesting
- Excel import and export
- Deliver one-click zip downloads to your users
- Identify isolated entities
- Merge entities to further normalize your data
- External authentication (for example LDAP) by simple shell scripts
- Easy identifier management and resolution
- Many configurable aspects (welcome page, terms of use, help, primary relations, brand, …)
- Access data via an OAI-PMH interface
- Vagrant dev environment
- good unit and integration test coverage
- checked for security problems with brakeman
- support for using Erlangen CRM and similar standards as basis for your ontology (including a convenient OWL import tool)
User documentation
Please check out our user guide (German, for version 3)
Changelog
We keep it updated at CHANGELOG.md
License
See file COPYING
Requirements
- ruby 2.4.4 (it should also run with slightly older or more recent versions but we only tested with 2.4.4)
- mysql server (>= 5.5)
- elasticsearch (>= 5.0.0, < 6.0.0)
- web server (optional but highly recommended)
- neo4j (optional)
Installation
Please refer to INSTALL.md
Configuration
Configuration on the server such as database connection information and data directory can all be set as environment variables. For example, to set a database string, you may put the following in your apache VirtualHost:
SetEnv "DATABASE_URL" "mysql2://<user>:<password>@..."
If you prefer file based configuration, you may also create a file .env
in
the kor application directory (where config.ru is located) and add the settings
there like in a bash script, for example
export DATABASE_URL="mysql2://<user>:<password>@..."
export
is optional but it allows to source this file with bash, should the
need arise.
For a list of configuration options, please refer to .env.example (of the ConedaKOR version deployed).
Customization
All customization can be done via the web interface. As an admin, follow the "Settings" navigation link.
Logging
Log messages are sent to log/production.log
. The log level can be
configured in config/environments/production.rb
.
Backups
Backups consist of
- a database dump file
- a the configuration file
.env
(if you use one, see above) - a copy of the data directory
- a reference to the version of ConedaKOR
All other parts of the installation directory are either considered source code or temporary. Temporary files can be regenerated via a task. If you made modifications to the source code, you may have to backup those as well. We recommend to only modify the source code if those modifications are embedded within a development process that includes regular reconsiliation with upstream.
For a consistent backup, you should aim to create the dump and file copies at
the same point in time (or at least be confident that little changes happened in
between). The dumpfile is created with mysqldump and the files can simply be
copied with cp
or rsync
.
If you used the scripted install, it creates a "shared" directory and symlinks the data and the configuration from that directory to the current deployment's directory. In this case, it is sufficient to backup the shared folder and to create a database dump.
Restore
To restore from a previous backup
- restore data (and potentially the config file) from the backup
- import the database dump and modify the configuration accordingly
- deploy the relevant version of ConedaKOR
- refresh the elasticsearch index (if you are using elasticsearch):
RAILS_ENV=production bundle exec bin/kor index-all
Authentication
On new installations, the default user is admin
with password admin
. He has
all permissions to grant permissions and administer the installation. Also, he
is member of the Administrators
credential which allows him to see and edit
all data.
Authentication is performed via a web form at http://kor.example.com#/login or
by providing a valid api_key
as GET
or POST
parameter. You may also
specify a header api-key
containing the key (make sure not to use an
underscore). Every user has a key which can be looked up by an administrator on
the user's administration page.
In order to be able to create user accounts, a user needs the User admin
role.
Unauthenticated access
If you create a user with the username guest
, every unauthenticated access
will be treated as if user guest
had made it. This gives enables you to define
exactly what permissions to grant in this case. If it doesn't exist, the app
will require authentication.
Permission inheritance
You may enter a parent username for every user. This way, the user will also be able to all the parts of the application his parent has access to.
External authentication
Additionally, one or more scripts may be written to carry out authentication with the credentials provided by the user from the login form. This allows flexible authentication via external sources such as PAM, LDAP or ActiveDirectory.
Internal users take preceedence over users authenticating via a script.
The script can be written in the language of your choice. Username and password
are passed on to it through two environment variables KOR_USERNAME_FILE
and
KOR_PASSWORD_FILE
which indicate files where the values can be extracted from.
The script is expected to terminate with exit code 0 if authentication was
successful and 1 otherwise. In the positive case, a valid JSON hash has to be
written to STDOUT. The hash must contain attributes to create/update the user
record. Only 'email' is required. A user record is created unless the username
exists.
To activate the script as authenticator, configure it within your environment.
Have a look at .env.example
for examples and description of the available
options.
Example authenticator script simple_auth.sh
:
#!/bin/bash
KOR_USERNAME=`cat $KOR_USERNAME_FILE`
KOR_PASSWORD=`cat $KOR_PASSWORD_FILE`
if [ "$KOR_USERNAME" == "jdoe" ] && [ "$KOR_PASSWORD" == "mysecret" ] ; then
echo "{\"email\": \"jdoe@example.com\"}"
else
echo "{}"
exit 1
fi
The authentication system might need to create users when they authenticate via
external sources. The above configuration would create new users and set their
parent to simple_user
or ldap_user
depending on which authentication source
succeeded. This allows you to grant default permissions to new users.
Authentication via request env
It is also possible to configure variables to be used for authentication based
on the request environment. A common use case are Apache authentication modules
that set the REMOTE_USER
environment variable. Have a look at .env.example
for examples and description of the available options.
This may be combined with script based authentication sources. Authentication is
only triggered on GET /env_auth
which falls back to the login form if the
environment authentication was not successfull.
If you configure any env auth sources, a button will appear above the login form
to notify users of that possibility. If they choose to use it, they are
redirected to /env_auth
where the magic happens. The label for the button can
be customized via the web interface.
Interfaces
OAI-PMH
ConedaKOR spawns four OAI-PMH endpoints for entities, kinds, relations and relationships:
- http://kor.example.com/oai-pmh/entities?verb=Identify
- http://kor.example.com/oai-pmh/kinds?verb=Identify
- http://kor.example.com/oai-pmh/relations?verb=Identify
- http://kor.example.com/oai-pmh/relationships?verb=Identify
Please refer to the OAI-PMH specification for further information on available verbs and on how to use them.
The api will retrieve entities and relationships according to the authenticated user's permissions. Kinds and relations are available without authentication. Please check out Authentication for how to use an api key.
Two formats are available: oai_dc
and kor
. While the former is only
maintained to fulfill the OAI-PMH specification, the latter gives full access to
all content within the ConedaKOR installation. According to specification, you
must choose the format as a request parameter metadataPrefix=kor
. The
kor format adheres to a schema that is included in ConedaKOR. It can be found at
https://kor.example.com/schema/1.0/kor.xsd
as part of every installation (version 2.0.0 and above). We will add new versions, should the need arise.
REST (JSON)
This API is undergoing a lot of change. This is why we are not showing all of the possible requests here. Instead, we'll just listing the ones that we hope will not change anymore in the forseeable future.
The request paths all end with .json
to hint the desired JSON content type for
the response. But also the request body has to be in the JSON format and the
request's content-type
header has to be application/json
Have a look at Authentication to see how you can provide authentication credentials.
In general, there are three types of responses:
Requests that retrieve a single record will always answered with a simple object containing just that record, e.g.
GET /kinds/1.json
{
"id": 123,
"name": "person",
"plural_name": "people",
...
}
Requests that retrieve a series of records (resultsets) will always be
answered with the objects themselves but also the total number of records, the
current page and the amount of records per page. Sometimes not full records are
returned but only their ids in which case the records
array will be empty and
there will be an ids
array instead, e.g.
GET /kinds.json
{
"records": [...],
"total": 120,
"per_page": 10,
"page": 7
}
Requests that modify a record will always be answered with the modified record as well as a message indicating the modification applied. Also the response code will reflect a successful change (200) or incorrect new data (422). This applies to create (POST), update (PATCH) and destroy (DELETE) requests, e.g.
POST /kinds.json
with JSON {"kind": {"name": "person", "plural_name": "people"}}
{
"message": "the kind 'person' has been created",
"record": {
"id": 123,
"name": "person",
"plural_name": "people",
...
}
}
GET /kinds.json
: returns array of all kindsGET /kinds/1.json
: returns kind with id 1GET /relations.json
: returns array of all relationsGET /relations/1.json
: returns relation with id 1GET /entities.json
: search for entities, returns only viewable content, returns resultset of entitiesterms
: searches for entities with matching name or synonyms (uses the elasticsearch index)relation_name
: limits to entities that can be used as targets for the given relation namekind_id
: limits to entities that are of the given kindinclude_media
: whether to include media entities (default: false)include
: a list of aspects to include within each entity, comma separated, choose one or more oftechnical
,synonyms
,datings
,dataset
,properties
,relations
,media_relations
,related
,kind
,collection
,user_groups
,groups
,degree
,users
,fields
,generators
andall
.page
: requests a specific page from the resultset (default: 1)per_page
: sets the page size (default: 10, max: 500)related_kind_id
: sets a filter for kind ids on related entitiesrelated_relation_name
: sets a filter for relation names on related entitiesrelated_per_page
: sets the page size for related entities [default: 1, max: 4]
GET /entities/1.json
: returns the entity with id 1, requiresview
permissions for that entityinclude
: see parameters for/entities.json
related_kind_id
: see parameters for/entities.json
related_relation_name
: see parameters for/entities.json
related_per_page
: see parameters for/entities.json
[different max of 500]
GET /relationships.json
: returns the relationships, returns only viewable content, returns resultset of directed relationshipsfrom_entity_id
: limits by the source entity, comma-separatedto_entity_id
orentity_id
: limits by the target entity, comma-separatedrelation_name
: limits by relation name, comma-separatedfrom_kind_id
: limits by the source's kind, comma-separatedto_kind_id
: limits by the target's kind, comma-separatedpage
: requests a specific page from the resultset (default: 1)per_page
: sets the page size (default: 10, max: 500)
POST /wikidata/import
: imports a wikidata item. This also creates potential relationships towards previously imported itemsid
: the wikidata id of the item to import e.g.Q762
kind
: the kind of the new entity, e.g.person
collection
: the collection where to create the new entity (the 'create' right is required within that collection)locale
: the locale to import from wikidata, defaults toen
POST /wikidata/preflight
: simulates the previous method without actually changing any data
Be aware that, if you are requesting related entities to be embedded within other entities, those are embedded as a list of directed relationships which in turn contain the entity itself.
Generating a virtual appliance
Versions after and including 1.9.2 can be packaged into a virtualbox appliance automatically. The version is specified as a shell parameter:
./deploy/build.sh v1.9.2
The ova file and a checksum are generated within deploy/build/
. Instead of
v1.9.2
you may choose any tag or branch available in the repository, although
very old versions could not work because of unsatisfiable dependencies. Make
sure you have pulled the most recent commits when using branches!
Generating docker images
deploy/dockerize.sh can be used to build docker images for commits >= v2.0.0.
Because of a different set of tools required for each environment, it has to be
`selected when building the images. For example:
./deploy/dockerize.sh master production
Will build a production image based on the master branch. You may also base images on tags or commits.
Command line tool
The kor command provides access to functionality which is not easily provided from a web page. For example, the excel export potentially generates many large files which are impractical to download. You may call the command like this
bundle exec bin/kor --help
from within the ConedaKOR installation directory to obtain a detailed description of all the tasks and options.
Excel import and export
Please refer to the command line tool for available command line options. In principle, the export produces seceral spreadsheets containing all entities. Those sheets may be modified and imported later on.
- identification columns (id and uuid) are not imported: they are only used to identify existing records on imports. Leave empty when adding new data.
- when creating new records, you will have to fill in at least the columns for
kind_id, collection_id and name (or no_name_statement). For the serialized
columns, please use their "natural" empty value if you don't use them. So
for dataset
{}
, for properties{}
for synonyms:[]
for datings:[]
. - the deleted column is not imported: enter any non-empty value in order to delete the entity on import.
- timestamps are not imported: they will be changed if the entity will be changed by the import.
Importing Erlangen CRM classes
The task will import all classes from http://erlangen-crm.org/ontology/ecrm/ecrm_current.owl, documented by http://erlangen-crm.org/docs/ecrm/current/index.html into the installation as entity types. The types will be set up according to their hierarchy and they will be set to be "abstract" which prevents them from showing up in the interface.
Rebuilding elastic index
Sometimes the elasticsearch index has to be rebuilt from scratch. This is done like so:
bundle exec bin/kor index-all
Development
The easiest way to get started hacking on kor, is to use the included vagrant test environment. Make sure vagrant and VirtualBox are installed:
- install VirtualBox (https://www.virtualbox.org/)
- install vagrant (https://www.vagrantup.com/)
Also, install the guest additions plugin:
vagrant plugin install vagrant-vbguest
Then bring up the vagrant VM:
vagrant up dev
SSH into the resulting VM and start the KOR development server:
vargant ssh
cd /vagrant
...
bundle exec rails s -b 0.0.0.0
This uses the code from the current working directory on your dev machine. Go to
http://localhost:3000 with your browser to see the development page. As with all
new installations of ConedaKOR, you can login with user admin
and password
admin
.
Running the test suites
There are two test suites, rspec unit tests and cucumber integration tests. Change to the /vagrant directory within the dev VM first and then run the unit tests:
bundle exec rspec spec/
or the integration tests:
bundle exec cucumber features/
Be aware that this will spawn a real browser to conduct the tests, If you prefer headless testing, you may use phantomjs by setting an environment variable:
HEADLESS=true bundle exec cucumber features/
Some tests have external dependencies and are therefore unreliable. We excluded them from the default test suite. To force running them, set an environment variable like this:
KOR_BRITTLE=true bundle exec rspec spec/
Coverage reports
You may run rspec or cucumber tests with the COVERAGE
environment variable
set, which will generate a coverage report to ./coverage
. For example:
COVERAGE=true HEADLESS=true bundle exec cucumber features/
Profiling
ConedaKOR will generate a detailed per-action profile when the environment variable PROFILE is set, for example in development:
PROFILE=true bundle exec rails s
The reports will be generated in ./tmp/profiles
Showing media in development
In the development environment, images are not being shown. Instead, a icon representing the medium's content_type is displayed. If you'd like to see the actual images nevertheless, use
SHOW_MEDIA=true bundle exec rails s