Skip to content
Switch branches/tags
Go to file

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Spanish Entity Extraction Service

Named Entity Recognition. Uses a dictionary for phrases to be considered entities. Those phrases have been extracted from DBpedia. A final user can add its own.

A docker image with this modules as it is here is available at


Python 3 is necessary to run the service. Required libraries are:

  • tornado

Starting and stopping the service

There is a script that starts and stops the service with the desired configuration:

./ start
./ stop

This same script contains the configuration for running the service with a given number of separated processes, using the port range as defined.

Configuring the service

The service needs a set of entities with inlinks in the data/pagelinlks_all.tsv file. In that file, each line is an entry consisting on a phrase to be detected and an inlink number, which is the number of articles pointing to that entity.

Additionaly, the inlinks threshold must be set in the file as INLINKS_THRESHOLD. The default value is 400. Entities with a inlinks count below inlinks threshold will be ignored.

Calling the service

This service admits both GET and POST requests.

Calling the service via GET

An example call would be:


The inlinks_threshold can also be set in the query.


This call returns a JSON object with the elapsed time, the detected entities and some additional information.

The response is a json map. E.g.:

  concepts: [
    "cristiano ronaldo",

Calling the service via POST

The POST expects a text per line.

The response is a json with an item per entry in the body.

  "response": [
      "text": "alguien como Einstein\r",
      "concepts": []
      "text": "nada por aquí\r",
      "concepts": []
      "text": "otra frase",
      "concepts": []

Creating a Docker Image

For creating a docker image, just configure your own data files (pagelinks_all.tsv and stopwords.txt) and set up an INLINKS_THRESHOLD in Then execute:

docker build -t {name} .


This module was developed by Paradigma Digital. This development has been partially funded by the European Union through the MixedEmotions Project (project number H2020 655632), as part of the RIA ICT 15 Big data and Open Data Innovation and take-up programme.




Service for entity extraction




No releases published


No packages published