Service for entity extraction
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
data
.gitignore
Dockerfile
LICENSE
README.md
conf.py
entity_service.py
inlinks.py
launcher.sh
ner.py
requirements.txt

README.md

Spanish Entity Extraction Service

Named Entity Recognition. Uses a dictionary for phrases to be considered entities. Those phrases have been extracted from DBpedia. A final user can add its own.

A docker image with this modules as it is here is available at https://hub.docker.com/r/mixedemotions/08_entity_extraction_pt/

Installation

Python 3 is necessary to run the service. Required libraries are:

  • tornado

Starting and stopping the service

There is a script that starts and stops the service with the desired configuration:

./launcher.sh start
./launcher.sh stop

This same script contains the configuration for running the service with a given number of separated processes, using the port range as defined.

Configuring the service

The service needs a set of entities with inlinks in the data/pagelinlks_all.tsv file. In that file, each line is an entry consisting on a phrase to be detected and an inlink number, which is the number of articles pointing to that entity.

Additionaly, the inlinks threshold must be set in the conf.py file as INLINKS_THRESHOLD. The default value is 400. Entities with a inlinks count below inlinks threshold will be ignored.

Calling the service

This service admits both GET and POST requests.

Calling the service via GET

An example call would be:

http://[ip_address]:[port]/?text=EEUU%20cierra%20la%20investigación%20sobre%20las%20torturas%20de%20la%20CIA%20sin%20acusados

The inlinks_threshold can also be set in the query.

http://[ip_address]:[port]/?inlinks_threshold=100&text=EEUU%20cierra%20la%20investigación%20sobre%20las%20torturas%20de%20la%20CIA%20sin%20acusados

This call returns a JSON object with the elapsed time, the detected entities and some additional information.

The response is a json map. E.g.:

{
  concepts: [
    "cristiano ronaldo",
    "messi"
  ]
}

Calling the service via POST

The POST expects a text per line.

The response is a json with an item per entry in the body.

{
  "response": [
    {
      "text": "alguien como Einstein\r",
      "concepts": []
    },
    {
      "text": "nada por aquí\r",
      "concepts": []
    },
    {
      "text": "otra frase",
      "concepts": []
    }
  ]
}

Creating a Docker Image

For creating a docker image, just configure your own data files (pagelinks_all.tsv and stopwords.txt) and set up an INLINKS_THRESHOLD in conf.py. Then execute:

docker build -t {name} .

Acknowledgement

This module was developed by Paradigma Digital. This development has been partially funded by the European Union through the MixedEmotions Project (project number H2020 655632), as part of the RIA ICT 15 Big data and Open Data Innovation and take-up programme.

MixedEmotions

EU

http://ec.europa.eu/research/participants/portal/desktop/en/opportunities/index.html