[Document/NLP preprocessing] Part-of-Speech Tagger for medical domain corpus in Spanish based on FreeLing.
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
Med_Tagger
Dockerfile Added files for FreeLing Oct 24, 2018
GPL.LICENSE Update GPL.LICENSE Jan 11, 2019
MIT.LICENSE Rename LICENSE to MIT.LICENSE Oct 26, 2018
README.md Update README.md Jan 17, 2019
compila_freeling.sh Minor changes Jan 17, 2019
config.cfg Added files for FreeLing Oct 24, 2018
llamada_freeling.sh Minor changes Jan 17, 2019
singlewords.dat
splitter.dat
tokenizer.dat Update tokenizer.dat Jan 17, 2019
usermap.dat Added files for FreeLing Oct 24, 2018

README.md

SPACCC_POS-TAGGER: Spanish Clinical Case Corpus Part-of-Speech Tagger

Digital Object Identifier (DOI)

https://doi.org/10.5281/zenodo.2542812

Introduction

This repository contains the Part-of-Speech Tagger for medical domain corpus in Spanish based on FreeLing3.1. It also contains the Python wrapper for this software, aiming at easier use.

Demo

Here you can find a demonstration of the Part-of-Speech Tagger: http://temu.bsc.es/pos/

Prerequisites

To use the SPACCC_POS-TAGGER, the following resources are required:

Directory structure

  • compila_freeling.sh: compiles the adapted FreeLing3.1 docker image
  • config.cfg: FreeLing configuration file
  • Dockerfile: Dockerfile for image compilation
  • llamada_freeling.sh: Script to execute the analysis of a text with the adapted FreeLing
  • README.md: This file
  • singlewords.dat: File with the normalized resources (words, acronyms, and abbreviations)
  • splitter.dat: Sentence segmentantion rules
  • tokenizer.dat: Tokenization rules
  • usermap.dat: Rules for POS assignment (regular expressions)
  • Med_Tagger: Folder containing the Python wrapper for this tool

Usage

To compile the adapted FreeLing3.1 docker image, the following command (from this directory) has to be executed:

$> bash compila_freeling.sh

The result will be the docker image med-tagger:1.0.0

Examples

To execute the program, given a text, one can use the following command:

$> echo 'Este es un texto de prueba.' | bash llamada_freeling.sh

Performance

Gold standard vs Tagger ACC
Splitting 98,85%
Tokenization 99,47%
Part-of-Speech 99,87%

Python wrapper

Check the Med_Tagger folder inside this directory

Contact

Felipe Soares (felipe.soares@bsc.es)

License

FreeLing is licensed under GPL (https://www.gnu.org/licenses/gpl-3.0.en.html).

The modules or linguistic data that have been developed in this project are licensed undet MIT (https://opensource.org/licenses/MIT).