Python package to evaluate the quality of semantic engines
Python
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
semaeval
.gitignore
LICENSE
MANIFEST.in
README.md
config_template.yml
setup.py

README.md

About

Semaeval is a python package to evaluate the quality of semantic engines. We support evaluation of the following engines:

Semaeval also offers tools to get example texts from the following sources:

Installation

Semaeval uses Python 2.7 and needs numpy and matplotlib installed. To install these dependencies you can do the following

  • Ubuntu/Debian:

      apt-get update
      apt-get install g++ git
      apt-get install python-dev python-pip python-numpy python-matplotlib
    
  • Fedora/CentOS:

      yum update
      yum install gcc-c++ git
      yum install python-devel numpy python-matplotlib
    
  • Mac OS X with Homebrew:

      brew update
      brew install git
      brew install python
      brew tap homebrew/python
      brew install numpy
      brew install matplotlib
    
  • Inside a virtual environment with Python 2.7 and pip/setuptools installed:

      pip install numpy matplotlib
    

Then clone the git repository and execute the following command in the repository folder:

python setup.py install

This will install the semaeval package and all the other necessary dependencies into your python environment.

To use the package you have to create a file config.yml holding the configuration for all the semantic engines you want to use. To do this copy the template file config_template.yml and edit it with an editor of your choice:

cp config_template.yml config.yml
nano config.yml

In the config file you will find configuration options for all the supported engines:

folder_input: "input"
folder_output: "output"
# categories can be one of these:
# PERSON, GEO, ORG, PRODUCT, EVENT, KEYWORD, FUNCTION, NUMBER, DATE, CURRENCY, URL, QUOTE
categories:
  - PERSON
  - GEO
  - ORG
engines:
  simple:
    labels:
      "GPE": "GEO"
      "LOCATION": "GEO"
      "ORGANIZATION": "ORG"
      "PERSON": "PERSON"
      "FACILITY": "KEYWORD"
      "GSP": "ORG"
#  alchemy:
#    key:
#    # see http://www.alchemyapi.com/api/entity/types
#    labels:
#      "City": "GEO"
#      "Facility": "GEO"
#      "StateOrCounty": "GEO"
#      "Country": "GEO"
#      "Region": "GEO"
#      "Continent": "GEO"
#      "GeographicFeature": "GEO"
#      "Person": "PERSON"
#      "Company": "ORG"
#      "Organization": "ORG"
#      "PrintMedia": "ORG"
#      "JobTitle": "FUNCTION"
#      "Quantity": "NUMBER"
#      "SportingEvent": "EVENT"
#      "Drug": "KEYWORD"
#      "HealthCondition": "KEYWORD"
#      "FieldTerminology": "KEYWORD"
#      "Sport": "KEYWORD"
#      "Technology": "KEYWORD"
#      "EntertainmentAward": "KEYWORD"
#      "Holiday": "EVENT"
#      "TelevisionStation": "ORG"
#      "Crime": "KEYWORD"

To activate e.g. alchemy uncomment the alchemy section in the configuration file and enter your alchemy key like

folder_input: "input"
folder_output: "output"
# categories can be one of these:
# PERSON, GEO, ORG, PRODUCT, EVENT, KEYWORD, FUNCTION, NUMBER, DATE, CURRENCY, URL, QUOTE
categories:
  - PERSON
  - GEO
  - ORG
engines:
  simple:
    labels:
      "GPE": "GEO"
      "LOCATION": "GEO"
      "ORGANIZATION": "ORG"
      "PERSON": "PERSON"
      "FACILITY": "KEYWORD"
      "GSP": "ORG"
  alchemy:
    key: 1234567890
    # see http://www.alchemyapi.com/api/entity/types
    labels:
      "City": "GEO"
      "Facility": "GEO"
      "StateOrCounty": "GEO"
      "Country": "GEO"
      "Region": "GEO"
      "Continent": "GEO"
      "GeographicFeature": "GEO"
      "Person": "PERSON"
      "Company": "ORG"
      "Organization": "ORG"
      "PrintMedia": "ORG"
      "JobTitle": "FUNCTION"
      "Quantity": "NUMBER"
      "SportingEvent": "EVENT"
      "Drug": "KEYWORD"
      "HealthCondition": "KEYWORD"
      "FieldTerminology": "KEYWORD"
      "Sport": "KEYWORD"
      "Technology": "KEYWORD"
      "EntertainmentAward": "KEYWORD"
      "Holiday": "EVENT"
      "TelevisionStation": "ORG"
      "Crime": "KEYWORD"

If the content of your configuration file is like above, you have activated two engines for evaluation: simple (which simply uses Python NLTK, this engines serves mainly as a baseline comparison) and alchemy. For both engines label conversion is also configured. This is necessary to allow comparison of the different results of each engine.

Getting started

You can use the semaeval package as follows:

>>> import semaeval.source.welt as welt
>>> import semaeval.evaluate as eval
>>> import semaeval.statistics as stats
>>> articles = welt.articles_from_feed()     # get the latest 20 articles from Welt RSS feed
Wed, 08 Apr 2015 16:59:23 +0200
http://www.welt.de/article139215840.html
Wed, 08 Apr 2015 14:34:17 +0200
http://www.welt.de/article139250110.html
...
>>> articles_enriched = eval.detect_entities(articles,"de")  # this takes quite a while (depending on the number of configured engines and the number of articles)
URL: http://www.welt.de/article139215840.html
Collecting results:
semaeval.engine.bitext.bitext
semaeval.engine.meaningcloud.meaningcloud
semaeval.engine.simple.simple
semaeval.engine.linguasys.linguasys
semaeval.engine.basistech.basistech
semaeval.engine.semant.semant
semaeval.engine.alchemy.alchemy
semaeval.engine.txtrazor.txtrazor
semaeval.engine.retresco.retresco
...
>>> data = stats.collect_data(articles_enriched)    # Compute precision and recall for each category for each article and each engine
>>> results = stats.aggregate_result(data)          # computing mean values for each category (PERSON, GEO, ORG, ...)
>>> results.extend(stats.compute_total(results))    # compute an overall total average over all categories and add this TOTAL to the results
>>> stats.plot_results(results)                     # show a plot of the results

Contact

Andreas Maier (andreas.maier@asideas.de).