Skip to content
This repository has been archived by the owner on Oct 20, 2018. It is now read-only.

Glossary

sandroacoelho edited this page Jul 26, 2013 · 3 revisions

Table of Contents

C

  • Context: the context refers to the "the parts of something written or spoken that immediately precede and follow a word or passage and clarify its meaning." source

O

  • OntologyClass: an ontology class represents a set of resources sharing similar characteristics. Resources can be of several types: Person, Organisation, Location, FloweringPlant, etc. All of these classes are organized in a domain model (i.e. schema, ontology). The "type" or the "ontology class" of a resource comes from this ontology.

P

Phrase Recognition: See Spotting.

R

  • Resource: a resource is any entity or concept in our target knowledge base (e.g. DBpedia). We take this name from RDF (Resource Description Framework), as a generic name for things, concepts, ideas "that can be identified on the Web, even when they cannot be directly retrieved on the Web."

S

  • Spotting: We call Spotting or Phrase Recognition the task of selecting, from some textual document given as input, phrases that should be annotated by the system. This is closely related to Keyphrase Extraction and Named Entity Recognition, for instance. In Keyphrase Extraction, the system tries to guess the "important" phrases, according to some definition of importance. Meanwhile, in Named Entity Recognition, the system focuses on specific entity types (commonly Person, Location and Organization), and the notion of importance is usually irrelevant. We describe some of these and several other strategies for phrase recognition below.
  • SurfaceForm: a surface form is the phrase used to refer to a resource in text. For example: "Barack Obama", "President Obama" and "Obama" are all surface forms for the resource `dbpedia:Obama`.

T

  • Token: each individual element extracted after tokenizing the text more. Tokens are the individual words in the context, or slightly modified versions of these words (e.g. running -> run)
  • Topic: a topic is a broad categorization of knowledge into areas of interest. For example, text can belong to Business, Politics, Sports or Arts topics.
Clone this wiki locally