Skip to content

Latest commit

 

History

History
298 lines (271 loc) · 9.99 KB

servicesformat.rst

File metadata and controls

298 lines (271 loc) · 9.99 KB

The EUROSENTIMENT format for services and corpora

The Eurosentiment format is an extension of the NIF format data model for use in Sentiment Analysis. However, NIF and the Eurosentiment differ in one respect: Eurosentiment sets JSON-LD as its primary serialisation format, whereas NIF defaults to XML+RDF or turtle. It includes properties from Marl, Onyx and other ontologies that complement those in NIF for sentiment and emotion tagging. However, NIF and the Eurosentiment differ in one respect: Eurosentiment sets JSON-LD as its primary serialisation format, whereas NIF defaults to XML+RDF or turtle.

JSON-LD is a subset of JSON that makes it possible to embed semantic information in plain JSON objects. It retains full compatibility with JSON while adding useful information.

By using this serialisation format, Eurosentiment targets both semantic web developers and traditional developers alike.

Overview

{
  "@context": [
    "http://demos.gsi.dit.upm.es/eurosentiment/static/context.jsonld",
],
"@id": {{ processID }},
"analysis": [
  {
    "@id": {{ analysisID }},
    "@type": [
      {{ analysisType }}
    ],
    "prov:wasAssociatedWith": {{ agent }},
    "dc:language": {{ language}},
    "marl:maxPolarityValue": {{ minValue }},
    "marl:minPolarityValue": {{ maxValue }}
  }
  [...]
],
"domain": {{ domain }},
"entries": [
  {
    "@id": {{ entry_id }},
    "dc:subject": {{ topic }},
    "emotions": [
      {
        "prov:generatedBy": {{ analysisID }},
        "onyx:hasEmotion": [
          {
            "onyx:hasEmotionCategory": {{ emotions[i].category }},
            "onyx:intensity": {{ emotions[i].emotion_intensity }}
          },
          [...]
        ]

      }
      [...]
    ],
    "opinions": [
      {
        "prov:generatedBy": {{ analysisID }},
        "marl:polarityValue": {{ opinions[i].polarityValue }},
        "marl:hasPolarity": {{ opinions[i].polarity }},
        "marl:describesObject": {{ opinions[i].described_object }},
      },
      [...]
    ],
    "nif:isString": {{ string_representation }},
    "strings": [
      {
        "nif:anchorOf": {{ strings[i].value }},
        "itsrdf:taIdentRef": {{ strings[i].entity }},
        "nif:posTag": {{ strings[i].posTag }},
        "nif:lemma": {{ strings[i].lemma }}
      },
      [...]
    ]
  },
  [...]
]
}
processID

Is the ID of the process that gathered the results.

domain

Domain detected in the entries, or used by the analysis

analysis

A set of results can be produced by combining the results from several analysis processes. Each of them needs to be described here.

analysisID

Each of the analysis needs an unique URI so that the generated opinions/emotions can be linked to it. A set of results may aggregate the results from independent analysis (e.g. a sentiment analysis and an emotion analysis)

analysisType

Example: marl:SentimentAnalysis or onyx:EmotionAnalysis

algorithm

[In marl] Algorithm that was used to generate the results

agent

Responsible for or creator of the analysis

language

Language that the analysis uses. e.g. "es"

minValue

[In marl opinions] Minimum value of the opinion value

maxValue

[In marl opinions] Maximum value of the opinion value

domain

Domain where the analysis was run. e.g. wnd:electronics

entry_id

Each entry must have a unique URI

topic

The subject or subjects of the entry. e.g. wnd:electronics

emotions

The emotions found in the context. Depending on the theory of emotions used, emotions can be categorised and/or be defined by different dimensions. This example represents the usual case which is a model using categories.

category

Category of the emotion. e.g. wna:Hatred

emotion_intensity

Intensity of the emotion as defined by the algorithm

opinions

The opinions found in the context.

polarity

Polarity of the opinion. e.g. marl:Positive

polarityValue

Numerical value of the polarity, as a floating point

described_object

Object that the opinion is about

string_representation

Plain text representation

strings

A NIF context can be subdivided in substrings, which have their own properties. This is usually done to associate a particular string with an entity in Named Entity Recognition

strings[i].value

Text representation

strings[i].entity

Entity the string represents

strings[i].posTag

Part-of-speech tag

strings[i].lemma

Lemma of the word

Context

The JSON-LD context contains semantic information about the properties in the JSON document, including convenient prefixes or namespaces. The Eurosentiment context would look like this:

{
  "@context": {
      "dc": "http://purl.org/dc/terms/",
      "dc:subject": {
        "@type": "@id"
      },
      "emotions": {
        "@container": "@list",
        "@id": "onyx:hasEmotionSet",
        "@type": "onyx:EmotionSet"
      },
      "marl": "http://www.gsi.dit.upm.es/ontologies/marl#",
      "nif": "http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#",
      "onyx": "http://www.gsi.dit.upm.es/ontologies/onyx#",
      "opinions": {
        "@container": "@list",
        "@id": "marl:hasOpinion",
        "@type": "marl:Opinion"
      },
      "prov": "http://www.w3.org/ns/prov#",
      "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
      "analysis": {
        "@id": "prov:wasInformedBy"
      },
      "entries": {
        "@id": "prov:generated"
      },
      "strings": {
        "@reverse": "nif:hasContext",
        "@type": "nif:String"
      },
      "wnaffect": "http://www.gsi.dit.upm.es/ontologies/wnaffect#",
      "xsd": "http://www.w3.org/2001/XMLSchema#"
  }
}

Examples

* Annotating one entry using a fictitious service (http://example.com/analyse) provided by http://example.com. Input: "My ipad is an awesome device". .. code-block:: javascript

{
"@context": [

"http://demos.gsi.dit.upm.es/eurosentiment/static/context.jsonld"

], "results": { "analysis": [ { "@id": "http://example.com/analyse", "@type": [ "marl:SentimentAnalysis" ], "dc:language": "en", "marl:maxPolarityValue": 10.0, "marl:minPolarityValue": 0.0 "prov:wasAssociatedWith": "http://example.com" } ], "entries": [ { "@id": "http://example.com/analyse?input=My%20ipad%20is%20an%20awesome%20device", "opinions": [ { "marl:polarityValue": 9, "marl:hasPolarity": "marl:Positive", "marl:describesObject": "http://dbpedia.org/page/IPad" "prov:generatedBy": "http://example.com/analyse", } ], "nif:isString": "My ipad is an awesome device", "strings": [ { "@id": "http://example.com/analyse?input=My%20ipad%20is%20an%20awesome%20device#char=3,6", "nif:anchorOf": "ipad", "itsrdf:taIdentRef": "http://dbpedia.org/page/IPad" } ] } ] }

}

* Annotating complex emotions in Spanish. Input: "Mi ipad me tiene harto". .. code-block:: javascript

{
"@context": [

"http://demos.gsi.dit.upm.es/eurosentiment/static/context.jsonld"

], "results": { "analysis": [ { "@id": "http://example.com/analyse", "@type": [ "onyx:EmotionAnalysis" ], "dc:language": "es", "onyx:maxEmotionIntensity": 1.0, "onyx:minEmotionIntensity": 0.0 "prov:wasAssociatedWith": "http://example.com/" } ], "entries": [ { "@id": "http://example.com/analyse?input=Mi%20ipad%20me%20tiene%20harto", "dc:language": "es", "opinions": [ ], "emotions": [ { "onyx:aboutObject": "http://dbpedia.org/page/IPad" "prov:generatedBy": "http://example.com/analyse", "onyx:hasEmotion": [ { "onyx:hasEmotionCategory": "wna:dislike", "onyx:hasEmotionIntensity": 0.7 }, { "onyx:hasEmotionCategory": "wna:despair", "onyx:hasEmotionIntensity": 0.1 } ] } ], "nif:isString": "My ipad is an awesome device", "prov:generatedBy": "http://example.com/analyse", "strings": [ { "@id": "http://example.com/analyse?input=Mi%20ipad%20me%20tiene%20harto#char=3,6", "nif:anchorOf": "ipad", "itsrdf:taIdentRef": "http://dbpedia.org/page/IPad" } ] } ] }

}

Other serialisation formats

The Eurosentiment format is semantic, as is the NIF Format Althought the preferred and mainly used serialisation format is JSON-LD, there are other serialisation formats that could be used as well.

For instance, it is particularly interesting to convert corpora to N-Triples for storage in a semantic server such as Virtuoso.

NIF

http://persistence.uni-leipzig.org/nlp2rdf/

JSON-LD

http://json-ld.org