Skip to content
JohnDaws edited this page Aug 13, 2018 · 1 revision

Baleen 2.6 introduces Event Extraction to the Baleen annotators. An event in this context is considered to be a relationship that contains both a location and a time or date.

Mongo Events

A new consumer MongoEvents has been created for the output of events from Baleen.

Simple Event Extraction

A basic method of extracting events has been implemented within the events.SimpleEventExtractor annotator which can be configured to detect events in sentences or paragraphs which contain both a temporal and location annotation as well as at least one entity.

It is necessary to also include language.OpenNLP in the pipeline in order that sentences or paragraphs may be parsed for events.

Odin Event Extraction

Odin has been integrated into Baleen to provide a rule based system for extracting events. A full description of the rule language can be found in Description of the Odin Event Extraction Framework and Rule Language.

Two simple examples are given below:

- name: token_example
  label: LivesIn
  priority: 2
  type: token
  pattern: |
    (?Oscar) (?[lemma=live]) on [tag=DT]? (?[tag=/^N/]+)

Each rule has a name and a label. The label (or list of labels) are assigned to mentions found by this rule. The priority describes the order the rule is run in. Odin runs all the rules on each sentence until no new mentions are given and therefore the priority is not required, but performance can be improved by assigning priorities if there are any dependencies between the rules. The type is either ‘token’ or ‘dependency’ and describes the rule syntax used. In this example, we have a token rule. To satisfy rules of this type a sequence of tokens must be found that match each token rule in the pattern in turn. The token rules are space separated. For example, the simplest rule in the above sequence is the ‘on’, which matches the word ‘on’ exactly. We now take each in turn:

  • (?<resident>Oscar) matches the word Oscar and assigns that match to the role resident in the mention.
  • (?<trigger>[lemma=live]) matches any token that has the lemma live, this is assigned the role trigger. Trigger is a special role in an event, and every event must have a trigger.
  • on matches the word ‘on’
  • [tag=DT]? Matches a token that has the part of speech tag DT, meaning it is a determiner. The ‘?’ at the end shows that this is optional, and as such can also match an empty token.
  • (?<location>[tag=/^N/]+) matches at least one token whose part of speech tag matches the regular expression ^N , meaning it is a Noun and is assigned the role location. If Location entities exist, or a rule that defines Locations, then this could have been replaced with @location:Location.

This rule would match the following sentences:

  • Oscar lives on Sesame Street.
  • Oscar lives on the Sesame Street in a trash can.
  • Back then, Oscar lived on a Sesame Street.

This next rule uses a dependency pattern, we omit type: dependency as this is the default.

- name: dependency_example
  label: Jump
  priority: 2
  pattern: |
    trigger = [lemma=live]
    resident:Person = nsubj
    location:Location = prep pobj

In this rule pattern we start with a trigger, here any word with the lemma live. Then each role in the mention is fulfilled by traversing the dependency graph from the trigger according to the path specified. We assume Person and Location types are entity extracted and allow the matching of such types at the end of the path. The role resident is fulfilled by the ‘nsubj’ of the verb and the location by the following a ‘prep’ and ‘pobj’ edge of the dependency graph. As well as the above examples, more complex sentence structures can be found by this rule, such as:

  • Back then Oscar lived in a small grey trash can on Sesame Street.
  • Oscar, the green bodied Grouch, lives, with Big Bird, on Sesame Street. These examples just scratch the surface of what can be expressed with these rule languages. Complex domain knowledge can be captured and used to extract a rich set of events from a corpus.

Odin can be used with a predefined rules file within a Baleen pipeline as follows:

collectionreader:
# A collection reader
annotators:
# Usual extraction pipeline including either:
#- language.OdinParser
# or
#- language.OpenNLP
#- language.MaltParser

- class: events.Odin
  rules: ./rulesFile.yml
# types:
# - Event
# - Entity

consumers:
- class: MongoEvents
#- class: print.Events

A simple Odin rules file might be:

rules:

- name: event
  label: Event
  priority: 2
  pattern: |
    trigger = [tag=/^V/]
    location: Location = >>
    time: Temporal = >>
    involved: Entity* = >> [!entity=/Location|Temporal/]

whereas a more complex rules file could include

taxonomy:
- List
- Meeting
- Communitation
- Actor
- Number
- Quote
- Group
- Effect:
  - Killing

rules:
- name: numbers
  label: Number
  priority: 1
  type: token
  pattern: |
    [tag=CD]

- name: single-quotes
  label: Quote
  priority: 1
  type: token
  pattern: /[']/ /[^']+/+ /[']/
      
- name: double-quotes
  label: Quote
  priority: 1
  type: token
  pattern: /["]/ /[^"]+/+ /["]/
  
- name: group
  label: Group
  priority: 2
  keep: false
  pattern: |
    trigger = [lemma=/people|civilian/]
    number: Number = /num/
    
- name: actor
  label: Actor
  priority: 2
  type: token
  pattern: |
      [entity=/Person|Organisation/]
     
- name: list
  label: List
  priority: 2
  type: token
  pattern: |
     @item:Entity ("," @item:Entity)+ (and @item:Entity)?

- name: said
  label: Communitation
  priority: 3
  pattern: |
    trigger = [tag=/^V/ & lemma=/say|declare/ & !outgoing=neg]
    subject: Actor = nsubj 
    quote: Quote = >> | <<
    
- name: killed
  label: Killing
  priority: 3
  pattern: |
    trigger = [lemma=/kill/]
    subject: Actor? = /nsubj/
    target: Group = dobj

- name: event
  label: Event
  priority: 2
  pattern: |
    trigger = [tag=/^V/]
    time: Temporal+ = >>
    location: Location+ = >>
    involved: Entity* = >> [!entity=/Location|Temporal/]

- name: meeting
  label: Meeting
  priority: 3
  pattern: |
    trigger = [tag=/^V/ & lemma=/met|meet|gather|host|assemble/ & !mention=Meeting]
    subject: Actor = nsubj prep? pobj?
    object: Actor? = dobj
    participant: Actor* = /nsubj|dobj/ prep? pobj? /appos|conj/
    location: Location? = xcomp? prep pobj
    time: Temporal? = xcomp? prep pobj

- name: meeting-of
  label: Meeting
  priority: 3
  pattern: |
    trigger = [lemma=/met|meet|gather|host|assemble/ & !mention=Meeting]
    subject: Actor = (prep pobj nn) | (prep obj) | (/nmod/ compound|amon)
    
- name: meeting-located
  label: Meeting
  priority: 3
  pattern: |
    trigger = [tag=/^V/ & lemma=/take|took/]
    subject: Meeting = nsubj
    object: Actor? = dobj
    participant: Actor* = /nsubj|dobj/ prep? pobj? /appos|conj/
    location: Location = xcomp? prep pobj
    time: Temporal? = xcomp? prep pobj
      
- name: communicating
  label: Communitation
  priority: 3
  pattern: |
    trigger = [tag=/^V/ & lemma=/talk|speak|argue|chat|communicate|tell|converse|say|shout|utter|discus/]
    participant: Actor{2,} = (/nsubj|dobj/) | (/nsubj|dobj/ (prep pobj)? /appos|conj/)
    location: Location? = xcomp? prep pobj
    time: Temporal? = xcomp? prep pobj
    
- name: born
  label: Event
  priority: 3
  pattern: |
    trigger = [tag=/^V/ & lemma=/bear|born/]
    subject: Person = /nsubj/
    location: Location? = xcomp? prep? pobj
    time: Temporal? = xcomp? prep? pobj
    
- name: attack
  label: Event
  priority: 4
  pattern: |
    trigger = [tag=/^V/ & lemma=/attack/]
    subject: Actor = /nsubj/
    effect: Effect? = >>
    location: Location? = xcomp? prep? pobj
    time: Temporal? = xcomp? prep? pobj