# Harnessing diversity in crowds and machines for better NER performance


This repository contains the experimental results of identifying and typing named entities in English Wikipedia sentences. Even though current named entity recognition tools achieve nearly human-like performance or particular data types or domains, they are still highly dependent on the gold standard used for training and testing. The mainstream approach of gathering ground truth or gold standard for training and evaluating named entity recognition tools is still by means of experts, who are typically expensive and hard to find. Furthermore, for each new input type, or each new domain, new gold standards need to be created. Overall, the experts follow over-generalized annotation guidelines, meant to increase the <b>inter-annotator agreement</b> between experts. Such guidelines are thus prone to denying the intrinsic language ambiguity, multitude of perspectives and interpretations. Thus, ground truth datasets might not always be 'gold' or 'true' in terms of capturing the real text meaning and interpretation diversity. In the last decade crowdsourcing has also proven to be a suitable method for gathering such ground truth, but data ambiguity is still not handled.

However, in our work we focus on capturing the <b>inter-annotator disagreement</b> to provide a new type of ground truth, i.e., crowd truth - by applying the CrowdTruth metrics and methodology, where language features are taken into consideration. All the crowdsourcing experiments were performed through the CrowdTruth platform, while the results were processed and analyzed using the CrowdTruth methodology and metrics. For more information, check the <b><a href="http://crowdtruth.org/">CrowdTruth</a></b> website. For gathering the annotated data, we used the <b><a href="http://corwdflower.com/">CrowdFlower</a></b> marketplace.

We propose a novel approach for extracting and typing named entities in texts, i.e.m a hybrid multi-machine-crowd approach where state-of-the-art NER tools are combined and their aggregated output is validated and improved through crowdsourcing. We report here results of:
1. Five state-of-the-art named entity recognition tools (Single-NER)
2. The combined output of the five state-of-the-art named entity recognition tools (Multi-NER)
3. Crowdsourcing experiments for correcting and improving the Multi-NER output and also for improving the expert-based gold standard (Multi-NER+Crowd). 


## Check the Results & Download the Data: <a href="https://github.com/CrowdTruth/Crowdsourcing-Improved-NE-Gold-Standard">Crowdsourcing-Improved-NE-Gold-Standard</a>


## Table of Contents:

* [Experimental Data](#experimentaldata)
* [Dataset Files](#datasetfiles)
* [Crowdsourcing Experiments](#crowdsourcingexperiments)
   * [Crowdsourcing Experimental Data](#crowdsourcingdata)
   * [Crowdsourcing Annotation Task](#crowdsourcingtask)
* [Experiments Results](#results)
   * [Single-NER vs. Multi-NER (entity surface)](#SingleNER-MultiNER-surface)
   * [Single-NER vs. Multi-NER (entity surface and entity type)](#SingleNER-MultiNER-surface-type)
   * [Crowd-Enhanced Multi-NER](#Crowd-MultiNER)


<a id="experimentaldata"></a>

## Experimental Data:

We performed named entity extraction with five state-of-the-art NER tools: <a href="http://nerd.eurecom.fr">NERD-ML</a>, <a href="https://www.textrazor.com">TextRazor</a>, <a href="http://ner.vse.cz/thd/">THD</a>, <a href="http://dbpedia-spotlight.github.io/demo/">DBpediaSpotlight</a>, and <a href="http://nlp.vse.cz/SemiTags/">SemiTags</a>. We performed a comparative analysis of (1) their performance (output) and (2) their combined performance (output), on <b>two ground truth (GT) evaluation datasets</b> used during Task 1 of the Open Knowledge Extraction (OKE) semantic challenge at ESWC in 2015 (<i>OKE2015</i>) and 2016 (<i>OKE2016</i>) respectively. The datasets can be checked here:
1. <b>OKE2015</b>: Open Knowledge Extraction 2015 (OKE2015) semantic challenge: https://github.com/anuzzolese/oke-challenge
2. <b>OKE2016</b>: Open Knowledge Extraction 2016 (OKE2016) semantic challenge: https://github.com/anuzzolese/oke-challenge-2016

In summary, there are $156$ Wikipedia sentences with $1007$ annotated named entities of types <i>place</i>, <i>person</i>, <i>organization</i> and <i>role</i> distributed across datasets in the following way:

<table class="tg">
  <tr>
    <th class="tg-s6z2" rowspan="6"></th>
    <th class="tg-s6z2" colspan="3" style="text-align:center;">OKE2015</th>
    <th class="tg-s6z2" colspan="3" style="text-align:center;">OKE2016</th>
  </tr>
  <tr>
    <td class="tg-baqh">Sentences</td>
    <td class="tg-baqh" colspan="2">Named Entities</td>
    <td class="tg-baqh">Sentences</td>
    <td class="tg-baqh" colspan="2">Named Entities</td>
  </tr>
  <tr>
    <td class="tg-baqh" rowspan="4">101</td>
    <td class="tg-baqh">Place</td>
    <td class="tg-baqh" style="text-align:center;">120</td>
    <td class="tg-baqh" rowspan="4" style="text-align:center;">55</td>
    <td class="tg-baqh">Place</td>
    <td class="tg-baqh" style="text-align:center;">44</td>
  </tr>
  <tr>
    <td class="tg-baqh">Person</td>
    <td class="tg-baqh" style="text-align:center;">304</td>
    <td class="tg-baqh">Person</td>
    <td class="tg-baqh" style="text-align:center;">105</td>
  </tr>
  <tr>
    <td class="tg-baqh">Organization</td>
    <td class="tg-baqh" style="text-align:center;">139</td>
    <td class="tg-baqh">Organization</td>
    <td class="tg-baqh" style="text-align:center;">105</td>
  </tr>
  <tr>
    <td class="tg-baqh">Role</td>
    <td class="tg-baqh" style="text-align:center;">103</td>
    <td class="tg-baqh">Role</td>
    <td class="tg-baqh" style="text-align:center;">86</td>
  </tr>
  <tr>
    <td class="tg-baqh">Total</td>
    <td class="tg-baqh" style="text-align:center;">101</td>
    <td class="tg-baqh" colspan="2" style="text-align:center;">664</td>
    <td class="tg-baqh" style="text-align:center;">55</td>
    <td class="tg-baqh" colspan="2" style="text-align:center;">340</td>
  </tr>
</table>

      
<a id="datasetfiles"></a>

## Dataset Files:

```
|--/aggregate
```
Various aggregated datasets collected as part of collecting the salient features in news articles and tweets workflow. We describe here the most important files:

```
|--/aggregate/aggregatedResults_newsArticles.csv
```
This file contains the processed ground truth for the news articles related to the whaling event, in comma-separated format. The file contains aggregated results of the snippets relevance and the snippets and relevant event mentions sentiment and intensity. The columns are:

* *Dataset*: reference to the dataset, news - DS1
* *Unit Id*: unique ID of the data entry
* *Title Id*: news article unique title ID
* *Title*: news article title
* *Snippet Id*: news article unique snippet ID
* *Snippet*: news article snippet
* *Overlapping Snippet*: binary value describing whether the snippet contains overlapping tokens with the title (1) or not (0)
* *Snippet Relevance Score*: the snippet relevance score; computed using the cosine similarity measure, shows the likelihood that the given snippet is relevant for the news article title
* *Number of Relevant Mentions*: total number of relevant event mentions identified by the crowd in the given snippet
* *Overall Sentiment-Intensity*: binary value describing whether the following columns contain sentiment and intensity scores for the snippet (1) or for the relevant event mentions identified in the given snippet (0)
* *Relevant Mention*: relevant event mention
* *Relevant Mention Score*: the event mention relevance score; computed using the cosine similarity measure, shows the likelihood that the given mention in the snippet is relevant for the news article title
* *Positive Sentiment, Negative Sentiment, Neutral Sentiment*: the sentiment scores of the snippets and event mentions; computed using the cosine similarity measure, shows the likelihood that the given snippet or mention expresses the given sentiment
* *High Intensity, Low Intensity, Medium Intensity*: the intensity scores of the snippets and event mentions; computed using the cosine similarity measure, shows the likelihood that the given snippet or mention expresses a sentiment with the given intensity 

```
|--/aggregate/aggregatedResults_tweets2014&2015.csv
```
This file contains the processed ground truth for the tweets related to the whaling event - from 2014 and 2015, in comma-separated format. The file contains aggregated results of the tweets relevance, tweets relevant event mentions and the sentiment and intensity of the overall tweet and event mentions. The columns are:

* *Dataset*: reference to the dataset, tweets 2014 - DS2, tweets 2015 - DS3 
* *Tweet Id*: unique ID of the tweet data entry
* *Tweet Author*: tweet author
* *Tweet Date*: tweet date
* *Tweet Seed Index*: unique tweet-event ID
* *Tweet Content*: tweet content
* *Tweet Event Relevance Score*: the tweet relevance score with regard to the whaling event; computed using the cosine similarity measure, shows the likelihood that the given tweet is relevant for the whaling event
* *Number of Relevant Mentions*: total number of relevant event mentions identified by the crowd in the given tweet
* *Overall Sentiment-Intensity*: binary value describing whether the following columns contain sentiment and intensity scores for the tweet (1) or for the relevant event mentions identified in the given tweet (0)
* *Relevant Mention*: relevant event mention
* *Relevant Mention Score*: the event mention relevance score; computed using the cosine similarity measure, shows the likelihood that the given mention in the tweet is relevant for the whaling event
* *Positive Sentiment, Negative Sentiment, Neutral Sentiment*: the sentiment scores of the tweet and event mentions; computed using the cosine similarity measure, shows the likelihood that the given tweet or mention expresses the given sentiment
* *High Intensity, Low Intensity, Medium Intensity*: the intensity scores of the tweet and event mentions; computed using the cosine similarity measure, shows the likelihood that the given tweet or mention expresses a sentiment with the given intensity 


```
|--aggregate/orderedSnippetsByRelevance.csv
```
The file contains the relevant news snippets ordered by their relevance score. The overlapping news snippets are ordered in a descending way, while the non-overlapping news snippets are ordered ascending. 

```
|--aggregate/snippetsPositionInArticle.csv
```
The file contains measures for snippets relevance with regard to their position in the news articles.

```
|--aggregate/orderedNewsSnippetsBySentiments.csv
```
The file contains the relevant news snippets ordered by their sentiments: positive sentiment - descending, negative sentiment - ascending. 

```
|--aggregate/orderedTweetsByRelevance.csv
```
The file contains the relevant tweets ordered by their relevance score.

```
|--aggregate/orderedTweetsByMentions.csv
```
The file contains the relevant tweets ordered by their total number of relevance event mentions.

```
|--aggregate/histogramRelevantTweets.csv
```
The file contains the number of relevant tweets for each relevance score intervals.

```
|--aggregate/orderedTweetsBySentiments.csv
```
The file contains the relevant tweets ordered by their sentiments: positive sentiment - descending, negative sentiment - ascending. 

```
|--aggregate/tweetsChangeInSentiment.csv
```
The file contains relevant event mentions in tweets that refer to "*whaling ban*". Each such relevant event mention has the associated sentiment and intensity acores.

```
|--/input
|  |--/seedWords_domainExperts.csv
```
The file contains relevant seed words for the whaling event, obtained from the social sciences domain experts. Each column of the file represents a type: *Event*, *Location*, *Actor/Organization*, *Other*

```
|--/raw
|  |--/Relevance Analysis
|  |  |--/News
|  |  |--/Tweets
|  |--/Sentiment Analysis
|  |  |--/News
|  |  |--/Tweets
```
The raw data collected from crowdsourcing for each of the 2 tasks.


<a id="crowdsourcingexperiments"></a>

## Crowdsourcing Experiments:

Overall, the aim of the crowdsourcing experiments is to:
1. correct the mistakes of the NER tools
2. identify the ambiguities in the ground truth and provide a better ground truth 

<a id="crowdsourcingdata"></a>

### Crowdsourcing Experimental Data

We select every entity in the ground truth for which the NER tools provided alternatives. We have the following two cases:  
* <i>Crowd reduces the number of FP</i>: For each named entity in the ground truth that has multiple alternatives (span alternative) we create an entity cluster. We also add the largest span among all the alternatives.  
* <i>Crowd reduces the number of FN</i>: For each named entity in the ground truth that was not extracted, we create an entity cluster that contains the FN named entity and the alternatives returned by the NER. Further, we add every other combination of words contained in all the alternatives. This step is necessary because we do not want to introduce bias in the task, i.e., the crowd should see all the possibilities, not only the expected one.  

<a id="crowdsourcingtask"></a>

### Crowdsourcing Annotation Task

For the two cases described above, the goal of the crowdsourcing task is two-fold: 
* identification of valid expressions from a list that refer to a highlighted phrase in yellow (Step 2 from the crowdsourcing template below)
* selection of the type for each expression in the list, from a predefined set of choices - place, person, organization, role and other (Step 3 from the crowdsourcing template below). 

The input of the crowdsourcing task consists of a sentence and a named entity for which multiple expressions were given by the five state-of-the-art NER tools.

Check the crowdsourcing templates below. To enlarge the picture and read the crowdsourcing task instructions click <a href="https://raw.githubusercontent.com/CrowdTruth/Salience-In-News-And-Tweets/master/img/taskrelevancenews.jpg" target="_blank">here</a>.



<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg .tg-baqh{text-align:center;vertical-align:top}
.tg .tg-yw4l{vertical-align:top}
</style>
<table class="tg">
  <tr>
    <th class="tg-baqh" rowspan="2"></th>
    <th class="tg-yw4l" colspan="6">OKE2015</th>
  </tr>
  <tr>
    <td class="tg-yw4l">TP</td>
    <td class="tg-yw4l">FP</td>
    <td class="tg-yw4l">FN</td>
    <td class="tg-yw4l">Precision</td>
    <td class="tg-yw4l">Recall</td>
    <td class="tg-yw4l">F1-score</td>
  </tr>
  <tr>
    <td class="tg-yw4l">NERD</td>
    <td class="tg-yw4l">401</td>
    <td class="tg-yw4l">93</td>
    <td class="tg-yw4l">263</td>
    <td class="tg-yw4l">0.812</td>
    <td class="tg-yw4l">0.604</td>
    <td class="tg-yw4l">0.693</td>
  </tr>
  <tr>
    <td class="tg-yw4l">SemiTags</td>
    <td class="tg-yw4l">366</td>
    <td class="tg-yw4l">37</td>
    <td class="tg-yw4l">298</td>
    <td class="tg-yw4l">0.908</td>
    <td class="tg-yw4l">0.551</td>
    <td class="tg-yw4l">0.686</td>
  </tr>
  <tr>
    <td class="tg-yw4l">THD</td>
    <td class="tg-yw4l">199</td>
    <td class="tg-yw4l">114</td>
    <td class="tg-yw4l">465</td>
    <td class="tg-yw4l">0.636</td>
    <td class="tg-yw4l">0.3</td>
    <td class="tg-yw4l">0.407</td>
  </tr>
  <tr>
    <td class="tg-yw4l">DBpediaSpotlight</td>
    <td class="tg-yw4l">411</td>
    <td class="tg-yw4l">234</td>
    <td class="tg-yw4l">253</td>
    <td class="tg-yw4l">0.637</td>
    <td class="tg-yw4l">0.619</td>
    <td class="tg-yw4l">0.628</td>
  </tr>
  <tr>
    <td class="tg-yw4l">TextRazor</td>
    <td class="tg-yw4l">431</td>
    <td class="tg-yw4l">177</td>
    <td class="tg-yw4l">232</td>
    <td class="tg-yw4l">0.709</td>
    <td class="tg-yw4l">0.65</td>
    <td class="tg-yw4l">0.678</td>
  </tr>
  <tr>
    <td class="tg-yw4l">Multi-NER</td>
    <td class="tg-yw4l">555</td>
    <td class="tg-yw4l">403</td>
    <td class="tg-yw4l">109</td>
    <td class="tg-yw4l">0.579</td>
    <td class="tg-yw4l">0.836</td>
    <td class="tg-yw4l">0.684</td>
  </tr>
</table>


<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg .tg-yw4l{vertical-align:top}
</style>
<table class="tg">
  <tr>
    <th class="tg-yw4l" rowspan="2"></th>
    <th class="tg-yw4l" colspan="6">OKE2016</th>
  </tr>
  <tr>
    <td class="tg-yw4l">TP</td>
    <td class="tg-yw4l">FP</td>
    <td class="tg-yw4l">FN</td>
    <td class="tg-yw4l">Precision</td>
    <td class="tg-yw4l">Recall</td>
    <td class="tg-yw4l">F1-score</td>
  </tr>
  <tr>
    <td class="tg-yw4l">NERD</td>
    <td class="tg-yw4l">209</td>
    <td class="tg-yw4l">37</td>
    <td class="tg-yw4l">131</td>
    <td class="tg-yw4l">0.85</td>
    <td class="tg-yw4l">0.615</td>
    <td class="tg-yw4l">0.713</td>
  </tr>
  <tr>
    <td class="tg-yw4l">SemiTags</td>
    <td class="tg-yw4l">161</td>
    <td class="tg-yw4l">14</td>
    <td class="tg-yw4l">179</td>
    <td class="tg-yw4l">0.92</td>
    <td class="tg-yw4l">0.474</td>
    <td class="tg-yw4l">0.625</td>
  </tr>
  <tr>
    <td class="tg-yw4l">THD</td>
    <td class="tg-yw4l">122</td>
    <td class="tg-yw4l">73</td>
    <td class="tg-yw4l">218</td>
    <td class="tg-yw4l">0.626</td>
    <td class="tg-yw4l">0.359</td>
    <td class="tg-yw4l">0.456</td>
  </tr>
  <tr>
    <td class="tg-yw4l">DBpediaSpotlight</td>
    <td class="tg-yw4l">228</td>
    <td class="tg-yw4l">119</td>
    <td class="tg-yw4l">112</td>
    <td class="tg-yw4l">0.657</td>
    <td class="tg-yw4l">0.671</td>
    <td class="tg-yw4l">0.664</td>
  </tr>
  <tr>
    <td class="tg-yw4l">TextRazor</td>
    <td class="tg-yw4l">207</td>
    <td class="tg-yw4l">105</td>
    <td class="tg-yw4l">133</td>
    <td class="tg-yw4l">0.663</td>
    <td class="tg-yw4l">0.609</td>
    <td class="tg-yw4l">0.635</td>
  </tr>
  <tr>
    <td class="tg-yw4l">Multi-NER</td>
    <td class="tg-yw4l">299</td>
    <td class="tg-yw4l">218</td>
    <td class="tg-yw4l">41</td>
    <td class="tg-yw4l">0.578</td>
    <td class="tg-yw4l">0.879</td>
    <td class="tg-yw4l">0.698</td>
  </tr>
</table>


<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg .tg-yw4l{vertical-align:top}
</style>
<table class="tg">
  <tr>
    <th class="tg-yw4l" rowspan="3"><br></th>
    <th class="tg-yw4l" colspan="15">OKE2015</th>
  </tr>
  <tr>
    <td class="tg-yw4l" colspan="5">TP</td>
    <td class="tg-yw4l" colspan="5">FP</td>
    <td class="tg-yw4l" colspan="5">FN</td>
  </tr>
  <tr>
    <td class="tg-yw4l">Place</td>
    <td class="tg-yw4l">People</td>
    <td class="tg-yw4l">Org</td>
    <td class="tg-yw4l">Role</td>
    <td class="tg-yw4l">Total</td>
    <td class="tg-yw4l">Place</td>
    <td class="tg-yw4l">People</td>
    <td class="tg-yw4l">Org</td>
    <td class="tg-yw4l">Role</td>
    <td class="tg-yw4l">Total</td>
    <td class="tg-yw4l">Place</td>
    <td class="tg-yw4l">People</td>
    <td class="tg-yw4l">Org</td>
    <td class="tg-yw4l">Role</td>
    <td class="tg-yw4l">Total</td>
  </tr>
  <tr>
    <td class="tg-yw4l">NERD</td>
    <td class="tg-yw4l">90</td>
    <td class="tg-yw4l">142</td>
    <td class="tg-yw4l">106</td>
    <td class="tg-yw4l">65</td>
    <td class="tg-yw4l">403</td>
    <td class="tg-yw4l">22</td>
    <td class="tg-yw4l">21</td>
    <td class="tg-yw4l">42</td>
    <td class="tg-yw4l">17</td>
    <td class="tg-yw4l">102</td>
    <td class="tg-yw4l">30</td>
    <td class="tg-yw4l">162</td>
    <td class="tg-yw4l">33</td>
    <td class="tg-yw4l">38</td>
    <td class="tg-yw4l">263</td>
  </tr>
  <tr>
    <td class="tg-yw4l">SemiTags</td>
    <td class="tg-yw4l">100</td>
    <td class="tg-yw4l">168</td>
    <td class="tg-yw4l">100</td>
    <td class="tg-yw4l">0</td>
    <td class="tg-yw4l">368</td>
    <td class="tg-yw4l">16</td>
    <td class="tg-yw4l">2</td>
    <td class="tg-yw4l">19</td>
    <td class="tg-yw4l">2</td>
    <td class="tg-yw4l">39</td>
    <td class="tg-yw4l">20</td>
    <td class="tg-yw4l">136</td>
    <td class="tg-yw4l">39</td>
    <td class="tg-yw4l">103</td>
    <td class="tg-yw4l">298</td>
  </tr>
  <tr>
    <td class="tg-yw4l">THD</td>
    <td class="tg-yw4l">62</td>
    <td class="tg-yw4l">35</td>
    <td class="tg-yw4l">55</td>
    <td class="tg-yw4l">49</td>
    <td class="tg-yw4l">201</td>
    <td class="tg-yw4l">17</td>
    <td class="tg-yw4l">17</td>
    <td class="tg-yw4l">62</td>
    <td class="tg-yw4l">29</td>
    <td class="tg-yw4l">125</td>
    <td class="tg-yw4l">58</td>
    <td class="tg-yw4l">269</td>
    <td class="tg-yw4l">84</td>
    <td class="tg-yw4l">54</td>
    <td class="tg-yw4l">465</td>
  </tr>
  <tr>
    <td class="tg-yw4l">DBpediaSpotlight</td>
    <td class="tg-yw4l">99</td>
    <td class="tg-yw4l">156</td>
    <td class="tg-yw4l">81</td>
    <td class="tg-yw4l">77</td>
    <td class="tg-yw4l">413</td>
    <td class="tg-yw4l">26</td>
    <td class="tg-yw4l">62</td>
    <td class="tg-yw4l">124</td>
    <td class="tg-yw4l">26</td>
    <td class="tg-yw4l">238</td>
    <td class="tg-yw4l">21</td>
    <td class="tg-yw4l">148</td>
    <td class="tg-yw4l">58</td>
    <td class="tg-yw4l">26</td>
    <td class="tg-yw4l">253</td>
  </tr>
  <tr>
    <td class="tg-yw4l">TextRazor</td>
    <td class="tg-yw4l">110</td>
    <td class="tg-yw4l">174</td>
    <td class="tg-yw4l">109</td>
    <td class="tg-yw4l">40</td>
    <td class="tg-yw4l">434</td>
    <td class="tg-yw4l">31</td>
    <td class="tg-yw4l">14</td>
    <td class="tg-yw4l">118</td>
    <td class="tg-yw4l">24</td>
    <td class="tg-yw4l">187</td>
    <td class="tg-yw4l">9</td>
    <td class="tg-yw4l">130</td>
    <td class="tg-yw4l">30</td>
    <td class="tg-yw4l">63</td>
    <td class="tg-yw4l">232</td>
  </tr>
  <tr>
    <td class="tg-yw4l">Multi-NER</td>
    <td class="tg-yw4l">117</td>
    <td class="tg-yw4l">219</td>
    <td class="tg-yw4l">130</td>
    <td class="tg-yw4l">92</td>
    <td class="tg-yw4l">558</td>
    <td class="tg-yw4l">54</td>
    <td class="tg-yw4l">91</td>
    <td class="tg-yw4l">214</td>
    <td class="tg-yw4l">66</td>
    <td class="tg-yw4l">425</td>
    <td class="tg-yw4l">4</td>
    <td class="tg-yw4l">85</td>
    <td class="tg-yw4l">9</td>
    <td class="tg-yw4l">11</td>
    <td class="tg-yw4l">108</td>
  </tr>
</table>

<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg .tg-yw4l{vertical-align:top}
</style>
<table class="tg">
  <tr>
    <th class="tg-yw4l" rowspan="3"><br></th>
    <th class="tg-yw4l" colspan="15">OKE2016</th>
  </tr>
  <tr>
    <td class="tg-yw4l" colspan="5">TP</td>
    <td class="tg-yw4l" colspan="5">FP</td>
    <td class="tg-yw4l" colspan="5">FN</td>
  </tr>
  <tr>
    <td class="tg-yw4l">Place</td>
    <td class="tg-yw4l">People</td>
    <td class="tg-yw4l">Organization</td>
    <td class="tg-yw4l">Role</td>
    <td class="tg-yw4l">Total</td>
    <td class="tg-yw4l">Place</td>
    <td class="tg-yw4l">People</td>
    <td class="tg-yw4l">Organization</td>
    <td class="tg-yw4l">Role</td>
    <td class="tg-yw4l">Total</td>
    <td class="tg-yw4l">Place</td>
    <td class="tg-yw4l">People</td>
    <td class="tg-yw4l">Organization</td>
    <td class="tg-yw4l">Role</td>
    <td class="tg-yw4l">Total</td>
  </tr>
  <tr>
    <td class="tg-yw4l">NERD</td>
    <td class="tg-yw4l">40</td>
    <td class="tg-yw4l">47</td>
    <td class="tg-yw4l">71</td>
    <td class="tg-yw4l">51</td>
    <td class="tg-yw4l">209</td>
    <td class="tg-yw4l">1</td>
    <td class="tg-yw4l">3</td>
    <td class="tg-yw4l">30</td>
    <td class="tg-yw4l">6</td>
    <td class="tg-yw4l">40</td>
    <td class="tg-yw4l">4</td>
    <td class="tg-yw4l">58</td>
    <td class="tg-yw4l">34</td>
    <td class="tg-yw4l">35</td>
    <td class="tg-yw4l">131</td>
  </tr>
  <tr>
    <td class="tg-yw4l">SemiTags</td>
    <td class="tg-yw4l">36</td>
    <td class="tg-yw4l">57</td>
    <td class="tg-yw4l">67</td>
    <td class="tg-yw4l">1</td>
    <td class="tg-yw4l">161</td>
    <td class="tg-yw4l">5</td>
    <td class="tg-yw4l">2</td>
    <td class="tg-yw4l">7</td>
    <td class="tg-yw4l">1</td>
    <td class="tg-yw4l">15</td>
    <td class="tg-yw4l">8</td>
    <td class="tg-yw4l">48</td>
    <td class="tg-yw4l">38</td>
    <td class="tg-yw4l">85</td>
    <td class="tg-yw4l">179</td>
  </tr>
  <tr>
    <td class="tg-yw4l">THD</td>
    <td class="tg-yw4l">36</td>
    <td class="tg-yw4l">12</td>
    <td class="tg-yw4l">33</td>
    <td class="tg-yw4l">41</td>
    <td class="tg-yw4l">122</td>
    <td class="tg-yw4l">3</td>
    <td class="tg-yw4l">1</td>
    <td class="tg-yw4l">55</td>
    <td class="tg-yw4l">14</td>
    <td class="tg-yw4l">73</td>
    <td class="tg-yw4l">8</td>
    <td class="tg-yw4l">93</td>
    <td class="tg-yw4l">72</td>
    <td class="tg-yw4l">45</td>
    <td class="tg-yw4l">218</td>
  </tr>
  <tr>
    <td class="tg-yw4l">DBpediaSpotlight</td>
    <td class="tg-yw4l">38</td>
    <td class="tg-yw4l">70</td>
    <td class="tg-yw4l">56</td>
    <td class="tg-yw4l">64</td>
    <td class="tg-yw4l">228</td>
    <td class="tg-yw4l">5</td>
    <td class="tg-yw4l">7</td>
    <td class="tg-yw4l">93</td>
    <td class="tg-yw4l">14</td>
    <td class="tg-yw4l">119</td>
    <td class="tg-yw4l">6</td>
    <td class="tg-yw4l">35</td>
    <td class="tg-yw4l">49</td>
    <td class="tg-yw4l">22</td>
    <td class="tg-yw4l">112</td>
  </tr>
  <tr>
    <td class="tg-yw4l">TextRazor</td>
    <td class="tg-yw4l">36</td>
    <td class="tg-yw4l">57</td>
    <td class="tg-yw4l">83</td>
    <td class="tg-yw4l">31</td>
    <td class="tg-yw4l">207</td>
    <td class="tg-yw4l">15</td>
    <td class="tg-yw4l">4</td>
    <td class="tg-yw4l">79</td>
    <td class="tg-yw4l">12</td>
    <td class="tg-yw4l">110</td>
    <td class="tg-yw4l">8</td>
    <td class="tg-yw4l">48</td>
    <td class="tg-yw4l">22</td>
    <td class="tg-yw4l">55</td>
    <td class="tg-yw4l">133</td>
  </tr>
  <tr>
    <td class="tg-yw4l">Multi-NER</td>
    <td class="tg-yw4l">44</td>
    <td class="tg-yw4l">78</td>
    <td class="tg-yw4l">100</td>
    <td class="tg-yw4l">77</td>
    <td class="tg-yw4l">299</td>
    <td class="tg-yw4l">21</td>
    <td class="tg-yw4l">13</td>
    <td class="tg-yw4l">157</td>
    <td class="tg-yw4l">34</td>
    <td class="tg-yw4l">225</td>
    <td class="tg-yw4l">0</td>
    <td class="tg-yw4l">27</td>
    <td class="tg-yw4l">5</td>
    <td class="tg-yw4l">9</td>
    <td class="tg-yw4l">41</td>
  </tr>
</table>


<style type="text/css">
.tg  {border-collapse:collapse;border-spacing:0;}
.tg td{font-family:Arial, sans-serif;font-size:14px;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg th{font-family:Arial, sans-serif;font-size:14px;font-weight:normal;padding:10px 5px;border-style:solid;border-width:1px;overflow:hidden;word-break:normal;}
.tg .tg-baqh{text-align:center;vertical-align:top}
</style>
<table class="tg">
  <tr>
    <th class="tg-baqh" rowspan="2"></th>
    <th class="tg-baqh" colspan="3">Place</th>
    <th class="tg-baqh" colspan="3">People</th>
    <th class="tg-baqh" colspan="3">Organization</th>
    <th class="tg-baqh" colspan="3">Role</th>
  </tr>
  <tr>
    <td class="tg-baqh">P</td>
    <td class="tg-baqh">R</td>
    <td class="tg-baqh">F1</td>
    <td class="tg-baqh">P</td>
    <td class="tg-baqh">R</td>
    <td class="tg-baqh">F1</td>
    <td class="tg-baqh">P</td>
    <td class="tg-baqh">R</td>
    <td class="tg-baqh">F1</td>
    <td class="tg-baqh">P</td>
    <td class="tg-baqh">R</td>
    <td class="tg-baqh">F1</td>
  </tr>
  <tr>
    <td class="tg-baqh">$OKE2015$</td>
    <td class="tg-baqh">0.69</td>
    <td class="tg-baqh">0.98</td>
    <td class="tg-baqh">0.81</td>
    <td class="tg-baqh">0.70</td>
    <td class="tg-baqh">0.72</td>
    <td class="tg-baqh">0.71</td>
    <td class="tg-baqh">0.38</td>
    <td class="tg-baqh">0.94</td>
    <td class="tg-baqh">0.54</td>
    <td class="tg-baqh">0.59</td>
    <td class="tg-baqh">0.89</td>
    <td class="tg-baqh">0.71</td>
  </tr>
  <tr>
    <td class="tg-baqh">$OKE2016$</td>
    <td class="tg-baqh">0.68</td>
    <td class="tg-baqh">1.00</td>
    <td class="tg-baqh">0.81</td>
    <td class="tg-baqh">0.86</td>
    <td class="tg-baqh">0.74</td>
    <td class="tg-baqh">0.80</td>
    <td class="tg-baqh">0.39</td>
    <td class="tg-baqh">0.95</td>
    <td class="tg-baqh">0.55</td>
    <td class="tg-baqh">0.70</td>
    <td class="tg-baqh">0.90</td>
    <td class="tg-baqh">0.79</td>
  </tr>
</table>

