Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange RT2KB and Typing scores #226

Open
jplu opened this issue Nov 25, 2017 · 16 comments
Open

Strange RT2KB and Typing scores #226

jplu opened this issue Nov 25, 2017 · 16 comments
Assignees
Labels

Comments

@jplu
Copy link

jplu commented Nov 25, 2017

Hello,

The RT2KB and Typing process gives strange scores compared to other scorers. Every time I run a RT2KB process, on a NIF dataset, I always get the exact same score for Precision, Recall and F1, which is quite odd (see this example). If I evaluate the same output with two other scorers (neleval and conlleval) I get the same results with both scorers that are much higher than what RT2KB can gives me (P = 0.717, R = 0.765, F1 = 0.740).

The description of RT2KB says "the annotator gets a text and shall recognize the entities inside and their types", consequently I'm curious to know how the three measures can be equals for Typing when they are different for Recognition.

Any light on this would be welcomed :)

Thanks!

@MichaelRoeder
Copy link
Member

MichaelRoeder commented Nov 27, 2017

Thanks for that question. I can only give a general answer since you have uploaded a larger dataset. I think uploading an example with a single document for which the evaluation results are different would give use an easier way of comparing the evaluations 😉

In general, RT2KB does the following:

  1. it identifies entities that have been recognized correctly (Recognition step)
  2. from these correctly identified entities it takes the types and calculates the hierarchical F-measure for the type. (Errors in the recognition will lead to lower precision/recall in this calculation as well, since expected type information won't be available, etc.)

From the results for these two single steps, you can see that the benchmarked system has got 0.76 F1-measure for each step. So the combination of both can not get more than that and most probably will have a lower F1-measure since correctly identified entities might have got a (partly) wrong type.

However, I would be happy to dig into this when you can provide a single example with the results from the other scorers 😃

@jplu
Copy link
Author

jplu commented Nov 27, 2017

Thanks @MichaelRoeder I will check with a single document with more specific details on how to reproduce this with Gerbil and the two other scorers ASAP and share it on this thread :)

@jplu
Copy link
Author

jplu commented Nov 27, 2017

The results with the conlleval scorer was a happy coincidence, because it does not evaluate by "offset" but by "token", so the way it evaluates the recognition is different. Sorry for that.

However, the neleval scorer has a similar behavior than RT2KB and still proposes a different result over this single document. Here the GERBIL results and here the TAC output (understood by the neleval scorer):

Gold Standard in TAC:

document-75	0	14	NIL0	0.0	http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Person
document-75	33	38	http://dbpedia.org/resource/Paris	0.0	http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Place
document-75	74	77	NIL0	0.0	http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Person
document-75	92	94	NIL0	0.0	http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Person
document-75	116	132	NIL1	0.0	http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Organization
document-75	136	156	http://dbpedia.org/resource/Thessaloniki	0.0	http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Place
document-75	158	161	NIL0	0.0	http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Person
document-75	162	168	http://dbpedia.org/resource/Mother	0.0	http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Role
document-75	170	184	NIL2	0.0	http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Person
document-75	205	212	http://dbpedia.org/resource/Actor	0.0	http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Role
document-75	227	241	NIL3	0.0	http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Person

The equivalent in NIF:

@prefix nif:		<http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix itsrdf:		<http://www.w3.org/2005/11/its/rdf#> .
@prefix dul:		<http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#> .
@prefix xsd:		<http://www.w3.org/2001/XMLSchema#> .
@prefix dbpedia:	<http://dbpedia.org/resource/> .
@prefix rdf:		<http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix oke:		<http://aksw.org/notInWiki/> .

<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242>
		a				nif:String, nif:RFC5147String, nif:Context ;
		nif:beginIndex	"0"^^xsd:nonNegativeInteger ;
		nif:endIndex	"242"^^xsd:nonNegativeInteger ;
		nif:isString	"Albert Modiano (1912–77, born in Paris), was of Italian Jewish origin; on his paternal side he was descended from a Sephardic family of Thessaloniki, Greece. His mother, Louisa Colpijn (1918-2015), was an actress also known as Louisa Colpeyn."@en .

<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,14>
		a						nif:String, nif:RFC5147String, nif:Phrase ;
		nif:anchorOf			"Albert Modiano"@en ;
		nif:beginIndex			"0"^^xsd:nonNegativeInteger ;
		nif:endIndex			"14"^^xsd:nonNegativeInteger ;
		nif:referenceContext	<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
		itsrdf:taIdentRef		oke:Albert_Modiano ;
		itsrdf:taClassRef		dul:Person ;
		itsrdf:taSource			"DBpedia 2014"^^xsd:string .

<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=33,38>
		a						nif:String, nif:RFC5147String, nif:Phrase ;
		nif:anchorOf			"Paris"@en ;
		nif:beginIndex			"33"^^xsd:nonNegativeInteger ;
		nif:endIndex			"38"^^xsd:nonNegativeInteger ;
		nif:referenceContext	<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
		itsrdf:taIdentRef		dbpedia:Paris ;
		itsrdf:taClassRef		dul:Place ;
		itsrdf:taSource			"DBpedia 2014"^^xsd:string .

<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=74,77>
		a						nif:String, nif:RFC5147String, nif:Phrase ;
		nif:anchorOf			"his"@en ;
		nif:beginIndex			"74"^^xsd:nonNegativeInteger ;
		nif:endIndex			"77"^^xsd:nonNegativeInteger ;
		nif:referenceContext	<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
		itsrdf:taIdentRef		oke:Albert_Modiano ;
		itsrdf:taClassRef		dul:Person ;
		itsrdf:taSource			"DBpedia 2014"^^xsd:string .

<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=92,94>
		a						nif:String, nif:RFC5147String, nif:Phrase ;
		nif:anchorOf			"he"@en ;
		nif:beginIndex			"92"^^xsd:nonNegativeInteger ;
		nif:endIndex			"94"^^xsd:nonNegativeInteger ;
		nif:referenceContext	<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
		itsrdf:taIdentRef		oke:Albert_Modiano ;
		itsrdf:taClassRef		dul:Person ;
		itsrdf:taSource			"DBpedia 2014"^^xsd:string .

<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=116,132>
		a						nif:String, nif:RFC5147String, nif:Phrase ;
		nif:anchorOf			"Sephardic family"@en ;
		nif:beginIndex			"116"^^xsd:nonNegativeInteger ;
		nif:endIndex			"132"^^xsd:nonNegativeInteger ;
		nif:referenceContext	<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
		itsrdf:taIdentRef		oke:Sephardi_family ;
		itsrdf:taClassRef		dul:Organization ;
		itsrdf:taSource			"DBpedia 2014"^^xsd:string .

<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=136,156>
		a						nif:String, nif:RFC5147String, nif:Phrase ;
		nif:anchorOf			"Thessaloniki, Greece"@en ;
		nif:beginIndex			"136"^^xsd:nonNegativeInteger ;
		nif:endIndex			"156"^^xsd:nonNegativeInteger ;
		nif:referenceContext	<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
		itsrdf:taIdentRef		dbpedia:Thessaloniki ;
		itsrdf:taClassRef		dul:Place ;
		itsrdf:taSource			"DBpedia 2014"^^xsd:string .

<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=158,161>
		a						nif:String, nif:RFC5147String, nif:Phrase ;
		nif:anchorOf			"His"@en ;
		nif:beginIndex			"158"^^xsd:nonNegativeInteger ;
		nif:endIndex			"161"^^xsd:nonNegativeInteger ;
		nif:referenceContext	<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
		itsrdf:taIdentRef		oke:Albert_Modiano ;
		itsrdf:taClassRef		dul:Person ;
		itsrdf:taSource			"DBpedia 2014"^^xsd:string .

<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=162,168>
		a						nif:String, nif:RFC5147String, nif:Phrase ;
		nif:anchorOf			"mother"@en ;
		nif:beginIndex			"162"^^xsd:nonNegativeInteger ;
		nif:endIndex			"168"^^xsd:nonNegativeInteger ;
		nif:referenceContext	<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
		itsrdf:taIdentRef		dbpedia:Mother ;
		itsrdf:taClassRef		dul:Role ;
		itsrdf:taSource			"DBpedia 2014"^^xsd:string .

<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=170,184>
		a						nif:String, nif:RFC5147String, nif:Phrase ;
		nif:anchorOf			"Louisa Colpijn"@en ;
		nif:beginIndex			"170"^^xsd:nonNegativeInteger ;
		nif:endIndex			"184"^^xsd:nonNegativeInteger ;
		nif:referenceContext	<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
		itsrdf:taIdentRef		oke:Louisa_Colpijn ;
		itsrdf:taClassRef		dul:Person ;
		itsrdf:taSource			"DBpedia 2014"^^xsd:string .

<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=205,212>
		a						nif:String, nif:RFC5147String, nif:Phrase ;
		nif:anchorOf			"actress"@en ;
		nif:beginIndex			"205"^^xsd:nonNegativeInteger ;
		nif:endIndex			"212"^^xsd:nonNegativeInteger ;
		nif:referenceContext	<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
		itsrdf:taIdentRef		dbpedia:Actor ;
		itsrdf:taClassRef		dul:Role ;
		itsrdf:taSource			"DBpedia 2014"^^xsd:string .

<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=227,241>
		a						nif:String, nif:RFC5147String, nif:Phrase ;
		nif:anchorOf			"Louisa Colpeyn"@en ;
		nif:beginIndex			"227"^^xsd:nonNegativeInteger ;
		nif:endIndex			"241"^^xsd:nonNegativeInteger ;
		nif:referenceContext	<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
		itsrdf:taIdentRef		oke:Louisa_Colpeyn ;
		itsrdf:taClassRef		dul:Person ;
		itsrdf:taSource			"DBpedia 2014"^^xsd:string .

System output in TAC:

document-75	170	184	http://dbpedia.org/resource/National_Register_of_Historic_Places_listings_in_Iowa	5.4756873E-7	http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Person
document-75	136	156	http://dbpedia.org/resource/Greece	1.4326925E-5	http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Place
document-75	0	14	http://dbpedia.org/resource/University_of_Chicago	5.789066E-6	http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Person
document-75	116	135	http://dbpedia.org/resource/Family_(biology)	3.2394513E-5	http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Organization
document-75	205	212	http://dbpedia.org/resource/Actor	2.6748134E-5	http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Role
document-75	33	39	http://dbpedia.org/resource/Paris	4.2364663E-5	http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Place
document-75	158	161	http://dbpedia.org/resource/Hit_(baseball)	2.1313697E-6	http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Person
document-75	92	94	http://dbpedia.org/resource/Netherlands	1.5448735E-5	http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Person
document-75	74	86	http://dbpedia.org/resource/Rhineland-Palatinate	4.3240807E-6	http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Person
document-75	227	233	http://dbpedia.org/resource/List_of_Animaniacs_characters	4.727223E-7	http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Role
document-75	234	241	NILfbc8560d-e7b1-4207-8856-0de7b142075f	0.0	http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Person
document-75	162	168	http://dbpedia.org/resource/Scotland	2.1532596E-5	http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#Role

Here the equivalent NIF output:

@prefix rdf:   <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd:   <http://www.w3.org/2001/XMLSchema#> .
@prefix itsrdf: <http://www.w3.org/2005/11/its/rdf#> .
@prefix nif:   <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core#> .
@prefix dul:   <http://www.ontologydesignpatterns.org/ont/dul/DUL.owl#> .

<http://localhost/entity/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=170,184>
        a                     nif:String , nif:RFC5147String , nif:Phrase ;
        nif:anchorOf          "Louisa Colpijn" ;
        nif:beginIndex        "170"^^xsd:nonNegativeInteger ;
        nif:endIndex          "184"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://localhost/context/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,242> ;
        itsrdf:taClassRef     dul:Person .

<http://localhost/entity/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=92,94>
        a                     nif:String , nif:RFC5147String , nif:Phrase ;
        nif:anchorOf          "he" ;
        nif:beginIndex        "92"^^xsd:nonNegativeInteger ;
        nif:endIndex          "94"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://localhost/context/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,242> ;
        itsrdf:taClassRef     dul:Person .

<http://localhost/entity/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,14>
        a                     nif:String , nif:RFC5147String , nif:Phrase ;
        nif:anchorOf          "Albert Modiano" ;
        nif:beginIndex        "0"^^xsd:nonNegativeInteger ;
        nif:endIndex          "14"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://localhost/context/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,242> ;
        itsrdf:taClassRef     dul:Person .

<http://localhost/entity/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=136,156>
        a                     nif:String , nif:RFC5147String , nif:Phrase ;
        nif:anchorOf          "Thessaloniki, Greece" ;
        nif:beginIndex        "136"^^xsd:nonNegativeInteger ;
        nif:endIndex          "156"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://localhost/context/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,242> ;
        itsrdf:taClassRef     dul:Place .

<http://localhost/entity/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=33,39>
        a                     nif:String , nif:RFC5147String , nif:Phrase ;
        nif:anchorOf          "Paris)" ;
        nif:beginIndex        "33"^^xsd:nonNegativeInteger ;
        nif:endIndex          "39"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://localhost/context/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,242> ;
        itsrdf:taClassRef     dul:Place .

<http://localhost/entity/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=234,241>
        a                     nif:String , nif:RFC5147String , nif:Phrase ;
        nif:anchorOf          "Colpeyn" ;
        nif:beginIndex        "234"^^xsd:nonNegativeInteger ;
        nif:endIndex          "241"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://localhost/context/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,242> ;
        itsrdf:taClassRef     dul:Person .

<http://localhost/entity/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=158,161>
        a                     nif:String , nif:RFC5147String , nif:Phrase ;
        nif:anchorOf          "His" ;
        nif:beginIndex        "158"^^xsd:nonNegativeInteger ;
        nif:endIndex          "161"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://localhost/context/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,242> ;
        itsrdf:taClassRef     dul:Person .

<http://localhost/entity/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=205,212>
        a                     nif:String , nif:RFC5147String , nif:Phrase ;
        nif:anchorOf          "actress" ;
        nif:beginIndex        "205"^^xsd:nonNegativeInteger ;
        nif:endIndex          "212"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://localhost/context/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,242> ;
        itsrdf:taClassRef     dul:Role .

<http://localhost/context/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,242>
        a               nif:String , nif:RFC5147String , nif:Context ;
        nif:beginIndex  "0"^^xsd:nonNegativeInteger ;
        nif:endIndex    "242"^^xsd:nonNegativeInteger ;
        nif:isString    "Albert Modiano (1912–77, born in Paris), was of Italian Jewish origin; on his paternal side he was descended from a Sephardic family of Thessaloniki, Greece. His mother, Louisa Colpijn (1918-2015), was an actress also known as Louisa Colpeyn."@en .

<http://localhost/entity/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=74,86>
        a                     nif:String , nif:RFC5147String , nif:Phrase ;
        nif:anchorOf          "his paternal" ;
        nif:beginIndex        "74"^^xsd:nonNegativeInteger ;
        nif:endIndex          "86"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://localhost/context/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,242> ;
        itsrdf:taClassRef     dul:Person .

<http://localhost/entity/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=227,233>
        a                     nif:String , nif:RFC5147String , nif:Phrase ;
        nif:anchorOf          "Louisa" ;
        nif:beginIndex        "227"^^xsd:nonNegativeInteger ;
        nif:endIndex          "233"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://localhost/context/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,242> ;
        itsrdf:taClassRef     dul:Role .

<http://localhost/entity/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=162,168>
        a                     nif:String , nif:RFC5147String , nif:Phrase ;
        nif:anchorOf          "mother" ;
        nif:beginIndex        "162"^^xsd:nonNegativeInteger ;
        nif:endIndex          "168"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://localhost/context/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,242> ;
        itsrdf:taClassRef     dul:Role .

<http://localhost/entity/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=116,135>
        a                     nif:String , nif:RFC5147String , nif:Phrase ;
        nif:anchorOf          "Sephardic family of" ;
        nif:beginIndex        "116"^^xsd:nonNegativeInteger ;
        nif:endIndex          "135"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://localhost/context/1bb2b337-4a09-49b1-b2b3-374a0909b17a#char=0,242> ;
        itsrdf:taClassRef     dul:Organization .

The scorer is available here and the command line to run the evaluation is:

./nel evaluate -m strong_typed_mention_match -f tab -g gold_standard.tac system_output.tac

And here the output I get:

ptp     fp      rtp     fn      precis  recall  fscore  measure
7       5       7       4       0.583   0.636   0.609   strong_typed_mention_match

@rtroncy
Copy link

rtroncy commented Nov 27, 2017

It seems to me that the neleval output:

ptp     fp      rtp     fn      precis  recall  fscore  measure
7       5       7       4       0.583   0.636   0.609   strong_typed_mention_match

corresponds to the "Entity Recognition" score provided by GERBIL at http://gerbil.aksw.org/gerbil/experiment?id=201711270022

However, the strong_typed_mention_match SHOULD correspond to the "Entity Typing". Is this the issue?

@jplu
Copy link
Author

jplu commented Nov 27, 2017

No it should correspond to the first line where 0,4375 | 0,4375 | 0,4375 is written. Entity Typing is something else.

Basically "strong_typed_mention_match" in neleval == "RT2KB" in GERBIL and "strong_mention_match" in neleval == "Entity Recognition" in GERBIL.

The example I gave is a case where the score of extraction is equal to the score of recognition because all the 7 (up to 11 in total) correct extracted mentions have their proper type attached. Look at the "TP", "FN" and "FP" values they are equals:

./nel evaluate -m strong_mention_match -f tab -g gold_standard.tac system_output.tac
ptp     fp      rtp     fn      precis  recall  fscore  measure
7       5       7       4       0.583   0.636   0.609   strong_mention_match
./nel evaluate -m strong_typed_mention_match -f tab -g gold_standard.tac system_output.tac
ptp     fp      rtp     fn      precis  recall  fscore  measure
7       5       7       4       0.583   0.636   0.609   strong_mention_match

@MichaelRoeder
Copy link
Member

@jplu thanks for this example. Going through it manually, I have calculated the same result as the nel-eval script.

GS start GS length GS URI GS type Sys start Sys length Sys type Erec Matching hier. Matching
0 14 aksw:Albert_Modiano dul:Person 0 14 dul:Person tp tp
33 5 dbr:Paris dul:Place 33 6 dul:Place fp, fn fp, fn
74 3 aksw:Albert_Modiano dul:Person 74 12 dul:Person fp, fn fp, fn
92 2 aksw:Albert_Modiano dul:Person 92 2 dul:Person tp tp
116 16 aksw:Sephardi_family dul:Organization 116 19 dul:Organization fp, fn fp, fn
136 20 dbr:Thessaloniki dul:Place 136 20 dul:Place tp tp
158 3 aksw:Albert_Modiano dul:Person 158 3 dul:Person tp tp
162 6 dbr:Mother dul:Role 162 6 dul:Role tp tp
170 14 aksw:Louisa_Colpijn dul:Person 170 14 dul:Person tp tp
205 7 dbr:Actor dul:Role 205 7 dul:Role tp tp
227 14 aksw:Louisa_Colpeyn dul:Person 227 6 dul:Role fp, fn fp, fn
--- --- --- --- 234 7 dul:Person fp fp

These numbers lead to precision=0.583, recall=0.636 and F1-score=0.609.

So what I gathered so far is that GERBIL identifies the cases as they are described in the table above. However, the numbers that are calculated based on these counts are not correct. We will search for the problem and update GERBIL.

@jplu
Copy link
Author

jplu commented Nov 28, 2017

Thanks @MichaelRoeder! Let me know once the bug will be fixed.

@TortugaAttack
Copy link
Contributor

TortugaAttack commented Dec 10, 2017

Hi,

sorry it took me so long. Much todo right now.

Is there an open endpoint or could you provide me the ADEL webservice url? (here or DM)
It would be much easier for me to check against the actual WebService.

@MichaelRoeder
Copy link
Member

@TortugaAttack I have reproduced the problem using the two NIF files listed above. You can use the FileBasedNIFDataset for loading the data and the InstanceListBasedAnnotator to load the result file of the annotator and simulate the behaviour of an annotator (you have to make sure that the URIs of the documents in both files are the same - I think the annotator result NIF above has a different URI for the document, the needs to be replaced).

Based on that, you should add a JUnit test (you can copy and adapt the SingleRunTest for that).

@TortugaAttack
Copy link
Contributor

Well. I found a problem.
In Hier:

If the annotator provides wrong results:
f.e.:

<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=33,39>
        a                     nif:String , nif:RFC5147String , nif:Phrase ;
        nif:anchorOf          "Paris)" ;
        nif:beginIndex        "33"^^xsd:nonNegativeInteger ;
        nif:endIndex          "39"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
        itsrdf:taClassRef     dul:Place .

It will be count as tp=0, fp=1, fn=0
By removing the part where annotations not in the golden std. the results matches yours.

I guess it is debatable here if ETyping should acknowledge Recognition too.
I can remove it and everything would be matching the results. or let in there and we should provide this information in the wiki.
Unit test will be changed accroding on what it should be

@MichaelRoeder
Copy link
Member

I do not see how this solves the issue since we have to count it as a false positive - as it is done in the table above as well. However, if it solved the problem for you, it might be possible that we count it twice... right?

@TortugaAttack
Copy link
Contributor

TortugaAttack commented Dec 11, 2017

no it is not done in the table above.
In the table above you havve the 11 entities which are in the golden std. (and one with -- i am not sure what you mean by that)
In Gerbil we have currently 16. 11 Golden std (which are counted correct according to the table) + 5 from the annotator which are not in the golden std.

Again:
F.e.

<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=33,39>
        a                     nif:String , nif:RFC5147String , nif:Phrase ;
        nif:anchorOf          "Paris)" ;
        nif:beginIndex        "33"^^xsd:nonNegativeInteger ;
        nif:endIndex          "39"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
        itsrdf:taClassRef     dul:Place .

Is not counted in your table.
If we ignore those entities not in the golden std. we get the results you calculated.
If not we get other results.

@MichaelRoeder
Copy link
Member

The table is structured by the gold std. entities (11) and the entities of the system answers (12) mapped to them. The last system answer does not match any gold standard (that is the reason for the ---). Apart from that, there are 4 entities from the system that are not exactly matching the gold std (like the "Paris)" example you described). So the table does contain 16 distinct entities 😉

TortugaAttack pushed a commit that referenced this issue Dec 11, 2017
@MichaelRoeder
Copy link
Member

MichaelRoeder commented Dec 11, 2017

  1. We fixed a bug in the hierarchical F1-measure counting that could lead to doubling the number of fp counts.

  2. Apart from that, there is a misunderstanding in the calculation of the hierarchical F-measure and the table that I posted before shows exactly the misunderstanding: when evaluating the results of an annotation system, the evaluation can not match "Paris" and "Paris)" as we have done it in the table above. For a human we would automatically put them in the same line but for the evaluation, these two entities are different and have to be handled separatetly. Thus, the updated table looks like the following.

GS start GS length GS URI GS type Sys start Sys length Sys type Erec Matching hier. Matching hier. prec hier. recall hier. F1
0 14 aksw:Albert_Modiano dul:Person 0 14 dul:Person tp tp 1.0 1.0 1.0
33 5 dbr:Paris dul:Place --- --- --- fn fn 0.0 0.0 0.0
74 3 aksw:Albert_Modiano dul:Person 74 12 dul:Person fn fn 0.0 0.0 0.0
92 2 aksw:Albert_Modiano dul:Person 92 2 dul:Person tp tp 1.0 1.0 1.0
116 16 aksw:Sephardi_family dul:Organization 116 19 dul:Organization fn fn 0.0 0.0 0.0
136 20 dbr:Thessaloniki dul:Place 136 20 dul:Place tp tp 1.0 1.0 1.0
158 3 aksw:Albert_Modiano dul:Person 158 3 dul:Person tp tp 1.0 1.0 1.0
162 6 dbr:Mother dul:Role 162 6 dul:Role tp tp 1.0 1.0 1.0
170 14 aksw:Louisa_Colpijn dul:Person 170 14 dul:Person tp tp 1.0 1.0 1.0
205 7 dbr:Actor dul:Role 205 7 dul:Role tp tp 1.0 1.0 1.0
227 14 aksw:Louisa_Colpeyn dul:Person 227 6 dul:Role fn fn 0.0 0.0 0.0
--- --- --- --- 33 6 dul:Place fp fp 0.0 0.0 0.0
--- --- --- --- 74 12 dul:Place fp fp 0.0 0.0 0.0
--- --- --- --- 116 19 dul:Organization fp fp 0.0 0.0 0.0
--- --- --- --- 227 6 dul:Role fp fp 0.0 0.0 0.0
--- --- --- --- 234 7 dul:Person fp fp 0.0 0.0 0.0

For the recognition of entities, there is no difference since we can simply sum up the tp, fp and fn counts. However, for the hierarchical F-measure, this is not possible. When evaluating the typing, we have to compare trees/hierarchies of types which can lead to more than one tp, fp or fn per comparison. Since we want to handle the single entities equal, GERBIL calculates the precision, recall and F1-measure for every entity (can be found in the table above). The averages of these values are the precision, recall and F1-measure scores for the complete document (for the example above, it is precision=7/16, recall=7/16 and F1-score=7/16).

@jplu @rtroncy I know it is not the most intuitive implementation 😃. It is arguable whether it is okay to have a "missed" entity not only counted as fn but as precision and recall = 0 and count the (nearly matching) fp entity again with precision and recall = 0. The only alternative that I can think of is a complicated weighting of the hierarchical tp, fp and fn counts to ensure that entities with a complex type hierarchy don't have a larger influence on the result compared to entities with an "easy" set of types.

@jplu
Copy link
Author

jplu commented Dec 11, 2017

Thanks @MichaelRoeder and @TortugaAttack. I can perfectly understand your concerns about the scoring I raised but it is more to be aligned with the known and popular neleval scorer.

Personally I think that the annotation:

<http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=33,39>
        a                     nif:String , nif:RFC5147String , nif:Phrase ;
        nif:anchorOf          "Paris)" ;
        nif:beginIndex        "33"^^xsd:nonNegativeInteger ;
        nif:endIndex          "39"^^xsd:nonNegativeInteger ;
        nif:referenceContext  <http://www.ontologydesignpatterns.org/data/oke-challenge-2015/task-1/document-75#char=0,242> ;
        itsrdf:taClassRef     dul:Place .

Must be count as "false positive" AND "false negative" (if the system do not propose nested entities) because the offset do not match, and then the type even if it is good one should not be taken as true positive but also as "false positive" AND "false negative" in the recognition step. This is how works neleval, and I'm ok with that because it seems logic to me.

Please, can you let me know once the fix will be pushed to the public instance of GERBIL? I will rerun my script for scoring and then compare between GERBIL and neleval.

@MichaelRoeder
Copy link
Member

Of cause, we will let you know. However, I think we still have a small misunderstanding.

Let's focus on the "Paris" / "Paris)" example. I totally agree that the recognition step has to count this as fp AND fn. I think there is no discussion regarding this point 😉
I want to underline, that the typing step is not able to see "Paris)" as attempt to match "Paris". It will handle them as two single entities and calculate for each of them precision, recall and F1-measure (for the reasons explained above). Therefore, it will count this 2 times with precision, recall, f1-score = 0 (not 1xfp and 1xfn) which leads to the overall evaluation scores of precision, recall, f1-score = 0.4375 which might be lower than expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants