## Named Entity Recognition (localization & Classification)

In this assignment, you will implement a named entity recognition (NER) system. The goal of NER is to identify and classify named entities in a text. For example, in the sentence "I went to the University of Illinois at Urbana-Champaign", the named entities are "University of Illinois at Urbana-Champaign" (ORGANIZATION) and "Urbana-Champaign" (LOCATION). The NER system should be able to identify and classify the named entities in a text.

^ generated by copilot. Wow it used the repo name to generate the location... wild.

* RUFES https://tac.nist.gov/2020/KBP/RUFES/index.html
* Task description paper: https://blender.cs.illinois.edu/paper/kbp2020overview.pdf

### Todo: 

#### Train
* Refer to Section 5 for some code of baseline models.
* Please use the gold mentions in the tab file for trainig.

#### Evaluate
* ONLY **entity typing subtask**.
* You need to use the extracted mentions that we provided. In real-world applications, gold mentions are typically not given.
* Generate a prediction file and use the official scorer to compute a score.

* Evaluation: https://github.com/shahraj81/rufes

```
python score_submission.py -l ../../input/demo/log.txt -r demo ../../input/log_specifications.txt ../../input/demo/gold.tab ../../input/demo/system_output.tab ../../input/demo/scores/

```

#### Other tools
* You can use any tools you want to implement your system. For example, you can use the Stanford CoreNLP toolkit to extract mentions and use the AllenNLP toolkit to train a model.

* Fine-grained entity recognition: https://github.com/xiaoling/figer
* Ultra-fine grained entity typing: https://homes.cs.washington.edu/~eunsol/open_entity.html
* Neural Entity Typing with Knowledge Attention: https://github.com/thunlp/KNET

#### Submission
* A report describing your methods, results, and findings.
* The code. The code should include a README.md with the environment and running instructions and at least a “predictions.tab” file for your predictions on the test set.


#### output format

* one or more of `name` == `NAM`, `nominal`, and/or `pronominal mentions`
* Classify into 

* 3 sub-tasks: mention extraction, coreference resolution, and entity typing.


NIST	annotation-0	rich	20121213_WAPO_89fb0ad94e02f710be5c6a3aa2f71402:46-49	the rich	PER	NOM	1.0

```
NIST	annotation-1	Mitt Romney	20111231_WAPO_2ee2b1ca-33d9-11e1-a274-61fcdeecc5f5:153-164	Mitt Romney	PER;PER.Politician	NAM1	1.0
NIST	annotation-2	Obama	20111231_WAPO_2ee2b1ca-33d9-11e1-a274-61fcdeecc5f5:1098-1102	    Obama	PER;PER.Politician;PER.Politician.HeadOfGovernment	NAM	1.1
NIST	annotation-3	Mitt Romney	20111231_WAPO_2ee2b1ca-33d9-11e1-a274-61fcdeecc5f5:153-164	Mitt Romney	PER;PER.Politician1	NAM	1.0
NIST	annotation-4	Obama	20111231_WAPO_2ee2b1ca-33d9-11e1-a274-61fcdeecc5f5:1098-1102	    Obama	PER;PER.Politician;PER.Politician.HeadOfGovernment1	NAM	1.0
NIST	annotation-5	Mitt Romney	20111231_WAPO_2ee2b1ca-33d9-11e1-a274-61fcdeecc5f5:164-153	Mitt Romney	PER;PER.Politician	NAM	1.0
NIST	annotation-6	Obama	20111231_WAPO_2ee2b1ca-33d9-11e1-a274-61fcdeecc5f5:1098-110200	  Obama	PER;PER.Politician;PER.Politician.HeadOfGovernment	NAM	1.0
NIST	annotation-7	Mitt Romney	20111231_WAPO_2ee2b1ca-33d9-11e1-a274-61fcdeecc5f5a:153-164	Mitt Romney	PER;PER.Politician	NAM	1.0
NIST	annotation-8	Obama	20111231_WAPO_2ee2b1ca-33d9-11e1-a274-61fcdeecc5f5:-1098-1102	    Obama	PER;PER.Politician;PER.Politician.HeadOfGovernment	NAM	1.0
```


In [5]:
# from spacy.pipeline.ner import DEFAULT_NER_MODEL

import spacy
nlp = spacy.load("en_core_web_trf")
"""
CARDINAL, DATE, EVENT, FAC, GPE, LANGUAGE, LAW, LOC,
MONEY, NORP, ORDINAL, ORG, PERCENT, PERSON, PRODUCT, QUANTITY, TIME, WORK_OF_ART
"""

# config = {
#    "moves": None,
#    "update_with_oracle_cut_size": 100,
#    "model": DEFAULT_NER_MODEL,
#    "incorrect_spans_key": "incorrect_spans",
# }
# nlp.add_pipe("ner", config=config)

In [8]:
test = nlp("This is a sentence.")
print(type(test))

<class 'spacy.tokens.doc.Doc'>


In [12]:
import spacy
nlp = spacy.load("en_core_web_trf")
doc = nlp("Apple is looking at buying U.K. startup for $1 billion")

for ent in doc.ents:
    print(ent.text, ent.start_char, ent.end_char, ent.label_)

Apple 0 5 ORG
U.K. 27 31 GPE
$1 billion 44 54 MONEY


In [4]:
nlp??

[0;31mSignature:[0m      
[0mnlp[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0mtext[0m[0;34m:[0m [0mUnion[0m[0;34m[[0m[0mstr[0m[0;34m,[0m [0mspacy[0m[0;34m.[0m[0mtokens[0m[0;34m.[0m[0mdoc[0m[0;34m.[0m[0mDoc[0m[0;34m][0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0;34m*[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdisable[0m[0;34m:[0m [0mIterable[0m[0;34m[[0m[0mstr[0m[0;34m][0m [0;34m=[0m [0;34m[[0m[0;34m][0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mcomponent_cfg[0m[0;34m:[0m [0mUnion[0m[0;34m[[0m[0mDict[0m[0;34m[[0m[0mstr[0m[0;34m,[0m [0mDict[0m[0;34m[[0m[0mstr[0m[0;34m,[0m [0mAny[0m[0;34m][0m[0;34m][0m[0;34m,[0m [0mNoneType[0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m [0;34m->[0m [0mspacy[0m[0;34m.[0m[0mtokens[0m[0;34m.[0m[0mdoc[0m[0;34m.[0m[0mDoc[0m[0;34m[0m[0;34m[0m[0m
[0;31mType:[0m            English
[0;31mString form:[0m     <spacy.