# Introduction to Named entity recognition
Named entity recognition (NER) also known as named entity extraction, and entity identification is the task of tagging an entity is the task of extracting which seeks to extract named entities from unstructured text into predefined categories such as names, medical codes, quantities or similar.

The most common variant is the [CoNLL-20003](https://www.clips.uantwerpen.be/conll2003/ner/) format which uses the categories, person (PER), organization (ORG) location (LOC) and miscellaneous (MISC), which for example denote cases such nationalies. For example:

*Hello my name is $Kenneth_{PER}$ I live in $Trøjborg_{LOC}$ and work at $AU_{ORG}$.*

This is for example the tagset used by the Danish spaCy model DaCy.

In [13]:
# !pip install dacy

In [14]:
import dacy

nlp_da = dacy.load("small")
doc = nlp_da("Mit navn er Kenneth, jeg bor i Aarhus og arbejder i på Center for Humanities Computing")
displacy.render(doc, style="ent")

## Other tagsets
More extensive tagsets exist such as the Ontonotes 5, which for instance include Geopolitical entities (GPE), dates, nationalities and religous groups (NORP), and more. This is for example the one used by the English Spacy model:

In [26]:
import spacy

nlp = spacy.load("en_core_web_lg")
doc = nlp("Hello my name is Kenneth I live in Denmark and work at Aarhus University, I am Danish and today is monday 25th.")

from spacy import displacy
displacy.render(doc, style="ent")

## Tagging standards
There exist different tag standards for NER. The most used one is the IOB-format which frames the task as token classification denoting inside, outside and beginning of a token. Where outside is denotes as *"O"*, i.e. not an entity. Alternatively, *B-\** indicates the start of an entity (i.e. *B-ORG* for the *Aarhus* in *Aarhus University*), while *I-\** indicate the continuation of a token (e.g. University).

In [27]:
for t in doc:
    if t.ent_type:
        print(t, f"{t.ent_iob_}-{t.ent_type_}")
    else:
        print(t, t.ent_iob_)

Hello O
my O
name O
is O
Kenneth B-PERSON
I O
live O
in O
Denmark B-GPE
and O
work O
at O
Aarhus B-ORG
University I-ORG
, O
I O
am O
Danish B-NORP
and O
today B-DATE
is O
monday B-DATE
25th I-DATE
. O


## Variations of NER
While NER is currently framed as above this formulating does contain some limitations. For instance the entity Aarhus University really refers to both the location Aarhus, the University within Aarhus, thus nested NER (NNER) argues that it would be more correct to tag it in a nested fashion as \[\[$Aarhus_{LOC}$\] $University$\]$_{ORG}$ (Plank, 2020). Other task also include named entity linking. Which is the task of linking an entity to e.g. a wikipedia entry, thus you have to both know that it is indeed an entity and which entity it is (if it is indeed a defined entity).