NER can be implemented easily using **spaCy**, an open-source NLP library. It’s used for various tasks and has built-in methods for NER. It also has a fast statistical entity recognition system. Generally, the spaCy model performs well for all types of text data but it can be fine-tuned for specific business needs.

In [1]:
import spacy
from spacy import displacy

**spaCy pipelines for NER**

spaCy has three main English pipelines that are optimized for CPU to perform NER.

a. En_core_web_sm

b. En_core_web_md

c. En_core_web_lg

These models are listed in ascending order according to their size where sm, md, lg denote small, medium, and large models, respectively.

In [2]:
spacy.cli.download("en_core_web_sm")
NER = spacy.load("en_core_web_sm")

def spacy_large_ner(document):
  return {(ent.text.strip(), ent.label_) for ent in NER(document).ents}

[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')


In [4]:
doc="The World Health Organization (WHO)[1] is a specialized agency of the United Nations responsible for international public health.[2] The WHO Constitution states its main objective as 'the attainment by all peoples of the highest possible level of health'.[3] Headquartered in Geneva, Switzerland, it has six regional offices and 150 field offices worldwide. The WHO was established on 7 April 1948.[4][5] The first meeting of the World Health Assembly (WHA), the agency's governing body, took place on 24 July of that year. The WHO incorporated the assets, personnel, and duties of the League of Nations' Health Organization and the Office International d'Hygiène Publique, including the International Classification of Diseases (ICD).[6] Its work began in earnest in 1951 after a significant infusion of financial and technical resources.[7]"

In [5]:
doc


"The World Health Organization (WHO)[1] is a specialized agency of the United Nations responsible for international public health.[2] The WHO Constitution states its main objective as 'the attainment by all peoples of the highest possible level of health'.[3] Headquartered in Geneva, Switzerland, it has six regional offices and 150 field offices worldwide. The WHO was established on 7 April 1948.[4][5] The first meeting of the World Health Assembly (WHA), the agency's governing body, took place on 24 July of that year. The WHO incorporated the assets, personnel, and duties of the League of Nations' Health Organization and the Office International d'Hygiène Publique, including the International Classification of Diseases (ICD).[6] Its work began in earnest in 1951 after a significant infusion of financial and technical resources.[7]"

In [6]:
#Call the function with the above text as input.
spacy_large_ner(doc)

{('150', 'CARDINAL'),
 ('1951', 'DATE'),
 ('24 July of that year', 'DATE'),
 ('7 April 1948.[4][5', 'DATE'),
 ('Geneva', 'GPE'),
 ('Switzerland', 'GPE'),
 ('The World Health Organization', 'ORG'),
 ('WHA', 'ORG'),
 ('WHO', 'ORG'),
 ('first', 'ORDINAL'),
 ("health'.[3]", 'ORG'),
 ('six', 'CARDINAL'),
 ('the International Classification of Diseases', 'ORG'),
 ("the League of Nations' Health Organization", 'ORG'),
 ("the Office International d'Hygiène Publique", 'ORG'),
 ('the United Nations', 'ORG'),
 ('the World Health Assembly', 'ORG')}

These are the extracted named entities from the input text of WHO.

You can check what type a particular named entity is with the method below:

In [7]:
spacy.explain("ORG")

'Companies, agencies, institutions, etc.'

In [8]:
#SpaCy also provides an interesting visual to see named entities directly in the text.

displacy.render(NER(doc),style="ent",jupyter=True)