## Named-Entity Recognition
Named Entity Recognition (NER) is a subtask of information extraction in Natural Language Processing (NLP) that involves identifying and classifying named entities in text into predefined categories. These categories often include names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc. NER is essential for understanding the meaning of texts by identifying the key elements they mention.

Named-entity recognition (NER) is the process of locating named entities in unstructured text and then classifying them into predefined categories, such as person names, organizations, locations, monetary values, percentages, and time expressions.

You can use NER to learn more about the meaning of your text. For example, you could use it to populate tags for a set of documents in order to improve the keyword search. You could also use it to categorize customer support tickets into relevant categories.

### **spaCy has the property .ents on Doc objects**


Key aspects of Named-Entity Recognition:

- Entity Identification: NER systems are designed to recognize named entities in a text. This involves not only identifying the entity but also determining its boundaries within the text (e.g., where the entity starts and ends).
- Entity Classification: After identifying an entity, the system classifies it into a category. Common categories include PERSON (for names of people), ORG (for organizations), LOC (for locations), DATE (for date expressions), MONEY (for monetary values), and more.
- Use in Various Applications: NER is used in many NLP applications such as search engines, content recommenders, question answering systems, and chatbots. It helps these systems understand the context of the text and provide relevant responses or actions.
- Techniques: NER can be implemented using various techniques, ranging from rule-based approaches to advanced machine learning methods, including deep learning. The choice of technique often depends on the complexity of the task and the available training data.
- Challenges: NER can be challenging due to the variety and ambiguity of natural language. For example, the same name can refer to different entities (e.g., "Jordan" can be a person's name or a country), and some entities might be less commonly known or have irregular forms.
- Integration with Other NLP Tasks: NER often works in conjunction with other NLP tasks like tokenization, part-of-speech tagging, and dependency parsing to accurately identify and categorize entities.
- Customization for Specific Domains: In some cases, NER systems are customized for specific domains, like medical texts or legal documents, where specialized knowledge is necessary to correctly identify and classify entities.

NER plays a crucial role in extracting meaningful information from unstructured text, making it a key component of many NLP systems and applications.

In [7]:
import spacy
from spacy import displacy

nlp = spacy.load("en_core_web_sm")

In [8]:
# Process the text in NER
doc = nlp(
    "Apple Inc. announced that their new iPhone model will be released in Cupertino in September 2023 for $999."
)
doc_txt = nlp(doc)

for ent in doc_txt.ents:
    print(
        f"""
{ent.text = }
{ent.start_char = }
{ent.end_char = }
{ent.label_ = }
spacy.explain('{ent.label_}') = {spacy.explain(ent.label_)}"""
    )


ent.text = 'Apple Inc.'
ent.start_char = 0
ent.end_char = 10
ent.label_ = 'ORG'
spacy.explain('ORG') = Companies, agencies, institutions, etc.

ent.text = 'iPhone'
ent.start_char = 36
ent.end_char = 42
ent.label_ = 'ORG'
spacy.explain('ORG') = Companies, agencies, institutions, etc.

ent.text = 'Cupertino'
ent.start_char = 69
ent.end_char = 78
ent.label_ = 'GPE'
spacy.explain('GPE') = Countries, cities, states

ent.text = 'September 2023'
ent.start_char = 82
ent.end_char = 96
ent.label_ = 'DATE'
spacy.explain('DATE') = Absolute or relative dates or periods

ent.text = '999'
ent.start_char = 102
ent.end_char = 105
ent.label_ = 'MONEY'
spacy.explain('MONEY') = Monetary values, including unit


In [10]:
# Visualize the NER
displacy.render(doc, style="ent", jupyter=True, options={"distance": 200})

In [16]:
# Example of a custom NER
NER = spacy.load("en_core_web_sm")

def spacy_large_ner(document):
    return {(ent.text.strip(), ent.label_) for ent in NER(document).ents}


In [26]:
doc = "The World Health Organization (WHO)[1] is a specialized agency of the United Nations responsible for international public health."
spaCy_ner = spacy_large_ner(doc)
spaCy_ner

{('The World Health Organization', 'ORG'), ('the United Nations', 'ORG')}

In [27]:
displacy.render(NER(doc), style="ent", jupyter=True, options={"distance": 200})