# Named Entity Recognition

Named Entity Recognition (NER) is a technique in natural language processing (NLP) that focuses on identifying and classifying entities. The purpose of NER is to automatically extract structured information from unstructured text, enabling machines to understand and categorize entities in a meaningful manner for various applications like text summarization, building knowledge graphs, question answering, and knowledge graph construction.

NER systems find applications across various domains, including question answering, information retrieval and machine translation. NER plays an important role in enhancing the precision of other NLP tasks like part-of-speech tagging and parsing. At its core, NER is just a two-step process, below are the two steps that are involved:

1. Detecting the entities from the text
2. Classifying them into different categories


## Workflow

The working of Named Entity Recognition is discussed below:

1. The NER system analyses the entire input text to identify and locate the named entities.

2. The system then identifies the sentence boundaries by considering capitalization rules. It recognizes the end of the sentence when a word starts with a capital letter, assuming it could be the beginning of a new sentence. Knowing sentence boundaries aids in contextualizing entities within the text, allowing the model to understand relationships and meanings.
3. NER can be trained to classify entire documents into different types, such as invoices, receipts, or passports. Document classification enhances the versatility of NER, allowing it to adapt its entity recognition based on the specific characteristics and context of different document types.
4. NER employs machine learning algorithms, including supervised learning, to analyze labeled datasets. These datasets contain examples of annotated entities, guiding the model in recognizing similar entities in new, unseen data.
5. Through multiple training iterations, the model refines its understanding of contextual features, syntactic structures, and entity patterns, continuously improving its accuracy over time.
6. The model’s ability to adapt to new data allows it to handle variations in language, context, and entity types, making it more robust and effective.

## Named Entity Recognition (NER) Methods

1. Lexicon Based Method

    The NER uses a dictionary with a list of words or terms. The process involves checking if any of these words are present in a given text. However, this approach isn’t commonly used because it requires constant updating and careful maintenance of the dictionary to stay accurate and effective.

2. Rule Based Method

    The Rule Based NER method uses a set of predefined rules that guide the extraction of information. These rules are based on patterns and context. Pattern-based rules focus on the structure and form of words, looking at their morphological patterns. On the other hand, context-based rules consider the surrounding words or the context in which a word appears within the text document. This combination of pattern-based and context-based rules enhances the precision of information extraction in Named Entity Recognition (NER).

3. Machine Learning-Based Method
   1. Multi-Class Classification with Machine Learning Algorithms

        One way is to train the model for multi-class classification using different machine learning algorithms, but it requires a lot of labelling. In addition to labelling the model also requires a deep understanding of context to deal with the ambiguity of the sentences. This makes it a challenging task for a simple machine learning algorithm.

   2. Conditional Random Field (CRF)

        Conditional random field is implemented by both NLP Speech Tagger and NLTK.  It is a probabilistic model that can be used to model sequential data such as words. The CRF can capture a deep understanding of the context of the sentence.

4. Deep Learning Based Method

    Deep learning NER system is much more accurate than previous method, as it is capable to assemble words. This is due to the fact that it uses a method called word embedding, that is capable of understanding the semantic and syntactic relationship between various words. It is also able to learn and analyze topic specific as well as high level words automatically.
    This makes deep learning NER applicable for performing multiple tasks. Deep learning can do most of the repetitive work itself, hence researchers can use their time more efficiently.

## Named Entity Recognition with SpaCy

Entity Types supported by SpaCy:

|Entity Type|Description|
|-----------|-----------|
|PERSON|      People, including fictional.|
|NORP  |      Nationalities or religious or political groups.|
|FAC  |      Buildings, airports, highways, bridges, etc.|
|ORG    |     Companies, agencies, institutions, etc.|
|GPE    |     Countries, cities, states.|
|LOC    |     Non-GPE locations, mountain ranges, bodies of water.|
|PRODUCT |    Objects, vehicles, foods, etc. (Not services.)|
|EVENT |      Named hurricanes, battles, wars, sports events, etc.|
|WORK_OF_ART| Titles of books, songs, etc.|
|LAW  |       Named documents made into laws.|
|LANGUAGE |   Any named language.|
|DATE  |      Absolute or relative dates or periods.|
|TIME |       Times smaller than a day.|
|PERCENT |    Percentage, including ”%“.|
|MONEY   |   Monetary values, including unit.|
|QUANTITY |   Measurements, as of weight or distance.|
|ORDINAL  |   “first”, “second”, etc.|
|CARDINAL |   Numerals that do not fall under another type.|

In [12]:
# Import libraries and load spacy's english model
import spacy
from spacy import displacy

nlp = spacy.load("en_core_web_sm")

In [None]:
# Process the text and print entity types
content = "The national capital recorded its worst fog yet this season with zero visibility conditions for nine hours\
    from mid-night hampering flight operations at Indira Gandhi International airport where over 100 flights were delayed, 19 diverted.\
        Intense foggy conditions also prevailed over Punjab, Haryana, parts of Himachal Pradesh, Uttarakhand, Rajasthan, Uttar Pradesh,\
            Bihar, Jharkhand, Tripura, Assam, Meghalaya and Mizoram."

doc = nlp(content)

for token in doc.ents:
    print(f"{token.text} -> {token.label_}")

this season -> DATE
zero -> CARDINAL
nine hours -> TIME
mid-night -> TIME
Indira Gandhi International -> FAC
over 100 -> CARDINAL
19 -> CARDINAL
Punjab -> NORP
Haryana -> PERSON
Himachal Pradesh -> ORG
Uttarakhand -> GPE
Rajasthan -> GPE
Uttar Pradesh -> ORG
Bihar -> GPE
Jharkhand -> GPE
Tripura -> GPE
Meghalaya -> ORG
Mizoram -> GPE


In [15]:
# Visualize named entities
displacy.render(doc, style="ent")

## Use Cases

NER can be used in a variety of applications, including:

1. News: News providers can use NER to categorize content into important information and trends. 

2. Customer support: NER can identify relevant customer complaints and queries, and direct them to the correct department. 

3. Human resources: NER can summarize applicants' resumes and extract information like qualifications, education, and references. 

4. Logistics: NER can scan bills of lading to identify entities like "Shipper", "Consignee" or "Carrier name". 

## Sources

1. GeeksForGeeks: [Named Entity Recognition](https://www.geeksforgeeks.org/named-entity-recognition/)
2. Turing: [A Comprehensive Guide to Named Entity Recognition (NER)](https://www.turing.com/kb/a-comprehensive-guide-to-named-entity-recognition)