<b><i>Named Entity Recognition (NER)</b></i> is a natural language processing (NLP) task that involves identifying and categorizing named entities within unstructured text into predefined categories such as person names, organization names, locations, dates, and more. The goal of NER is to extract and classify specific entities from text to understand the underlying meaning and relationships within the content.

Here's how NER works:

1. Tokenization: The text is divided into individual words or tokens.

2. Part-of-Speech (POS) Tagging: Each token is assigned a part-of-speech tag (e.g., noun, verb, adjective) to understand its grammatical function in the sentence.

3. Named Entity Recognition: NER algorithms analyze the tokenized text and assign labels to tokens that represent named entities. These labels typically include categories such as person names, organization names, locations, dates, monetary values, and more.

4. Entity Categorization: The recognized named entities are categorized into predefined types or classes. For example, "New York City" might be categorized as a location entity, "Google" as an organization entity, and "John Smith" as a person entity.

NER is widely used in various NLP applications, including information extraction, document summarization, question answering, sentiment analysis, and more. It enables systems to automatically identify and extract relevant information from large volumes of text, improving efficiency and accuracy in tasks that require understanding textual data. Additionally, NER plays a crucial role in entity linking, where identified entities are linked to knowledge bases or databases for further enrichment and analysis.

In [1]:
sentence="The Eiffel Tower was built from 1887 to 1889 by Gustave Eiffel, whose company specialized in building metal frameworks and structures."

In [2]:
import nltk
nltk.word_tokenize(sentence)

['The',
 'Eiffel',
 'Tower',
 'was',
 'built',
 'from',
 '1887',
 'to',
 '1889',
 'by',
 'Gustave',
 'Eiffel',
 ',',
 'whose',
 'company',
 'specialized',
 'in',
 'building',
 'metal',
 'frameworks',
 'and',
 'structures',
 '.']

In [3]:
words = nltk.word_tokenize(sentence)

In [5]:
tagged_ele = nltk.pos_tag(words)

In [7]:
nltk.download('maxent_ne_chunker')

[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     C:\Users\DELL\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping chunkers\maxent_ne_chunker.zip.


True

In [9]:
nltk.download('words')

[nltk_data] Downloading package words to
[nltk_data]     C:\Users\DELL\AppData\Roaming\nltk_data...
[nltk_data]   Unzipping corpora\words.zip.


True

In [11]:
nltk.ne_chunk(tagged_ele).draw()