### 1) i.Describe Named Entity Recognition.

   Named Entity Recognition (NER) is a crucial task in natural language processing (NLP) that involves identifying and categorizing named entities within a body of text into predefined categories such as names of persons, organizations, locations, dates, numerical expressions, and more. This technique is pivotal in various NLP applications like information retrieval, question answering, sentiment analysis, and machine translation.

At its core, NER operates by utilizing machine learning algorithms, typically supervised learning methods, to classify words or phrases in text into predefined categories. These algorithms are trained on labeled datasets, where each word or phrase is tagged with its corresponding entity type. Commonly used machine learning techniques for NER include conditional random fields (CRFs), support vector machines (SVMs), and deep learning approaches like recurrent neural networks (RNNs) and transformers.

The process of NER involves several steps:

1. **Preprocessing**: The input text is tokenized into individual words or subword units, and often preprocessed to remove noise, such as stop words or punctuation.

2. **Feature Extraction**: Relevant features are extracted from the text to provide meaningful information to the machine learning model. These features can include word embeddings, part-of-speech tags, and syntactic features.

3. **Training**: The model is trained on a labeled dataset, where each word or phrase is annotated with its corresponding entity type. During training, the model learns to recognize patterns and associations between words and entity types.

4. **Inference**: Once trained, the model is applied to new, unseen text to identify named entities. This involves predicting the entity type for each word or phrase in the text.

NER is widely used in various real-world applications. For example, consider the sentence:

"Apple Inc. was founded by Steve Jobs, Steve Wozniak, and Ronald Wayne on April 1, 1976, in Cupertino, California."

In this sentence, NER would identify the following named entities:

- "Apple Inc." as an organization.
- "Steve Jobs," "Steve Wozniak," and "Ronald Wayne" as persons.
- "April 1, 1976" as a date.
- "Cupertino, California" as a location.

This example illustrates how NER can extract meaningful information from text by identifying and categorizing named entities.

NER systems can be further enhanced by incorporating domain-specific knowledge, leveraging contextual information, and utilizing ensemble methods to improve accuracy. However, NER also faces challenges such as handling ambiguous entities, dealing with out-of-vocabulary words, and adapting to different languages and domains.

In conclusion, Named Entity Recognition plays a crucial role in extracting structured information from unstructured text, enabling various NLP applications to analyze and understand text more effectively. Through the use of machine learning algorithms and linguistic techniques, NER systems can automatically identify and classify named entities, facilitating tasks such as information extraction, knowledge discovery, and semantic understanding.

### ii. Write a python code to recognize named entities in a document. The input should be a text file containing 200 - 300 words in it. The output should be written to another file

In [5]:
import spacy
from spacy.matcher import Matcher

In [6]:
nlp = spacy.load("en_core_web_sm")

In [7]:
input_file_path ="C:/Users/91709/Downloads/ADS-main/ADS Assignment/1/Input file.txt"
output_file_path ="C:/Users/91709/Downloads/ADS-main/ADS Assignment/1/Output file.txt"


In [8]:

    # Read the input file
with open(input_file_path, "r") as f:
 text = f.read()


In [9]:
# Process the text using spaCy
doc = nlp(text)

In [10]:
# Initialize an empty list to store the named entities
entities = []


In [11]:
# Iterate through the named entities in the document
for ent in doc.ents:
    # Append the named entity to the list
    entities.append((ent.text, ent.start_char, ent.end_char, ent.label_))

In [12]:
# Write the named entities to the output file
with open(output_file_path, "w") as f:
    for ent in entities:
        f.write(f"{ent[0]} {ent[1]} {ent[2]} {ent[3]}\n")

In [13]:
print("Named entities have been written to the output file.")

Named entities have been written to the output file.
