<a href="https://colab.research.google.com/github/RonitShetty/NLP-Labs/blob/main/C070_RonitShetty_NLPLab7.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# NLP Lab 7
****
**Aim:** Identify the Named Entity Recognition (NER) in text data.

**Roll No.:** C070  
**Name:** Ronit Shetty  
**SAP ID:** 70322000128  
**Division:** C  
**Batch:** C1  

In [1]:
# Install spaCy library and download the small English model
!pip install -q spacy
!python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m107.1 MB/s[0m eta [36m0:00:00[0m
[?25h[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


#Implementing NER with spaCy

This task uses the spaCy library to identify named entities. spaCy is a modern and efficient library for production-level NLP tasks.

In [2]:
import spacy

# Load the pre-trained English language model
# 'en_core_web_sm' is a small English model trained on written web text.
nlp = spacy.load("en_core_web_sm")

# Define the text to be analyzed
text = """
Apple Inc., a technology company founded by Steve Jobs, is reportedly looking to acquire a U.K. startup for over $1 billion.
The deal is expected to be finalized by next Tuesday in London. Sundar Pichai, the CEO of Google, commented on the market trends.
"""

# Process the text with the spaCy NLP pipeline
doc = nlp(text)

# Print the identified named entities, their labels, and an explanation of the label.
print("--- Identified Named Entities ---")
for ent in doc.ents:
    print(f"Entity: {ent.text:<15} | Label: {ent.label_:<10} | Explanation: {spacy.explain(ent.label_)}")

--- Identified Named Entities ---
Entity: Apple Inc.      | Label: ORG        | Explanation: Companies, agencies, institutions, etc.
Entity: Steve Jobs      | Label: PERSON     | Explanation: People, including fictional
Entity: U.K.            | Label: GPE        | Explanation: Countries, cities, states
Entity: over $1 billion | Label: MONEY      | Explanation: Monetary values, including unit
Entity: next Tuesday    | Label: DATE       | Explanation: Absolute or relative dates or periods
Entity: London          | Label: GPE        | Explanation: Countries, cities, states
Entity: Sundar Pichai   | Label: PERSON     | Explanation: People, including fictional
Entity: Google          | Label: ORG        | Explanation: Companies, agencies, institutions, etc.


#Visualizing Named Entities with displaCy

A powerful feature of spaCy is its built-in visualizer, displaCy. It helps in presenting the NER results in a more human-readable format by highlighting the entities directly in the text.

In [3]:
# Import the displacy module from spaCy
from spacy import displacy

# Use displacy to render the NER output within a Jupyter/Colab notebook
# The 'style="ent"' option specifies that we want to visualize entities.
displacy.render(doc, style="ent", jupyter=True)

#Implementing NER with NLTK

This task demonstrates how to perform NER using another popular library, the Natural Language Toolkit (NLTK). The process in NLTK is more granular, typically requiring three steps: tokenization, Part-of-Speech (POS) tagging, and then chunking to find entities.

In [4]:
import nltk

# NLTK requires downloading specific packages
nltk.download('punkt')
nltk.download('punkt_tab')
nltk.download('maxent_ne_chunker_tab')
nltk.download('words')
nltk.download('averaged_perceptron_tagger_eng')
print("\n--- Task 3: Identifying Named Entities using NLTK ---")

# 1. Tokenize the sentence into words
tokenized_text = nltk.word_tokenize(text)

# 2. Apply Part-of-Speech (POS) tagging to the tokenized words
pos_tagged_text = nltk.pos_tag(tokenized_text)

# 3. Apply Named Entity chunking
# The 'binary=True' parameter groups all named entities under a single 'NE' label
named_entities_tree = nltk.ne_chunk(pos_tagged_text)

print(named_entities_tree)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package maxent_ne_chunker_tab to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package maxent_ne_chunker_tab is already up-to-date!
[nltk_data] Downloading package words to /root/nltk_data...
[nltk_data]   Package words is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger_eng is already up-to-
[nltk_data]       date!



--- Task 3: Identifying Named Entities using NLTK ---
(S
  (PERSON Apple/NNP)
  (ORGANIZATION Inc./NNP)
  ,/,
  a/DT
  technology/NN
  company/NN
  founded/VBN
  by/IN
  (PERSON Steve/NNP Jobs/NNP)
  ,/,
  is/VBZ
  reportedly/RB
  looking/VBG
  to/TO
  acquire/VB
  a/DT
  U.K./NNP
  startup/NN
  for/IN
  over/IN
  $/$
  1/CD
  billion/CD
  ./.
  The/DT
  deal/NN
  is/VBZ
  expected/VBN
  to/TO
  be/VB
  finalized/VBN
  by/IN
  next/JJ
  Tuesday/NNP
  in/IN
  (GPE London/NNP)
  ./.
  (PERSON Sundar/NNP Pichai/NNP)
  ,/,
  the/DT
  (ORGANIZATION CEO/NNP)
  of/IN
  (GPE Google/NNP)
  ,/,
  commented/VBD
  on/IN
  the/DT
  market/NN
  trends/NNS
  ./.)
