# **Assignment 3:**

Apply NER to identify entities such as names, organizations, and locations in a given
text. Perform sentence segmentation on a paragraph and explain its importance in NLP tasks.

### **Importing necessary libraries**


In [19]:
import spacy
import nltk
nltk.download('punkt')  # For sentence tokenization
nltk.download('punkt_tab')
from nltk.tokenize import sent_tokenize # NLTK for sentence segmentation
from collections import Counter
from spacy import displacy  # For visualization

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


### **Load the NLP model**

In [20]:
# Load spaCy English model
nlp = spacy.load("en_core_web_sm")

"en_core_web_sm" is a small English model sufficient for basic NER

### NER Text

In [21]:
paragraph = """
On 5th November 2025, Sundar Pichai, the CEO of Google, attended the Global Tech Summit in San Francisco, California.
During the event, he discussed the future of artificial intelligence, cloud computing, and quantum computing, emphasizing the role of Google Cloud, TensorFlow, and Alphabet Inc.
Microsoft, Amazon, and IBM also participated, sharing their latest innovations in machine learning, cybersecurity, and enterprise software.
Meanwhile, in Japan, Sony announced a collaboration with Panasonic to develop advanced battery technology for electric vehicles, aiming to reduce carbon emissions.
Intel and NVIDIA presented their latest processors and GPUs designed for AI and high-performance computing.
The summit attracted participants from over 50 countries, including France, Brazil, Australia, Germany, South Korea, and Canada, highlighting the global impact of technology.
Workshops led by Elon Musk and Jeff Bezos focused on sustainable energy, space exploration, and autonomous systems.
Additionally, the summit included panels on data privacy and ethical AI, featuring experts from MIT, Stanford University, and the University of Tokyo.
"""


### Text Processing

In [22]:
doc = nlp(paragraph)


In [23]:
doc


On 5th November 2025, Sundar Pichai, the CEO of Google, attended the Global Tech Summit in San Francisco, California. 
During the event, he discussed the future of artificial intelligence, cloud computing, and quantum computing, emphasizing the role of Google Cloud, TensorFlow, and Alphabet Inc. 
Microsoft, Amazon, and IBM also participated, sharing their latest innovations in machine learning, cybersecurity, and enterprise software. 
Meanwhile, in Japan, Sony announced a collaboration with Panasonic to develop advanced battery technology for electric vehicles, aiming to reduce carbon emissions. 
Intel and NVIDIA presented their latest processors and GPUs designed for AI and high-performance computing. 
The summit attracted participants from over 50 countries, including France, Brazil, Australia, Germany, South Korea, and Canada, highlighting the global impact of technology. 
Workshops led by Elon Musk and Jeff Bezos focused on sustainable energy, space exploration, and autonomous sys

### **Sentence Segmentation**

Sentence segmentation divides text into meaningful sentences, helping NLP models process text effectively.

In [25]:
sentences = sent_tokenize(paragraph)
print("Segmented Sentences:\n")
for i, sentence in enumerate(sentences, start=1):
    print(f"{i}. {sentence.strip()}\n")

Segmented Sentences:

1. On 5th November 2025, Sundar Pichai, the CEO of Google, attended the Global Tech Summit in San Francisco, California.

2. During the event, he discussed the future of artificial intelligence, cloud computing, and quantum computing, emphasizing the role of Google Cloud, TensorFlow, and Alphabet Inc. 
Microsoft, Amazon, and IBM also participated, sharing their latest innovations in machine learning, cybersecurity, and enterprise software.

3. Meanwhile, in Japan, Sony announced a collaboration with Panasonic to develop advanced battery technology for electric vehicles, aiming to reduce carbon emissions.

4. Intel and NVIDIA presented their latest processors and GPUs designed for AI and high-performance computing.

5. The summit attracted participants from over 50 countries, including France, Brazil, Australia, Germany, South Korea, and Canada, highlighting the global impact of technology.

6. Workshops led by Elon Musk and Jeff Bezos focused on sustainable energy

In [26]:
for ent in doc.ents:
    print(ent.text, ent.start, ent.end, ent.start_char, ent.start_char, ent.label_)

5th 2 3 4 4 ORDINAL
November 2025 3 5 8 8 DATE
Sundar Pichai 6 8 23 23 PERSON
Google 12 13 49 49 ORG
the Global Tech Summit 15 19 66 66 ORG
San Francisco 20 22 92 92 GPE
California 23 24 107 107 GPE
Google Cloud, TensorFlow 49 53 254 254 ORG
Alphabet Inc. 55 57 284 284 ORG
Microsoft 58 59 299 299 ORG
Amazon 60 61 310 310 ORG
IBM 63 64 322 322 ORG
Japan 85 86 454 454 GPE
Sony 87 88 461 461 ORG
Panasonic 92 93 497 497 ORG
Intel 109 110 605 605 ORG
AI 120 121 678 678 GPE
over 50 133 135 753 753 CARDINAL
France 138 139 782 782 GPE
Brazil 140 141 790 790 GPE
Australia 142 143 798 798 GPE
Germany 144 145 809 809 GPE
South Korea 146 148 818 818 GPE
Canada 150 151 835 835 GPE
Elon Musk 163 165 907 907 PERSON
Jeff Bezos 166 168 921 921 PERSON
MIT 197 198 1103 1103 ORG
Stanford University 199 201 1108 1108 ORG
the University of Tokyo 203 207 1133 1133 ORG


### **Name Entitiy Recognition (NER)**
NER extracts important entities like names, organizations, and locations.

In [28]:
# Apply NER on full paragraph
doc = nlp(paragraph)
print("\nNamed Entities in Paragraph:\n")
for ent in doc.ents:
    print(f"{ent.text} - {ent.label_}")


Named Entities in Paragraph:

5th - ORDINAL
November 2025 - DATE
Sundar Pichai - PERSON
Google - ORG
the Global Tech Summit - ORG
San Francisco - GPE
California - GPE
Google Cloud, TensorFlow - ORG
Alphabet Inc. - ORG
Microsoft - ORG
Amazon - ORG
IBM - ORG
Japan - GPE
Sony - ORG
Panasonic - ORG
Intel - ORG
AI - GPE
over 50 - CARDINAL
France - GPE
Brazil - GPE
Australia - GPE
Germany - GPE
South Korea - GPE
Canada - GPE
Elon Musk - PERSON
Jeff Bezos - PERSON
MIT - ORG
Stanford University - ORG
the University of Tokyo - ORG


### **Counting entities by Type**

In [29]:
# Count entities by type
entity_labels = [ent.label_ for ent in doc.ents]
label_count = Counter(entity_labels)
print("\nEntity Count by Type:")
for label, count in label_count.items():
    print(f"{label}: {count}")



Entity Count by Type:
ORDINAL: 1
DATE: 1
PERSON: 3
ORG: 13
GPE: 10
CARDINAL: 1


In [30]:
# NER sentence by sentence
print("\nEntities Sentence by Sentence:\n")
for sentence in sentences:
    doc_sentence = nlp(sentence)
    print(f"Sentence: {sentence}")
    for ent in doc_sentence.ents:
        print(f"  - {ent.text} ({ent.label_})")
    print()



Entities Sentence by Sentence:

Sentence: 
On 5th November 2025, Sundar Pichai, the CEO of Google, attended the Global Tech Summit in San Francisco, California.
  - 5th (ORDINAL)
  - November 2025 (DATE)
  - Sundar Pichai (PERSON)
  - Google (ORG)
  - the Global Tech Summit (ORG)
  - San Francisco (GPE)
  - California (GPE)

Sentence: During the event, he discussed the future of artificial intelligence, cloud computing, and quantum computing, emphasizing the role of Google Cloud, TensorFlow, and Alphabet Inc. 
Microsoft, Amazon, and IBM also participated, sharing their latest innovations in machine learning, cybersecurity, and enterprise software.
  - Google Cloud, TensorFlow (ORG)
  - Alphabet Inc. (ORG)
  - Microsoft (ORG)
  - Amazon (ORG)
  - IBM (ORG)

Sentence: Meanwhile, in Japan, Sony announced a collaboration with Panasonic to develop advanced battery technology for electric vehicles, aiming to reduce carbon emissions.
  - Japan (GPE)
  - Sony (ORG)
  - Panasonic (ORG)

Senten

## **Visualization**

visualizing adds insight for text analytics and information extraction

In [35]:
# Visualization Name Entities in Paragraph

print("\nVisualizing Named Entities")
displacy.render(doc, style="ent", jupyter=True)


Visualizing Named Entities


In [38]:
# Explanation of Labels

print("\nNER Label Explanation:")
print("PERSON -> Names of people")
print("ORG    -> Organizations, companies, institutions")
print("GPE    -> Geopolitical entities (cities, countries, states)")



NER Label Explanation:
PERSON -> Names of people
ORG    -> Organizations, companies, institutions
GPE    -> Geopolitical entities (cities, countries, states)
