**Install and Import spaCy**


In [10]:
!pip install spacy
!python -m spacy download en_core_web_sm


Collecting en-core-web-sm==3.7.1
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.7.1/en_core_web_sm-3.7.1-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m34.6 MB/s[0m eta [36m0:00:00[0m
[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


In [11]:
import spacy
from spacy import displacy


**Load spaCy Model and Process Text**


In [12]:
# Load the small English model
nlp = spacy.load("en_core_web_sm")

# Sample text for Named Entity Recognition
text = """Elon Musk is the CEO of Tesla and SpaceX.
He was born in Pretoria, South Africa, and now lives in the United States.
In 2021, his net worth exceeded $200 billion according to Forbes."""

# Process text with spaCy NLP pipeline
doc = nlp(text)


**Extract Named Entities**

In [13]:
print("Named Entities, their labels, and explanations:")
for ent in doc.ents:
    print(f"{ent.text} ({ent.label_}) - {spacy.explain(ent.label_)}")


Named Entities, their labels, and explanations:
Elon Musk (PERSON) - People, including fictional
Tesla (ORG) - Companies, agencies, institutions, etc.
Pretoria (GPE) - Countries, cities, states
South Africa (GPE) - Countries, cities, states
the United States (GPE) - Countries, cities, states
2021 (DATE) - Absolute or relative dates or periods
$200 billion (MONEY) - Monetary values, including unit
Forbes (ORG) - Companies, agencies, institutions, etc.


**Visualizing Named Entities**


In [14]:
displacy.render(doc, style="ent", jupyter=True)

**Bonus: Try Custom Text**
You can modify text to process different content, such as a news article or a Wikipedia excerpt.


In [15]:
# Load the small English model
nlp1 = spacy.load("en_core_web_sm")

# Custom text
text = """Robotics is an interdisciplinary field that integrates computer science and engineering.
Robots are widely used in industries such as automotive manufacturing and healthcare.
The first modern industrial robot, Unimate, was developed by George Devol in 1956 and later used in General Motors."""

# Process text with spaCy NLP pipeline
doc = nlp1(text)

# Extract and print named entities
print("Named Entities, their labels, and explanations:")
for ent in doc.ents:
    print(f"{ent.text} ({ent.label_}) - {spacy.explain(ent.label_)}")

Named Entities, their labels, and explanations:
Robotics (ORG) - Companies, agencies, institutions, etc.
first (ORDINAL) - "first", "second", etc.
Unimate (ORG) - Companies, agencies, institutions, etc.
George Devol (PERSON) - People, including fictional
1956 (DATE) - Absolute or relative dates or periods
General Motors (ORG) - Companies, agencies, institutions, etc.


In [16]:
displacy.render(doc, style="ent", jupyter=True)

**Discussion Questions**
1. What types of entities were extracted from the text?
2. How does NER help in data analysis and information retrieval?


**What types of entities were extracted from the text?**

ORG (Organizations) → Robotics, Unimate, General Motors

ORDINAL (Order) → first

PERSON (People) → George Devol

DATE (Time-related information) → 1956


**How does NER help in data analysis and information retrieval?**

* Automatically extracts key information (e.g., names, dates, organizations) from unstructured text.
* Helps in categorizing and structuring data for better understanding.
* Aids in search engines to improve information retrieval based on entity recognition.
* Supports business intelligence by identifying trends from large text datasets.
* Enhances chatbots and AI assistants by understanding and responding to user queries effectively.