This project is "Named Entity Recognition (NER) with spaCy".

NER is a sub-task of information extraction that locates and classifies named entities mentioned in unstructured text into pre-defined categories such as person names, organizations, locations, medical codes, time expressions, quantities, monetary values, percentages, etc.

We will use spaCy, the industry-standard library for efficient NLP. It is much faster than standard Deep Learning models for this specific task and comes with a beautiful built-in visualizer.

Cell 1: Install & Import Libraries
We need spacy for the NLP pipeline. We also need to download a pre-trained language model (en_core_web_sm) which contains the rules and weights for English.

In [1]:
# 1. Install spaCy (if not already installed)
!pip install -q spacy

# 2. Download the pre-trained English model
# 'en_core_web_sm' is a small, efficient model perfect for testing.
!python -m spacy download en_core_web_sm

import spacy
from spacy import displacy

print("✅ spaCy installed and model downloaded.")

Collecting en-core-web-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m105.1 MB/s[0m eta [36m0:00:00[0m
[?25h[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.
✅ spaCy installed and model downloaded.


Cell 2: Load the Model
We load the English model into an nlp object. This object will act as our processing pipeline. When we feed it text, it will Tokenize -> Tag (Part-of-Speech) -> Parse -> Recognize Entities.

In [5]:
# Load the small English model
nlp = spacy.load("en_core_web_sm")

print("✅ Model loaded. Ready to process text.")

✅ Model loaded. Ready to process text.


Cell 3: Process Text & Extract Entities
We will feed a sentence containing various entity types (Companies, Dates, Locations, Money) to see if the model catches them.

In [6]:
# A sample text with multiple named entities
text = "Apple Inc. is planning to open a new store in San Francisco by January 2025. Tim Cook announced this on Monday."

# Process the text
doc = nlp(text)

# Iterate over the detected entities
print(f"{'ENTITY':<20} | {'LABEL':<10} | {'EXPLANATION'}")
print("-" * 50)

for ent in doc.ents:
    # ent.text = The actual word(s)
    # ent.label_ = The category code (ORG, GPE, DATE, etc.)
    # spacy.explain() = Gives a human-readable description of the tag
    print(f"{ent.text:<20} | {ent.label_:<10} | {spacy.explain(ent.label_)}")

ENTITY               | LABEL      | EXPLANATION
--------------------------------------------------
Apple Inc.           | ORG        | Companies, agencies, institutions, etc.
San Francisco        | GPE        | Countries, cities, states
January 2025         | DATE       | Absolute or relative dates or periods
Tim Cook             | PERSON     | People, including fictional
Monday               | DATE       | Absolute or relative dates or periods


Cell 4: Visualization (The Cool Part)
spaCy has a built-in visualizer called displacy that highlights entities directly in the text. This is extremely useful for demos and debugging.

In [7]:
# Visualize the entities
# style="ent" tells displacy to highlight Entities (as opposed to grammar dependencies)
displacy.render(doc, style="ent", jupyter=True)

Cell 5: Real-World Test (News Article)
Now, let's try it on a longer, more complex paragraph to see how robust it is.

In [8]:
news_article = """
Tesla CEO Elon Musk has reached an agreement to buy Twitter for roughly $44 billion,
promising a more lenient touch to policing content on the social media platform
where he is the most influential user. The deal ends a weeks-long saga that started
when Musk disclosed a large stake in the company in April.
"""

doc_news = nlp(news_article)

# Render the visualization for the longer text
displacy.render(doc_news, style="ent", jupyter=True)

Cell 6: Interactive Mode
Run this cell to type your own sentences and see how the AI analyzes them instantly.

In [9]:
# Interactive loop
while True:
    user_input = input("\nEnter text to analyze (or 'quit' to exit): ")
    if user_input.lower() == 'quit':
        break

    doc_user = nlp(user_input)

    # Check if any entities were found
    if len(doc_user.ents) == 0:
        print("⚠️ No entities found. Try adding names, places, or dates.")
        continue

    # Render
    displacy.render(doc_user, style="ent", jupyter=True)


Enter text to analyze (or 'quit' to exit): ORG: Companies, agencies, institutions (e.g., Google, NASA).  GPE: Geopolitical Entities like countries, cities, states (e.g., India, California).  PERSON: People, including fictional (e.g., Elon Musk, Sherlock Holmes).  DATE: Absolute or relative dates or periods (e.g., 2024, yesterday).  MONEY: Monetary values (e.g., $100 million).  CARDINAL: Numerals that do not fall under another type (e.g., one, 50).



Enter text to analyze (or 'quit' to exit): quit
