# Named Entity Recognition (NER) with Spacy

This simple example showcases `spacy`'s capability in NER.

In [1]:
!pip install spacy==3.7.2
!pip install spacy-transformers
!python -m spacy download en_core_web_trf

Collecting spacy==3.7.2
  Downloading spacy-3.7.2-cp312-cp312-win_amd64.whl.metadata (26 kB)
Collecting thinc<8.3.0,>=8.1.8 (from spacy==3.7.2)
  Downloading thinc-8.2.5-cp312-cp312-win_amd64.whl.metadata (15 kB)
Collecting weasel<0.4.0,>=0.1.0 (from spacy==3.7.2)
  Downloading weasel-0.3.4-py3-none-any.whl.metadata (4.7 kB)
Collecting typer<0.10.0,>=0.3.0 (from spacy==3.7.2)
  Downloading typer-0.9.4-py3-none-any.whl.metadata (14 kB)
Collecting blis<0.8.0,>=0.7.8 (from thinc<8.3.0,>=8.1.8->spacy==3.7.2)
  Downloading blis-0.7.11-cp312-cp312-win_amd64.whl.metadata (7.6 kB)
Collecting cloudpathlib<0.17.0,>=0.7.0 (from weasel<0.4.0,>=0.1.0->spacy==3.7.2)
  Downloading cloudpathlib-0.16.0-py3-none-any.whl.metadata (14 kB)
Downloading spacy-3.7.2-cp312-cp312-win_amd64.whl (11.7 MB)
   ---------------------------------------- 0.0/11.7 MB ? eta -:--:--
    --------------------------------------- 0.3/11.7 MB 8.9 MB/s eta 0:00:02
   -- ------------------------------------- 0.6/11.7 MB 6.5 MB/s



Collecting spacy-transformers
  Downloading spacy_transformers-1.3.8-cp312-cp312-win_amd64.whl.metadata (7.2 kB)
Collecting torch>=1.8.0 (from spacy-transformers)
  Downloading torch-2.6.0-cp312-cp312-win_amd64.whl.metadata (28 kB)
Collecting spacy-alignments<1.0.0,>=0.7.2 (from spacy-transformers)
  Downloading spacy_alignments-0.9.1-cp312-cp312-win_amd64.whl.metadata (2.7 kB)
Collecting sympy==1.13.1 (from torch>=1.8.0->spacy-transformers)
  Downloading sympy-1.13.1-py3-none-any.whl.metadata (12 kB)
Downloading spacy_transformers-1.3.8-cp312-cp312-win_amd64.whl (343 kB)
   ---------------------------------------- 0.0/343.5 kB ? eta -:--:--
   ------------------------------- -------- 266.2/343.5 kB 5.4 MB/s eta 0:00:01
   ---------------------------------------  337.9/343.5 kB 7.0 MB/s eta 0:00:01
   ---------------------------------------  337.9/343.5 kB 7.0 MB/s eta 0:00:01
   ---------------------------------------  337.9/343.5 kB 7.0 MB/s eta 0:00:01
   ---------------------------

In [1]:
import spacy
from spacy import displacy
import pandas as pd

# Load the transformer-based model
# This uses RoBERTa under the hood and provides state-of-the-art NER capabilities
print("Loading transformer model...")
nlp = spacy.load("en_core_web_trf")
print("Model loaded!")

Loading transformer model...
Model loaded!


In [3]:
# Example text with various entity types
text = """
Apple Inc. is planning to open a new office in New York City by March 2025, 
according to CEO Tim Cook. The company will invest $50 million in the project, 
which is expected to create 500 jobs. Google and Microsoft are also expanding 
their operations in the United States. The Federal Reserve's recent policy 
announcement might impact these plans.
"""

In [5]:
# Process the text with the transformer model
doc = nlp(text)

  with torch.cuda.amp.autocast(self._mixed_precision):


In [7]:
# Display identified entities
print("\nIdentified Entities:")
print("-" * 50)
for ent in doc.ents:
    print(f"Entity: {ent.text}\nType: {ent.label_}\nDescription: {spacy.explain(ent.label_)}\n")


Identified Entities:
--------------------------------------------------
Entity: Apple Inc.
Type: ORG
Description: Companies, agencies, institutions, etc.

Entity: New York City
Type: GPE
Description: Countries, cities, states

Entity: March 2025
Type: DATE
Description: Absolute or relative dates or periods

Entity: Tim Cook
Type: PERSON
Description: People, including fictional

Entity: $50 million
Type: MONEY
Description: Monetary values, including unit

Entity: 500
Type: CARDINAL
Description: Numerals that do not fall under another type

Entity: Google
Type: ORG
Description: Companies, agencies, institutions, etc.

Entity: Microsoft
Type: ORG
Description: Companies, agencies, institutions, etc.

Entity: the United States
Type: GPE
Description: Countries, cities, states

Entity: The Federal Reserve's
Type: ORG
Description: Companies, agencies, institutions, etc.



In [9]:
# Create a DataFrame for better visualization
entities_data = []
for ent in doc.ents:
    entities_data.append({
        "Text": ent.text,
        "Start": ent.start_char,
        "End": ent.end_char,
        "Type": ent.label_,
        "Description": spacy.explain(ent.label_)
    })

df = pd.DataFrame(entities_data)
print("Entities DataFrame:")
display(df)


Entities DataFrame:


Unnamed: 0,Text,Start,End,Type,Description
0,Apple Inc.,1,11,ORG,"Companies, agencies, institutions, etc."
1,New York City,48,61,GPE,"Countries, cities, states"
2,March 2025,65,75,DATE,Absolute or relative dates or periods
3,Tim Cook,95,103,PERSON,"People, including fictional"
4,$50 million,129,140,MONEY,"Monetary values, including unit"
5,500,186,189,CARDINAL,Numerals that do not fall under another type
6,Google,196,202,ORG,"Companies, agencies, institutions, etc."
7,Microsoft,207,216,ORG,"Companies, agencies, institutions, etc."
8,the United States,257,274,GPE,"Countries, cities, states"
9,The Federal Reserve's,276,297,ORG,"Companies, agencies, institutions, etc."


In [11]:
# Visualize NER results in HTML (you can display this in a notebook or save to file)
html = displacy.render(doc, style="ent", jupyter=False)

# Save the visualization to an HTML file
with open("ner_visualization.html", "w", encoding="utf-8") as f:
    f.write(html)
print("\nVisualization saved to 'ner_visualization.html'")


Visualization saved to 'ner_visualization.html'
