<a href="https://colab.research.google.com/github/Mohammadhsiavash/DeepL-Training/blob/main/NLP/Named_Entity_Recognition_with_spaCy.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Extract named enes (like PERSON, ORG, DATE, GPE) from text using spaCy’s builtin NER model.

In [1]:
!pip install spacy
!python -m spacy download en_core_web_sm

Collecting en-core-web-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.8/12.8 MB[0m [31m52.2 MB/s[0m eta [36m0:00:00[0m
[?25h[38;5;2m✔ Download and installation successful[0m
You can now load the package via spacy.load('en_core_web_sm')
[38;5;3m⚠ Restart to reload dependencies[0m
If you are in a Jupyter or Colab notebook, you may need to restart Python in
order to load all the package's dependencies. You can do this by selecting the
'Restart kernel' or 'Restart runtime' option.


In [2]:
import spacy
# Load spaCy's small English model
nlp = spacy.load("en_core_web_sm")

Define Input Text

In [3]:
text = '''
Elon Musk is a business magnate, investor, and engineer. He is the founder, CEO, and chief engineer of SpaceX; an early-stage investor, CEO, and product architect of Tesla, Inc.; founder of The Boring Company; co-founder of Neuralink and OpenAI; and the owner of X (formerly Twitter).


Born in Pretoria, South Africa, in 1971, Musk later moved to Canada and then to the United States. He attended the University of Pennsylvania, where he earned degrees in physics and economics.


His career began with the founding of Zip2, an online city guide software company, which was later acquired by Compaq. He then co-founded X.com, an online payments company that eventually merged with another company to form PayPal.


Musk is known for his ambitious and often controversial projects, which aim to revolutionize transportation on Earth and in space, and to advance artificial intelligence. His work with SpaceX has made significant strides in reusable rocket technology, while Tesla has played a major role in popularizing electric vehicles.
'''

Apply spaCy NER Pipeline

In [4]:
# Process text
doc = nlp(text)
# Display named entities
for ent in doc.ents:
  print(f"{ent.text:<30} → {ent.label_} ({spacy.explain(ent.label_)})")

Elon Musk                      → PERSON (People, including fictional)
SpaceX                         → PERSON (People, including fictional)
Tesla, Inc.                    → ORG (Companies, agencies, institutions, etc.)
The Boring Company             → ORG (Companies, agencies, institutions, etc.)
Neuralink                      → PERSON (People, including fictional)
OpenAI                         → GPE (Countries, cities, states)
Pretoria                       → GPE (Countries, cities, states)
South Africa                   → GPE (Countries, cities, states)
1971                           → DATE (Absolute or relative dates or periods)
Canada                         → GPE (Countries, cities, states)
the United States              → GPE (Countries, cities, states)
the University of Pennsylvania → ORG (Companies, agencies, institutions, etc.)
Compaq                         → ORG (Companies, agencies, institutions, etc.)
PayPal                         → ORG (Companies, agencies, institutions

Visualize Named Entities

In [5]:
from spacy import displacy
# Render entities in Jupyter or browser
displacy.render(doc, style="ent", jupyter=True)

Group Entities by Type

In [6]:
from collections import defaultdict
entities_by_type = defaultdict(list)
for ent in doc.ents:
  entities_by_type[ent.label_].append(ent.text)
for label, ents in entities_by_type.items():
  print(f"{label} ({spacy.explain(label)}): {set(ents)}\n")

PERSON (People, including fictional): {'Neuralink', 'Elon Musk', 'SpaceX'}

ORG (Companies, agencies, institutions, etc.): {'Compaq', 'The Boring Company', 'the University of Pennsylvania', 'Tesla, Inc.', 'PayPal', 'Tesla'}

GPE (Countries, cities, states): {'Pretoria', 'Canada', 'the United States', 'South Africa', 'OpenAI'}

DATE (Absolute or relative dates or periods): {'1971'}

LOC (Non-GPE locations, mountain ranges, bodies of water): {'Earth'}

