Install spaCy

In [3]:
pip install spacy

Defaulting to user installation because normal site-packages is not writeable
Note: you may need to restart the kernel to use updated packages.


Import spaCy & Load a Language Model
The model includes vocabulary, syntax, and named entity recognition.

In [5]:
import spacy

# Load small English model
nlp = spacy.load("en_core_web_sm")


Splitting text into words, punctuation, etc.

In [6]:
doc = nlp("SpaCy is an amazing NLP library!")

for token in doc:
    print(token.text)


SpaCy
is
an
amazing
NLP
library
!


Part-of-Speech (POS) Tagging

Identifying the grammatical role of each token.

In [8]:
for token in doc:
    print(f"{token.text} -> {token.pos_}")


SpaCy -> PROPN
is -> AUX
an -> DET
amazing -> ADJ
NLP -> PROPN
library -> NOUN
! -> PUNCT


🧍 Named Entity Recognition (NER)

Finds names, places, dates, etc.

In [16]:
doc = nlp("Jennifer Simons becomes the first female president in 2025 and she will open a new KFC restaurant in Lelydorp")

for ent in doc.ents:
    print(f"{ent.text} -> {ent.label_}")


Jennifer Simons -> PERSON
first -> ORDINAL
2025 -> DATE
KFC -> ORG
Lelydorp -> GPE


Lemmatization

Finds the base form of words.

In [19]:
doc = nlp("The cats are eating the dogs.")
for token in doc:
    print(f"{token.text} -> {token.lemma_}")


The -> the
cats -> cat
are -> be
eating -> eat
the -> the
dogs -> dog
. -> .


Custom Pipeline Component

You can create your own function to process text inside spaCy’s pipeline.
SpaCy pipelines process text through multiple stages — tokenization, tagging, parsing, NER, etc.
You can add custom stages to modify or analyze text.

In [20]:
from spacy.language import Language

@Language.component("custom_component")
def custom_component(doc):
    print("Doc length:", len(doc))
    return doc

nlp.add_pipe("custom_component", last=True)

doc = nlp("This is a custom component example.")


Doc length: 7
