| Section | Description                                                                   |
| ------- | ----------------------------------------------------------------------------- |
| 1–3| Loads a model and processes a realistic multi-sentence text                   |
| 4   | Extracts and displays all named entities (people, orgs, dates, etc.)          |
| 5   | Prints full grammatical dependency structure                                  |
| 6   | Demonstrates custom token attributes                                      |
| 7  | Uses PhraseMatcher to detect specific multi-word terms                    |
| 8  | Performs basic relationship extraction (e.g., “OpenAI —developed→ GPT-5”) |
| 9  | Extracts keyword candidates (nouns & proper nouns)                        |
| 10  | Measures semantic similarity between sentences                            |
| 11  | Exports an interactive HTML visualization of entities                     |


In [1]:
import spacy
from spacy import displacy
from spacy.matcher import Matcher, PhraseMatcher
from spacy.tokens import Span, Token
from IPython.display import HTML
import os

In [2]:
# -------------------------------------------------
# 1️⃣  Fix for Python 3.13 + VS Code Jupyter display
# -------------------------------------------------
os.environ["DISPLACY_FORCE_SIMPLE_RENDER"] = "1"

In [4]:
# -------------------------------------------------
# 2️⃣  Load a larger English model (has word vectors)
# -------------------------------------------------
# You can install it with: python -m spacy download en_core_web_md

!python -m spacy download en_core_web_md

nlp = spacy.load("en_core_web_md")

Defaulting to user installation because normal site-packages is not writeable
Collecting en-core-web-md==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_md-3.8.0/en_core_web_md-3.8.0-py3-none-any.whl (33.5 MB)
     ---------------------------------------- 0.0/33.5 MB ? eta -:--:--
     ---------------------------------------- 0.0/33.5 MB ? eta -:--:--
     ---------------------------------------- 0.0/33.5 MB ? eta -:--:--
     - -------------------------------------- 1.3/33.5 MB 6.2 MB/s eta 0:00:06
     ------ --------------------------------- 5.5/33.5 MB 13.6 MB/s eta 0:00:03
     ------------- ------------------------- 11.5/33.5 MB 18.8 MB/s eta 0:00:02
     ------------------- ------------------- 17.0/33.5 MB 20.7 MB/s eta 0:00:01
     -------------------------- ------------ 22.5/33.5 MB 21.7 MB/s eta 0:00:01
     -------------------------------- ------ 27.8/33.5 MB 22.5 MB/s eta 0:00:01
     ------------------------------------- - 32.5/3

In [22]:
# -------------------------------------------------
# 3️⃣  Example text (you can replace with any text)
# -------------------------------------------------
text = """
In January 2024, when I started studying at PTC in Paramaribo, I was also working full-time at a company called TechCore Solutions.
Every morning around 6:00 a.m., I left my house on Kwattaweg to catch the bus to the office. After work, usually around 5:30 p.m., I went straight to campus for evening classes.
It was a routine that tested both my body and my mind.

At PTC, I studied Information Technology under Dr. Pedro Juanito Alejandra Papacito Guacamole, one of the most inspiring lecturers I have ever met.
He often reminded us that education is an investment, not a cost.
I still remember his words during the 2023 mid-semester seminar: “Your effort today will define your opportunities tomorrow.”

Balancing my job at TechCore Solutions with my studies was tough.
I had deadlines at work, programming assignments at school, and sometimes even weekend projects with classmates like Sarah and Jason.
But I learned how to manage my time, use every minute wisely, and stay focused on my long-term goals.

Now, in 2025, I can see how those years have shaped me. The experience taught me responsibility, discipline, and resilience.
When I think about the future — maybe working abroad in places like Amsterdam or New York — I feel confident that my time at PTC and my job at TechCore gave me the foundation to grow.

One day, when my own children ask about those years, I’ll tell them that studying while working was one of the hardest, but also one of the most rewarding, chapters of my life.
"""


In [23]:
# Process the text with spaCy
doc = nlp(text)

In [24]:
# -------------------------------------------------
# 4️⃣  Named Entity Recognition (NER)
# -------------------------------------------------
print("=== 🏷️ Named Entities ===")
for ent in doc.ents:
    print(f"{ent.text:<40} | {ent.label_}")

    # Visualize entities
html = displacy.render(doc, style="ent", jupyter=False)
HTML(html)

=== 🏷️ Named Entities ===
January 2024                             | DATE
PTC                                      | ORG
Paramaribo                               | GPE
TechCore Solutions                       | ORG
Every morning                            | TIME
around 6:00 a.m.                         | TIME
Kwattaweg                                | PERSON
around 5:30 p.m.                         | TIME
evening                                  | TIME
PTC                                      | ORG
Information Technology                   | ORG
Pedro Juanito                            | PERSON
Alejandra Papacito Guacamole             | PERSON
one                                      | CARDINAL
2023                                     | CARDINAL
mid-semester                             | DATE
today                                    | DATE
tomorrow                                 | DATE
TechCore Solutions                       | ORG
Sarah                                    | PERSON
Jaso