#### 1. What is spaCy?

spaCy is an open-source industrial-strength Natural Language Processing (NLP) library in Python.
It is designed for speed, scalability, and production use.

spaCy organizes its NLP tasks into a processing pipeline, which includes:

- Tokenization
- Part-of-Speech (POS) Tagging
- Named Entity Recognition (NER)
- Dependency Parsing

When you load a model with spacy.load("en_core_web_sm"), you get an NLP object containing the model and pipeline components.

#### 2. spaCy Processing Pipeline

Hereâ€™s how spaCy processes text internally:

- Text Input â†’ The raw string is passed to the NLP object
- Tokenizer â†’ Splits text into words, punctuation, etc.
- Tagger â†’ Assigns POS tags like NOUN, VERB, ADJ
- Parser â†’ Builds dependency trees
- NER â†’ Detects named entities like people, places, or organizations
- Doc Object â†’ Final structured representation of the text

ðŸ’¡ The resulting object is called a Doc, which contains Tokens, Sentences, Entities, and linguistic annotations.

#### 3. Example: Tokenization and POS Tagging

In [3]:
import spacy

# Load the English model
nlp = spacy.load("en_core_web_sm")

# Sample text
text = "This is an example sentence."

# Process the text
doc = nlp(text)

# Print tokens and their Part-of-Speech tags
print("Token  |  POS Tag")
print("-------------------")
for token in doc:
    print(f"{token.text:<10} {token.pos_}")


Token  |  POS Tag
-------------------
This       PRON
is         AUX
an         DET
example    NOUN
sentence   NOUN
.          PUNCT


#### 4. Dependency Parsing

Dependency parsing analyzes grammatical relationships between words.

It helps in identifying subjects, verbs, and objects in sentences.

In [7]:
# Print dependency labels for each token
for token in doc:
    print(f"{token.text:<10} â†’ {token.dep_:<10} (Head: {token.head.text})")


This       â†’ nsubj      (Head: is)
is         â†’ ROOT       (Head: is)
an         â†’ det        (Head: sentence)
example    â†’ compound   (Head: sentence)
sentence   â†’ attr       (Head: is)
.          â†’ punct      (Head: is)


#### 5. Named Entity Recognition (NER)

Named Entity Recognition identifies names of people, organizations, locations, etc.

In [16]:
text = "Online Store Incorporated is planning to open a new store in San Francisco. Cora Greene, the CEO, will attend the launch."

doc = nlp(text)

print("Named           Entities:")
print("-----------------------")
for ent in doc.ents:
    print(f"{ent.text:<25} â†’ {ent.label_}")


Named           Entities:
-----------------------
San Francisco             â†’ GPE
Cora Greene               â†’ PERSON


#### 6. Practical Example: Product Review Analysis


In [19]:
text = "Your products are excellent and the delivery was super fast!"
doc = nlp(text)

print("Tokens:")
print([token.text for token in doc])

print("\nPOS Tags:")
print([(token.text, token.pos_) for token in doc])

print("\nEntities:")
print([(ent.text, ent.label_) for ent in doc.ents])


Tokens:
['Your', 'products', 'are', 'excellent', 'and', 'the', 'delivery', 'was', 'super', 'fast', '!']

POS Tags:
[('Your', 'PRON'), ('products', 'NOUN'), ('are', 'AUX'), ('excellent', 'ADJ'), ('and', 'CCONJ'), ('the', 'DET'), ('delivery', 'NOUN'), ('was', 'AUX'), ('super', 'ADV'), ('fast', 'ADJ'), ('!', 'PUNCT')]

Entities:
[]


#### 7. Visualizing Dependencies 

You can visualize dependencies directly inside Jupyter:

In [24]:
from spacy import displacy

displacy.render(doc, style="dep", jupyter=True)


#### 8. Key Takeaways

- spaCy provides an efficient pipeline for language processing.
- Each Doc object stores rich linguistic information â€” tokens, POS tags, entities, and dependencies.
- NER and dependency parsing help structure unstructured text into usable data.
- spaCy is ideal for real-world, production-level NLP applications.