#🔹 NLTK (Natural Language Toolkit)

🔸 Overview:

A research-focused, open-source library for NLP in Python.

Contains a wide variety of tools for linguistic analysis.

Comes with corpora, grammars, and trained models.

Excellent for learning and experimentation.

🔸 Key Features:

Tokenization

Stemming and Lemmatization

POS Tagging

Named Entity Recognition (NER)

Parsing and Chunking

Large selection of datasets and text corpora

🔸 Example:

In [4]:
import nltk
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger')
nltk.download('punkt_tab') # Download the missing resource
nltk.download('averaged_perceptron_tagger_eng') # Download the missing resource

text = "Natural Language Processing is interesting."
# Tokenization
tokens = nltk.word_tokenize(text)
print("Tokens:", tokens)

# POS Tagging
pos_tags = nltk.pos_tag(tokens)
print("POS Tags:", pos_tags)

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger_eng.zip.


Tokens: ['Natural', 'Language', 'Processing', 'is', 'interesting', '.']
POS Tags: [('Natural', 'JJ'), ('Language', 'NNP'), ('Processing', 'NNP'), ('is', 'VBZ'), ('interesting', 'JJ'), ('.', '.')]


#🔹 spaCy

🔸 Overview:

Industrial-strength NLP library focused on performance and efficiency.

Used in production for large-scale text processing.

Includes pre-trained pipelines for multiple languages.

🔸 Key Features:

Tokenization

POS Tagging

Named Entity Recognition (NER)

Dependency Parsing

Lemmatization

Supports custom pipelines and model training

🔸 Example:

In [2]:
import spacy

# Load English model
nlp = spacy.load("en_core_web_sm")

text = "Natural Language Processing is interesting."
doc = nlp(text)

# Tokenization
tokens = [token.text for token in doc]
print("Tokens:", tokens)

# POS Tagging
pos_tags = [(token.text, token.pos_) for token in doc]
print("POS Tags:", pos_tags)

# Named Entities
entities = [(ent.text, ent.label_) for ent in doc.ents]
print("Named Entities:", entities)


Tokens: ['Natural', 'Language', 'Processing', 'is', 'interesting', '.']
POS Tags: [('Natural', 'PROPN'), ('Language', 'PROPN'), ('Processing', 'NOUN'), ('is', 'AUX'), ('interesting', 'ADJ'), ('.', 'PUNCT')]
Named Entities: [('Natural Language Processing', 'ORG')]


#🔹 Comparison Table

| Feature            | NLTK                         | spaCy                        |
| ------------------ | ---------------------------- | ---------------------------- |
| Focus              | Research, education          | Industrial, production-ready |
| Speed              | Slower                       | Faster                       |
| Ease of Use        | Verbose                      | Compact API                  |
| Corpus Included    | Yes (WordNet, Gutenberg etc) | No (but can be added)        |
| Model Training     | Limited                      | Supported                    |
| Entity Recognition | Yes                          | Yes                          |


#✅ Summary:

Use NLTK if you're learning NLP or working on linguistic experiments.

Use spaCy if you're building a production-grade NLP system.