# 1. Description
* **NLTK** is a broad and general-purpose **NLP** library that provides a wide range of tools and algorithms for tasks like:-
 * tokenization,
 * stemming,
 * tagging,
 * parsing, and
 * sentiment analysis.
* **spaCy,** on the other hand, is *more focused on industrial-strength NLP tasks,* such as
 * named entity recognition,
 * dependency parsing, and
 * text classification.

  **spaCy,** *is designed for high-performance, production-ready applications.*

### Examples
## 1. Tokenization:
**Tokenization** is the process of *splitting text into individual tokens (words, punctuation, etc.).*

### *spaCy implementation*

In [1]:
import spacy

# Load the English model
nlp = spacy.load("en_core_web_sm")

# Process the text
doc = nlp("spaCy is a fast and robust library for NLP.")

# Tokenization
tokens = [token.text for token in doc]
print(tokens)

['spaCy', 'is', 'a', 'fast', 'and', 'robust', 'library', 'for', 'NLP', '.']


### *NLTK implementation*

In [2]:
import nltk
from nltk.tokenize import word_tokenize

# Download the necessary NLTK data files (if not already downloaded)
nltk.download('punkt')

# Tokenization
text = "NLTK is a powerful library for NLP."
tokens = word_tokenize(text)
print(tokens)

['NLTK', 'is', 'a', 'powerful', 'library', 'for', 'NLP', '.']


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


## 2. Part-of-Speech (POS) Tagging:
**POS** tagging assigns *parts of speech* to each token.

### *spaCy implementation*

In [3]:
import spacy

# Load the English model
nlp = spacy.load("en_core_web_sm")

# Process the text
doc = nlp("spaCy is a fast and robust library for NLP.")

# POS tagging
pos_tags = [(token.text, token.pos_) for token in doc]
print(pos_tags)

[('spaCy', 'INTJ'), ('is', 'AUX'), ('a', 'DET'), ('fast', 'ADJ'), ('and', 'CCONJ'), ('robust', 'ADJ'), ('library', 'NOUN'), ('for', 'ADP'), ('NLP', 'PROPN'), ('.', 'PUNCT')]


### *NLTK implementation*

In [4]:
import nltk
from nltk.tokenize import word_tokenize
from nltk import pos_tag

# Download the necessary NLTK data files (if not already downloaded)
nltk.download('averaged_perceptron_tagger')

# Tokenization and POS tagging
text = "NLTK is a powerful library for NLP."
tokens = word_tokenize(text)
pos_tags = pos_tag(tokens)
print(pos_tags)

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!


[('NLTK', 'NNP'), ('is', 'VBZ'), ('a', 'DT'), ('powerful', 'JJ'), ('library', 'NN'), ('for', 'IN'), ('NLP', 'NNP'), ('.', '.')]


## 3. Named Entity Recognition (NER)
**NER** *identifies and classifies named entities in text.*

### *spaCy implementation*



In [5]:
import spacy

# Load the English model
nlp = spacy.load("en_core_web_sm")

# Process the text
doc = nlp("Apple is looking at buying U.K. startup for $1 billion.")

# Named Entity Recognition
entities = [(ent.text, ent.label_) for ent in doc.ents]
print(entities)

[('Apple', 'ORG'), ('U.K.', 'GPE'), ('$1 billion', 'MONEY')]


### *NLTK implementation*

In [6]:
import nltk
from nltk import ne_chunk
from nltk.tokenize import word_tokenize
from nltk.tag import pos_tag

# Download the necessary NLTK data files (if not already downloaded)
nltk.download('maxent_ne_chunker')
nltk.download('words')

# Tokenization, POS tagging, and NER
text = "Apple is looking at buying U.K. startup for $1 billion."
tokens = word_tokenize(text)
pos_tags = pos_tag(tokens)
entities = ne_chunk(pos_tags)
print(entities)

(S
  (GPE Apple/NNP)
  is/VBZ
  looking/VBG
  at/IN
  buying/VBG
  U.K./NNP
  startup/NN
  for/IN
  $/$
  1/CD
  billion/CD
  ./.)


[nltk_data] Downloading package maxent_ne_chunker to
[nltk_data]     /root/nltk_data...
[nltk_data]   Package maxent_ne_chunker is already up-to-date!
[nltk_data] Downloading package words to /root/nltk_data...
[nltk_data]   Package words is already up-to-date!


## Summary
* **spaCy** is preferred for production applications due to its speed and modern NLP capabilities.
* **NLTK** is great for educational purposes and research, providing a wide range of linguistic data and tools.
* Each library has its strengths and can be chosen based on the specific needs of the project.