<h1 align=\"center\"><font color='green'><font size=\"6\">NLP with Python: Using NLTK and spaCy</font> </h1>

<div style = "background-color: #90EE90;">.</div>

Natural Language Processing (NLP) is a field of artificial intelligence that focuses on the interaction between computers and humans through language. 

Python is a popular language for NLP due to its rich libraries, particularly NLTK and spaCy.

### 1. NLTK (Natural Language Toolkit)
NLTK is a powerful library for working with human language data. It provides easy-to-use tools for various NLP tasks, making it a great educational resource.

#### Key Features:

 - **Tokenization**: Splitting text into words or sentences.
 - **Stemming and Lemmatization**: Reducing words to their base forms.
 - **Part-of-Speech Tagging**: Identifying the grammatical role of each word.

**Use Case**

NLTK is well-suited for educational purposes, helping students and researchers understand the fundamentals of NLP.

Example: Basic NLTK Usage

In [None]:
import nltk
from nltk.tokenize import word_tokenize

# Download NLTK resources
nltk.download('punkt')

# Sample text
text = "Natural language processing is fascinating."

# Tokenize the text
tokens = word_tokenize(text)
print("Tokens:", tokens)


## 2. spaCy
spaCy is another popular NLP library designed for production-level tasks, focusing on efficiency and performance. It’s optimized for industrial use and real-time NLP applications.

### Key Features:

 - **Named Entity Recognition (NER)**: Identifying and categorizing entities (like people, organizations, dates) in text, making it easier to extract meaningful information.
 - **Fast Tokenization**: Quick splitting of text into tokens.
 - **Dependency Parsing**: Understanding the grammatical structure of sentences.

### Advantages:

 - Well-organized, streamlined API with pre-built pipelines for speed and performance.
 - Uses optimized Python code to handle large datasets efficiently.
 

Example: Basic spaCy Usage

| Feature/Aspect         | **spaCy**                              | **NLTK**                             |
|------------------------|---------------------------------------|--------------------------------------|
| **Purpose**            | Designed for production-level tasks, focusing on efficiency and performance. | Primarily an educational tool for teaching and research in NLP. |
| **Speed**              | Optimized for fast processing, handles large datasets efficiently. | Generally slower due to flexibility and manual processing steps. |
| **Ease of Use**        | Streamlined API with pre-built pipelines for quick implementation. | More complex setup, often requiring manual assembly of the NLP pipeline. |
| **Performance**        | High performance in real-time applications, ideal for industrial use. | Good for smaller projects and educational purposes, but not as performant in production. |
| **Named Entity Recognition (NER)** | Provides robust and efficient NER capabilities out of the box. | NER is available but may require more setup and customization. |
| **Tokenization**       | Fast and efficient tokenization, integrated into the pipeline. | Flexible tokenization options but can be slower. |
| **Dependency Parsing** | Offers advanced dependency parsing with a focus on accuracy. | Also provides dependency parsing, but typically less optimized for speed. |
| **Community and Support** | Growing community with good documentation, focused on production use cases. | Established community with extensive resources and tutorials, especially for beginners. |
| **Customization**      | Less customizable out of the box but offers great performance. | Highly customizable, allowing for more research-oriented tasks. |
| **Use Cases**          | Best for real-time applications like chatbots, web scraping, and production systems. | Best for educational purposes, prototyping, and smaller NLP tasks. |


In [None]:
import spacy

# Load the English NLP model
nlp = spacy.load("en_core_web_sm")

# Sample text
text = "Apple is looking at buying U.K. startup for $1 billion."

# Process the text
doc = nlp(text)

# Print named entities
for ent in doc.ents:
    print(ent.text, ent.label_)


## Understanding BFS and DFS
While BFS and DFS are not specifically NLP concepts, they are algorithms used in various computing tasks, including some NLP applications like parsing.

#### Breadth-First Search (BFS)
BFS explores all the neighbor nodes at the present depth before moving on to nodes at the next depth level. 

Example: Imagine you’re on a homepage and want to explore all its links:

 - Start at the homepage.
 - Visit all links on that page (e.g., About, Services, Contact).
 - For each of those links, visit all the links on those pages.
 - Continue this process level by level until you've explored to a certain depth.

#### Depth-First Search (DFS)
DFS explores as far as possible along each branch before backtracking.

Example: Using the same homepage scenario, with DFS you’d do this:

 - Start at the homepage.
 - Click the first link (e.g., About) and explore all its links.
 - If there’s a link on the About page, follow that until there are no more links to explore.
 - Once you reach a dead end, go back to the previous page and explore the next link.