# Python Code for NLP Pipeline (Question 1)

### Importing Libraries

In [None]:
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer, WordNetLemmatizer

### Download required resources

In [None]:
nltk.download('punkt_tab')
nltk.download('stopwords')
nltk.download('wordnet')
nltk.download('omw-1.4')

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!


True

### Custome text(Paragraph) for processing

In [None]:
custom_paragraph = "Natural Language Processing is a fascinating field. It helps computers understand, interpret, and generate human language. We are now performing the entire pipeline to transform the text."
print("--- Step 0: Original Text ---")
print(custom_paragraph)

--- Step 0: Original Text ---
Natural Language Processing is a fascinating field. It helps computers understand, interpret, and generate human language. We are now performing the entire pipeline to transform the text.


### Step 1: Tokenization

In [None]:
tokens = word_tokenize(custom_paragraph)
print("--- Step 1: Tokenization ---")
print(tokens)

--- Step 1: Tokenization ---
['Natural', 'Language', 'Processing', 'is', 'a', 'fascinating', 'field', '.', 'It', 'helps', 'computers', 'understand', ',', 'interpret', ',', 'and', 'generate', 'human', 'language', '.', 'We', 'are', 'now', 'performing', 'the', 'entire', 'pipeline', 'to', 'transform', 'the', 'text', '.']


### Step 2: Stopword Removal (and Lowercasing)

In [None]:
stop_words = set(stopwords.words('english'))
filtered_tokens = [word.lower() for word in tokens if word.lower() not in stop_words and word.isalpha()]
print("--- Step 2: Stopword Removal (and Lowercasing) ---")
print(filtered_tokens)

--- Step 2: Stopword Removal (and Lowercasing) ---
['natural', 'language', 'processing', 'fascinating', 'field', 'helps', 'computers', 'understand', 'interpret', 'generate', 'human', 'language', 'performing', 'entire', 'pipeline', 'transform', 'text']


### Step 3: Stemming (Porter Stemmer)

In [None]:
stemmer = PorterStemmer()
stemmed_words = [stemmer.stem(word) for word in filtered_tokens]
print("--- Step 3: Stemming (Porter Stemmer) ---")
print(stemmed_words)

--- Step 3: Stemming (Porter Stemmer) ---
['natur', 'languag', 'process', 'fascin', 'field', 'help', 'comput', 'understand', 'interpret', 'gener', 'human', 'languag', 'perform', 'entir', 'pipelin', 'transform', 'text']


### Step 4: Lemmatization (WordNet Lemmatizer)

In [None]:
lemmatizer = WordNetLemmatizer()
lemmatized_words = [lemmatizer.lemmatize(word) for word in filtered_tokens]
print("--- Step 4: Lemmatization (WordNet Lemmatizer) ---")
print(lemmatized_words)

--- Step 4: Lemmatization (WordNet Lemmatizer) ---
['natural', 'language', 'processing', 'fascinating', 'field', 'help', 'computer', 'understand', 'interpret', 'generate', 'human', 'language', 'performing', 'entire', 'pipeline', 'transform', 'text']


# Question 2: Define NLP and its Real-Time Application

### What is NLP?

Natural Language Processing (NLP) is a subfield of artificial intelligence (AI) that gives computers the ability to understand, interpret, and generate human language in a valuable way. It combines computational linguistics (rule-based modeling of human language) with statistical, machine learning, and deep learning models. The goal is to bridge the gap between human communication and computer understanding.

### Real-Time Application in a Specific Domain

| Domain | Application | Description |
| :--- | :--- | :--- |
| **Customer Service** | **Real-Time Chatbots** | In e-commerce or telecommunications, chatbots use NLP to **instantly** understand a customer's query (intent recognition) from their typed or spoken language and provide relevant, real-time responses or route the user to the correct human agent. |
| **Finance/Trading** | **Sentiment Analysis on News Feeds** | Algorithms analyze high-volume, real-time news articles, social media posts, and company reports to gauge the market's **sentiment** (positive, negative, or neutral) towards a stock or commodity. This helps inform automated, high-frequency trading decisions. |
| **Healthcare** | **Clinical Decision Support** | NLP is used in real-time to process a doctor's transcribed notes or electronic health records (EHRs), **extracting key information** like diagnoses, medications, and allergies, and providing instant, relevant alerts or suggestions to the clinician. |

# Question 3: What is NLU and NLG?

### **NLU** and **NLG** are two major components of the overall NLP field.

| Component | Full Name | Definition | Analogy |
| :--- | :--- | :--- | :--- |
| **NLU** | **Natural Language Understanding** | The process of getting computers to **comprehend the meaning** of human language input. This involves tasks like determining the underlying *intent*, *sentiment*, and *context* of the text, not just the individual words. | A student **reading and understanding** a complex textbook chapter. |
| **NLG** | **Natural Language Generation** | The process of getting computers to **produce coherent and human-like text** or speech as output. It involves planning what to say, structuring the sentences, and selecting the appropriate words. | A student **writing a well-structured essay** based on the information they learned. |