# 1) Take a custom paragraph, perform the entire pipeline and Print results at each step.

# a)Tokenization → b)Stopword Removal → c)Stemming → d)Lemmatization.

In [1]:
import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer, WordNetLemmatizer

In [5]:
import nltk
nltk.download('punkt')
nltk.download('punkt_tab')
nltk.download('stopwords')
nltk.download('wordnet')


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


True

In [3]:
# Custom paragraph
custom_para = """The universe is incredibly vast and mysterious; astronomers are constantly observing distant galaxies, searching for exoplanets, and developing powerful new telescopes. Analyzing all this incoming space data requires sophisticated techniques, like efficient tokenization and careful lemmatization, to truly unlock the cosmos's secrets."""

print("Original Paragraph:")
print(custom_para)
print("="*80)


Original Paragraph:
The universe is incredibly vast and mysterious; astronomers are constantly observing distant galaxies, searching for exoplanets, and developing powerful new telescopes. Analyzing all this incoming space data requires sophisticated techniques, like efficient tokenization and careful lemmatization, to truly unlock the cosmos's secrets.


# a)Tokenization

In [6]:
# a) Tokenization
tokens = word_tokenize(custom_para)
print("a) Tokenization:")
print(tokens)
print("="*80)

a) Tokenization:
['The', 'universe', 'is', 'incredibly', 'vast', 'and', 'mysterious', ';', 'astronomers', 'are', 'constantly', 'observing', 'distant', 'galaxies', ',', 'searching', 'for', 'exoplanets', ',', 'and', 'developing', 'powerful', 'new', 'telescopes', '.', 'Analyzing', 'all', 'this', 'incoming', 'space', 'data', 'requires', 'sophisticated', 'techniques', ',', 'like', 'efficient', 'tokenization', 'and', 'careful', 'lemmatization', ',', 'to', 'truly', 'unlock', 'the', 'cosmos', "'s", 'secrets', '.']


# b)Stopword Removal

In [10]:

stop_words = set(stopwords.words('english'))
filtered_tokens = [word for word in tokens if word.lower() not in stop_words and word.isalpha()]
print("b) Stopword Removal:")
print(filtered_tokens)
print("="*80)



b) Stopword Removal:
['universe', 'incredibly', 'vast', 'mysterious', 'astronomers', 'constantly', 'observing', 'distant', 'galaxies', 'searching', 'exoplanets', 'developing', 'powerful', 'new', 'telescopes', 'Analyzing', 'incoming', 'space', 'data', 'requires', 'sophisticated', 'techniques', 'like', 'efficient', 'tokenization', 'careful', 'lemmatization', 'truly', 'unlock', 'cosmos', 'secrets']


# c)Stemming

In [11]:

stemmer = PorterStemmer()
stemmed_tokens = [stemmer.stem(word) for word in filtered_tokens]
print("c) Stemming:")
print(stemmed_tokens)
print("="*80)

c) Stemming:
['univers', 'incred', 'vast', 'mysteri', 'astronom', 'constantli', 'observ', 'distant', 'galaxi', 'search', 'exoplanet', 'develop', 'power', 'new', 'telescop', 'analyz', 'incom', 'space', 'data', 'requir', 'sophist', 'techniqu', 'like', 'effici', 'token', 'care', 'lemmat', 'truli', 'unlock', 'cosmo', 'secret']


# d)Lemmatization

In [12]:

lemmatizer = WordNetLemmatizer()
lemmatized_tokens = [lemmatizer.lemmatize(word) for word in filtered_tokens]
print("d) Lemmatization:")
print(lemmatized_tokens)

d) Lemmatization:
['universe', 'incredibly', 'vast', 'mysterious', 'astronomer', 'constantly', 'observing', 'distant', 'galaxy', 'searching', 'exoplanets', 'developing', 'powerful', 'new', 'telescope', 'Analyzing', 'incoming', 'space', 'data', 'requires', 'sophisticated', 'technique', 'like', 'efficient', 'tokenization', 'careful', 'lemmatization', 'truly', 'unlock', 'cosmos', 'secret']


# 2) Define NLP and its real time application in a specific domain base.

-->Natural Language Processing (NLP) is a subfield of Artificial Intelligence (AI) that gives computers the ability to read, understand, interpret, and generate human language (both spoken and written).

The core goal of NLP is to bridge the gap between human communication—which is highly unstructured, ambiguous, and contextual—and machine understanding, which requires structured, logical data. NLP combines concepts from computer science, AI, and linguistics to achieve this, allowing machines to derive meaning, determine sentiment, and even generate natural, human-like responses.

Real-Time NLP in the Financial Domain: Market Sentiment Analysis

A prominent application of real-time NLP is in the Financial Services domain, specifically for Market Sentiment Analysis.

Specific Domain: Finance

In finance, quick access to information is critical. Every day, trillions of dollars' worth of trades are influenced by news, regulatory filings, social media, and earnings call transcripts. Analyzing this vast, unstructured text data manually is impossible.

Real-Time Application: Market Sentiment Analysis

Market Sentiment Analysis uses NLP to automatically determine the tone or emotion expressed toward a specific company, stock, or market sector in real-time.

1. How it Works (Real-Time):

    Data Ingestion: The NLP system continuously streams text data from multiple sources:

        Financial news wires (e.g., Bloomberg, Reuters).

        Social media (e.g., relevant financial subreddits, X/Twitter feeds).

        Regulatory filings (e.g., SEC 8-K reports for immediate company news).

        Live transcripts of CEO speeches or earnings calls.

    Preprocessing & Analysis: As the text streams in, the following NLP tasks are performed in milliseconds:

        Tokenization & Entity Recognition: Identifying key tokens and tagging financial entities (e.g., "TSLA" as a company, "$250M" as a currency amount, "acquisition" as a corporate action).

        Sentiment Classification: An NLP model (often a deep learning model like a Transformer) classifies the text as Positive, Negative, or Neutral. The system is trained on financial-specific lexicon (e.g., in finance, words like "volatile" or "risk" are often highly negative, while "growth" and "exceeds" are positive).

        Aggregation: The individual sentiment scores are aggregated to provide a real-time sentiment score for a particular asset.

2. Real-Time Impact:

    Algorithmic Trading (HFT): High-Frequency Trading (HFT) firms use these real-time sentiment signals to make automated buy or sell decisions. For example, if a sentiment score for a company suddenly plunges due to a breaking news story, an algorithm can automatically sell the stock within milliseconds, before human traders can even read the headline.

    Risk Management: Compliance teams can monitor internal and external communications in real-time to detect potentially risky or non-compliant language, flagging possible insider trading or unauthorized activity.

    Investment Decision Support: Portfolio managers receive real-time alerts that summarize the public mood towards their holdings, allowing them to adjust strategies immediately based on breaking information.


# 3) What is NLU and NLG?

-->**Natural Language Understanding (NLU)** : It is the specialized component within Natural Language Processing (NLP) that handles the challenge of interpreting human input. Its primary goal is to extract the meaning, intent, and context from user text or speech, even when the language is complex, ambiguous, or contains errors. NLU achieves this by performing crucial tasks like identifying the user's ultimate goal (intent recognition) and pulling out critical information (entity recognition), effectively serving as the machine's "reading comprehension" system.

**Natural Language Generation (NLG)** : It is the equally vital counterpart that focuses on creating human-like language output. Its function is to take the structured, machine-ready data—the answer or action determined by the system—and translate it back into a coherent, grammatically correct, and contextually appropriate response that a human can easily understand. NLG focuses on the mechanics of articulation, including sentence structuring and choosing the right words, thereby handling the machine's outbound communication (formulating and speaking).