<a href="https://colab.research.google.com/github/Paromita2001/NLP_/blob/main/nlp_pr_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

1. What is NLP?

NLP (Natural Language Processing) = Computer + Human Language
It allows machines to read, understand, and generate human language.

Example:
When you ask Alexa — “What’s the weather today?”
→ It converts your voice to text
→ Understands meaning
→ Searches for weather
→ Replies in natural language

That entire process is NLP.

2. Why NLP is Hard (Challenges)

Human language is not always straightforward.
One word or sentence can have many meanings — this is called Ambiguity

| Type                      | Meaning                      | Example                             | Problem                    | Solution (NLP technique)        |
| ------------------------- | ---------------------------- | ----------------------------------- | -------------------------- | ------------------------------- |
| **Lexical Ambiguity**     | A word has multiple meanings | “Bank” → (money bank / river bank)  | Word meaning confusion     | Word Sense Disambiguation (WSD) |
| **Syntactic Ambiguity**   | Sentence structure unclear   | “I saw the man with the telescope.” | Who had the telescope?     | Syntactic Parsing               |
| **Semantic Ambiguity**    | Sentence meaning unclear     | “The chicken is ready to eat.”      | Who eats whom?             | Semantic Role Labeling          |
| **Pragmatic Ambiguity**   | Meaning depends on context   | “Can you pass the salt?”            | Not a question — a request | Context understanding           |
| **Referential Ambiguity** | Pronouns unclear             | “John told Mike that he was late.”  | Who is “he”?               | Coreference Resolution          |


3. How NLP Handles Ambiguity

NLP uses a combination of:

Contextual Analysis → Understanding the full sentence meaning.

Machine Learning / Deep Learning Models → Train models to predict likely meaning.

Word Sense Disambiguation (WSD) → Identify correct meaning of words.

Dependency Parsing → Understand grammatical structure.

Coreference Resolution → Find what pronouns refer to.

In [1]:
import nltk
from nltk.tokenize import sent_tokenize, word_tokenize

nltk.download('punkt')

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


True

In [2]:
import nltk
nltk.download('punkt_tab')

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


True

In [3]:
text = "I love NLP. It helps computers understand human language!"
stext=sent_tokenize(text)

In [4]:
print(stext)
print(len(stext))

['I love NLP.', 'It helps computers understand human language!']
2


In [5]:
wtext=word_tokenize(text)
print(wtext)
print(len(wtext))

['I', 'love', 'NLP', '.', 'It', 'helps', 'computers', 'understand', 'human', 'language', '!']
11


Detect Ambiguity with Simple Rules

We’ll use word sense to see multiple meanings of one word using wordnet.

In [6]:
from nltk.corpus import wordnet
nltk.download('wordnet')

word = "bank"
syn = wordnet.synsets(word)  #synsets=“Synonym Sets” ->words with same meaning


print("Possible meanings of 'bank':")
for s in syn:
    print("-", s.definition())  #synset contain definition() - that’s the meaning of that word in a particular sense.


[nltk_data] Downloading package wordnet to /root/nltk_data...


Possible meanings of 'bank':
- sloping land (especially the slope beside a body of water)
- a financial institution that accepts deposits and channels the money into lending activities
- a long ridge or pile
- an arrangement of similar objects in a row or in tiers
- a supply or stock held in reserve for future use (especially in emergencies)
- the funds held by a gambling house or the dealer in some gambling games
- a slope in the turn of a road or track; the outside is higher than the inside in order to reduce the effects of centrifugal force
- a container (usually with a slot in the top) for keeping money at home
- a building in which the business of banking transacted
- a flight maneuver; aircraft tips laterally about its longitudinal axis (especially in turning)
- tip laterally
- enclose with a bank
- do business with a bank or keep an account at a bank
- act as the banker in a game or in gambling
- be in the banking business
- put into a bank account
- cover with ashes so to contr

Simple POS (Part-of-Speech) Tagging

This helps computers understand sentence structure.

In [7]:
import nltk
nltk.download('averaged_perceptron_tagger_eng')

[nltk_data] Downloading package averaged_perceptron_tagger_eng to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger_eng.zip.


True

In [8]:
from nltk import pos_tag

sentence = word_tokenize("I saw the man with the telescope")
tags = pos_tag(sentence)
print(tags)

# PRP → Pronoun
# VBD → Verb (past tense)
# NN → Noun
# IN → Preposition

[('I', 'PRP'), ('saw', 'VBD'), ('the', 'DT'), ('man', 'NN'), ('with', 'IN'), ('the', 'DT'), ('telescope', 'NN')]


Context Resolution (Coreference Example)

For understanding “he”, “she”, etc.

We’re importing spaCy, one of the most popular NLP libraries.
It’s much more powerful and faster than NLTK for real-world NLP tasks like:

POS tagging

Dependency parsing

Named Entity Recognition (NER)

Coreference analysis

In [9]:
import spacy

nlp_abc = spacy.load("en_core_web_sm")  #"en_core_web_sm" means: en → English, core → General-purpose ,
                                # web → Trained on web text,   sm → Small version (lightweight)
text = "John told Mike that he was late."
doc = nlp_abc(text)  #Breaks the sentence into tokens (words)Tags each word with its grammatical role (subject, object, etc.)
              #Creates dependency relationships between words

for token in doc:
    print(token.text, token.dep_, token.head.text)


John nsubj told
told ROOT told
Mike dobj told
that mark was
he nsubj was
was ccomp told
late acomp was
. punct told


token.text → the actual word

token.dep_ → dependency label (grammatical role)

token.head.text → the head word that this token depends on


Take a paragraph (e.g., from a news article).

Sentence tokenization

Word tokenization

POS tagging

Count all nouns and verbs

In [10]:
news="""As Maharashtra Deputy Chief Minister Ajit Pawar’s son Parth Pawar is currently embroiled in a row involving an
alleged land scam in Pune, his father defended him Sunday and blamed the sub-registrar who registered the sale deed.
”His father has put forth his stand,” said Ajit Pawar when reporters asked him in his hometown, Baramati, as to why Parth
was not coming forward and clearing the air.Blaming the sub-registrar who registered the sale deed, he said, ”How did
the individual in the sub-registrar’s office register the deed? What prompted him to register and do the wrong job? We will
come to know this through the probe,” he said"""

In [12]:
sent=sent_tokenize(news)
word=word_tokenize(news)
part=pos_tag(word)

In [13]:
count=0
for i in part:
  # POS tags for nouns typically start with 'NN' (NN, NNS, NNP, NNPS)
  # POS tags for verbs typically start with 'VB' (VB, VBD, VBG, VBN, VBP, VBZ)
  if i[1].startswith('NN') or i[1].startswith('VB'):
    count+=1
    print(i)

print(f"\nTotal count of nouns and verbs: {count}")

('Maharashtra', 'NNP')
('Deputy', 'NNP')
('Chief', 'NNP')
('Minister', 'NNP')
('Ajit', 'NNP')
('Pawar', 'NNP')
('’', 'NNP')
('s', 'VBD')
('Parth', 'NNP')
('Pawar', 'NNP')
('is', 'VBZ')
('embroiled', 'VBN')
('row', 'NN')
('involving', 'VBG')
('alleged', 'VBN')
('land', 'NN')
('scam', 'NN')
('Pune', 'NNP')
('father', 'NN')
('defended', 'VBD')
('Sunday', 'NNP')
('blamed', 'VBD')
('sub-registrar', 'NN')
('registered', 'VBD')
('sale', 'NN')
('deed', 'NN')
('”', 'VB')
('father', 'NN')
('has', 'VBZ')
('put', 'VBN')
('stand', 'NN')
('”', 'NNP')
('said', 'VBD')
('Ajit', 'NNP')
('Pawar', 'NNP')
('reporters', 'NNS')
('asked', 'VBD')
('hometown', 'NN')
('Baramati', 'NNP')
('Parth', 'NNP')
('was', 'VBD')
('coming', 'VBG')
('clearing', 'VBG')
('air.Blaming', 'VBG')
('sub-registrar', 'NN')
('registered', 'VBD')
('sale', 'NN')
('deed', 'NN')
('said', 'VBD')
('”', 'NNP')
('How', 'NNP')
('did', 'VBD')
('individual', 'NN')
('’', 'NNP')
('s', 'NN')
('office', 'NN')
('register', 'NN')
('deed', 'NN')
('prom