<a href="https://colab.research.google.com/github/HamzaBahsir/NLP/blob/main/StartingNLP.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Starting NLP**

In this session, we will do following

1. Tokenization
2. Part-of-Speech (POS) Tagging
3. Word Sense Disambiguation
4. Dependency Parsing
5. Syntactic Parsing
6. Text Classification
7. Coreference Resolution
8. Named Entity Recognition (NER)
9. Natural Language Generation

#**Extra Resources**
[Natural Language Processing with Python](https://www.nltk.org/book/)

#**Libraries Required**
*   nltk
*   spacy
*   textblob
*   fastcoref
*   transformers




In [None]:
# Import Libraries
import nltk
import spacy
import textblob
# Installing the library
!pip install fastcoref -q
import transformers


#### For Tokenization
nltk.download('punkt_tab')
from nltk.tokenize import word_tokenize, sent_tokenize
# Download the required tokenizer models
nltk.download('punkt')

#### For POS tagging
from nltk import pos_tag
# Download the model for POS tagging
nltk.download('averaged_perceptron_tagger_eng')

#### For Word Sense Disambiguity (WSD)
from nltk.corpus import wordnet
from nltk.wsd import lesk
# Download WordNet data
nltk.download('wordnet')

#### For Dependency Parsing
# Load the spaCy model
nlp = spacy.load("en_core_web_sm")
# Visualization
from spacy import displacy

#### For Syntactic Parsing
from nltk import CFG
from nltk.parse import RecursiveDescentParser

#### For Text classification
from textblob import TextBlob

#### For Coreference Resolution
from fastcoref import FCoref

#### For named entry recognition
from nltk import ne_chunk
nltk.download('maxent_ne_chunker_tab')
nltk.download('words')
from nltk.tree import Tree

### For natural language generation
from transformers import pipeline

KeyboardInterrupt: 

#**Tokenization**

**Input Text**: NVDEE has expertise in high performace computing. It offers services in Natural Language Processing.

**Sentence Tokenization**: ['NVDEE has expertise in high performace computing.', 'It offers services in Natural Language Processing']

**Word Tokenization**: ['NVDEE', 'has', 'expertise', 'in', 'high', 'performace', 'computing', '.', 'It', 'offers', 'services', 'in', 'Natural', 'Language', 'Processing']

In [None]:
# Sample Text
text = "NVDEE has expertise in high performace computing. It offers services in Natural Language Processing"

# Sentence Tokenization
sentences = sent_tokenize(text)
print("Sentences:", sentences)

words = [];
# Word Tokenization
for i in range(len(sentences)):
  words += word_tokenize(sentences[i])
print("Words:", words)

Sentences: ['NVDEE has expertise in high performace computing.', 'It offers services in Natural Language Processing']
Words: ['NVDEE', 'has', 'expertise', 'in', 'high', 'performace', 'computing', '.', 'It', 'offers', 'services', 'in', 'Natural', 'Language', 'Processing']


#**Part-of-Speech (POS) Tagging**
**Input (Word Tokenized)**:   Words: ['NVDEE', 'has', 'expertise', 'in', 'high', 'performace', 'computing', '.', 'It', 'offers', 'services', 'in', 'Natural', 'Language', 'Processing']

**POS Tagging Output:**: POS Tags: [('NVDEE', 'NNP'), ('has', 'VBZ'), ('expertise', 'NN'), ('in', 'IN'), ('high', 'JJ'), ('performace', 'NN'), ('computing', 'VBG'), ('.', '.'), ('It', 'PRP'), ('offers', 'VBZ'), ('services', 'NNS'), ('in', 'IN'), ('Natural', 'NNP'), ('Language', 'NNP'), ('Processing', 'NNP')]

* NVDEE - Proper Noun, Singular (NNP)
* has - Verb, 3rd person singular present (VBZ)
* expertise - Noun (NN)
* in - Preposition (IN)
* high - Adjective (JJ)
* performance - Noun (NN)
* computing -  Verbs ending in -ing  (VBG)
* . - Punctuation (.)



In [None]:
# POS Tagging
pos_tags = pos_tag(words)
print("POS Tags:", pos_tags)

POS Tags: [('NVDEE', 'NNP'), ('has', 'VBZ'), ('expertise', 'NN'), ('in', 'IN'), ('high', 'JJ'), ('performace', 'NN'), ('computing', 'VBG'), ('.', '.'), ('It', 'PRP'), ('offers', 'VBZ'), ('services', 'NNS'), ('in', 'IN'), ('Natural', 'NNP'), ('Language', 'NNP'), ('Processing', 'NNP')]


#**Word Sense Disambiguation (WSD)**
After getting all definitions, we have used LESK model for WSD but it is not correctly working at the moment.

In [None]:
word = "offers"

# Get all possible definitions for the word
definitions = wordnet.synsets(word)

print(f"Definitions for '{word}':")
for syn in definitions:
    print(f"- {syn.definition()}")

Definitions for 'offers':
- the verbal act of offering
- something offered (as a proposal or bid)
- a usually brief attempt
- make available or accessible, provide or furnish
- present for acceptance or rejection
- agree freely
- put forward for consideration
- offer verbally
- make available for sale
- propose a payment
- produce or introduce on the stage
- present as an act of worship
- mount or put up
- make available; provide
- ask (someone) to marry you
- threaten to do something


In [None]:
# Define a sentence
sentence = "It offers services in Natural Language Processing"

# Perform WSD using Lesk algorithm
sense = lesk(nltk.word_tokenize(sentence), "offers")

# Print out the sense and its definition
print(f"Chosen sense of 'offers': {sense}")
print(f"Definition: {sense.definition()}")

Chosen sense of 'offers': Synset('volunteer.v.02')
Definition: agree freely


# **Dependency Parsing**

In [None]:
# Process sentence
sentence = "NVDEE has expertise in high performace computing."
doc = nlp(sentence)
# Print dependency relations
print("Dependency Parsing:")
for token in doc:
    print(f"{token.text} --> {token.dep_} --> {token.head.text}")

# nsubj = nominal subject | ROOT = main verb | dobj = Direct Object | prep = Prepositional Modifier |
# amod = Adjectival Modifier | compound = dependency label | acomp = adjectival compliment |
# pobj = object of a preposition | punct = punctuation

Dependency Parsing:
NVDEE --> nsubj --> has
has --> ROOT --> has
expertise --> dobj --> has
in --> prep --> has
high --> amod --> performace
performace --> compound --> computing
computing --> pobj --> in
. --> punct --> has


In [None]:
# Visualize the dependency parsing
displacy.serve(doc, style="dep")




Using the 'dep' visualizer
Serving on http://0.0.0.0:5000 ...

Shutting down server on port 5000.


# **Syntactic Parsing**

In [None]:
grammar = CFG.fromstring("""
    S -> NP VP
    NP -> N V N
    VP -> PP Adj N N
    N -> 'NVDEE' | 'expertise' | 'performance' | 'computing'
    V -> 'has'
    PP -> 'in'
    Adj -> 'high'
""")

# Create a parser using Recursive Descent Parsing
parser = RecursiveDescentParser(grammar)

# Define the sentence to parse
sentence = "NVDEE has expertise in high performance computing"
words = word_tokenize(sentence)

# Parse the sentence and display results
print("Parse Tree(s):")
for tree in parser.parse(words):
    print(tree)
    tree.pretty_print()



Parse Tree(s):
(S
  (NP (N NVDEE) (V has) (N expertise))
  (VP (PP in) (Adj high) (N performance) (N computing)))
                     S                            
        _____________|___                          
       NP                VP                       
   ____|______        ___|___________________      
  N    V      N      PP Adj       N          N    
  |    |      |      |   |        |          |     
NVDEE has expertise  in high performance computing



# **Text Classification**

In [None]:
from textblob import TextBlob

text = "NVDEE has expertise in high performance computing"

# Analyze sentiment
blob = TextBlob(text)
print("Sentiment:", blob.sentiment)

Sentiment: Sentiment(polarity=0.16, subjectivity=0.5399999999999999)


*Key points:*
* `TextBlob.sentiment`: Returns **polarity** (positive/negative score) and **subjectivity** (opinion vs fact).
* **Polarity**: A float value between -1.0 (negative sentiment) and 1.0 (positive sentiment). The higher the polarity, the more positive the sentiment. For example:

      Polarity = 0.75: Strong positive sentiment.
      Polarity = -0.5: Moderate negative sentiment.
      Polarity = 0.0: Neutral sentiment (neither positive nor negative).






* **Subjectivity**: A float value between 0.0 (objective, factual) and 1.0 (subjective, opinion). Higher subjectivity means the text is more opinion-based or personal. For example:
      Subjectivity = 0.1: Very objective (factual).
      Subjectivity = 0.9: Very subjective (opinion-based).
      Subjectivity = 0.5: Balanced between factual and opinion-based.

# **Coreference Resolution**

In [None]:
# Initialize the coreference resolution model
model = FCoref() # use FCoref(device='cuda:0') if GPU available

# Input text
text = "NVDEE has expertise in high performace computing. It offers services in Natural Language Processing."

# Perform coreference resolution
preds = model.predict(texts=[text])

Map:   0%|          | 0/1 [00:00<?, ? examples/s]

Inference:   0%|          | 0/1 [00:00<?, ?it/s]

In [None]:
# Print clusters
print(preds[0].get_clusters(as_strings=True))

# Print resolved text
resolved_text = text
for cluster in preds[0].get_clusters():
    main_ref = cluster[0]  # Use the first element of the cluster as the reference
    for mention in cluster[1:]:
        resolved_text = resolved_text.replace(mention, main_ref)

print("\nResolved Text:", resolved_text)

[['NVDEE', 'It']]

Resolved Text: NVDEE has expertise in high performace computing. NVDEE offers services in Natural Language Processing.


# **Named Entity Recognition (NER)**

In [None]:
# Input Text
text = "NVDEE has expertise in high performace computing. It offers services in Signal processing"

# Tokenize the Sentence
tokens = word_tokenize(text)

# Compute POS Tags
pos_tags = pos_tag(tokens)

# Perform NER
named_entities = ne_chunk(pos_tags)
print(named_entities)

(S
  (GPE NVDEE/NNP)
  has/VBZ
  expertise/NN
  in/IN
  high/JJ
  performace/NN
  computing/VBG
  ./.
  It/PRP
  offers/VBZ
  services/NNS
  in/IN
  (GPE Signal/NNP)
  processing/NN)


In [None]:
# Visualising/Printing as a Tree
named_entities.pretty_print()

                                                                S                                                                         
    ____________________________________________________________|___________________________________________________________________       
   |         |         |      |          |             |        |    |        |           |         |         |          GPE       GPE    
   |         |         |      |          |             |        |    |        |           |         |         |           |         |      
has/VBZ expertise/NN in/IN high/JJ performace/NN computing/VBG ./. It/PRP offers/VBZ services/NNS in/IN processing/NN NVDEE/NNP Signal/NNP



In [None]:
from nltk.tree import Tree

# Extract named entities from the output tree
entities = []
for subtree in named_entities:
    if isinstance(subtree, Tree):  # Check if it is a named entity
        entity = " ".join([token for token, pos in subtree.leaves()])
        entity_type = subtree.label()
        entities.append((entity, entity_type))

# Display named entities in a clear format
print("Named Entities:")
for entity, entity_type in entities:
    print(f"{entity} ({entity_type})")

Named Entities:
NVDEE (GPE)
Signal (GPE)


# **Natural Language Generation**

In [None]:
# Input text
input_text = """
Palestinian Ambassador to Pakistan, Dr Zuhair Zaid expressed profound gratitude to the siblings for their unwavering solidarity with the Palestinian cause. In an official letter of appreciation, he praised their courageous efforts, calling them a symbol of justice and humanity.
I find myself at a loss for words, overwhelmed by the depth of your courage, the purity of your love, and the boundless compassion you have shown," Dr Zaid wrote, emphasizing that their message was a powerful reminder that "humanity is still alive in its purest form.
He lauded the impact of their sacrifice, noting that "though small in action, it is immeasurable in meaning," and described their advocacy as an "unbreakable bond that transcends distance and time" between the people of Pakistan and Palestine.
"""

# Initialize the summarization pipeline
summarizer = pipeline("summarization", model="t5-base")

# Pass the input text to the summarizer
abstractive_summary = summarizer(input_text, max_length=50, min_length=25, do_sample=False)

# Display results
print("\nAbstractive Summary:")
print(abstractive_summary[0]['summary_text'])

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/892M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]


Abstractive Summary:
"humanity is still alive in its purest form," he writes . he lauds the impact of their sacrifice, noting "though small in action, it is immeasurable"
