<a href="https://colab.research.google.com/github/MK316/Spring2024/blob/main/Corpus/Words_in_context.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🍃 Words in context

+ Analyzing words in context is fundamental for accurately interpreting and understanding language, whether in human communication, language learning, or computational language processing.

## Key methods

+ Tokenization
+ Part-of-Speech (POS) Tagging
+ Contextual Word Meaning (Word Sense Disambiguation)
+ Concordance view
+ Collocations
+ Sentiment analysis

## {nltk} installation

In [None]:
!pip install nltk

## [1] Tokenization

+ Purpose: Breaking down text into individual words (tokens) is the first step in many NLP tasks.
+ Method: Use nltk.word_tokenize() for tokenizing sentences into words.

In [None]:
text = "The quick brown fox jumps over the lazy dog"

In [None]:
import nltk
nltk.download('punkt')
from nltk.tokenize import word_tokenize, sent_tokenize

In [None]:
tokens = word_tokenize(text)
print(tokens)

## Part-of-Speech (POS) Tagging

+ Purpose: Assigning parts of speech to each word (like noun, verb, adjective) helps in understanding the grammatical context.
+ Method: Use nltk.pos_tag().
+ [Penn Treebank tagset](https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html)

In [None]:
nltk.download('averaged_perceptron_tagger')

In [None]:
from nltk import pos_tag
pos_tags = pos_tag(tokens)
print(pos_tags)

## [3] Contextual Word Meaning (Word Sense Disambiguation):

+ Purpose: Determining the meaning of a word based on the context it appears in.
+ Method: Use algorithms like Lesk Algorithm implemented in NLTK.

Note: NLTK uses [WordNet](https://wordnet.princeton.edu)

+ Bank (Meaning 1 - Financial Institution):

  + Sentence 1: I need to visit the bank to withdraw some money.
  + Sentence 2: The bank of the river was a peaceful place to relax.
+ Bat (Meaning 1 - Nocturnal Flying Mammal):

  + Sentence 1: I saw a bat flying in the night sky.
  + Sentence 2: She used a baseball bat to hit the ball out of the park.
+ Book (Meaning 1 - Written or Printed Work):

  + Sentence 1: I'm reading a fascinating book about space exploration.
  + Sentence 2: Please book a table for two at the restaurant for tonight.
+ Crane (Meaning 1 - Bird with a Long Neck):

  + Sentence 1: A beautiful crane waded in the shallow water.
  + Sentence 2: They used a crane to lift the heavy machinery onto the truck.
+ Club (Meaning 1 - Social Organization):

  + Sentence 1: I'm a member of the local chess club.
  + Sentence 2: He used a golf club to hit the ball into the hole.

In [None]:
sent = input("Paste a sentence: ")
amb = input("Type target word: ")

In [None]:
from nltk.wsd import lesk
from nltk.tokenize import word_tokenize

sentence = sent
ambiguous = amb
word_sense = lesk(word_tokenize(sentence), ambiguous)

# Access the name of the disambiguated sense
print("Disambiguated Sense:", word_sense.name())
# Access the definition of the disambiguated sense
print("Sense Definition:", word_sense.definition())


In [None]:
from nltk.wsd import lesk
from nltk.tokenize import word_tokenize
from nltk import pos_tag

# Define the sentence and ambiguous word
sentence = "He addressed the issue."
ambiguous = "address"

# Tokenize the sentence and perform POS tagging
tokens = word_tokenize(sentence)
pos_tags = pos_tag(tokens)

# Filter tokens based on the POS tag of the ambiguous word
filtered_tokens = [token for token, pos in pos_tags if pos == 'POS_TAG_OF_AMBIGUOUS_WORD']

# Perform Word Sense Disambiguation using Lesk algorithm
word_sense = lesk(filtered_tokens, ambiguous)

# Access the name of the disambiguated sense
print("Disambiguated Sense:", word_sense.name())
# Access the definition of the disambiguated sense
print("Sense Definition:", word_sense.definition())


In [None]:
from nltk.corpus import wordnet
from nltk.wsd import lesk
from nltk.tokenize import word_tokenize
from nltk import pos_tag

sentence = "Your addressed the issue clearly."
ambiguous_word = "addressed"

# Define a function to map Penn Treebank POS tags to WordNet POS tags
def penn_to_wordnet_pos(penn_pos):
    if penn_pos.startswith('N'):
        return wordnet.NOUN
    elif penn_pos.startswith('V'):
        return wordnet.VERB
    elif penn_pos.startswith('R'):
        return wordnet.ADV
    elif penn_pos.startswith('J'):
        return wordnet.ADJ
    else:
        return None  # Return None for unknown POS tags

# Define your sentence and ambiguous word
sentence = "The invalid is in the hospital."
ambiguous_word = "invalid"

# Tokenize the sentence and perform POS tagging
tokens = word_tokenize(sentence)
pos_tags = pos_tag(tokens)

# Determine the Penn Treebank POS tag for the ambiguous word
ambiguous_word_pos_penn = None

for token, pos in pos_tags:
    if token == ambiguous_word:
        ambiguous_word_pos_penn = pos
        break

# Map the Penn Treebank POS tag to WordNet POS tag
ambiguous_word_pos_wordnet = penn_to_wordnet_pos(ambiguous_word_pos_penn)

if ambiguous_word_pos_wordnet is None:
    print(f"Cannot determine WordNet POS category for '{ambiguous_word_pos_penn}'.")
else:
    # Retrieve synsets and disambiguate sense
    synsets = wordnet.synsets(ambiguous_word, pos=ambiguous_word_pos_wordnet)

    if synsets:
        word_sense = lesk(tokens, ambiguous_word, pos=ambiguous_word_pos_wordnet)
        print("Disambiguated Sense:", word_sense.name())
        print("Sense Definition:", word_sense.definition())
    else:
        print(f"No synsets found for '{ambiguous_word}' in the '{ambiguous_word_pos_wordnet}' category.")


## Gradio

In [None]:
!pip install gradio

In [None]:
#@markdown Gradio app to display the ambiguous meaning (Not so reliable)
import gradio as gr
import nltk
from nltk.corpus import wordnet
from nltk.wsd import lesk
from nltk.tokenize import word_tokenize
from nltk import pos_tag

# Download necessary NLTK data
nltk.download('averaged_perceptron_tagger')
nltk.download('punkt')
nltk.download('wordnet')
nltk.download('omw-1.4')

# Define a function to map Penn Treebank POS tags to WordNet POS tags
def penn_to_wordnet_pos(penn_pos):
    if penn_pos.startswith('N'):
        return wordnet.NOUN
    elif penn_pos.startswith('V'):
        return wordnet.VERB
    elif penn_pos.startswith('R'):
        return wordnet.ADV
    elif penn_pos.startswith('J'):
        return wordnet.ADJ
    else:
        return None  # Return None for unknown POS tags

# Define the disambiguation function that uses POS tagging
def disambiguate_word_sense(sentence, ambiguous_word):
    # Tokenize the sentence and perform POS tagging
    tokens = word_tokenize(sentence)
    pos_tags = pos_tag(tokens)

    # Find the POS tag for the ambiguous word in the tokenized sentence
    ambiguous_word_pos_penn = None
    for word, pos in pos_tags:
        if word.lower() == ambiguous_word.lower():
            ambiguous_word_pos_penn = pos
            break

    # If the POS tag is found, convert to WordNet POS tag
    if ambiguous_word_pos_penn:
        ambiguous_word_pos_wordnet = penn_to_wordnet_pos(ambiguous_word_pos_penn)
    else:
        return "The ambiguous word was not found in the sentence."

    if ambiguous_word_pos_wordnet:
        # Perform Word Sense Disambiguation using Lesk algorithm
        word_sense = lesk(tokens, ambiguous_word, pos=ambiguous_word_pos_wordnet)
        if word_sense:
            return f"Disambiguated Sense: {word_sense.name()}\nSense Definition: {word_sense.definition()}"
        else:
            return f"No disambiguated sense found for '{ambiguous_word}'."
    else:
        return f"Cannot determine WordNet POS category for '{ambiguous_word}'."

# Create the Gradio interface
iface = gr.Interface(
    fn=disambiguate_word_sense,
    inputs=[
        gr.Textbox(lines=2, placeholder="Enter a sentence containing the ambiguous word", label="Sentence"),
        gr.Textbox(placeholder="Enter the ambiguous word", label="Ambiguous Word")
    ],
    outputs=gr.Textbox(label="Result"),
    title="Word Sense Disambiguation",
    description="Enter a sentence and an ambiguous word to disambiguate its sense."
)

# Launch the Gradio interface
iface.launch()


# With POS (Just to get an idea)

In [None]:
#@markdown Gradio app with POS info to display the ambiguous meaning (Not so reliable)
import gradio as gr
import nltk
from nltk.corpus import wordnet
from nltk.wsd import lesk
from nltk.tokenize import word_tokenize

# Ensure NLTK data is available
nltk.download('punkt')
nltk.download('wordnet')
nltk.download('omw-1.4')  # Open Multilingual Wordnet

# Define the disambiguation function
def disambiguate_word_sense(sentence, ambiguous_word, pos_choice):
    # Map POS choice to WordNet POS
    pos_map = {
        "Noun": wordnet.NOUN,
        "Verb": wordnet.VERB,
        "Adjective": wordnet.ADJ,
        "Adverb": wordnet.ADV
    }

    # Determine WordNet POS based on the user's choice
    wordnet_pos = pos_map.get(pos_choice)

    if wordnet_pos is None:
        return f"Cannot determine WordNet POS category for '{pos_choice}'."

    tokens = word_tokenize(sentence)

    # Use lesk to disambiguate the sense of the word
    disambiguated_sense = lesk(tokens, ambiguous_word, pos=wordnet_pos)

    if disambiguated_sense:
        sense_name = disambiguated_sense.name()
        sense_definition = disambiguated_sense.definition()  # Get the definition of the selected sense
        return f"Disambiguated Sense: {sense_name}\nSense Definition: {sense_definition}"
    else:
        return f"No suitable sense found for '{ambiguous_word}' with POS '{pos_choice}'."

# Create a Gradio interface with a submit button
iface = gr.Interface(
    fn=disambiguate_word_sense,
    inputs=[
        gr.Textbox(label="Sentence", placeholder="Enter a sentence containing the ambiguous word"),
        gr.Textbox(label="Ambiguous Word", placeholder="Enter the ambiguous word"),
        gr.Dropdown(label="Select POS", choices=["Noun", "Verb", "Adjective", "Adverb"])
    ],
    outputs=gr.Textbox(label="Result"),
    title="Word Sense Disambiguation",
    description="Enter a sentence, an ambiguous word, and select the part of speech (POS) of the word."
)

# Launch the Gradio interface
iface.launch()
