<a href="https://colab.research.google.com/github/divyyeahhhhh/DSA0311--Natural-Language-Processing/blob/main/DSA0311(NLP).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**1.	Write program demonstrates how to use regular expressions in Python to match and search for patterns in text.**

In [None]:
import re

def find_emails(text):
    # Define a simple regular expression for matching email addresses
    email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'

    # Use re.findall to find all matches in the text
    matches = re.findall(email_pattern, text)

    return matches

# Example text containing email addresses
sample_text = "Contact us at info@example.com or support@company.com for assistance."

# Find and print all email addresses in the text
email_addresses = find_emails(sample_text)
print("Email Addresses Found:")
print(email_addresses)

Email Addresses Found:
['info@example.com', 'support@company.com']


**2.	Implement a basic finite state automaton that recognizes a specific language or pattern. In this example, we'll create a simple automaton to match strings ending with 'ab' using python.**

In [None]:
def is_match(input_string):
    # Define the finite state automaton transitions
    transitions = {
        0: {'a': 1, 'b': 0},
        1: {'a': 1, 'b': 2},
        2: {'a': 1, 'b': 0}
    }
    current_state = 0
    # Process each character in the input string
    for char in input_string:
        if char in transitions[current_state]:
            current_state = transitions[current_state][char]
        else:
            # If there is no transition for the current character, reset to the initial state
            current_state = 0
    # Check if the final state is reached
    return current_state == 2
# Test the automaton with various strings
test_strings = ["ab", "aab", "aaaab", "abc", "xyzab", "abab", "ba"]
for test_string in test_strings:
    if is_match(test_string):
        print(f"'{test_string}' matches the pattern.")
    else:
        print(f"'{test_string}' does not match the pattern.")

'ab' matches the pattern.
'aab' matches the pattern.
'aaaab' matches the pattern.
'abc' does not match the pattern.
'xyzab' matches the pattern.
'abab' matches the pattern.
'ba' does not match the pattern.


**3.	Write program demonstrates how to perform morphological analysis using the NLTK library in Python.**

In [None]:
import nltk
from nltk.stem import PorterStemmer

nltk.download('punkt')  # Download the punkt tokenizer if not already downloaded

def perform_morphological_analysis(text):
    # Tokenize the input text into words
    words = nltk.word_tokenize(text)

    # Create a Porter stemmer object
    porter_stemmer = PorterStemmer()

    # Perform stemming on each word
    stemmed_words = [porter_stemmer.stem(word) for word in words]

    return stemmed_words

if __name__ == "__main__":
    # Example text for morphological analysis
    input_text = "The quick brown foxes are jumping over the lazy dogs"

    # Perform morphological analysis (stemming)
    result = perform_morphological_analysis(input_text)

    # Display the original and stemmed words
    print("Original words:", nltk.word_tokenize(input_text))
    print("Stemmed words:", result)


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


Original words: ['The', 'quick', 'brown', 'foxes', 'are', 'jumping', 'over', 'the', 'lazy', 'dogs']
Stemmed words: ['the', 'quick', 'brown', 'fox', 'are', 'jump', 'over', 'the', 'lazi', 'dog']


**4.	Implement a finite-state machine for morphological parsing. In this example, we'll create a simple machine to generate plural forms of English nouns using python.**

In [None]:
import nltk
nltk.download('averaged_perceptron_tagger')
from nltk import pos_tag, word_tokenize
def identify_nouns(sentence):
    words = word_tokenize(sentence)
    tagged_words = pos_tag(words)
    print(tagged_words)

    nouns = [word for word, pos in tagged_words if pos.startswith('NN')]

    return nouns

sentence = "The quick brown fox jumps over the lazy dog."

nouns = identify_nouns(sentence)

if nouns:
    print("Nouns identified in the sentence:")
    for noun in nouns:
        if noun[-1].lower() in {'s', 'x', 'z'} or noun[-2:].lower() in {'ch', 'sh'}:
            print(noun+"es")
        else:
            print(noun+"s")
else:
    print("No nouns found in the sentence.")

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


[('The', 'DT'), ('quick', 'JJ'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'VBZ'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'), ('dog', 'NN'), ('.', '.')]
Nouns identified in the sentence:
browns
foxes
dogs


**5.	Use the Porter Stemmer algorithm to perform word stemming on a list of words using python libraries.**

In [None]:
from nltk.stem import PorterStemmer
def perform_stemming(words):
    # Initialize the Porter Stemmer
    porter_stemmer = PorterStemmer()
    # Perform stemming for each word
    stemmed_words = [porter_stemmer.stem(word) for word in words]
    return stemmed_words

words_to_stem = ["running", "jumps", "happily", "dogs", "cats", "better"]

stemmed_words = perform_stemming(words_to_stem)

print("Original Words:", words_to_stem)
print("Stemmed Words:", stemmed_words)

Original Words: ['running', 'jumps', 'happily', 'dogs', 'cats', 'better']
Stemmed Words: ['run', 'jump', 'happili', 'dog', 'cat', 'better']


**6.	Implement a basic N-gram model for text generation. For example, generate text using a bigram model using python.**

In [None]:
import random

def build_bigram_model(sentences):
    bigram_model = {}

    for sentence in sentences:
        tokens = sentence.split()
        for i in range(len(tokens) - 1):
            current_word = tokens[i]
            next_word = tokens[i + 1]

            if current_word in bigram_model:

                bigram_model[current_word].append(next_word)
            else:
                bigram_model[current_word] = [next_word]

    return bigram_model

def generate_text(bigram_model, start_word, length=10):
    generated_text = [start_word]

    for _ in range(length - 1):
        if start_word in bigram_model:
            next_word = random.choice(bigram_model[start_word])
            generated_text.append(next_word)
            start_word = next_word
        else:
            break

    return ' '.join(generated_text)

# Example list of sentences
sentences = [
    "I love programming in Python.",
    "Python is a versatile programming language.",
    "Text generation using bigram models is interesting.",
    "Natural Language Processing involves analyzing and generating text."
]

# Build bigram model
bigram_model = build_bigram_model(sentences)
print(bigram_model)

# Generate text using bigram model
generated_text = generate_text(bigram_model, start_word="I", length=8)

# Display the results
print("Generated Text:", generated_text)

{'I': ['love'], 'love': ['programming'], 'programming': ['in', 'language.'], 'in': ['Python.'], 'Python': ['is'], 'is': ['a', 'interesting.'], 'a': ['versatile'], 'versatile': ['programming'], 'Text': ['generation'], 'generation': ['using'], 'using': ['bigram'], 'bigram': ['models'], 'models': ['is'], 'Natural': ['Language'], 'Language': ['Processing'], 'Processing': ['involves'], 'involves': ['analyzing'], 'analyzing': ['and'], 'and': ['generating'], 'generating': ['text.']}
Generated Text: I love programming in Python.


**7.	Write program using the NLTK library to perform part-of-speech tagging on a text.**

In [None]:
import nltk
from nltk import pos_tag, word_tokenize
def perform_pos_tagging(text):
    # Tokenize the text into words
    words = word_tokenize(text)
    # Perform part-of-speech tagging
    tagged_words = pos_tag(words)
    return tagged_words
# Example text
text = "NLTK is a powerful library for natural language processing."
# Perform part-of-speech tagging
tagged_words = perform_pos_tagging(text)

print("Original Text:", text)
print("Part-of-Speech Tagging Result:", tagged_words)

Original Text: NLTK is a powerful library for natural language processing.
Part-of-Speech Tagging Result: [('NLTK', 'NNP'), ('is', 'VBZ'), ('a', 'DT'), ('powerful', 'JJ'), ('library', 'NN'), ('for', 'IN'), ('natural', 'JJ'), ('language', 'NN'), ('processing', 'NN'), ('.', '.')]


**8.	Implement a simple stochastic part-of-speech tagging algorithm using a basic probabilistic model to assign POS tags using python.**

In [None]:
import random

def train_unigram_model(tagged_corpus):
    unigram_model = {}

    for sentence in tagged_corpus:
        for word, pos_tag in sentence:
            if word in unigram_model:
                unigram_model[word].append(pos_tag)
            else:
                unigram_model[word] = [pos_tag]

    return unigram_model

def stochastic_pos_tagging(sentence, unigram_model):
    tagged_sentence = []

    for word in sentence:
        if word in unigram_model:
            pos_tag = random.choice(unigram_model[word])
        else:
            # If word not in model, assign a default POS tag (e.g., 'NOUN')
            pos_tag = 'NOUN'

        tagged_sentence.append((word, pos_tag))

    return tagged_sentence

# Example tagged corpus for training
tagged_corpus = [
    [('The', 'DET'), ('quick', 'ADJ'), ('brown', 'ADJ'), ('fox', 'NOUN')],
    [('Jumped', 'VERB'), ('over', 'PREP'), ('the', 'DET'), ('lazy', 'ADJ'), ('dog', 'NOUN')]
]

# Train unigram model
unigram_model = train_unigram_model(tagged_corpus)

# Example sentence for stochastic POS tagging
sentence_to_tag = ['The', 'lazy', 'fox', 'jumped']

# Perform stochastic POS tagging
tagged_sentence = stochastic_pos_tagging(sentence_to_tag, unigram_model)

# Display the results
print("Original Sentence:", sentence_to_tag)
print("Stochastic POS Tagging Result:", tagged_sentence)

Original Sentence: ['The', 'lazy', 'fox', 'jumped']
Stochastic POS Tagging Result: [('The', 'DET'), ('lazy', 'ADJ'), ('fox', 'NOUN'), ('jumped', 'NOUN')]


**9.	Implement a rule-based part-of-speech tagging system using regular expressions using python.**

In [None]:
import re
def rule_based_pos_tagging(sentence):
    tagged_sentence = []
    for word in sentence:
        if re.match(r'\b(?:is|am|are|was|were)\b', word, re.IGNORECASE):
            pos_tag = 'VERB'
        elif re.match(r'\b(?:the|a|an)\b', word, re.IGNORECASE):
            pos_tag = 'DET'
        elif re.match(r'\b(?:quick|brown|lazy)\b', word, re.IGNORECASE):
            pos_tag = 'ADJ'
        else:
            pos_tag = 'NOUN'

        tagged_sentence.append((word, pos_tag))
    return tagged_sentence
# Example sentence for rule-based POS tagging
sentence_to_tag = ['The', 'quick', 'brown', 'fox', 'is', 'lazy']
# Perform rule-based POS tagging
tagged_sentence = rule_based_pos_tagging(sentence_to_tag)
# Display the results
print("Original Sentence:", sentence_to_tag)
print("Rule-based POS Tagging Result:", tagged_sentence)

Original Sentence: ['The', 'quick', 'brown', 'fox', 'is', 'lazy']
Rule-based POS Tagging Result: [('The', 'DET'), ('quick', 'ADJ'), ('brown', 'ADJ'), ('fox', 'NOUN'), ('is', 'VERB'), ('lazy', 'ADJ')]


**10. Implement transformation-based tagging using a set of transformation rules, apply a simple rule to tag words using python.py**

In [None]:
def apply_transformation_rule(word):
    # Apply a simple transformation rule (example rule: if a word ends with 'ing', tag it as a 'VERB')
    if word.lower().endswith('ing'):
        return 'VERB'
    else:
        return 'NOUN'
def transform_based_pos_tagging(sentence):
    tagged_sentence = []

    for word in sentence:
        pos_tag = apply_transformation_rule(word)
        tagged_sentence.append((word, pos_tag))

    return tagged_sentence

# Example sentence for transformation-based POS tagging
sentence_to_tag = ['The', 'running', 'dog', 'is', 'jumping']
# Perform transformation-based POS tagging
tagged_sentence = transform_based_pos_tagging(sentence_to_tag)
# Display the results
print("Original Sentence:", sentence_to_tag)
print("Transformation-based POS Tagging Result:", tagged_sentence)

Original Sentence: ['The', 'running', 'dog', 'is', 'jumping']
Transformation-based POS Tagging Result: [('The', 'NOUN'), ('running', 'VERB'), ('dog', 'NOUN'), ('is', 'NOUN'), ('jumping', 'VERB')]


**11.Implement a simple top-down parser for context-free grammars using python very simple program**

In [None]:
class Parser:
    def __init__(self, tokens):
        self.tokens = tokens
        self.current_token = None
        self.index = -1
        self.advance()

    def advance(self):
        self.index += 1
        if self.index < len(self.tokens):
            self.current_token = self.tokens[self.index]
        else:
            self.current_token = None

    def parse(self):
        return self.expr()

    def expr(self):
        result = self.term()

        while self.current_token in ('+', '-'):
            operator = self.current_token
            self.advance()
            right = self.term()
            if operator == '+':
                result += right
            elif operator == '-':
                result -= right

        return result

    def term(self):
        result = self.factor()

        while self.current_token in ('*', '/'):
            operator = self.current_token
            self.advance()
            right = self.factor()
            if operator == '*':
                result *= right
            elif operator == '/':
                if right == 0:
                    raise ZeroDivisionError("Division by zero")
                result /= right

        return result

    def factor(self):
        token = self.current_token
        self.advance()

        if token == '(':
            result = self.expr()
            if self.current_token != ')':
                raise SyntaxError("Expected closing parenthesis")
            self.advance()
            return result
        elif token.isdigit():
            return int(token)
        else:
            raise SyntaxError("Invalid syntax")


def parse_input(input_string):
    tokens = input_string.replace(' ', '').replace('\t', '').split(',')
    parser = Parser(tokens)
    return parser.parse()

input_string = '5, *, (, 3, +, 7,), +, 10'
result = parse_input(input_string)
print(f"Result: {result}")

Result: 60


**12.Implement an Earley parser for context-free grammars using a simple python program**

In [None]:
class EarleyItem:
    def __init__(self, production, dot_position, start_column):
        self.production = production
        self.dot_position = dot_position
        self.start_column = start_column

def predict(grammar, column, item):
    non_terminal = item.production[item.dot_position]
    for rule in grammar.get(non_terminal, []):
        new_item = EarleyItem(rule, 0, column)
        if new_item not in column:
            column.append(new_item)

def scan(tokens, column, item):
    if item.dot_position < len(tokens) and \
       item.production[item.dot_position] == tokens[column]:
        new_item = EarleyItem(item.production, item.dot_position + 1, item.start_column)
        if new_item not in column:
            column.append(new_item)

def complete(chart, column, item):
    for entry in chart[item.start_column]:
        if entry.dot_position < len(entry.production) and \
           entry.production[entry.dot_position] == item.production and \
           EarleyItem(entry.production, entry.dot_position + 1, entry.start_column) not in chart[column]:
            chart[column].append(EarleyItem(entry.production, entry.dot_position + 1, entry.start_column))

def earley_parse(tokens, grammar):
    chart = {i: [] for i in range(len(tokens) + 1)}
    start_rule = list(grammar.keys())[0]
    chart[0].append(EarleyItem(start_rule, 0, 0))

    for column in range(len(tokens) + 1):
        for item in chart[column]:
            if item.dot_position < len(item.production) and \
               isinstance(item.production[item.dot_position], str):
                scan(tokens, column, item)
            elif item.dot_position < len(item.production):
                predict(grammar, chart[column], item)
            else:
                complete(chart, column, item)

    for item in chart[len(tokens)]:
        if item.production == [start_rule] and item.dot_position == 1 and item.start_column == 0:
            return False

    return True

# Example usage:
example_grammar = {
    'S': [['NP', 'VP']],
    'NP': [['Det', 'N']],
    'VP': [['V', 'NP']],
    'Det': ['the', 'a'],
    'N': ['cat', 'dog'],
    'V': ['chased', 'ate']
}

example_tokens = ['the', 'dog', 'chased', 'a', 'cat']

result = earley_parse(example_tokens, example_grammar)
print("Parsing successful:", result)

Parsing successful: True


**13.Generate a parse tree for a given sentence using a context-free grammar using python program.**

In [None]:
import nltk
from nltk import CFG

# Define a context-free grammar
grammar = CFG.fromstring("""
    S -> NP VP
    NP -> 'John'
    VP -> V NP
    V -> 'likes'
    NP -> 'pizza'
""")

# Create a parser based on the grammar
parser = nltk.ChartParser(grammar)

# Given sentence
sentence = "John likes pizza"

# Tokenize the sentence
tokens = sentence.split()

# Generate and print parse trees
for tree in parser.parse(tokens):
    tree.pretty_print()

       S            
  _____|____         
 |          VP      
 |      ____|____    
 NP    V         NP 
 |     |         |   
John likes     pizza



**14.Create a program in python to check for agreement in sentences based on a context-free grammar's rules**

In [None]:

class AgreementChecker:
    def __init__(self, grammar):
        self.grammar = grammar

    def check_subject_verb_agreement(self, subject, verb):
        rules = self.grammar.get("S-V Agreement", [])
        for rule in rules:
            if rule[0] == subject and rule[1] == verb:
                return True
        return False

grammar = {
    "S-V Agreement": [
        ["dog", "barks"],
        ["dogs", "bark"],
        ["cat", "meows"],
        ["cats", "meow"]

    ]
}

agreement_checker = AgreementChecker(grammar)

subject = "dog"
verb = "barks"
is_agree = agreement_checker.check_subject_verb_agreement(subject, verb)

if is_agree:
    print(f"The subject '{subject}' and verb '{verb}' agree.")
else:
    print(f"The subject '{subject}' and verb '{verb}' do not agree.")

The subject 'dog' and verb 'barks' agree.


**15. Implement probabilistic context-free grammar parsing for a sentence using python**

In [None]:
import nltk
grammar = nltk.PCFG.fromstring("""
    S -> NP VP [1.0]
    VP -> V NP [0.7] | VP PP [0.3]
    PP -> P NP [1.0]
    V -> "saw" [1.0]
    P -> "with" [1.0]
    NP -> N [0.4] | Det N [0.3] | NP PP [0.3]
    N -> "John" [0.4] | "Mary" [0.4] | "telescope" [0.2]
    Det -> "a" [1.0]
""")
parser = nltk.ViterbiParser(grammar)

sentence = "John saw Mary with a telescope".split()

trees = list(parser.parse(sentence))

trees.sort(key=lambda tree: -tree.prob())

for tree in trees:
    print(tree)

(S
  (NP (N John))
  (VP
    (VP (V saw) (NP (N Mary)))
    (PP (P with) (NP (Det a) (N telescope))))) (p=0.00032256)


**16.Implement a Python program using the SpaCy library to perform Named Entity Recognition (NER) on a given text?**

In [None]:
#spacy download en_core_web_sm
import spacy

nlp = spacy.load("en_core_web_sm")

text = "Apple is looking at buying U.K. startup for $1 billion"

doc = nlp(text)

for ent in doc.ents:
    print(f"Entity: {ent.text}, Type: {ent.label_}")

Entity: Apple, Type: ORG
Entity: U.K., Type: GPE
Entity: $1 billion, Type: MONEY


**17.Write program demonstrates how to access WordNet, a lexical database, to retrieve synsets and explore word meanings in python**

In [None]:
import nltk
from nltk.corpus import wordnet

# Download WordNet data
nltk.download('wordnet')

def get_synsets(word):
    synsets = wordnet.synsets(word)
    return synsets

def print_synset_info(synset):
    print(f"Synset: {synset.name()}")
    print(f"POS (Part of Speech): {synset.pos()}")
    print(f"Definition: {synset.definition()}")
    print(f"Examples: {synset.examples()}")
    print()

def explore_word_meanings(word):
    synsets = get_synsets(word)

    if not synsets:
        print(f"No synsets found for the word '{word}'.")
        return

    print(f"Synsets for the word '{word}':")
    for synset in synsets:
        print_synset_info(synset)

    # Explore hypernyms (more abstract terms)
    hypernyms = synsets[0].hypernyms()
    if hypernyms:
        print(f"Hypernyms of '{word}':")
        for hypernym in hypernyms:
            print_synset_info(hypernym)

    # Explore hyponyms (more specific terms)
    hyponyms = synsets[0].hyponyms()
    if hyponyms:
        print(f"Hyponyms of '{word}':")
        for hyponym in hyponyms:
            print_synset_info(hyponym)

if __name__ == "__main__":
    # Replace 'example' with the word you want to explore
    word_to_explore = 'example'
    explore_word_meanings(word_to_explore)


[nltk_data] Downloading package wordnet to /root/nltk_data...


Synsets for the word 'example':
Synset: example.n.01
POS (Part of Speech): n
Definition: an item of information that is typical of a class or group
Examples: ['this patient provides a typical example of the syndrome', 'there is an example on page 10']

Synset: model.n.07
POS (Part of Speech): n
Definition: a representative form or pattern
Examples: ['I profited from his example']

Synset: exemplar.n.01
POS (Part of Speech): n
Definition: something to be imitated
Examples: ['an exemplar of success', 'a model of clarity', 'he is the very model of a modern major general']

Synset: example.n.04
POS (Part of Speech): n
Examples: ['they decided to make an example of him']

Synset: case.n.01
POS (Part of Speech): n
Definition: an occurrence of something
Examples: ['it was a case of bad judgment', 'another instance occurred yesterday', 'but there is always the famous example of the Smiths']

Synset: exercise.n.04
POS (Part of Speech): n
Definition: a task performed or problem solved in order t

**18.Implement a simple FOPC parser for basic logical expressions using python program**

In [None]:
from pyparsing import Word, alphas, alphanums, Forward, infixNotation, opAssoc

identifier = Word(alphas, alphanums+"_")

and_op = "AND"
or_op = "OR"
implies_op = "IMPLIES"
not_op = "NOT"

expr = Forward()
atom = identifier | "(" + expr + ")"
term = infixNotation(atom, [
    (not_op, 1, opAssoc.RIGHT),
    (and_op, 2, opAssoc.LEFT),
    (or_op, 2, opAssoc.LEFT),
    (implies_op, 2, opAssoc.RIGHT),
])

def parse_expression(expression):
    return term.parseString(expression, parseAll=True)

logical_expression = "P AND (Q OR NOT R) IMPLIES S"
parsed_expr = parse_expression(logical_expression)
print(parsed_expr)

[[['P', 'AND', ['Q', 'OR', ['NOT', 'R']]], 'IMPLIES', 'S']]


**19.Create a program for word sense disambiguation using the Lesk algorithm using python**

In [1]:
from nltk.wsd import lesk
from nltk.tokenize import word_tokenize
import nltk
nltk.download('punkt')

# Download WordNet data
nltk.download('wordnet')

def perform_lesk_algorithm(sentence, target_word):
    sense = lesk(word_tokenize(sentence), target_word)

    if sense:
        print(f"Target word: {target_word}")
        print(f"Selected sense: {sense.name()}")
        print(f"Definition: {sense.definition()}")
        print(f"Examples: {sense.examples()}")
    else:
        print(f"No sense found for the word '{target_word}' in the given context.")

if __name__ == "__main__":
    input_sentence = "I went to the bank to deposit some money."
    target_word = "bank"

    perform_lesk_algorithm(input_sentence, target_word)


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.
[nltk_data] Downloading package wordnet to /root/nltk_data...


Target word: bank
Selected sense: savings_bank.n.02
Definition: a container (usually with a slot in the top) for keeping money at home
Examples: ['the coin bank was empty']


**20.Implement a basic information retrieval system using TF-IDF (Term Frequency-Inverse Document Frequency) for document ranking using python**

In [2]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity

documents = [
    "This is the first document.",
    "This document is the second document.",
    "And this is the third one.",
    "Is this the first document?",
]

vectorizer = TfidfVectorizer()

tfidf_matrix = vectorizer.fit_transform(documents)

query = "this is the second document"

query_vector = vectorizer.transform([query])

cosine_similarities = cosine_similarity(query_vector, tfidf_matrix).flatten()

document_scores = [(score, doc) for score, doc in zip(cosine_similarities, documents)]
sorted_documents = sorted(document_scores, key=lambda x: x[0], reverse=True)

print("Ranked documents based on TF-IDF similarity to the query:")
for i, (score, doc) in enumerate(sorted_documents, start=1):
    print(f"Rank {i}: Similarity Score: {score:.4f}, Document: '{doc}'")

Ranked documents based on TF-IDF similarity to the query:
Rank 1: Similarity Score: 0.9505, Document: 'This document is the second document.'
Rank 2: Similarity Score: 0.6042, Document: 'This is the first document.'
Rank 3: Similarity Score: 0.6042, Document: 'Is this the first document?'
Rank 4: Similarity Score: 0.2804, Document: 'And this is the third one.'


**21.Create a python program that performs syntax-driven semantic analysis by extracting noun phrases and their meanings from a sentence**

In [3]:
import spacy

# Load the English language model from spaCy
nlp = spacy.load("en_core_web_sm")
def syntax_semantic_analysis(sentence):
    # Process the input sentence using spaCy
    doc = nlp(sentence)
    # Extract noun phrases and their meanings
    noun_phrases_and_meanings = []
    for chunk in doc.noun_chunks:
        # Get the head (main word) of the noun phrase
        head_word = chunk.root.text

        # Extract the meaning (definition) of the head word
        head_meaning = get_word_meaning(head_word)

        # Store the noun phrase and its meaning
        noun_phrases_and_meanings.append({
            'noun_phrase': chunk.text,
            'meaning': head_meaning
        })

    return noun_phrases_and_meanings
def get_word_meaning(word):
    # Placeholder function to get the meaning of a word
    # In a real-world scenario, you would use a lexical database or an API
    # For simplicity, we'll return a dummy meaning here
    return f"Dummy meaning for {word}"

# Example usage
sentence = "The quick brown fox jumps over the lazy dog."
results = syntax_semantic_analysis(sentence)

# Display the results
for result in results:
    print(f"Noun Phrase: {result['noun_phrase']}, Meaning: {result['meaning']}")

Noun Phrase: The quick brown fox, Meaning: Dummy meaning for fox
Noun Phrase: the lazy dog, Meaning: Dummy meaning for dog


**22.Create a python program that performs reference resolution within a text.**

In [5]:
import nltk
nltk.download('averaged_perceptron_tagger')

def resolve_references(text):
    sentences = nltk.sent_tokenize(text)
    tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences]
    tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences]

    resolved_text = []
    pronouns = set(['he', 'him', 'his', 'she', 'her', 'it', 'they', 'them', 'their'])

    for tagged_sentence in tagged_sentences:
        resolved_sentence = []
        for word, pos in tagged_sentence:
            if word.lower() in pronouns and len(resolved_sentence) > 0:
                antecedent = resolved_sentence[-1]
                resolved_sentence.append(f'({word} -> {antecedent})')
            else:
                resolved_sentence.append(word)
        resolved_text.append(' '.join(resolved_sentence))

    return ' '.join(resolved_text)

text = "John went to the market and He bought some fruits."

resolved_text = resolve_references(text)
print(resolved_text)

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


John went to the market and (He -> and) bought some fruits .


**23 Develop a python program that evaluates the coherence of a given text**

In [6]:

import nltk

def calculate_coherence(text):
    sentences = nltk.sent_tokenize(text)
    coherence_markers = ['however', 'therefore', 'consequently', 'nevertheless', 'furthermore', 'meanwhile', 'although', 'while', 'yet', 'moreover']

    total_markers = 0
    for sentence in sentences:
        tokenized_sentence = nltk.word_tokenize(sentence.lower())
        for marker in coherence_markers:
            if marker in tokenized_sentence:
                total_markers += 1

    return total_markers

text = "The weather was terrible. However, they decided to go for a picnic. Therefore, they packed their bags and left."

coherence_score = calculate_coherence(text)
print(f"Coherence Score: {coherence_score}")

Coherence Score: 2


**24.Create a python program that recognizes dialog acts in a given dialog or conversation.**

In [7]:
import re

def recognize_dialog_acts(text):

    statements = re.compile(r'^[^?!.]*[.?!]$')
    questions = re.compile(r'.*\?$')
    greetings = re.compile(r'(hello|hi|hey).*', re.IGNORECASE)
    requests = re.compile(r'(please|can you|could you).*', re.IGNORECASE)

    utterances = re.split(r'[.?!]', text)

    dialog_acts = []
    for utterance in utterances:
        utterance = utterance.strip()
        if re.match(statements, utterance):
            dialog_acts.append((utterance, 'Statement'))
        elif re.match(questions, utterance):
            dialog_acts.append((utterance, 'Question'))
        elif re.match(greetings, utterance):
            dialog_acts.append((utterance, 'Greeting'))
        elif re.match(requests, utterance):
            dialog_acts.append((utterance, 'Request'))
        else:
            dialog_acts.append((utterance, 'Other'))

    return dialog_acts

conversation = "Hi! How are you? I'm fine, thank you. Can you pass the salt, please?"

recognized_acts = recognize_dialog_acts(conversation)
for utterance, act_type in recognized_acts:
    print(f"Utterance: '{utterance}', Dialog Act: {act_type}")

Utterance: 'Hi', Dialog Act: Greeting
Utterance: 'How are you', Dialog Act: Other
Utterance: 'I'm fine, thank you', Dialog Act: Other
Utterance: 'Can you pass the salt, please', Dialog Act: Request
Utterance: '', Dialog Act: Other


**25.Utilize the GPT-3 model to generate text based on a given prompt. Make sure to install the OpenAI GPT-3 library in python implementation**

**26. Implement a machine translation program using the Hugging Face Transformers library,  translate English text to French using python.**

In [10]:
import torch
from transformers import MarianMTModel, MarianTokenizer

def translate_to_french(english_sentence, model, tokenizer):
    input_text = f"translate English to French: {english_sentence}"
    input_ids = tokenizer.encode(input_text, return_tensors="pt", max_length=128, truncation=True)

    # Generate translation
    with torch.no_grad():
        output_ids = model.generate(input_ids)
    french_translation = tokenizer.decode(output_ids[0], skip_special_tokens=True)

    return french_translation

# Load the model and tokenizer outside the function
model_name = "Helsinki-NLP/opus-mt-en-fr"
model = MarianMTModel.from_pretrained(model_name)
tokenizer = MarianTokenizer.from_pretrained(model_name)

# Example usage
english_sentence = "The cat is on the mat."
french_translation = translate_to_french(english_sentence, model, tokenizer)
print(f"English: {english_sentence}")
print(f"French: {french_translation}")



English: The cat is on the mat.
French: traduire l'anglais en français : Le chat est sur le tapis.
