<a href="https://colab.research.google.com/github/divyyeahhhhh/DSA0311--Natural-Language-Processing/blob/main/DSA0311(NLP).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**1.	Write program demonstrates how to use regular expressions in Python to match and search for patterns in text.**

In [1]:
import re

def find_emails(text):
    # Define a simple regular expression for matching email addresses
    email_pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'

    # Use re.findall to find all matches in the text
    matches = re.findall(email_pattern, text)

    return matches

# Example text containing email addresses
sample_text = "Contact us at info@example.com or support@company.com for assistance."

# Find and print all email addresses in the text
email_addresses = find_emails(sample_text)
print("Email Addresses Found:")
print(email_addresses)

Email Addresses Found:
['info@example.com', 'support@company.com']


**2.	Implement a basic finite state automaton that recognizes a specific language or pattern. In this example, we'll create a simple automaton to match strings ending with 'ab' using python.**

In [2]:
def is_match(input_string):
    # Define the finite state automaton transitions
    transitions = {
        0: {'a': 1, 'b': 0},
        1: {'a': 1, 'b': 2},
        2: {'a': 1, 'b': 0}
    }
    current_state = 0
    # Process each character in the input string
    for char in input_string:
        if char in transitions[current_state]:
            current_state = transitions[current_state][char]
        else:
            # If there is no transition for the current character, reset to the initial state
            current_state = 0
    # Check if the final state is reached
    return current_state == 2
# Test the automaton with various strings
test_strings = ["ab", "aab", "aaaab", "abc", "xyzab", "abab", "ba"]
for test_string in test_strings:
    if is_match(test_string):
        print(f"'{test_string}' matches the pattern.")
    else:
        print(f"'{test_string}' does not match the pattern.")

'ab' matches the pattern.
'aab' matches the pattern.
'aaaab' matches the pattern.
'abc' does not match the pattern.
'xyzab' matches the pattern.
'abab' matches the pattern.
'ba' does not match the pattern.


**3.	Write program demonstrates how to perform morphological analysis using the NLTK library in Python.**

In [5]:
import nltk
from nltk.stem import PorterStemmer

nltk.download('punkt')  # Download the punkt tokenizer if not already downloaded

def perform_morphological_analysis(text):
    # Tokenize the input text into words
    words = nltk.word_tokenize(text)

    # Create a Porter stemmer object
    porter_stemmer = PorterStemmer()

    # Perform stemming on each word
    stemmed_words = [porter_stemmer.stem(word) for word in words]

    return stemmed_words

if __name__ == "__main__":
    # Example text for morphological analysis
    input_text = "The quick brown foxes are jumping over the lazy dogs"

    # Perform morphological analysis (stemming)
    result = perform_morphological_analysis(input_text)

    # Display the original and stemmed words
    print("Original words:", nltk.word_tokenize(input_text))
    print("Stemmed words:", result)


[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


Original words: ['The', 'quick', 'brown', 'foxes', 'are', 'jumping', 'over', 'the', 'lazy', 'dogs']
Stemmed words: ['the', 'quick', 'brown', 'fox', 'are', 'jump', 'over', 'the', 'lazi', 'dog']


**4.	Implement a finite-state machine for morphological parsing. In this example, we'll create a simple machine to generate plural forms of English nouns using python.**

In [6]:
import nltk
nltk.download('averaged_perceptron_tagger')
from nltk import pos_tag, word_tokenize
def identify_nouns(sentence):
    words = word_tokenize(sentence)
    tagged_words = pos_tag(words)
    print(tagged_words)

    nouns = [word for word, pos in tagged_words if pos.startswith('NN')]

    return nouns

sentence = "The quick brown fox jumps over the lazy dog."

nouns = identify_nouns(sentence)

if nouns:
    print("Nouns identified in the sentence:")
    for noun in nouns:
        if noun[-1].lower() in {'s', 'x', 'z'} or noun[-2:].lower() in {'ch', 'sh'}:
            print(noun+"es")
        else:
            print(noun+"s")
else:
    print("No nouns found in the sentence.")

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.


[('The', 'DT'), ('quick', 'JJ'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'VBZ'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'), ('dog', 'NN'), ('.', '.')]
Nouns identified in the sentence:
browns
foxes
dogs


**5.	Use the Porter Stemmer algorithm to perform word stemming on a list of words using python libraries.**

In [7]:
from nltk.stem import PorterStemmer
def perform_stemming(words):
    # Initialize the Porter Stemmer
    porter_stemmer = PorterStemmer()
    # Perform stemming for each word
    stemmed_words = [porter_stemmer.stem(word) for word in words]
    return stemmed_words

words_to_stem = ["running", "jumps", "happily", "dogs", "cats", "better"]

stemmed_words = perform_stemming(words_to_stem)

print("Original Words:", words_to_stem)
print("Stemmed Words:", stemmed_words)

Original Words: ['running', 'jumps', 'happily', 'dogs', 'cats', 'better']
Stemmed Words: ['run', 'jump', 'happili', 'dog', 'cat', 'better']


**6.	Implement a basic N-gram model for text generation. For example, generate text using a bigram model using python.**

In [8]:
import random

def build_bigram_model(sentences):
    bigram_model = {}

    for sentence in sentences:
        tokens = sentence.split()
        for i in range(len(tokens) - 1):
            current_word = tokens[i]
            next_word = tokens[i + 1]

            if current_word in bigram_model:

                bigram_model[current_word].append(next_word)
            else:
                bigram_model[current_word] = [next_word]

    return bigram_model

def generate_text(bigram_model, start_word, length=10):
    generated_text = [start_word]

    for _ in range(length - 1):
        if start_word in bigram_model:
            next_word = random.choice(bigram_model[start_word])
            generated_text.append(next_word)
            start_word = next_word
        else:
            break

    return ' '.join(generated_text)

# Example list of sentences
sentences = [
    "I love programming in Python.",
    "Python is a versatile programming language.",
    "Text generation using bigram models is interesting.",
    "Natural Language Processing involves analyzing and generating text."
]

# Build bigram model
bigram_model = build_bigram_model(sentences)
print(bigram_model)

# Generate text using bigram model
generated_text = generate_text(bigram_model, start_word="I", length=8)

# Display the results
print("Generated Text:", generated_text)

{'I': ['love'], 'love': ['programming'], 'programming': ['in', 'language.'], 'in': ['Python.'], 'Python': ['is'], 'is': ['a', 'interesting.'], 'a': ['versatile'], 'versatile': ['programming'], 'Text': ['generation'], 'generation': ['using'], 'using': ['bigram'], 'bigram': ['models'], 'models': ['is'], 'Natural': ['Language'], 'Language': ['Processing'], 'Processing': ['involves'], 'involves': ['analyzing'], 'analyzing': ['and'], 'and': ['generating'], 'generating': ['text.']}
Generated Text: I love programming in Python.


**7.	Write program using the NLTK library to perform part-of-speech tagging on a text.**

In [9]:
import nltk
from nltk import pos_tag, word_tokenize
def perform_pos_tagging(text):
    # Tokenize the text into words
    words = word_tokenize(text)
    # Perform part-of-speech tagging
    tagged_words = pos_tag(words)
    return tagged_words
# Example text
text = "NLTK is a powerful library for natural language processing."
# Perform part-of-speech tagging
tagged_words = perform_pos_tagging(text)

print("Original Text:", text)
print("Part-of-Speech Tagging Result:", tagged_words)

Original Text: NLTK is a powerful library for natural language processing.
Part-of-Speech Tagging Result: [('NLTK', 'NNP'), ('is', 'VBZ'), ('a', 'DT'), ('powerful', 'JJ'), ('library', 'NN'), ('for', 'IN'), ('natural', 'JJ'), ('language', 'NN'), ('processing', 'NN'), ('.', '.')]


**8.	Implement a simple stochastic part-of-speech tagging algorithm using a basic probabilistic model to assign POS tags using python.**

In [10]:
import random

def train_unigram_model(tagged_corpus):
    unigram_model = {}

    for sentence in tagged_corpus:
        for word, pos_tag in sentence:
            if word in unigram_model:
                unigram_model[word].append(pos_tag)
            else:
                unigram_model[word] = [pos_tag]

    return unigram_model

def stochastic_pos_tagging(sentence, unigram_model):
    tagged_sentence = []

    for word in sentence:
        if word in unigram_model:
            pos_tag = random.choice(unigram_model[word])
        else:
            # If word not in model, assign a default POS tag (e.g., 'NOUN')
            pos_tag = 'NOUN'

        tagged_sentence.append((word, pos_tag))

    return tagged_sentence

# Example tagged corpus for training
tagged_corpus = [
    [('The', 'DET'), ('quick', 'ADJ'), ('brown', 'ADJ'), ('fox', 'NOUN')],
    [('Jumped', 'VERB'), ('over', 'PREP'), ('the', 'DET'), ('lazy', 'ADJ'), ('dog', 'NOUN')]
]

# Train unigram model
unigram_model = train_unigram_model(tagged_corpus)

# Example sentence for stochastic POS tagging
sentence_to_tag = ['The', 'lazy', 'fox', 'jumped']

# Perform stochastic POS tagging
tagged_sentence = stochastic_pos_tagging(sentence_to_tag, unigram_model)

# Display the results
print("Original Sentence:", sentence_to_tag)
print("Stochastic POS Tagging Result:", tagged_sentence)

Original Sentence: ['The', 'lazy', 'fox', 'jumped']
Stochastic POS Tagging Result: [('The', 'DET'), ('lazy', 'ADJ'), ('fox', 'NOUN'), ('jumped', 'NOUN')]


**9.	Implement a rule-based part-of-speech tagging system using regular expressions using python.**

In [11]:
import re
def rule_based_pos_tagging(sentence):
    tagged_sentence = []
    for word in sentence:
        if re.match(r'\b(?:is|am|are|was|were)\b', word, re.IGNORECASE):
            pos_tag = 'VERB'
        elif re.match(r'\b(?:the|a|an)\b', word, re.IGNORECASE):
            pos_tag = 'DET'
        elif re.match(r'\b(?:quick|brown|lazy)\b', word, re.IGNORECASE):
            pos_tag = 'ADJ'
        else:
            pos_tag = 'NOUN'

        tagged_sentence.append((word, pos_tag))
    return tagged_sentence
# Example sentence for rule-based POS tagging
sentence_to_tag = ['The', 'quick', 'brown', 'fox', 'is', 'lazy']
# Perform rule-based POS tagging
tagged_sentence = rule_based_pos_tagging(sentence_to_tag)
# Display the results
print("Original Sentence:", sentence_to_tag)
print("Rule-based POS Tagging Result:", tagged_sentence)

Original Sentence: ['The', 'quick', 'brown', 'fox', 'is', 'lazy']
Rule-based POS Tagging Result: [('The', 'DET'), ('quick', 'ADJ'), ('brown', 'ADJ'), ('fox', 'NOUN'), ('is', 'VERB'), ('lazy', 'ADJ')]
