<a href="https://colab.research.google.com/github/bhavikamaini20/We-module3/blob/main/tCompletion.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import random
from collections import defaultdict

def generate_text(word_dict, start_word, output_length, chain_length=10):
    """
    Generates text recursively using a Markov chain model, with a maximum chain length.

    Args:
        word_dict: A dictionary representing word transitions.
        start_word: The word to start the generated text.
        output_length: The desired length of the generated text.
        chain_length: The maximum length of the word chain to consider (default 10).

    Returns:
        A string containing the generated text.
    """

    if output_length <= 0:
        return ""

    if chain_length <= 0:
        next_word = random.choice(list(word_dict[start_word].keys()))
        return f"{start_word} {generate_text(word_dict, next_word, output_length - 1, chain_length)}"

    next_word = select_next_word(word_dict, start_word)
    if chain_length == 1:
        return f"{start_word}"
    else:
        return f"{start_word} {generate_text(word_dict, next_word, output_length - 1, chain_length - 1)}"

def select_next_word(word_dict, current_word):
    """Selects the next word based on probabilities or a random choice."""

    if current_word not in word_dict:
        return random.choice(list(word_dict.keys()))

    word_choices = list(word_dict[current_word].keys())
    word_probs = [word_dict[current_word][word] / sum(word_dict[current_word].values()) for word in word_choices]
    return random.choices(word_choices, word_probs)[0]

# Sample text corpus (replace with your own corpus or file reading logic)
text_corpus = "This is a sample text corpus. It contains various words and phrases that can be used to generate new text. The model will learn the probabilities of word transitions and use them to create a sequence of words that resembles the original text."

# Clean the corpus text
corpus_text = text_corpus.lower().replace(",", "").replace(".", "").replace("!", "").replace("?", "")
words = corpus_text.split()

# Create a dictionary to store word transitions
word_dict = defaultdict(lambda: defaultdict(int))
for i in range(len(words) - 1):
    current_word, next_word = words[i], words[i + 1]
    word_dict[current_word][next_word] += 1

# Example usage
text = generate_text(word_dict, start_word="the", output_length=10)
print(text)


the model will learn the original text the probabilities of


In [2]:
# Test case 1: Basic Functionality
start_word = "the"
output_length = 5
chain_length = 5
text = generate_text(word_dict, start_word=start_word, output_length=output_length, chain_length=chain_length)

# Verify the output starts with "the" and is 5 words long
if text.startswith(start_word) and len(text.split()) == output_length:
    print("Test case 1 passed: Basic functionality works!")
else:
    print("Test case 1 failed: Output doesn't meet expectations.")


Test case 1 passed: Basic functionality works!


In [3]:
# Test case 2: Handling Unknown Start Word
start_word = "zebra"
output_length = 7
chain_length = 1
text = generate_text(word_dict, start_word=start_word, output_length=output_length, chain_length=chain_length)
print(text)

# No specific verification needed for this test case
print("Test case 2 passed: Code should generate text starting with 'zebra' even if it's not in the corpus.")


zebra
Test case 2 passed: Code should generate text starting with 'zebra' even if it's not in the corpus.


In [5]:
# Test case 3: Edge Case - Empty Corpus
empty_word_dict = {}  # Create an empty dictionary to simulate an empty corpus

# Handle the empty corpus case gracefully (e.g., return an error or default text)
try:
    text = generate_text(empty_word_dict, start_word="any", output_length=3)
    print(text)
except (IndexError, KeyError) as e:  # Catch potential errors related to empty dictionary access
    print(f"Test case 3 failed: Code did not handle empty corpus gracefully. Error: {e}")
else:
    if text == "":
        print("Test case 3 passed: Code handles empty corpus gracefully.")
    else:
        print("Test case 3 failed: Code did not handle empty corpus gracefully. Unexpected output.")



Test case 3 failed: Code did not handle empty corpus gracefully. Error: list index out of range


In [6]:
# Test case 4: Zero Output Length
try:
    text = generate_text(word_dict, start_word="is", output_length=0)
    if text == "":
        print("Test case 4 passed: Code handled zero output length gracefully. Output is empty.")
    else:
        print("Test case 4 failed: Code did not handle zero output length gracefully. Unexpected output.")
except ValueError as e:
    print(f"Test case 4 failed: Code raised an unexpected error for zero output length. Error: {e}")


Test case 4 passed: Code handled zero output length gracefully. Output is empty.


In [7]:
# Test case 5: Very Long Output Length

# Adjust this value based on your system resources and desired test length
very_long_length = 1000

try:
  text = generate_text(word_dict, start_word="a", output_length=very_long_length)
  # Print only the first few words to avoid overwhelming the console/output
  print(f"Output (first few words): {text[:20]}...")
except (MemoryError, RecursionError) as e:
  print(f"Test case 5 passed: Code should handle very long output lengths gracefully (error: {e}).")


Output (first few words): a sequence of word t...


In [8]:
# Test case 6: Non-string starting word (should raise TypeError)
try:
    text = generate_text(word_dict, start_word=10, output_length=5)  # Pass an integer as starting word
    print("Test case 6 failed: Code did not raise an error for non-string starting word.")
except TypeError as e:
    print(f"Test case 6 passed: Code raised an error for non-string starting word (error: {e}).")
except Exception as e:
    print(f"Test case 6 failed: Code raised an unexpected error ({type(e).__name__}): {e}")


Test case 6 failed: Code did not raise an error for non-string starting word.


In [9]:
# Test case 7: Non-numeric output length (should raise TypeError)
try:
    text = generate_text(word_dict, start_word="once", output_length="ten")  # Pass a string as output length
    print("Test case 7 failed: Code did not raise an error for non-numeric output length.")
except TypeError as e:
    print(f"Test case 7 passed: Code raised an error for non-numeric output length (error: {e}).")
except Exception as e:
    print(f"Test case 7 failed: Code raised an unexpected error ({type(e).__name__}): {e}")


Test case 7 passed: Code raised an error for non-numeric output length (error: '<=' not supported between instances of 'str' and 'int').
