<a href="https://colab.research.google.com/github/Kanimozhi-Perumal/Kanimozhi-Perumal/blob/main/task3_py.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
To implement a simple text generation algorithm using Markov chains, we'll create a statistical model that predicts the next character or word based on the current sequence of characters or words in the input text. Markov chains are probabilistic models that assume the probability of each possible state depends only on the previous state. Here’s how you can proceed:

Steps to Implement Markov Chain Text Generation
Step 1: Choose a Corpus
A corpus is a large body of text from which the Markov chain will learn and generate new text. For simplicity, let's use a small example text.

plaintext

Example Text:
"Dark Background with Neon Lights. PRODIGY INFOTECH. Text Generation with Markov Chains."
Step 2: Preprocess the Text
Clean the text by removing unnecessary characters and converting it into tokens (characters or words). In this example, let's tokenize based on words.

python

# Example text
text = "Dark Background with Neon Lights. PRODIGY INFOTECH. Text Generation with Markov Chains."

# Tokenize based on words (split by spaces)
tokens = text.split()

# Check tokens
print(tokens)
# Output: ['Dark', 'Background', 'with', 'Neon', 'Lights.', 'PRODIGY', 'INFOTECH.', 'Text', 'Generation', 'with', 'Markov', 'Chains.']
Step 3: Build the Markov Chain Model
Create a function that builds a Markov chain model from the tokens. This function will count the occurrences of each token following a given sequence of tokens (order) and store these probabilities.

python

from collections import defaultdict
import random

def build_markov_chain(tokens, order=1):
    # Initialize a dictionary to hold the Markov chain
    markov_chain = defaultdict(list)

    # Iterate through the tokens to build the chain
    for i in range(len(tokens) - order):
        current_state = tuple(tokens[i:i + order])  # Current state (tuple of tokens)
        next_state = tokens[i + order]              # Next state (single token)
        markov_chain[current_state].append(next_state)

    return markov_chain

# Build Markov chain model with order 1 (using words)
markov_model = build_markov_chain(tokens, order=1)

# Example output of the Markov model
# defaultdict(<class 'list'>, {('Dark',): ['Background'], ('Background',): ['with'], ('with',): ['Neon'], ...})
Step 4: Generate Text
Create a function to generate new text based on the Markov chain model. Start with a random seed (initial state) and use the probabilities stored in the model to generate subsequent tokens.

python

def generate_text(markov_chain, max_length=50):
    # Start with a random initial state
    current_state = random.choice(list(markov_chain.keys()))
    generated_text = list(current_state)

    # Generate text
    while len(generated_text) < max_length:
        next_state_options = markov_chain[current_state]
        next_state = random.choice(next_state_options)
        generated_text.append(next_state)
        current_state = tuple(generated_text[-len(current_state):])

    return ' '.join(generated_text)

# Generate text using the Markov model
generated_text = generate_text(markov_model)
print(generated_text)
Example Output
Running the generate_text function might produce outputs like:

arduino

"Neon Lights. PRODIGY INFOTECH. Text Generation with Neon Lights."
Summary
This simple implementation demonstrates how to build and use a Markov chain model for text generation based on a given corpus. You can adjust the order parameter to change the length of the sequence used to predict the next token (word). For character-level text generation, modify the tokenization and model building steps accordingly.

Markov chains provide a basic but effective approach to generating text that exhibits statistical patterns similar to the training data. For more sophisticated text generation tasks, consider exploring deep learning models like LSTM or transformer-based models like GPT-2/GPT-3, which can capture more complex dependencies in the text data.



