# 🌟 **Internship Task 3  at Prodigy Infotec: Generative AI Intern**

## 📝 **Text Generation with Markov Chains**

### 🎯 **Task 3 Overview:**
In this project, you will implement a simple text generation algorithm using **Markov chains**. The goal is to create a statistical model that predicts the next character or word in a sequence based on the previous one(s). This technique can be used for generating text that mimics the style of a given input text.

Markov chains are a type of probabilistic model where the future state depends only on the current state, not on the sequence of events that preceded it. In text generation, this means predicting the next word or character based on the previous one.

---

### 🛠️ **How You’ll Achieve This:**

1. **Understand Markov Chains:**
   - The core idea behind Markov chains is that the probability of a given event (e.g., a character or word) depends only on the state of the previous event. This property makes Markov chains ideal for modeling sequential data, like text.

2. **Text Preprocessing**:
   - You'll begin by preparing your input text. This involves:
     - **Cleaning** the text (removing unwanted characters, formatting, etc.).
     - **Tokenization**: Breaking the text into words or characters, depending on whether you're working with word-based or character-based models.
     - **Creating n-grams**: For Markov chains, you will create "n-grams" (pairs, triplets, etc.) that help define the relationship between words or characters.

3. **Build the Markov Chain Model**:
   - You'll create a model where:
     - **States** are words or characters.
     - **Transitions** are the probabilities of moving from one word/character to another.
   - This can be done by iterating through the text and counting the occurrences of word/character pairs or n-grams.

4. **Train the Model**:
   - You'll create a transition matrix that captures the probabilities of each word or character following another.
   - The transition matrix is essentially a dictionary where keys are words/characters, and the values are dictionaries of possible next words/characters with their associated probabilities.

5. **Generate Text**:
   - Once the model is trained, you can use it to generate new text. Starting from an initial word or character, the model predicts the next word/character based on the learned probabilities.
   - This process can be repeated iteratively to generate a sequence of text.

6. **Text Generation Process**:
   - You can implement different strategies for generating text, such as:
     - **Greedy approach**: Always choose the most probable next word/character.
     - **Random sampling**: Randomly sample the next word/character according to its probability distribution.

7. **Evaluate the Output**:
   - After generating the text, you will evaluate it for coherence and style, comparing it with the original input text.
   - Depending on the model's performance, you may need to adjust the size of n-grams or refine your text generation strategies.

---

### 🚀 **Outcome:**

By the end of this project, you will have:
- Implemented a **Markov chain** model for text generation.
- Gained hands-on experience in **statistical modeling** and text processing.
- Created an application capable of generating text that mimics the style of a provided input.

This task will provide you with a strong foundation in **probabilistic modeling** and the ability to work with sequential data


## 🔄 Generating Text with Markov Chains  

### Steps:  
1. 📜 **Prepare the Corpus**  
   - Defines a block of text as the dataset for generating new text.  

2. 🔢 **Understanding Markov Chains**  
   - Uses probability-based transitions between words to generate coherent sequences.  

3. 🎲 **Randomization**  
   - The system will randomly select and transition between words based on their frequency and order in the corpus.  

4. ✅ **Ready for Text Generation**  
   - The prepared corpus will be processed to create a Markov model for generating new text.  


In [7]:
import random

# Step 1: Prepare the corpus (you can change this text)
text = """
Markov chains are mathematical systems that undergo transitions from one state to another.
They are used in various fields including physics, economics, and computer science.
Markov chains are named after Andrey Markov, a Russian mathematician who developed the theory.
They are useful for modeling random systems that follow a certain probability distribution.
"""

## ✂️ Tokenizing Text for Markov Chains  

### Steps:  
1. 📜 **Receive the Input Text**  
   - Uses the predefined corpus containing multiple sentences.  

2. 🔠 **Split into Words**  
   - Breaks the text into individual words to create a sequence.  

3. 🔗 **Prepare for Markov Model**  
   - These words will be used to build transitions between states for text generation.  

4. ✅ **Ready for Processing**  
   - The tokenized words can now be used to form a probability-based Markov chain.  


In [8]:
# Step 2: Tokenize the text (split into words)
words = text.split()


## 🔗 Building a Markov Chain Model  

### Steps:  
1. 📌 **Initialize an Empty Dictionary**  
   - Creates a dictionary where each word maps to its possible next words.  

2. 🔄 **Loop Through the Tokenized Words**  
   - Iterates through the word list, taking each word and its next word as key-value pairs.  

3. 📊 **Update the Markov Chain Dictionary**  
   - If the word is new, it is added with its next word as a value.  
   - If it already exists, the next word is appended to its list of possible successors.  

4. ✅ **Ready for Text Generation**  
   - The Markov chain model is now structured to generate text based on probability distributions.  


In [9]:
# Step 3: Create a Markov Chain Model
# Create a dictionary where each word is linked to the next possible word(s)
markov_chain = {}

for i in range(len(words) - 1):
    word = words[i]
    next_word = words[i + 1]
    
    # If word is already a key, append the next word to the list
    if word not in markov_chain:
        markov_chain[word] = [next_word]
    else:
        markov_chain[word].append(next_word)

## ✨ Generating Text with Markov Chains  

### Steps:  
1. 🚀 **Define a Function for Text Generation**  
   - Takes a starting word and a desired length for the generated text.  

2. 🔄 **Iterate to Form a Sentence**  
   - Repeatedly selects the next word based on the Markov Chain model.  

3. 🎲 **Random Selection of Next Word**  
   - Chooses a word from the list of possible next words using probability-based transitions.  

4. ⛔ **Handle Missing Words**  
   - Stops if a word has no further connections in the chain.  

5. 📝 **Return the Generated Text**  
   - Joins the words into a complete sentence for output.  


In [10]:
# Step 4: Function to generate text based on the Markov Chain model
def generate_text(start_word, length=50):
    current_word = start_word
    generated_text = [current_word]
    
    for _ in range(length - 1):
        if current_word not in markov_chain:
            break
        
        next_word = random.choice(markov_chain[current_word])  # Randomly select the next word
        generated_text.append(next_word)
        current_word = next_word
    
    return ' '.join(generated_text)

## 📝 Generating and Printing Markov Chain Text  

### Steps:  
1. 🎯 **Select a Random Start Word**  
   - Picks a word from the tokenized text as the starting point for generation.  

2. 🔄 **Generate Text**  
   - Calls the `generate_text` function to create a sequence of 100 words based on the Markov Chain model.  

3. 🖨️ **Print the Output**  
   - Displays the generated text for review and analysis.  

4. ✅ **Result**  
   - A probabilistic sequence of words mimicking the original corpus is produced.  


In [11]:
start_word = random.choice(words)  # Choose a random starting word
generated_text = generate_text(start_word, length=100)

# Print the generated text
print("Generated Text:")
print(generated_text)


Generated Text:
random systems that undergo transitions from one state to another. They are mathematical systems that follow a certain probability distribution.


## 📝 Generating and Printing Markov Chain Text  

In [12]:
start_word = random.choice(words)  # Choose a random starting word
generated_text = generate_text(start_word, length=150)

# Print the generated text
print("Generated Text:")
print(generated_text)


Generated Text:
systems that follow a Russian mathematician who developed the theory. They are named after Andrey Markov, a certain probability distribution.


## 📝 Generating and Printing Markov Chain Text  

In [13]:
start_word = random.choice(words)  # Choose a random starting word
generated_text = generate_text(start_word, length=50)

# Print the generated text
print("Generated Text:")
print(generated_text)


Generated Text:
state to another. They are mathematical systems that follow a certain probability distribution.
