Program 4:
Use word embeddings to improve prompts for Generative AI model. Retrieve similar words using word embeddings. Use the similar words to enrich a GenAI prompt. Use the AI model to generate responses for the original and enriched prompts. Compare the outputs in terms of detail and relevance.

In [1]:
# Install required libraries
! pip install gensim transformers nltk matplotlib

# Import libraries
import gensim.downloader as api
from transformers import pipeline
import nltk
import string
from nltk.tokenize import word_tokenize

# Download required NLTK resources
nltk.download('punkt')  # Correct resource for tokenization
nltk.download('punkt_tab')

# Load pre-trained word vectors
print("Loading pre-trained word vectors...")
word_vectors = api.load("glove-wiki-gigaword-100")  # You can change to another model if desired

# Function to replace a keyword in the prompt with its most similar word
def replace_keyword_in_prompt(prompt, keyword, word_vectors, topn=1):
    words = word_tokenize(prompt)
    enriched_words = []
    for word in words:
        cleaned_word = word.lower().strip(string.punctuation)
        if cleaned_word == keyword.lower():
            try:
                similar_words = word_vectors.most_similar(cleaned_word, topn=topn)
                if similar_words:
                    replacement_word = similar_words[0][0]
                    print(f"Replacing '{word}' → '{replacement_word}'")
                    enriched_words.append(replacement_word)
                    continue  # Skip appending original
            except KeyError:
                print(f"'{keyword}' not found in the vocabulary. Using original word.")
        enriched_words.append(word)
    enriched_prompt = " ".join(enriched_words)
    print(f"\n🔹 Enriched Prompt: {enriched_prompt}")
    return enriched_prompt

# Load GPT-2 model for text generation
print("\nLoading GPT-2 model...")
generator = pipeline("text-generation", model="gpt2")

# Function to generate response using GPT-2
def generate_response(prompt, max_length=100):
    try:
        response = generator(prompt, max_length=max_length, num_return_sequences=1)
        return response[0]['generated_text']
    except Exception as e:
        print(f"Error generating response: {e}")
        return None

# Example original prompt
original_prompt = "Who is king."
print(f"\n🔹 Original Prompt: {original_prompt}")

# Define the keyword to be enriched
key_term = "king"

# Enrich the prompt
enriched_prompt = replace_keyword_in_prompt(original_prompt, key_term, word_vectors)

# Generate responses
print("\nGenerating response for the original prompt...")
original_response = generate_response(original_prompt)
print("\nOriginal Prompt Response:")
print(original_response)

print("\nGenerating response for the enriched prompt...")
enriched_response = generate_response(enriched_prompt)
print("\nEnriched Prompt Response:")
print(enriched_response)

# Comparison
print("\n📊 Comparison of Responses:")
print("Original Prompt Response Length:", len(original_response))
print("Enriched Prompt Response Length:", len(enriched_response))
print("Original Prompt Sentences:", original_response.count('.'))
print("Enriched Prompt Sentences:", enriched_response.count('.'))



  from .autonotebook import tqdm as notebook_tqdm
[nltk_data] Downloading package punkt to
[nltk_data]     C:\Users\bened\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package punkt_tab to
[nltk_data]     C:\Users\bened\AppData\Roaming\nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


Loading pre-trained word vectors...

Loading GPT-2 model...



Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Device set to use cpu
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_le


🔹 Original Prompt: Who is king.
Replacing 'king' → 'prince'

🔹 Enriched Prompt: Who is prince .

Generating response for the original prompt...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Both `max_new_tokens` (=256) and `max_length`(=100) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)



Original Prompt Response:
Who is king.

1 The king is the king.

2 The king is the king.

3 The king is the king.

4 The king is the king.

5 The king is the king.

6 The king is the king.

7 The king is the king.

8 The king is the king.

9 The king is the king.

10 The king is the king.

11 The king is the king.

12 The king is the king.

13 The king is the king.

14 The king is the king.

15 The king is the king.

16 The king is the king.

17 The king is the king.

18 The king is the king.

19 The king is the king.

20 The king is the king.

21 The king is the king.

22 The king is the king.

23 The king is the king.

24 The king is the king.

25 The king is the king.

26 The king is the king.

27 The king is the king.

28 The king is the king.

29 The

Generating response for the enriched prompt...

Enriched Prompt Response:
Who is prince ...?" he asked. "My lord," said Lord Robert. And that was all the prince, for he had been told that all the lords had been told that Robert was 