<a href="https://colab.research.google.com/github/SnehaParamagond/ML-Activity/blob/main/genai4.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Install required libraries
# Install gensim for downloading pre-trained models
!pip install gensim



In [2]:
# Install Hugging Face Transformers for NLP pipelines
!pip install transformers




In [3]:
# Install NLTK for text preprocessing and tokenization
!pip install nltk



In [4]:
# Import libraries
import gensim.downloader as api
from transformers import pipeline
import nltk
import string
from nltk.tokenize import word_tokenize

In [5]:
# Download the 'punkt_tab' resource from NLTK
nltk.download('punkt_tab')

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


True

In [6]:
# Load pre-trained word vectors
print("Loading pre-trained word vectors...")
word_vectors = api.load("glove-wiki-gigaword-100") # Load Word2Vec model

Loading pre-trained word vectors...


In [9]:
# Function to replace words in the prompt with their most similar words
def replace_keyword_in_prompt(prompt, keyword, word_vectors, topn=1):
    words = word_tokenize(prompt) # Tokenize the prompt into words
    enriched_words = []
    for word in words:
        cleaned_word = word.lower().strip(string.punctuation) # Normalize word

        if cleaned_word == keyword.lower(): # Replace only if it matches the keyword
            try:
                # Retrieve similar word
                similar_words = word_vectors.most_similar(cleaned_word, topn=topn)
                if similar_words:
                    replacement_word = similar_words[0][0] # Choose the most similar word
                    print(f"Replacing '{word}' → '{replacement_word}'")
                    enriched_words.append(replacement_word)
                    continue # Skip appending the original word
            except KeyError:
                print(f"'{keyword}' not found in the vocabulary. Using original word.")
                enriched_words.append(word) # Keep original if no replacement was made
    enriched_prompt = " ".join(enriched_words)
    print(f"\n🔹 Enriched Prompt: {enriched_prompt}")
    return enriched_prompt

In [10]:
# Load an open-source Generative AI model (GPT-2)
print("\nLoading GPT-2 model...")
generator = pipeline("text-generation", model="gpt2")


Loading GPT-2 model...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cpu


In [12]:
# Function to generate responses using the Generative AI model
def generate_response(prompt, max_length=100):
 try:
  response = generator(prompt, max_length=max_length, num_return_sequences=1)
  return response[0]['generated_text']
 except Exception as e:
  print(f"Error generating response: {e}")
  return None

In [13]:
# Example original prompt
original_prompt = "Who is king."
print(f"\n🔹 Original Prompt: {original_prompt}")


🔹 Original Prompt: Who is king.


In [15]:
# Retrieve similar words for key terms in the prompt
key_term = "king"

In [16]:
# Enrich the original prompt
enriched_prompt = replace_keyword_in_prompt(original_prompt,key_term,
word_vectors)

Replacing 'king' → 'prince'

🔹 Enriched Prompt: prince


In [18]:
# Generate responses for the original and enriched prompts
print("\nGenerating response for the original prompt...")
original_response = generate_response(original_prompt)
print("\nOriginal Prompt Response:")
print(original_response)
print("\nGenerating response for the enriched prompt...")
enriched_response = generate_response(enriched_prompt)
print("\nEnriched Prompt Response:")
print(enriched_response)

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



Generating response for the original prompt...


Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.



Original Prompt Response:
Who is king. When we enter a palace or apartment, look closely. There are dozens waiting for us, some wearing very simple shoes. People are looking for a chance to speak to people inside. The police are trying to make sure that the only way of escaping is to run by them."

"We want to ensure that all visitors have everything they need... for the next time someone comes and shows it." "It is all that we have to do. Do not fall for it and

Generating response for the enriched prompt...

Enriched Prompt Response:
prince) to a party that disagrees with the party's policy on the question of the national anthem, he will continue to hold a closed-door briefing meeting with congressional leaders this weekend.

[How Trump and Pelosi split on anthem issue]

Sen. Mark Kirk, R-Ill., who is a prominent Trump critic, said Trump is likely to call Obama a "piece of garbage" during the meeting.

"You know, he's been saying for a long time that


In [19]:
# Compare the outputs
print("\nComparison of Responses:")
print("\nOriginal Prompt Response Length:", len(original_response))
print("Enriched Prompt Response Length:", len(enriched_response))
print("\nOriginal Prompt Response Detail:", original_response.count("."))
print("Enriched Prompt Response Detail:", enriched_response.count("."))


Comparison of Responses:

Original Prompt Response Length: 440
Enriched Prompt Response Length: 424

Original Prompt Response Detail: 10
Enriched Prompt Response Detail: 4
