**Module 10 Large Language Models**

**Exercise-1**

**Title:**Generating Text Using GPT-2 Language Model

**Problem Statement:**
The goal is to demonstrate how to use the GPT-2 language model to generate text based on an input prompt.

**Steps to Follow:**

**1.	Import Libraries:** Import necessary libraries including GPT2LMHeadModel and GPT2Tokenizer from the Transformers library.

**2.	Load Pre-trained Model and Tokenizer:** Initialize the GPT-2 tokenizer and model using the specified pre-trained model (gpt2 in this case).

**3.	Set Seed (Optional):** Set a seed for reproducibility if needed. This ensures that the results are consistent across different runs.

**4.	Define Input Prompt:** Specify the input text prompt that will be used to generate text.

**5.	Tokenize Input Text:** Use the tokenizer to convert the input text into token IDs (input_ids), which the model can understand.

**6.	Generate Text:** Utilize the pre-trained GPT-2 model to generate text based on the input_ids. Parameters like max_length, num_return_sequences, and temperature control the length and randomness of the generated text.

**7.	Decode Generated Output:** Decode the generated token IDs back into human-readable text, skipping any special tokens like padding or end-of-sequence markers.

**8.	Print Generated Text:** Display the generated text to the user.


In [None]:
# Importing necessary libraries
from transformers import GPT2LMHeadModel, GPT2Tokenizer

# Load pre-trained GPT-2 model and tokenizer
model_name = "gpt2"  # Specify the pre-trained model name
tokenizer = GPT2Tokenizer.from_pretrained(model_name)  # Initialize tokenizer
model = GPT2LMHeadModel.from_pretrained(model_name)  # Initialize model

# Set seed for reproducibility (optional)
import torch
torch.manual_seed(42)

# Input prompt
input_text = "Once upon a time"

# Tokenize input text
input_ids = tokenizer.encode(input_text, return_tensors='pt')  # Convert input text to token IDs

# Generate text based on input prompt
output = model.generate(input_ids, max_length=50, num_return_sequences=1, temperature=0.7)
# Generate text using the model. Parameters:
#   - input_ids: Token IDs of the input text
#   - max_length: Maximum length of the generated text
#   - num_return_sequences: Number of sequences to generate
#   - temperature: Controls the randomness of the generation. Higher values result in more random output.

# Decode generated output
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
# Decode the generated token IDs to text, removing any special tokens like padding or EOS.

# Print generated text
print("Generated Text:")
print(generated_text)


tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Generated Text:
Once upon a time, the world was a place of great beauty and great danger. The world was a place of great danger, and the world was a place of great danger. The world was a place of great danger, and the world was a


**Explanation:**

**1.	Import Libraries:** Import necessary modules from the Transformers library to work with GPT-2.

**2.	Load Pre-trained Model and Tokenizer:** Initialize the tokenizer and model with the gpt2 pre-trained model.

**3.	Set Seed:** Set a seed using torch.manual_seed(42) for reproducibility of results.

**4.	Input Prompt:** Define an initial text prompt ("Once upon a time" in this case).

**5.	Tokenization:** Use the tokenizer to convert the input text into token IDs (input_ids).

**6.	Text Generation:** Use the GPT-2 model to generate text based on input_ids, controlling generation parameters like max_length and temperature.

**7.	Decoding:** Convert the generated token IDs back into readable text, skipping special tokens.

**8.	Output:** Print the generated text to the console.


**Exercise-2**

**Title:** Generating Text Styles and Translating Languages Using Pre-trained Models

**Problem Statement:**
Demonstrate how to use pre-trained language models for generating text in different styles (creative output) and performing language translation without relying on external APIs.

**Steps to Follow:**

1.	Import Necessary Libraries:

    a.	Import transformers for using pre-trained models.

    b.	Import AutoTokenizer and AutoModelForSeq2SeqLM from transformers to access model architectures and tokenizers.

2.	Define Functions for Text Generation and Translation:

    a.	generate_text_styling(prompt_text, model_name, max_length=50, temperature=0.9):

            i.	Load the model and tokenizer using AutoModelForCausalLM and AutoTokenizer.
            ii.	Tokenize the input text.
            iii.	Generate text based on the input prompt using the model with specified parameters.
            iv.	Decode the generated token IDs to text.

    b.	translate_text(input_text, target_language, model_name, max_length=100):
            i.	Load the translation model and tokenizer using AutoModelForSeq2SeqLM and AutoTokenizer.
            ii.	Tokenize the input text in the source language.
            iii.	Generate translated text into the target language using the model with specified parameters.
            iv.	Decode the generated token IDs to text.
3.	Example Usage:

    a.	generate_text_styling:
            i.	Provide a prompt text.
            ii.	Specify the model name, max_length, and temperature for generating creative output.
    b.	translate_text:
            i.	Provide input text in the source language.
            ii.	Specify the target language, model name, and max_length for translation.




In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM, AutoModelForSeq2SeqLM

def generate_text_styling(prompt_text, model_name="gpt2", max_length=50, temperature=0.9):
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForCausalLM.from_pretrained(model_name)

    input_ids = tokenizer.encode(prompt_text, return_tensors='pt')
    output = model.generate(input_ids, max_length=max_length, temperature=temperature, pad_token_id=tokenizer.eos_token_id)

    generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
    return generated_text

def translate_text(input_text, target_language, model_name="t5-small", max_length=100):
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

    input_ids = tokenizer(input_text, return_tensors="pt").input_ids
    translated_output = model.generate(input_ids, max_length=max_length, num_beams=4, early_stopping=True)

    translated_text = tokenizer.decode(translated_output[0], skip_special_tokens=True)
    return translated_text

# Example of generating creative output
prompt_text = "Once upon a time"
creative_output = generate_text_styling(prompt_text)
print("Creative Output:")
print(creative_output)

# Example of translating text
input_text_english = "The cat sat on the mat."
translated_text = translate_text(input_text_english, target_language="fr")
print("\nTranslation Output (French):")
print(translated_text)


Creative Output:
Once upon a time, the world was a place of great beauty and great danger. The world was a place of great danger, and the world was a place of great danger. The world was a place of great danger, and the world was a

Translation Output (French):
Le chat sat on the mat.


**Explanation:**

1.	Importing Libraries: Import necessary libraries from transformers for loading and using pre-trained models (AutoTokenizer, AutoModelForCausalLM, AutoModelForSeq2SeqLM).

2.	Functions:

      a.	generate_text_styling: Loads a GPT-2 model specified by model_name, generates text based on prompt_text with max_length and temperature.

      b.	translate_text: Loads a T5 model specified by model_name, translates input_text from English to target_language (specified), with max_length for translation.

3.	Example Usage: Demonstrates how to use these functions for generating creative output and translating text without relying on an external API key.


This approach uses Hugging Face's transformers library to access and utilize pre-trained models directly within your Python environment, ensuring flexibility and ease of use for various NLP tasks.
