<a href="https://colab.research.google.com/github/Rana-Shukor/vector1/blob/main/Copy_of_Transformers.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
!pip install transformers --upgrade






## **Exploring GPT-2 Architecture and Parameters**

### **Introduction**

In this exercise, we will explore GPT-2, a state-of-the-art language model known for its ability to generate coherent and creative text. We will examine the model's architecture, count its parameters, and investigate how changing the temperature parameter affects text generation.

### **Objectives**
1. Load the GPT-2 model.
2. Count and display the total number of parameters.
3. Generate text with different temperatures to observe the effect on creativity and randomness.






### **Step 1: Load the GPT-2 Model**

We will start by loading the smallest version of GPT-2. This version is efficient and easy to work with, making it ideal for exploring the model's functionality.


In [None]:


from transformers import GPT2LMHeadModel, GPT2Tokenizer

# Load the pre-trained GPT-2 model and tokenizer
model = GPT2LMHeadModel.from_pretrained("gpt2")
tokenizer = GPT2Tokenizer.from_pretrained("gpt2")

print("GPT-2 model loaded successfully!")


### **Step 2: Count the Number of Parameters**

Let’s find out how many parameters the GPT-2 model contains. This information will help us understand the model's complexity and capacity.




In [None]:

# Count the total number of parameters
total_params = sum(p.numel() for p in model.parameters())

# Print the result
print(f"Total number of parameters in GPT-2: {total_params:,}")




Total number of parameters in GPT-2: 124,439,808


### **Step 3: Test Text Generation with Different Temperatures**

The `temperature` parameter controls the randomness of the generated text. A low temperature makes the output more deterministic, while a high temperature introduces more randomness. We’ll test different temperature values to see how they affect the text generation.



In [None]:


# Function to generate text with temperature and attention mask
def generate_text_with_attention_mask(temperature):
    # Input text for the model to generate from
    input_text = "my hole life was "

    # Tokenize the input text
    input_ids = tokenizer.encode(input_text, return_tensors="pt")

    # Create attention mask (1s for real tokens, 0s for padding tokens)
    attention_mask = torch.ones(input_ids.shape, dtype=torch.long)

    # Generate text with the specified temperature
    output = model.generate(
        input_ids,
        max_length=50,
        temperature=temperature,
        do_sample=True,
        attention_mask=attention_mask,
        pad_token_id=tokenizer.eos_token_id
    )

    # Decode and return the generated text
    return tokenizer.decode(output[0], skip_special_tokens=True)

# Test different temperature values
temperatures = [0.1, 0.3, 0.6, 1.0]

for temp in temperatures:
    print(f"\nTemperature: {temp}")
    print(generate_text_with_attention_mask(temp))




Temperature: 0.1
my hole life was  a little bit more fun than I thought it would be. I was able to get a little more out of my life and I'm glad I did. I'm glad I did. I'm glad I did. I

Temperature: 0.3
my hole life was  in the sky, and I was a little bit scared. I was just a little bit scared.
I was a little bit scared. I was a little bit scared. I was a little bit scared.
I

Temperature: 0.6
my hole life was  very good! I am still very happy with my life as I am now in my 30's. I love eating and I want to do more. I am looking forward to the next month and I am looking forward to

Temperature: 1.0
my hole life was _______ on the moon, that it was not so bad until he left the land, and that he returned to earth a week after that on November 12, 1723, when his body was discovered, there was an accident of




### Explanation:
- **`attention_mask`**: An attention mask is created with all ones because GPT-2 does not use padding tokens in its architecture. However, it's still good practice to include it.
- **`pad_token_id`**: Set to `tokenizer.eos_token_id` to handle open-end generation properly, as GPT-2 uses EOS tokens for indicating the end of sequences.

 This code snippet ensures that you handle the attention mask and padding token ID correctly when generating text with GPT-2. This approach helps avoid unexpected behavior and improves the reliability of the text generation results.





### **Exploring Hugging Face Pipelines**

In this exercise, you will work with two Hugging Face pipelines to perform different NLP tasks. You will also have the opportunity to explore additional pipelines on your own.

#### **1. Sentiment Analysis Pipeline**

The sentiment analysis pipeline allows you to determine the sentiment (positive, negative, or neutral) of a given piece of text.



**Task 1:**
- Test the sentiment analysis pipeline with different texts, such as product reviews or social media comments.
- Analyze how the sentiment of various texts is classified.
# New Section

In [2]:
# Install the transformers library if you haven't already
!pip install transformers

# Import necessary libraries
from transformers import pipeline

# Create the sentiment analysis pipeline
sentiment_pipeline = pipeline("sentiment-analysis")

# List of texts to analyze
texts = [
    "I absolutely love this product! It works great and exceeded my expectations.",
    "This is the worst purchase I've ever made. Totally regret it.",
    "The service was okay, nothing special, but it wasn't terrible either.",
    "I feel neutral about this item; it's just average.",
    "Fantastic experience! Would highly recommend to everyone.",
    "I hate this! It's a complete waste of money."
]

# Analyze sentiment for each text and print results
for text in texts:
    result = sentiment_pipeline(text)
    print(f"Text: {text}\nSentiment: {result[0]['label']}, Score: {result[0]['score']:.4f}\n")





No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Text: I absolutely love this product! It works great and exceeded my expectations.
Sentiment: POSITIVE, Score: 0.9999

Text: This is the worst purchase I've ever made. Totally regret it.
Sentiment: NEGATIVE, Score: 0.9998

Text: The service was okay, nothing special, but it wasn't terrible either.
Sentiment: NEGATIVE, Score: 0.9191

Text: I feel neutral about this item; it's just average.
Sentiment: NEGATIVE, Score: 0.9900

Text: Fantastic experience! Would highly recommend to everyone.
Sentiment: POSITIVE, Score: 0.9999

Text: I hate this! It's a complete waste of money.
Sentiment: NEGATIVE, Score: 0.9998



**Task 2:**

#### **Image-to-Text (Captioning) Pipeline**

The image-to-text pipeline generates captions for images. For this exercise, you will need to find and use an image-to-text pipeline from Hugging Face.

**Instructions:**

1. **Search for an Image-to-Text Pipeline:**
   - Go to the Hugging Face Model Hub.
   - Search for an image-to-text model, such as `Salesforce/blip-image-captioning-base` or similar.

2. **Set Up the Pipeline:**
   - Use the model you found to create an image-to-text pipeline.

3. **Generate Captions:**
   - Test the pipeline by providing images and observing the generated captions.

In [5]:
# Install necessary libraries
!pip install transformers
!pip install torch torchvision
!pip install Pillow  # For image processing

# Import necessary libraries
import requests
from PIL import Image
from transformers import BlipProcessor, BlipForConditionalGeneration

# Load the processor and model from Hugging Face
processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")

# Function to generate a caption for an image
def generate_caption(image_path):
    # Open the image
    image = Image.open(image_path)

    # Preprocess the image
    inputs = processor(image, return_tensors="pt")

    # Generate caption
    output = model.generate(**inputs)
    caption = processor.decode(output[0], skip_special_tokens=True)

    return caption

#Import the drive module from google.colab
from google.colab import drive
drive.mount('/content/drive')

# Update image directory path to point to your images in Google Drive
image_directory = "/content/drive/MyDrive/images" # Update 'images' to the actual folder name
# Example usage
image_paths = [
    "/content/drive/MyDrive/images/imag1.jpg",  # Replace with the actual path to your images
    "/content/drive/MyDrive/images/image2.jpg",
]

# Generate captions for each image
for img_path in image_paths:
    caption = generate_caption(img_path)
    print(f"Image: {img_path}\nCaption: {caption}\n")






Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).




Image: /content/drive/MyDrive/images/imag1.jpg
Caption: a cat with a green background

Image: /content/drive/MyDrive/images/image2.jpg
Caption: a red rose with green leaves on a white background

