<a href="https://colab.research.google.com/github/anshupandey/MA_AI900/blob/main/Lab1_HuggingFace_Text_Generation_with_LLMs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Experimenting with Open Models

## HuggingFace: Using Pre-trained LLMs

### LLM: GPT-2

Context: https://huggingface.co/openai-community/gpt2

In [None]:
from transformers import pipeline, set_seed
generator = pipeline('text-generation', model='openai-community/gpt2')
generator("Hello, I'm a language model,", max_length=30)


### Microsoft Phi2


Phi-2 is a Transformer with 2.7 billion parameters. It was trained using the same data sources as Phi-1.5, augmented with a new data source that consists of various NLP synthetic texts and filtered websites (for safety and educational value). When assessed against benchmarks testing common sense, language understanding, and logical reasoning, Phi-2 showcased a nearly state-of-the-art performance among models with less than 13 billion parameters.


HF: https://huggingface.co/microsoft/phi-2

**Key Features:**

*   **Parameters:** 2.7 billion
*   **Architecture:** Transformer
*   **Training Data:** Same sources as Phi-1.5, plus new NLP synthetic texts and filtered websites.
*   **Performance:** Near state-of-the-art among models under 13 billion parameters on common sense, language understanding, and logical reasoning benchmarks.
*   **Input Modality:** Text
*   **Output Modality:** Text
*   **Input Context Window Size:** Not explicitly stated.
*   **Output Context Window Size:** Not explicitly stated.
*   **Organization:** Microsoft

**Use Cases:**

*   Text generation tasks.
*   Applications requiring strong common sense, language understanding, and logical reasoning.

In [None]:
# Use a pipeline as a high-level helper
from transformers import pipeline
import torch

device = "cuda" if torch.cuda.is_available() else "cpu"
pipe = pipeline("text-generation", model="microsoft/phi-2", device=device)

In [None]:
pipe("Explain quantum computing in simple terms.,", max_length=30, num_return_sequences=1)

### Qwen/Qwen3-0.6B

Qwen3-0.6B is a 0.6 billion parameter language model from the Qwen family of models. These models are known for their strong performance across various benchmarks.

**Key Features:**

*   **Parameters:** 0.6 billion
*   **Input Modality:** Text
*   **Output Modality:** Text
*   **Input Context Window Size:** Not explicitly stated, but typically large for Qwen models.
*   **Output Context Window Size:** Not explicitly stated.
*   **Training Data:** Trained on a large corpus of text and code data.
*   **Organization:** Qwen

**Use Cases:**

*   Text generation tasks.
*   Applications where a smaller, efficient model is suitable.

**Hugging Face:** [https://huggingface.co/Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B)

In [None]:
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Qwen/Qwen3-0.6B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

## Multimodal LLMs

### Microsoft Kosmos-2.5

Kosmos-2.5 is a multimodal literate model for machine reading of text-intensive images. Pre-trained on large-scale text-intensive images, Kosmos-2.5 excels in two distinct yet cooperative transcription tasks: (1) generating spatially-aware text blocks, where each block of text is assigned its spatial coordinates within the image, and (2) producing structured text output that captures styles and structures into the markdown format.

In [None]:
!pip install --upgrade transformers

In [None]:
import re
import torch
import requests
from PIL import Image, ImageDraw

repo = "microsoft/kosmos-2.5-chat"
device = "cuda:0"
dtype = torch.bfloat16

# sample image
url = "https://huggingface.co/microsoft/kosmos-2.5/resolve/main/receipt_00008.png"

image = Image.open(requests.get(url, stream=True).raw)

image


In [None]:
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="microsoft/kosmos-2.5-chat")

In [None]:
question = "What is the sub total of the receipt?"
template = " USER: {} ASSISTANT:"
prompt = template.format(question)


generated_text = pipe(images=image, text=prompt, max_new_tokens=1024)
generated_text[0]["generated_text"]

## Thank You