# SELFLEARNING HuggingFace: Using Pre-trained LLMs

### LLM: GPT-2

Context: https://huggingface.co/openai-community/gpt2

In [None]:
from transformers import pipeline, set_seed
generator = pipeline('text-generation', model='openai-community/gpt2')
set_seed(42)
generator("Hello, I'm a language model,", max_length=30, num_return_sequences=2)


### Microsoft Phi2


Phi-2 is a Transformer with 2.7 billion parameters. It was trained using the same data sources as Phi-1.5, augmented with a new data source that consists of various NLP synthetic texts and filtered websites (for safety and educational value). When assessed against benchmarks testing common sense, language understanding, and logical reasoning, Phi-2 showcased a nearly state-of-the-art performance among models with less than 13 billion parameters.


HF: https://huggingface.co/microsoft/phi-2

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model to GPU (cuda) if available
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

# Load tokenizer and model from Hugging Face
model_id = "microsoft/phi-2"


tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16 if device == "cuda" else torch.float32)
model.to(device)


In [None]:
# Your prompt
prompt = "Explain quantum computing in simple terms."

# Tokenize input
inputs = tokenizer(prompt, return_tensors="pt").to(device)

# Generate output
with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=100)

# Decode and print
print(tokenizer.decode(outputs[0], skip_special_tokens=True))


### DeepSeek R1-Distill-Qwen-1.5B

DeepSeek R1-Distill-Qwen-1.5B is a distilled version of the Qwen-1.5B model. Distillation is a technique where a smaller model (the student) is trained to mimic the behavior of a larger, more powerful model (the teacher). This often results in a smaller, faster model that retains much of the performance of the larger model.

**Key Features:**

*   **Distilled Model:** Benefits from the knowledge of a larger model while being more efficient.
*   **Qwen-1.5B Base:** Built upon the Qwen-1.5B architecture.
*   **Text Generation:** Capable of generating text based on provided prompts.
*   **Parameters:** 1.5 billion
*   **Input Modality:** Text
*   **Output Modality:** Text
*   **Input Context Window Size:** Not explicitly stated, but likely similar to Qwen-1.5B (around 32k tokens).
*   **Output Context Window Size:** Not explicitly stated.
*   **Training Data:** Distilled from Qwen-1.5B, which was trained on a large corpus of text and code data.
*   **Organization:** DeepSeek AI

**Use Cases:**

*   Text generation tasks where efficiency is important.
*   Applications on devices with limited computational resources.

**Hugging Face:** [https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B)

In [None]:
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

### Qwen/Qwen3-0.6B

Qwen3-0.6B is a 0.6 billion parameter language model from the Qwen family of models. These models are known for their strong performance across various benchmarks.

**Key Features:**

*   **Parameters:** 0.6 billion
*   **Input Modality:** Text
*   **Output Modality:** Text
*   **Input Context Window Size:** Not explicitly stated, but typically large for Qwen models.
*   **Output Context Window Size:** Not explicitly stated.
*   **Training Data:** Trained on a large corpus of text and code data.
*   **Organization:** Qwen

**Use Cases:**

*   Text generation tasks.
*   Applications where a smaller, efficient model is suitable.

**Hugging Face:** [https://huggingface.co/Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B)

In [None]:
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Qwen/Qwen3-0.6B")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

### TinyLlama/TinyLlama-1.1B-Chat-v1.0

TinyLlama-1.1B-Chat-v1.0 is a compact language model with 1.1 billion parameters, designed for chat-based applications. It's a smaller model, making it more efficient for deployment in resource-constrained environments.

**Key Features:**

*   **Parameters:** 1.1 billion
*   **Input Modality:** Text
*   **Output Modality:** Text
*   **Input Context Window Size:** Not explicitly stated, but optimized for conversational contexts.
*   **Output Context Window Size:** Not explicitly stated.
*   **Training Data:** Trained on a large corpus of text data with a focus on conversational examples.
*   **Organization:** TinyLlama

**Use Cases:**

*   Chatbots and conversational AI applications.
*   Text generation in interactive scenarios.
*   Deployment on devices with limited computational resources.

**Hugging Face:** [https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0)

In [None]:
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="TinyLlama/TinyLlama-1.1B-Chat-v1.0")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

## Multimodal LLMs

### Microsoft Kosmos-2.5

Kosmos-2.5 is a multimodal literate model for machine reading of text-intensive images. Pre-trained on large-scale text-intensive images, Kosmos-2.5 excels in two distinct yet cooperative transcription tasks: (1) generating spatially-aware text blocks, where each block of text is assigned its spatial coordinates within the image, and (2) producing structured text output that captures styles and structures into the markdown format.

In [None]:
!pip install --upgrade transformers

In [None]:
import re
import torch
import requests
from PIL import Image, ImageDraw

repo = "microsoft/kosmos-2.5-chat"
device = "cuda:0"
dtype = torch.bfloat16

# sample image
url = "https://huggingface.co/microsoft/kosmos-2.5/resolve/main/receipt_00008.png"

image = Image.open(requests.get(url, stream=True).raw)

image


In [None]:
# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="microsoft/kosmos-2.5-chat")

In [None]:
question = "What is the sub total of the receipt?"
template = " USER: {} ASSISTANT:"
prompt = template.format(question)


generated_text = pipe(images=image, text=prompt, max_new_tokens=1024)
generated_text[0]["generated_text"]

## Thank You