[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/dbamman/anlp25/blob/main/10.llms/Prompting%20Local%20LLMs.ipynb)

In this notebook, we'll explore few-shot learning with [Qwen3-4B](https://huggingface.co/Qwen/Qwen3-4B); this model can fit within the memory and processing constraints of a T4 GPU on Google Colab while also being openly available.

Then, we will also use quantization to fit a larger model ([Qwen3-14B]()) on the T4 GPU by converting the model weights to 4-bits instead of the full 16-bits.

Can you create a new classification task and design prompts to differentiate between the classes within it?  

In [1]:
from textwrap import dedent

In [2]:
import torch
from torch.nn import functional as F

In [3]:
from transformers import AutoModelForCausalLM, AutoTokenizer

In [4]:
# check that the GPU is available
torch.cuda.is_available()

True

## Qwen3-4B


In [5]:
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-4B", device_map="cuda", dtype="auto")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-4B")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

In [6]:
def classify_with_prompt(labels, shots, target_x, thinking=False):
    system_prompt = dedent(f"""
        You're a helpful assistant for text classification. You'll be given an input text and need to output a single choice from the following set of categories:
        {', '.join(labels)}
        Pick one of those labels and do not generate any other text.
    """).strip()

    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": shots[0]["X"]}, {"role": "assistant", "content": shots[0]["y"]},
        {"role": "user", "content": shots[1]["X"]}, {"role": "assistant", "content": shots[1]["y"]},
        {"role": "user", "content": shots[2]["X"]}, {"role": "assistant", "content": shots[2]["y"]},
        {"role": "user", "content": target_x}
    ]
    text = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True,
        enable_thinking=thinking # Switches between thinking and non-thinking modes. Default is True.
    )

    model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

    # conduct text completion
    generated = model.generate(
        **model_inputs,
        max_new_tokens=32768
    )

    # let's break this down:
    #                      | we take the element of the batch (our batch size is 1)
    #                      |  |-----------------------------| skip our original input
    output_ids = generated[0][len(model_inputs.input_ids[0]):].tolist()

    # decode into token space
    print(tokenizer.decode(output_ids, skip_special_tokens=True).strip("\n"))

In [7]:
shots = [
    {"X":"I love this movie", "y": "positive"},
    {"X":"I hate this movie", "y": "negative"},
    {"X":"I kind of like the movie", "y": "positive"}
]

target_x = "This is one of the best movies I've ever seen"

classify_with_prompt(["positive", "negative"], shots, target_x)

positive


In [8]:
shots = [
    {"X":"Vampires take over the planet during an eclipse", "y": "Horror"},
    {"X":"A court sentences George to be Jerry's butler", "y": "Comedy"},
    {"X":"John turns into a werewolf during a full moon", "y": "Horror"}
]

target_x = "John is a werewolf who plays basketball"

classify_with_prompt(["Horror", "Comedy"], shots, target_x)

Horror


In [9]:
shots = [
    {"X":"This is a text", "y": "English"},
    {"X":"Nel mezzo del cammin' di nostra vita", "y": "Italian"},
    {"X":"Je ne sais pas", "y": "French"},
]

target_x = "Siempre imaginé que el Paraíso sería algún tipo de biblioteca"

classify_with_prompt(["English", "Italian", "French", "Spanish", "Japanese"], shots, target_x)

Spanish


Construct a new classification task; try to find one that the 4B model fails for.

In [10]:
shots = [
    {"X": "Lying to protect a friend from embarrassment", "y": "Acceptable"},
    {"X": "Stealing money from a charity", "y": "Unacceptable"},
    {"X": "Telling a white lie to cheer someone up", "y": "Acceptable"}
]

target_x = "Breaking a promise to keep someone else safe"

classify_with_prompt(
    ["Acceptable", "Unacceptable"],
    shots,
    target_x
)

Acceptable


## Qwen-14B with Quantization

Now let's try a bigger model. A general rule of thumb is to multiply the model size by 4 to estimate how much GPU memory you will need for inference. For example, without quantization, a 14-billion parameter model would require roughly 56GB of memory for inference.

In [11]:
# first, delete the previous model to free up memory

del model
del tokenizer
torch.cuda.empty_cache()

In [12]:
from transformers import BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(load_in_4bit=True)

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-14B",
    device_map="cuda",
    dtype="auto",
    quantization_config=quantization_config
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-14B")

model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 8 files:   0%|          | 0/8 [00:00<?, ?it/s]

model-00003-of-00008.safetensors:   0%|          | 0.00/3.96G [00:00<?, ?B/s]

model-00006-of-00008.safetensors:   0%|          | 0.00/3.96G [00:00<?, ?B/s]

model-00001-of-00008.safetensors:   0%|          | 0.00/3.84G [00:00<?, ?B/s]

model-00002-of-00008.safetensors:   0%|          | 0.00/3.96G [00:00<?, ?B/s]

model-00007-of-00008.safetensors:   0%|          | 0.00/3.96G [00:00<?, ?B/s]

model-00008-of-00008.safetensors:   0%|          | 0.00/1.91G [00:00<?, ?B/s]

model-00005-of-00008.safetensors:   0%|          | 0.00/3.96G [00:00<?, ?B/s]

model-00004-of-00008.safetensors:   0%|          | 0.00/3.96G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

Rerun the prompting tasks from above. Are any of the outputs different?

In [14]:
# Reuse your classify_with_prompt() function from earlier
# (make sure you already ran the cell that defines it)

# 1. Sentiment classification
shots = [
    {"X": "I love this movie", "y": "positive"},
    {"X": "I hate this movie", "y": "negative"},
    {"X": "I kind of like the movie", "y": "positive"}
]
target_x = "This is one of the best movies I've ever seen"
classify_with_prompt(["positive", "negative"], shots, target_x)


# 2. Genre classification
shots = [
    {"X": "Vampires take over the planet during an eclipse", "y": "Horror"},
    {"X": "A court sentences George to be Jerry's butler", "y": "Comedy"},
    {"X": "John turns into a werewolf during a full moon", "y": "Horror"}
]
target_x = "John is a werewolf who plays basketball"
classify_with_prompt(["Horror", "Comedy"], shots, target_x)


# 3. Language detection
shots = [
    {"X": "This is a text", "y": "English"},
    {"X": "Nel mezzo del cammin' di nostra vita", "y": "Italian"},
    {"X": "Je ne sais pas", "y": "French"},
]
target_x = "Siempre imaginé que el Paraíso sería algún tipo de biblioteca"
classify_with_prompt(["English", "Italian", "French", "Spanish", "Japanese"], shots, target_x)


# 4. Moral reasoning failure case
shots = [
    {"X": "Lying to protect a friend from embarrassment", "y": "Acceptable"},
    {"X": "Stealing money from a charity", "y": "Unacceptable"},
    {"X": "Telling a white lie to cheer someone up", "y": "Acceptable"}
]
target_x = "Breaking a promise to keep someone else safe"
classify_with_prompt(["Acceptable", "Unacceptable"], shots, target_x)


positive
Comedy
Spanish
Unacceptable
