In [2]:
#Helper function to call ChatCompletion API and get the model response.Here we are going to use "gpt-3.5-turbo" model and we are setting temperature=0
!pip install huggingface_hub
from huggingface_hub import InferenceClient

# choose a chat model that allows hosted inference (some require accepting a license)
_HF_MODEL = "HuggingFaceH4/zephyr-7b-beta"
_client = InferenceClient(model=_HF_MODEL, timeout=120)

def get_response(messages, model=None, temperature=0):
    # Build a simple chat prompt (works well for Instruct models)
    user_text = "\n".join([m["content"] for m in messages if m["role"]=="user"])
    prompt = f"<|system|>You are a helpful assistant.<|end|>\n<|user|>{user_text}<|end|>\n<|assistant|>"

    out = _client.text_generation(
        prompt,
        max_new_tokens=256,
        temperature=temperature,
        top_p=0.9 if temperature>0 else 1.0,
        return_full_text=False,
        stream=False,
    )
    return out.strip()



In [3]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To log in, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) n
Token is valid (permission: fineGrained).
The token `HW3` has been saved to /root/.cache/huggingface/stored_tokens
Your token has been saved to /root/.cache/huggingface/token
Login successful.
The current active token is: `HW3`


**Zero-Shot**

In [4]:
!pip install "torch>=2.2" "transformers>=4.44" accelerate sentencepiece
# (Optional, GPU only) pip install bitsandbytes



In [5]:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# load once, reuse across calls
_MODEL_NAME = "microsoft/Phi-3.5-mini-instruct"  # change to any chat model above
_tokenizer = AutoTokenizer.from_pretrained(_MODEL_NAME)
_model = AutoModelForCausalLM.from_pretrained(
    _MODEL_NAME,
    torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.float32,
    device_map="auto",  # uses GPU if available, else CPU
)

def get_response(messages, model=None, temperature=0):  # signature compatible with your code
    # Use the model's chat template to build the prompt
    prompt = _tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )
    inputs = _tokenizer(prompt, return_tensors="pt").to(_model.device)
    with torch.no_grad():
        outputs = _model.generate(
            **inputs,
            max_new_tokens=256,
            do_sample=(temperature > 0),
            temperature=max(1e-6, temperature),
            top_p=0.9 if temperature > 0 else 1.0,
            eos_token_id=_tokenizer.eos_token_id,
        )
    text = _tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
    return text.strip()



The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json: 0.00B [00:00, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

added_tokens.json:   0%|          | 0.00/306 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

config.json: 0.00B [00:00, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors.index.json: 0.00B [00:00, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/2.67G [00:00<?, ?B/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/4.97G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/195 [00:00<?, ?B/s]

In [6]:
# Assumes you already defined: get_response(messages)

def prompts_zero_shot(task_type):
    if task_type == "math":
        return "A shop sells bagels at $1.25 each or 5 for $5.00. You need 13 bagels. What is the minimum total cost?"
    if task_type == "logic":
        return 'If all linns are drubs and no drubs are norks, can any linns be norks? Answer “Yes” or “No” with a one-line reason.'
    if task_type == "text":
        return 'Summarize in one sentence: "The board approved a phased rollout, starting with a limited beta to collect feedback before wider release next quarter."'

def prompts_few_shot(task_type):
    if task_type == "math":
        return (
            "Q: Apples cost $0.80 each or 4 for $2.60. Need 10. Cheapest cost?\n"
            "A: Two 4-packs = $5.20, plus 2 singles = $1.60 → $6.80.\n\n"
            "Q: Pencils cost $1.50 each or 3 for $3.60. Need 7. Cheapest cost?\n"
            "A: Two 3-packs = $7.20, plus 1 single = $1.50 → $8.70.\n\n"
            "Q: Bagels cost $1.25 each or 5 for $5.00. Need 13. Cheapest cost?\n"
            "A:"
        )
    if task_type == "logic":
        return (
            "Q: If some A are B and all B are C, must some A be C?\n"
            "A: Yes, because the A that are B are included in C.\n\n"
            "Q: If all J are K and some K are not L, must some J be not L?\n"
            "A: Not necessarily; the non-L K may not include J.\n\n"
            "Q: If all linns are drubs and no drubs are norks, can any linns be norks?\n"
            "A:"
        )
    if task_type == "text":
        return (
            "Rewrite to be concise and neutral.\n\n"
            "Ex1 Input: “We absolutely crushed every metric this month!!!”\n"
            "Ex1 Output: “We exceeded all key metrics this month.”\n\n"
            "Ex2 Input: “I think the schedule might be kind of risky tbh.”\n"
            "Ex2 Output: “The schedule carries risk.”\n\n"
            "Input: “The board approved a phased rollout, starting with a limited beta to collect feedback before wider release next quarter.”\n"
            "Output:"
        )

def prompts_cot(task_type):
    if task_type == "math":
        return "A shop sells bagels at $1.25 each or 5 for $5.00. You need 13 bagels. Show your reasoning step by step before giving the final cost."
    if task_type == "logic":
        return "All linns are drubs. No drubs are norks. Can any linns be norks? Think through step by step and then answer."
    if task_type == "text":
        return 'Break this sentence into claims, simplify each, then combine into one sentence: "The board approved a phased rollout, starting with a limited beta to collect feedback before wider release next quarter."'

def prompts_zero_shot_cot(task_type):
    if task_type == "math":
        return "A shop sells bagels at $1.25 each or 5 for $5.00. You need 13 bagels. Let's think step by step."
    if task_type == "logic":
        return "All linns are drubs and no drubs are norks. Let's reason step by step: can any linns be norks?"
    if task_type == "text":
        return 'Summarize this in one sentence. Let’s think step by step: "The board approved a phased rollout, starting with a limited beta to collect feedback before wider release next quarter."'

def prompts_meta(task_type):
    if task_type == "math":
        return ("You are a careful math tutor. Solve the problem, double-check calculations, "
                "and state any possible mistakes before the final answer. "
                "Problem: Bagels cost $1.25 each or 5 for $5.00. You need 13 bagels.")
    if task_type == "logic":
        return ("You are a logic TA. State the reasoning rules you’ll use, check for fallacies, then answer: "
                "All linns are drubs. No drubs are norks. Can any linns be norks?")
    if task_type == "text":
        return ('You are an editor. Produce a one-sentence summary, then self-critique for clarity, neutrality, and completeness. '
                'Input: "The board approved a phased rollout, starting with a limited beta to collect feedback before wider release next quarter."')

def prompts_tot(task_type):
    if task_type == "math":
        return ("Solve by exploring multiple solution paths. "
                "Path A: all singles. Path B: 1 bundle + singles. Path C: 2 bundles + singles. Path D: 3 bundles. "
                "Compute each cost for bagels ($1.25 each or 5 for $5.00; need 13). Then choose the best.")
    if task_type == "logic":
        return ("Explore multiple thought paths. Scenario 1: linns could be norks. Scenario 2: linns cannot be norks. "
                "Explain each, then decide which is logically correct: All linns are drubs, no drubs are norks.")
    if task_type == "text":
        return ('Generate three candidate summaries (bullet, headline, formal). '
                'Evaluate each for brevity and accuracy, then select the best. '
                'Text: "The board approved a phased rollout, starting with a limited beta to collect feedback before wider release next quarter."')

TECHNIQUES = {
    "Zero-Shot": prompts_zero_shot,
    "Few-Shot": prompts_few_shot,
    "Chain-of-Thought": prompts_cot,
    "Zero-Shot CoT": prompts_zero_shot_cot,
    "Meta-Prompting": prompts_meta,
    "Tree of Thoughts": prompts_tot,
}

TASKS = ["math", "logic", "text"]

def run_all():
    results = {}
    for tech_name, builder in TECHNIQUES.items():
        results[tech_name] = {}
        for task in TASKS:
            prompt = builder(task)
            msg = [{"role": "user", "content": prompt}]
            try:
                out = get_response(msg)
            except Exception as e:
                out = f"[ERROR] {e}"
            results[tech_name][task] = {"prompt": prompt, "output": out}
    return results

def print_results(results):
    for tech in TECHNIQUES.keys():
        print("="*90)
        print(f"{tech}".upper())
        print("="*90)
        for task in TASKS:
            item = results[tech][task]
            print(f"\n--- {task.upper()} ---")
            print("[PROMPT]")
            print(item["prompt"])
            print("\n[OUTPUT]")
            print(item["output"])
        print("\n")

# Run and display
res = run_all()
print_results(res)


The following generation flags are not valid and may be ignored: ['temperature']. Set `TRANSFORMERS_VERBOSITY=info` for more details.


ZERO-SHOT

--- MATH ---
[PROMPT]
A shop sells bagels at $1.25 each or 5 for $5.00. You need 13 bagels. What is the minimum total cost?

[OUTPUT]
To find the minimum total cost for 13 bagels, we should look at the two pricing options and determine which one gives us the lowest cost for 13 bagels.

Option 1: Buying individual bagels at $1.25 each
Cost for 13 bagels = 13 * $1.25 = $16.25

Option 2: Buying bagels in sets of 5 for $5.00
Since we need 13 bagels, we can buy two sets of 5 bagels and then 3 individual bagels.
Cost for two sets of 5 bagels = 2 * $5.00 = $10.00
Cost for 3 individual bagels = 3 * $1.25 = $3.75
Total cost for two sets and three individual bagels = $10.00 + $3.75 = $13.75

Comparing the two options, buying two sets of 5 bagels and 3 individual bagels is cheaper.

Minimum total cost = $13.75

Therefore,

--- LOGIC ---
[PROMPT]
If all linns are drubs and no drubs are norks, can any linns be norks? Answer “Yes” or “No” with a one-line reason.

[OUTPUT]
No, because if a

### Key Findings :-
**Zero-Shot**
- What: direct question in the absence of examples.
- Findings: functions satisfactorily on easy jobs (bagel math, simple logic, simple summary).
- Limitation: shallow at times -math answer was right, just explained with brief statement; logic made use of transitive property and did not elaborate; summaries are true but simple.

**Few-Shot**
- What: present a couple of Q/A examples prior to the new question.
- Results: model copies the style of the models.
In the case of math: it was structured as an example and provided the step-by-step breakdown.
- On logic: a verbose result, repeating the same conclusion multiple times (model was also over-mirrored to the examples).
- On text: created a brief, impersonal style that was in line with training samples.
- Takeaway: few-shot can be heavy-handed in format and tone, and may also add unnecessarily to the response.

**Chain-of-Thought (CoT)**
- What: make sure to explicitly request the model to think step by step.
- Findings:
 - Math: the steps of the model were spelled out so that the reasoning was transparent.
 - Logic: presented premises - combination of them - conclusion.
 - Text: broken down sentence into statements ⇒ reassembled.
- Takeaway: CoT enhances accuracy and clarity of reasoning particularly in multi-step logic/math.

**Zero-Shot CoT**
- What: one-line prompts such as Think step by step (no examples).
- Findings: generated a chain of reasoning comparable to a CoT, but truncated.
- Takeaway: an over-the-counter means to invoke step-by-step reasoning with no additional examples.

**Meta-Prompting**
- What: inform the model about the way the model should act (e.g. You are a math tutor. Double-check calculations.”).
- Findings:
 - Math: more careful, checked work, identified possible errors.
 - Logic: appealed to logic rules (identity, non-contradiction, etc.) and then answered.
 - Text: provided output and self criticism/revision.
 - Lesson: meta-prompts promote self-monitoring and error reduction.

**Tree of Thoughts (ToT)**
- What: brainstorm on a variety of solutions, select.
- Findings:
 - Math: compared various purchasing strategies explicitly, and then selected best.
 - Logic: interrogated between scenario 1 and scenario 2 before making the choice.
 - Text: generated various candidate summaries and assessed them and selected the best.
- Takeaway: ToT compares alternatives in the model, which usually yields more reliable results, although at a longer cost.