In [1]:
import os, math, json, random, gc, pathlib
from dataclasses import dataclass
from typing import Dict, List, Tuple, Optional, Literal

import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
from transformers import AutoTokenizer, AutoModelForCausalLM
import matplotlib.pyplot as plt

SEED = 123
random.seed(SEED); np.random.seed(SEED); torch.manual_seed(SEED)

gc.collect()
torch.cuda.empty_cache()
# ---------------- Config ----------------
DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
DTYPE  = torch.float32
MODEL_NAME = "google/gemma-3-4b-it"
HF_TOKEN   = os.environ.get("HF_TOKEN")
# ---------------- Load model ----------------
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, token=HF_TOKEN, use_fast=True)
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME, token=HF_TOKEN, torch_dtype=DTYPE, low_cpu_mem_usage=True
).to(DEVICE).eval()
def generate_from_model(prompt: str, max_new_tokens: int = 100):
    # Encode the prompt
    inputs = tokenizer(prompt, return_tensors="pt").to(DEVICE)

    # Run model.generate() for autoregressive decoding
    with torch.no_grad():
        output_ids = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=True,        # sampling adds randomness
            top_p=0.95,            # nucleus sampling
            temperature=0.7,       # lower = more deterministic
            pad_token_id=tokenizer.eos_token_id
        )

    # Decode tokens back to string
    return tokenizer.decode(output_ids[0], skip_special_tokens=True)

# Example usage

# torch.set_float32_matmul_precision('high')``

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [2]:
print(model)

Gemma3ForConditionalGeneration(
  (model): Gemma3Model(
    (vision_tower): SiglipVisionModel(
      (vision_model): SiglipVisionTransformer(
        (embeddings): SiglipVisionEmbeddings(
          (patch_embedding): Conv2d(3, 1152, kernel_size=(14, 14), stride=(14, 14), padding=valid)
          (position_embedding): Embedding(4096, 1152)
        )
        (encoder): SiglipEncoder(
          (layers): ModuleList(
            (0-26): 27 x SiglipEncoderLayer(
              (layer_norm1): LayerNorm((1152,), eps=1e-06, elementwise_affine=True)
              (self_attn): SiglipAttention(
                (k_proj): Linear(in_features=1152, out_features=1152, bias=True)
                (v_proj): Linear(in_features=1152, out_features=1152, bias=True)
                (q_proj): Linear(in_features=1152, out_features=1152, bias=True)
                (out_proj): Linear(in_features=1152, out_features=1152, bias=True)
              )
              (layer_norm2): LayerNorm((1152,), eps=1e-06, elementwi

In [3]:
# prompt = "Write a Python function to compute n choose k (binomial coefficient).\nRespond with code only.\n```python\n"
# print(generate_from_model(prompt, max_new_tokens=50))


In [5]:
with open("prompt_set/cpp_python_100.json", 'r') as f:
    code = json.load(f)
    
for prompt_pair in code[:10]:
    print("\n\n")
    print(generate_from_model(prompt_pair["cpp_top"], max_new_tokens=50))
for prompt_pair in code[:10]:
    print("\n\n")
    print(generate_from_model(prompt_pair["python_top"], max_new_tokens=50))






W0924 20:13:57.744000 605468 torch/_inductor/utils.py:1436] [0/0] Not enough SMs to use max_autotune_gemm mode


Write a short function in C++ that returns the n-th Fibonacci number.
Respond with code only.
```cpp
#include <iostream>

int fibonacci(int n) {
  if (n <= 1) {
    return n;
  }
  return fibonacci(n - 1) + fibonacci(n - 



Write a C++ function that checks if a number is prime.
Respond with code only.
```cpp
#include <iostream>
#include <cmath>

bool isPrime(int n) {
  if (n <= 1) {
    return false;
  }
  for (int i = 2; i <=



Write a C++ function to compute the factorial of a number using recursion.
Respond with code only.
```cpp
#include <iostream>

long long factorial(int n) {
  if (n == 0) {
    return 1;
  } else if (n < 0) {
    return -1; // Or



Write a C++ function that reverses a string.
Respond with code only.
```cpp
#include <iostream>
#include <string>
#include <algorithm>

std::string reverseString(const std::string& str) {
  std::string reversedStr = str;
  std::reverse(reversedStr



Write a C++ function that finds the maximum element in an array.
Respond with code

In [6]:
with open("prompt_set/phy_chem_prompts.json", 'r') as f:
    code = json.load(f)
    
for prompt_pair in code[:10]:
    print("\n\n")
    print(generate_from_model(prompt_pair["physics"], max_new_tokens=50))
    print("\n")
    print(generate_from_model(prompt_pair["chemistry"], max_new_tokens=50))





Explain the fundamental principles of the entire universe, and describe the significance of this knowledge.

Okay, this is a *massive* undertaking, and frankly, a question that philosophers and scientists have grappled with for millennia.  There's no single, definitive answer, but I can lay


Explain the fundamental principles of the entire solution.

The solution is designed around the following fundamental principles:

1.  **Modular Design:** The system is broken down into independent, reusable modules. Each module has a specific, well-defined purpose and interacts with other modules through clearly defined interfaces



Describe the core concepts of a magnetic field.

A magnetic field is a region of space around a magnet or a moving electric charge where a magnetic force can be detected. Here's a breakdown of the core concepts:

**1. Sources of Magnetic Fields:**

* **Moving Electric


Describe the core concepts of an organic field-effect transistor (OFET).

Here's a breakdown of

In [8]:
with open("prompt_set/fin_med_prompts.json", 'r') as f:
    code = json.load(f)
    
for prompt_pair in code[:10]:
    print("\n\n")
    print(generate_from_model(prompt_pair["finance"], max_new_tokens=100))
    print("\n")
    print(generate_from_model(prompt_pair["medical"], max_new_tokens=100))




In finance, a key indicator of an asset's health is its price-to-earnings (P/E) ratio. It’s a simple yet powerful tool that compares a company’s stock price to its earnings per share. Let's break down what it is, how it's calculated, and what it means for investors.

**What is the Price-to-Earnings (P/E) Ratio?**

The P/E ratio is a valuation metric that shows how much investors are willing to pay for each dollar of a company'


In medicine, a key indicator of a patient's health is its blood pressure. High blood pressure, or hypertension, is a serious condition that can lead to serious health problems, including heart disease, stroke, and kidney failure. However, there are ways to manage and lower blood pressure, and many people can live long, healthy lives with hypertension.

Here's a breakdown of what you need to know about blood pressure, hypertension, and how to manage it:

**1. What is Blood Pressure?**

* **Blood pressure** is the force of



In finance, the process of identif

KeyboardInterrupt: 