<a href="https://colab.research.google.com/github/HariHaran9597/Math-solver/blob/main/Try_Math_Solver.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# @title 1. Install Dependencies (Run this first)
# We install Unsloth for fast 2x inference on free Colab GPUs
%%capture
!pip install "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git"
!pip install --no-deps "xformers<0.0.27" "trl<0.9.0" peft accelerate bitsandbytes gradio

In [2]:
# @title 2. Launch the AI (Click Play & Wait for Link)
from unsloth import FastLanguageModel
import torch
import gradio as gr
from collections import Counter
import re

# --- CONFIGURATION ---
# We load YOUR fine-tuned model from the Hub
MODEL_ID = "justhariharan/Qwen2.5-Math-1.5B-Solver"

print(f"⏳ Downloading {MODEL_ID} to Colab GPU...")
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = MODEL_ID,
    max_seq_length = 2048,
    dtype = None,
    load_in_4bit = True, # Use 4-bit for speed
)
FastLanguageModel.for_inference(model) # Enable native 2x faster inference

# --- LOGIC ---
def extract_answer(text):
    if not text: return None
    if "####" in text: text = text.split("####")[-1]
    pattern = r"(-?[$0-9.,]{1,})"
    matches = re.findall(pattern, text)
    return matches[-1].replace(",", "").replace("$", "").strip() if matches else None

def solve_single(question, history):
    # Friendly Prompt
    prompt = f"""<|im_start|>system
You are a helpful math teacher. Solve step-by-step and explain logic simply. End with #### Number.<|im_end|>
<|im_start|>user
{question}<|im_end|>
<|im_start|>assistant
"""
    inputs = tokenizer([prompt], return_tensors="pt").to("cuda")

    outputs = model.generate(**inputs, max_new_tokens=512, use_cache=True, temperature=0.6)
    response = tokenizer.batch_decode(outputs)[0]
    return response.split("<|im_start|>assistant")[-1].replace("<|im_end|>", "").strip()

def solve_majority(question, history):
    # Smart Mode (3 Attempts)
    candidates = []
    raw = []
    for _ in range(3):
        inputs = tokenizer([f"<|im_start|>user\n{question}<|im_end|>\n<|im_start|>assistant\n"], return_tensors="pt").to("cuda")
        out = model.generate(**inputs, max_new_tokens=512, do_sample=True, temperature=0.8)
        resp = tokenizer.batch_decode(out)[0].split("assistant")[-1].replace("<|im_end|>", "")
        raw.append(resp)
        ans = extract_answer(resp)
        if ans: candidates.append(ans)

    if not candidates: return raw[0]
    winner = Counter(candidates).most_common(1)[0][0]
    for r in raw:
        if extract_answer(r) == winner: return f"🏆 **Verified Answer:**\n{r}"
    return raw[0]

def chat(msg, hist, smart):
    return solve_majority(msg, hist) if smart else solve_single(msg, hist)

# --- UI ---
demo = gr.ChatInterface(
    fn=chat,
    additional_inputs=[gr.Checkbox(label="Smart Mode (Slower but Accurate)", value=False)],
    title="🚀 GPU Math Solver (Colab Version)",
    description="Running on Tesla T4 GPU. This is much faster than the CPU demo."
)
demo.launch(share=True, debug=True)

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
⏳ Downloading justhariharan/Qwen2.5-Math-1.5B-Solver to Colab GPU...
==((====))==  Unsloth 2025.11.6: Fast Qwen2 patching. Transformers: 4.57.2.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.9.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.5.0
\        /    Bfloat16 = FALSE. FA [Xformers = None. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/3.09G [00:00<?, ?B/s]

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

added_tokens.json:   0%|          | 0.00/632 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/613 [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/11.4M [00:00<?, ?B/s]

chat_template.jinja: 0.00B [00:00, ?B/s]

  self.chatbot = Chatbot(


Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
* Running on public URL: https://a7098689dddd46840d.gradio.live

This share link expires in 1 week. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)


Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://a7098689dddd46840d.gradio.live


