# HouseBrain V1: Junior Architect - Inference Notebook

This notebook is designed to run inference with the fine-tuned HouseBrain V1 model. 

**Instructions:**

1.  **Upload Adapters:** Make sure you have uploaded the `housebrain_v1_adapters.zip` file from the fine-tuning notebook and unzipped it. The `housebrain_v1_adapters` directory should be present in the root of your Colab environment.
2.  **Run All Cells:** Execute the cells in order to install dependencies, load the model with the adapters, and run a test inference.


In [None]:
# Cell 2: Install Dependencies
# We need to install the core libraries for running the model.
# Triton is a dependency for bitsandbytes on some Colab GPUs.

print("🚀 Installing required libraries...")
!pip install -q transformers==4.43.3 bitsandbytes==0.43.1 accelerate==0.32.1 torch==2.2.1 peft==0.12.0 triton
print("✅ Installation complete.")


In [None]:
# Cell 3: Imports and Setup

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
import os

print("✅ Libraries imported.")


In [None]:
# Cell 4: Load the Fine-Tuned Model
# This is where we load the original base model and then apply our trained adapters on top.
import os
# Define the models
base_model_name = "Qwen/Qwen2.5-3B-Instruct"
adapter_path = "/content/housebrain_v1_adapters" # Path to your local adapters

print(f"🔍 Checking for adapter directory at: {adapter_path}")
if not os.path.isdir(adapter_path):
    raise FileNotFoundError(
        f"Adapter directory not found at '{adapter_path}'. "
        f"Please make sure you have uploaded and unzipped your adapters."
    )
print("✅ Adapter directory found.")

# Configure 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

print(f"⬇️ Loading base model: {base_model_name}")
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True
)
print("✅ Base model loaded.")

print(f"🔧 Fusing adapters from: {adapter_path}")
model = PeftModel.from_pretrained(base_model, adapter_path)
print("✅ Adapters loaded.")

print("🔄 Merging model and adapters...")
# Important: Merge the adapters into the base model for faster inference
model = model.merge_and_unload()
print("✅ Model merged and ready for inference!")

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)


In [None]:
# Cell 5: Run Inference
# Now we can test our specialized model with a new prompt.

# The prompt should follow the same "messages" format we trained on
prompt_text = "Design a modern, single-story 2BHK house for a 30x40 feet plot with a total area of 1200 sqft."

messages = [
    {"role": "system", "content": "You are a helpful assistant that generates house plans in JSON format."},
    {"role": "user", "content": prompt_text}
]

# Apply the chat template and tokenize
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to("cuda")

print("🤖 Generating house plan...")

# Generate the output
generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=2048, # Increased token limit for potentially complex plans
    do_sample=True,     # Use sampling for more creative/varied outputs
    top_p=0.9,          # Use nucleus sampling
    temperature=0.6     # A bit of creativity, but not too much
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

print("---✨ Generated Plan ✨---")
print(response)
