# HouseBrain: Production Fine-Tuning on Google Colab

This notebook provides the complete, end-to-end workflow for fine-tuning a production-grade Large Language Model for architectural design. It uses our full **20-example "Gold Standard" dataset** to teach a powerful base model (`meta-llama/Llama-2-7b-chat-hf`) the specific schema and nuances of Indian residential architecture.

**GPU Requirement:** This notebook is designed for a T4 GPU, which is available in the free tier of Google Colab. For faster results, you can use an A100 on Colab Pro+.


## Step 1: Environment Setup

This step clones the project repository from GitHub and installs all the necessary Python packages for fine-tuning, including `transformers`, `peft`, `trl`, and `bitsandbytes` for memory-efficient 4-bit training.


In [None]:
!git clone https://github.com/your-username/housebrain_v1_1.git
%cd housebrain_v1_1

!pip install -q -U transformers datasets accelerate peft trl bitsandbytes


## Step 3: Prepare the Gold Standard Data

This step runs our preparation script. It will process the 20 raw Gold Standard JSON files and create a new `gold_standard_finetune_ready` directory containing the data in the simple `{"prompt": "...", "output": "..."}` format required by the training script.


In [None]:
!python scripts/prepare_gold_standard_data.py
!echo "\n✅ Data preparation complete. Verifying the new directory:"
!ls -l data/training/gold_standard_finetune_ready | wc -l


## Step 4: Run the Fine-Tuning Script

This is the core of the process. We execute the `run_finetuning.py` script, which will:

1.  **Load** our 20 prepared Gold Standard examples.
2.  **Download** the base `Llama-2-7b-chat-hf` model from Hugging Face.
3.  **Configure** 4-bit quantization and LoRA for efficient training.
4.  **Fine-tune** the model on our data.
5.  **Save** the final, specialized `housebrain-llama2-7b-v0.1` model to the `models/` directory.

We will use a high number of epochs (e.g., 200) because our dataset is very high-quality but small. This is necessary to ensure the model learns the schema thoroughly.


In [None]:
!python scripts/run_finetuning.py \
    --dataset-path "data/training/gold_standard_finetune_ready" \
    --base-model "meta-llama/Llama-2-7b-chat-hf" \
    --output-path "models/housebrain-llama2-7b-v0.1" \
    --epochs 200 \
    --batch-size 2 \
    --learning-rate 2e-5


## Step 5: Next Steps - Using Your Fine-Tuned Model

Once training is complete, the new model is saved in the `models/housebrain-llama2-7b-v0.1` directory. 

You can now use this specialized model in your `generate_validated_silver_data.py` script (by changing the model ID) to generate a large, high-quality dataset of thousands of examples. This is the path to a truly production-ready system.
