**Run 19 July - INFERENCE | UNIT TEST | ON FINE TUNED LLAMA 8B MODEL QUANTIZED AND WITH QLORA**

Run Unit Test on 23 passage, question pairs. These passage question pairs are loaded from a different pdf that was NOT used in fune tuning.

Most questions are high orders of difficulty.

Inference runs using loaded trained adapter parameters from Storage


***
Important notes from last run

1. Hugging Face login successful!

 - Loading model: meta-llama/Meta-Llama-3-8B-Instruct with standard Hugging Face QLoRA...

  - Loading checkpoint shards: 100%

  - Base model loaded successfully with 4-bit quantization.
  - LoRA adapters configured and applied to the model.
trainable params: 41,943,040 || all params: 8,072,204,288 || trainable%: 0.5196

  - Loading fine-tuning dataset from: /content/drive/MyDrive/fpdata/geetha_vahini/phase_4_question_passage_ans_triplet.jsonl
  - Dataset loaded with 2515 examples.


2. Fine-tuning complete!

  - Fine-tuned LoRA adapters saved to: /content/drive/MyDrive/fpdata/geetha_vahini/llama3_8b_qa_finetuned_adapters_standard_hf

  - You can load these adapters with the base model for inference later.

3. Inference

 - Hugging Face login successful!

 - Checking for existence of fine-tuned adapters: /content/drive/MyDrive/fpdata/geetha_vahini/llama3_8b_qa_finetuned_adapters_standard_hf
Fine-tuned adapters directory found. Proceeding.
 - Loading base model: meta-llama/Meta-Llama-3-8B-Instruct with 4-bit quantization...
 - Loading checkpoint shards: 100%

 - Base model and tokenizer loaded.
 - Loading LoRA adapters from: /content/drive/MyDrive/fpdata/geetha_vahini/llama3_8b_qa_finetuned_adapters_standard_hf...
 - LoRA adapters loaded.
 - Merging LoRA adapters into base model (optional, for faster inference)...
 - Adapters merged.

--- Running Predefined Inference Examples ---
- Example 1:
  - Passage: In the ancient city of Athens, democracy flourished, allowing citizens to participate directly in governance. However, only free-born men were considered citizens, excluding women, slaves, and foreign residents. The Assembly, where laws were debated and passed, met regularly on the Pnyx hill.Socrates, a prominent philosopher, was known for his method of questioning, which often challenged conventional wisdom.
  - Question: Where did the Assembly meet in ancient Athens?
  - Generated Answer: The Assembly met regularly on the Pnyx hill in ancient Athens.




# Part 1: (Aggressive) Installation.

IMPORTANT: follow these steps precisely:
1.	Run this code block.
2.	After it completes, go to Runtime -> Disconnect and delete runtime.
3.	Once the runtime restarts, RUN THIS CODE BLOCK AGAIN.
4.	After this block finishes its second run, you can safely proceed to the next sections.

In [1]:
# --- Installation Block ---
print("Starting library installations and upgrades...")

# Aggressively uninstall to ensure a clean slate, especially for torch and torchvision
!pip uninstall -y torch torchvision torchaudio transformers accelerate bitsandbytes trl peft datasets xformers

# Clear relevant caches
print("Clearing bitsandbytes cache...")
!rm -rf ~/.cache/bitsandbytes
print("Clearing Hugging Face cache...")
!rm -rf ~/.cache/huggingface/hub/*

# Install PyTorch and Torchvision specifically for CUDA 12.1 (common in Colab)
print("Installing PyTorch and Torchvision for CUDA 12.1...")
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# Install other core Hugging Face libraries
print("Installing transformers, accelerate, bitsandbytes, trl, peft, datasets...")
# Pin trl to a known compatible version to avoid import errors
!pip install transformers accelerate bitsandbytes "trl==0.8.6" peft datasets
# xformers is optional, uncomment if you want to try it, but it's not strictly necessary
# !pip install xformers

print("\nLibrary installation complete.")
print("IMPORTANT: Please follow the instructions above about restarting the runtime.")




Starting library installations and upgrades...
Found existing installation: torch 2.5.1+cu121
Uninstalling torch-2.5.1+cu121:
  Successfully uninstalled torch-2.5.1+cu121
Found existing installation: torchvision 0.20.1+cu121
Uninstalling torchvision-0.20.1+cu121:
  Successfully uninstalled torchvision-0.20.1+cu121
Found existing installation: torchaudio 2.5.1+cu121
Uninstalling torchaudio-2.5.1+cu121:
  Successfully uninstalled torchaudio-2.5.1+cu121
Found existing installation: transformers 4.53.2
Uninstalling transformers-4.53.2:
  Successfully uninstalled transformers-4.53.2
Found existing installation: accelerate 1.9.0
Uninstalling accelerate-1.9.0:
  Successfully uninstalled accelerate-1.9.0
Found existing installation: bitsandbytes 0.46.1
Uninstalling bitsandbytes-0.46.1:
  Successfully uninstalled bitsandbytes-0.46.1
Found existing installation: trl 0.8.6
Uninstalling trl-0.8.6:
  Successfully uninstalled trl-0.8.6
Found existing installation: peft 0.16.0
Uninstalling peft-0.16.

# Part 2: Google drive mount, and unit test passage-question jsonl file upload

In [2]:
import os
from google.colab import drive, files

# --- Configuration ---
# Define the base path where your data will be stored in Google Drive
DATA_PATH = "/content/drive/MyDrive/fpdata/geetha_vahini"
UNIT_TEST_FILE_NAME = "unit_test_passage_questions_clean.jsonl"
UNIT_TEST_FILE_PATH = os.path.join(DATA_PATH, UNIT_TEST_FILE_NAME)

# --- Mount Google Drive ---
# Check if Google Drive is already mounted
if not os.path.exists('/content/drive/MyDrive'):
    print("Google Drive not detected as mounted. Attempting to mount...")
    try:
        drive.mount('/content/drive')
        print("Google Drive mounted successfully!")
    except Exception as e:
        print(f"Error mounting Google Drive: {e}")
        print("Please ensure you are running this in a Google Colab environment and authorize Drive access.")
        exit() # Exit if Drive cannot be mounted
else:
    print("Google Drive already mounted. Skipping mounting step.")

# Ensure the target directory exists in Google Drive
os.makedirs(DATA_PATH, exist_ok=True)
print(f"Ensured directory exists: {DATA_PATH}")

# --- Upload Unit Test File ---
print(f"\nNow, please upload your '{UNIT_TEST_FILE_NAME}' file.")
print(f"It will be saved to: {UNIT_TEST_FILE_PATH}")

uploaded = files.upload()

if UNIT_TEST_FILE_NAME in uploaded:
    # Save the uploaded file to the specified Google Drive path
    with open(UNIT_TEST_FILE_PATH, 'wb') as f:
        f.write(uploaded[UNIT_TEST_FILE_NAME])
    print(f"'{UNIT_TEST_FILE_NAME}' uploaded and saved to {UNIT_TEST_FILE_PATH}")

    # Verify file existence after upload
    if os.path.exists(UNIT_TEST_FILE_PATH):
        print(f"Verification: '{UNIT_TEST_FILE_NAME}' found at {UNIT_TEST_FILE_PATH}")
    else:
        print(f"Verification WARNING: '{UNIT_TEST_FILE_NAME}' NOT found at {UNIT_TEST_FILE_PATH} after upload.")
        print("There might be a sync issue. Please check your Google Drive manually.")
else:
    print(f"Error: '{UNIT_TEST_FILE_NAME}' was not found in the uploaded files.")
    print("Please ensure you selected the correct file during upload.")
    exit() # Exit if the file wasn't uploaded correctly

print("\nGoogle Drive setup and file upload complete. Proceed to the next section for inference.")


Google Drive not detected as mounted. Attempting to mount...
Mounted at /content/drive
Google Drive mounted successfully!
Ensured directory exists: /content/drive/MyDrive/fpdata/geetha_vahini

Now, please upload your 'unit_test_passage_questions_clean.jsonl' file.
It will be saved to: /content/drive/MyDrive/fpdata/geetha_vahini/unit_test_passage_questions_clean.jsonl


Saving unit_test_passage_questions_clean.jsonl to unit_test_passage_questions_clean.jsonl
'unit_test_passage_questions_clean.jsonl' uploaded and saved to /content/drive/MyDrive/fpdata/geetha_vahini/unit_test_passage_questions_clean.jsonl
Verification: 'unit_test_passage_questions_clean.jsonl' found at /content/drive/MyDrive/fpdata/geetha_vahini/unit_test_passage_questions_clean.jsonl

Google Drive setup and file upload complete. Proceed to the next section for inference.


# Part 3 - Run fine tuning again since LoRA adapters did not get saved

In [3]:
#utility functions


import os
from google.colab import drive
from huggingface_hub import login

# --- Configuration (re-defined here for clarity, ensure it matches other blocks) ---
DATA_PATH = "/content/drive/MyDrive/fpdata/geetha_vahini"
FINE_TUNING_DATASET_PATH = os.path.join(DATA_PATH, "phase_4_question_passage_ans_triplet.jsonl")

def mount_google_drive_conditionally():
    """
    Checks if Google Drive is already mounted and mounts it if not.
    Exits the program if mounting fails.
    """
    if not os.path.exists('/content/drive/MyDrive'):
        print("Google Drive not detected as mounted. Attempting to mount...")
        try:
            drive.mount('/content/drive')
            print("Google Drive mounted successfully!")
        except Exception as e:
            print(f"Error mounting Google Drive: {e}")
            print("Please ensure you are running this in a Google Colab environment and authorize Drive access.")
            exit() # Exit if Drive cannot be mounted
    else:
        print("Google Drive already mounted. Skipping mounting step.")

def check_file_exists_and_exit_if_not(file_path: str, file_description: str = "file"):
    """
    Checks for the existence of a specified file. If the file does not exist,
    it prints an error message and exits the program.

    Args:
        file_path (str): The full path to the file to check.
        file_description (str): A descriptive name for the file (e.g., "fine-tuning dataset").
    """
    print(f"\nChecking for existence of {file_description}: {file_path}")
    if not os.path.exists(file_path):
        print(f"Error: {file_description} not found at {file_path}")
        print("Please ensure previous steps were completed successfully and the file exists.")
        exit() # Exit if the file is not found
    else:
        print(f"{file_description.capitalize()} found. Proceeding.")

# Example usage (you would call these from your main fine-tuning script)
# mount_google_drive_conditionally()
# check_file_exists_and_exit_if_not(FINE_TUNING_DATASET_PATH, "fine-tuning dataset")

def login_to_huggingface_hub():
    """
    Logs into the Hugging Face Hub.
    Exits the program if login fails.
    """
    print("\nLogging into Hugging Face Hub...")
    try:
        # You will be prompted to enter your HF token in a pop-up or console
        login()
        print("Hugging Face login successful!")
    except Exception as e:
        print(f"Hugging Face login failed: {e}")
        print("Please ensure you have accepted the Llama 3 license and pasted a valid token.")
        exit() # Exit if login fails, as model loading will fail without it

def formatting_prompts_func(examples, tokenizer):
    """
    Formats the dataset examples into the Llama 3 chat template required by SFTTrainer.

    Args:
        examples (dict): A dictionary of lists, where each list corresponds to a column
                         (e.g., 'question', 'passage', 'answer') from the dataset.
        tokenizer: The Hugging Face tokenizer object for the Llama 3 model.

    Returns:
        dict: A dictionary containing a single key 'text', whose value is a list of
              formatted prompt strings suitable for SFTTrainer.
    """
    formatted_texts = []
    for i in range(len(examples["question"])):
        question = examples["question"][i]
        passage = examples["passage"][i]
        answer = examples["answer"][i]

        messages = [
            {"role": "system", "content": "You are a helpful assistant that answers questions based on the provided passage. Ensure your answer is concise and directly addresses the question using information only from the passage."},
            {"role": "user", "content": f"Passage: {passage}\nQuestion: {question}"},
            {"role": "assistant", "content": answer}
        ]
        # Apply the tokenizer's chat template to convert messages to a single string
        # add_generation_prompt=False means we don't add the final assistant turn start token,
        # as the model is learning to generate the assistant's response.
        formatted_texts.append(tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=False))
    return {"text": formatted_texts}




In [4]:
# --- Imports for Fine-tuning ---
import os
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments, BitsAndBytesConfig
from trl import SFTTrainer
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from datasets import load_dataset
# The huggingface_hub and google.colab.drive imports are now handled by their respective utility functions.

# --- Configuration ---
DATA_PATH = "/content/drive/MyDrive/fpdata/geetha_vahini"
FINE_TUNING_DATASET_PATH = os.path.join(DATA_PATH, "phase_4_question_passage_ans_triplet.jsonl")
MODEL_NAME = "meta-llama/Meta-Llama-3-8B-Instruct"
MAX_SEQ_LENGTH = 2048
OUTPUT_DIR = os.path.join(DATA_PATH, "llama3_8b_qa_finetuned_adapters_standard_hf")
os.makedirs(OUTPUT_DIR, exist_ok=True) # Ensure the output directory exists

# --- Call Utility Functions for Drive Access and File Checks ---
# Ensure these functions (mount_google_drive_conditionally, check_file_exists_and_exit_if_not)
# are defined in a cell executed BEFORE this one.
mount_google_drive_conditionally()
check_file_exists_and_exit_if_not(FINE_TUNING_DATASET_PATH, "fine-tuning dataset")

# --- Call Hugging Face Login Utility Function ---
# Ensure this function (login_to_huggingface_hub) is defined in a cell executed BEFORE this one.
login_to_huggingface_hub()

# --- Load Model with QLoRA (Standard Hugging Face way) ---
print(f"\nLoading model: {MODEL_NAME} with standard Hugging Face QLoRA...")

# 4-bit quantization configuration
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4", # NormalFloat 4-bit
    bnb_4bit_compute_dtype=torch.bfloat16, # Compute in bfloat16
    bnb_4bit_use_double_quant=True, # Double quantization for extra memory saving
)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    quantization_config=bnb_config,
    device_map="auto", # Automatically maps model to available GPU
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
# Llama 3 tokenizer doesn't have a default pad_token, set it to eos_token
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right" # For training, right padding is generally better

print("Base model loaded successfully with 4-bit quantization.")

# --- Prepare Model for PEFT and Configure LoRA Adapters ---
# Prepare the model for k-bit training (handles some initializations for quantized models)
model = prepare_model_for_kbit_training(model)

# LoRA Configuration
lora_config = LoraConfig(
    r=16, # Rank of the update matrices
    lora_alpha=32, # Scaling factor
    lora_dropout=0.05, # Dropout for regularization (can be 0 if desired)
    bias="none",
    task_type="CAUSAL_LM",
    # Target modules for Llama 3 (these are the linear layers in attention/FFN blocks)
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",
        "gate_proj", "up_proj", "down_proj",
    ],
)

# Get the PEFT model
model = get_peft_model(model, lora_config)
print("LoRA adapters configured and applied to the model.")
model.print_trainable_parameters() # Shows how many parameters are actually being trained

# --- Load Prepared Dataset ---
print(f"\nLoading fine-tuning dataset from: {FINE_TUNING_DATASET_PATH}")
train_dataset = load_dataset("json", data_files=FINE_TUNING_DATASET_PATH, split="train")
print(f"Dataset loaded with {len(train_dataset)} examples.")

# --- Apply Formatting Function for SFTTrainer ---
# Ensure 'formatting_prompts_func' is defined in a cell executed BEFORE this one.
train_dataset = train_dataset.map(
    lambda examples: formatting_prompts_func(examples, tokenizer), # Pass tokenizer to the function
    batched=True,
    remove_columns=["doc_id", "question", "passage", "answer"],
)
print("Dataset formatted for SFTTrainer.")
print(train_dataset)

# --- Configure Training Arguments ---
training_args = TrainingArguments(
    output_dir="./training_logs_standard_hf", # Temporary directory for training logs
    per_device_train_batch_size=1,
    gradient_accumulation_steps=8, # Keep aggressive accumulation for memory
    warmup_steps=10,
    num_train_epochs=3,
    learning_rate=2e-4,
    fp16=False, # Set to False if bfloat16 is true, or if GPU doesn't support fp16
    bf16=torch.cuda.is_bf16_supported(), # Use BF16 if supported
    logging_steps=10,
    optim="paged_adamw_8bit", # Use paged optimizer for memory efficiency
    weight_decay=0.01,
    lr_scheduler_type="linear",
    seed=42,
    save_total_limit=1,
    push_to_hub=False,
    report_to="none",
    remove_unused_columns=False,
    dataloader_num_workers=0,
)

# --- Initialize SFTTrainer ---
print("\nInitializing SFTTrainer...")
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    dataset_text_field="text",
    max_seq_length=MAX_SEQ_LENGTH,
    packing=True,
    args=training_args,
)
print("SFTTrainer initialized. Starting training...")

# --- Start Training ---
trainer.train()
print("\nFine-tuning complete!")

# --- Save Fine-tuned Adapters ---
model.save_pretrained(OUTPUT_DIR)
tokenizer.save_pretrained(OUTPUT_DIR)
print(f"Fine-tuned LoRA adapters saved to: {OUTPUT_DIR}")
print("You can load these adapters with the base model for inference later.")


Google Drive already mounted. Skipping mounting step.

Checking for existence of fine-tuning dataset: /content/drive/MyDrive/fpdata/geetha_vahini/phase_4_question_passage_ans_triplet.jsonl
Fine-tuning dataset found. Proceeding.

Logging into Hugging Face Hub...


VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

Hugging Face login successful!

Loading model: meta-llama/Meta-Llama-3-8B-Instruct with standard Hugging Face QLoRA...


config.json:   0%|          | 0.00/654 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/187 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/51.0k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/73.0 [00:00<?, ?B/s]

Base model loaded successfully with 4-bit quantization.
LoRA adapters configured and applied to the model.
trainable params: 41,943,040 || all params: 8,072,204,288 || trainable%: 0.5196

Loading fine-tuning dataset from: /content/drive/MyDrive/fpdata/geetha_vahini/phase_4_question_passage_ans_triplet.jsonl


Generating train split: 0 examples [00:00, ? examples/s]

Dataset loaded with 2515 examples.


Map:   0%|          | 0/2515 [00:00<?, ? examples/s]

Dataset formatted for SFTTrainer.
Dataset({
    features: ['text'],
    num_rows: 2515
})

Initializing SFTTrainer...


Generating train split: 0 examples [00:00, ? examples/s]

  super().__init__(
No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.


SFTTrainer initialized. Starting training...


  return fn(*args, **kwargs)


Step,Training Loss
10,1.2458
20,0.8472
30,0.7873
40,0.725
50,0.6867
60,0.7077
70,0.6384
80,0.6125
90,0.5313
100,0.5088



Fine-tuning complete!
Fine-tuned LoRA adapters saved to: /content/drive/MyDrive/fpdata/geetha_vahini/llama3_8b_qa_finetuned_adapters_standard_hf
You can load these adapters with the base model for inference later.


In [5]:
####After fine tuning completes, verify adapters were actually written to storage

import os
from google.colab import drive

# Ensure drive is mounted if not already
if not os.path.exists('/content/drive/MyDrive'):
    drive.mount('/content/drive')
    print("Drive mounted.")
else:
    print("Drive already mounted.")

ADAPTERS_DIR = "/content/drive/MyDrive/fpdata/geetha_vahini/llama3_8b_qa_finetuned_adapters_standard_hf"

print(f"\nVerifying contents of: {ADAPTERS_DIR}")
if os.path.exists(ADAPTERS_DIR):
    contents = os.listdir(ADAPTERS_DIR)
    print(f"Contents of '{ADAPTERS_DIR}': {contents}")
    if "adapter_config.json" in contents and "adapter_model.safetensors" in contents:
        print("SUCCESS: 'adapter_config.json' and 'adapter_model.safetensors' are present!")
    else:
        print("WARNING: Essential adapter files are missing from the directory.")
else:
    print(f"ERROR: Directory '{ADAPTERS_DIR}' does not exist.")

Drive already mounted.

Verifying contents of: /content/drive/MyDrive/fpdata/geetha_vahini/llama3_8b_qa_finetuned_adapters_standard_hf
Contents of '/content/drive/MyDrive/fpdata/geetha_vahini/llama3_8b_qa_finetuned_adapters_standard_hf': ['README.md', 'adapter_model.safetensors', 'adapter_config.json', 'chat_template.jinja', 'tokenizer_config.json', 'special_tokens_map.json', 'tokenizer.json']
SUCCESS: 'adapter_config.json' and 'adapter_model.safetensors' are present!


# Part 4 - Inference on loaded passage, question pairs ("generate answer for the question given the context that is the passage"

 - Generated answer files saved to persistent storage.  unit_test_generated_answers_fine_tuned_llama8bQLora.jsonl

In [5]:
######UTILITY FUNCTONS

import os
from google.colab import drive, files
import zipfile
import tarfile

def upload_and_extract_fine_tuned_adapters_folder(data_path: str):
    """
    Guides the user to upload a zipped folder containing fine-tuned model adapters
    and extracts its contents to the specified data_path in Google Drive.

    Args:
        data_path (str): The base path in Google Drive where the adapters folder
                         (e.g., 'llama3_8b_qa_finetuned_adapters_standard_hf')
                         should reside.
    """
    # --- Configuration ---
    FINE_TUNED_ADAPTERS_FOLDER_NAME = "llama3_8b_qa_finetuned_adapters_standard_hf"
    FINE_TUNED_ADAPTERS_PATH = os.path.join(data_path, FINE_TUNED_ADAPTERS_FOLDER_NAME)
    UPLOAD_ZIP_FILE_NAME = f"{FINE_TUNED_ADAPTERS_FOLDER_NAME}.zip" # Expected zip file name

    # --- Mount Google Drive (if not already mounted) ---
    if not os.path.exists('/content/drive/MyDrive'):
        print("Google Drive not detected as mounted. Attempting to mount...")
        try:
            drive.mount('/content/drive')
            print("Google Drive mounted successfully!")
        except Exception as e:
            print(f"Error mounting Google Drive: {e}")
            print("Please ensure you are running this in a Google Colab environment and authorize Drive access.")
            return # Use return instead of exit() when inside a function to allow calling code to handle

    else:
        print("Google Drive already mounted.")

    # --- Ensure the target adapters directory exists ---
    # This directory will be where the contents of the zip file are extracted
    os.makedirs(FINE_TUNED_ADAPTERS_PATH, exist_ok=True)
    print(f"Ensured target adapters directory exists: {FINE_TUNED_ADAPTERS_PATH}")

    # --- Manual Upload of Zipped Adapters Folder ---
    print(f"\n--- UPLOAD INSTRUCTIONS ---")
    print(f"1. On your local machine, zip the '{FINE_TUNED_ADAPTERS_FOLDER_NAME}' folder.")
    print(f"   (e.g., right-click folder -> Compress or Send to -> Compressed (zipped) folder)")
    print(f"2. When prompted, select and upload the generated ZIP file (e.g., '{UPLOAD_ZIP_FILE_NAME}').")
    print(f"--------------------------")

    print(f"\nNow, please upload the zipped folder containing your fine-tuned adapters.")

    uploaded = files.upload()

    if UPLOAD_ZIP_FILE_NAME in uploaded:
        # Save the uploaded zip file to a temporary location
        temp_zip_path = os.path.join("/tmp", UPLOAD_ZIP_FILE_NAME)
        with open(temp_zip_path, 'wb') as f:
            f.write(uploaded[UPLOAD_ZIP_FILE_NAME])
        print(f"'{UPLOAD_ZIP_FILE_NAME}' uploaded to temporary location.")

        # --- Extract the contents of the zip file ---
        print(f"Extracting '{UPLOAD_ZIP_FILE_NAME}' to '{data_path}'...") # Extract to DATA_PATH to get folder inside
        try:
            # Check if it's a zip file
            if zipfile.is_zipfile(temp_zip_path):
                with zipfile.ZipFile(temp_zip_path, 'r') as zip_ref:
                    # Extract all contents. The zip file should contain the folder itself,
                    # so we extract to the parent DATA_PATH to get the folder inside.
                    zip_ref.extractall(data_path)
                print("Zip file extracted successfully.")
            # Check if it's a tar.gz file (less common for single folders, but good to support)
            elif tarfile.is_tarfile(temp_zip_path):
                with tarfile.open(temp_zip_path, 'r:gz') as tar_ref:
                    tar_ref.extractall(data_path)
                print("Tar.gz file extracted successfully.")
            else:
                print(f"Error: Uploaded file '{UPLOAD_ZIP_FILE_NAME}' is not a recognized archive format (zip or tar.gz).")
                os.remove(temp_zip_path)
                return

            # Clean up the temporary zip file
            os.remove(temp_zip_path)

            # --- Verify extracted folder and its contents ---
            print(f"\nVerifying extracted folder and its contents in: {FINE_TUNED_ADAPTERS_PATH}")
            if os.path.exists(FINE_TUNED_ADAPTERS_PATH) and os.path.isdir(FINE_TUNED_ADAPTERS_PATH):
                print(f"Folder '{FINE_TUNED_ADAPTERS_FOLDER_NAME}' found.")
                contents = os.listdir(FINE_TUNED_ADAPTERS_PATH)
                print(f"Contents of '{FINE_TUNED_ADAPTERS_FOLDER_NAME}': {contents}")
                if "adapter_config.json" in contents and "adapter_model.safetensors" in contents:
                    print("SUCCESS: 'adapter_config.json' and 'adapter_model.safetensors' found.")
                else:
                    print("WARNING: Essential adapter files (adapter_config.json or adapter_model.safetensors) not found after extraction.")
                    print("Please ensure your zipped folder contains these files directly at its root.")
            else:
                print(f"Verification WARNING: Folder '{FINE_TUNED_ADAPTERS_FOLDER_NAME}' NOT found after extraction.")
                print("Please ensure your zipped file contains the folder directly, not nested too deep.")

        except Exception as e:
            print(f"An error occurred during extraction: {e}")
            print("Please ensure the zip file is not corrupted and contains the expected folder structure.")
    else:
        print(f"Error: '{UPLOAD_ZIP_FILE_NAME}' was not found in the uploaded files.")
        print("Please ensure you selected the correct zip file during upload.")

    print("\nManual upload process complete.")

# Example of how to call this function (you would call it from your main script)
DATA_PATH_FOR_CALL = "/content/drive/MyDrive/fpdata/geetha_vahini"

upload_and_extract_fine_tuned_adapters_folder(DATA_PATH_FOR_CALL)


Google Drive not detected as mounted. Attempting to mount...
Mounted at /content/drive
Google Drive mounted successfully!
Ensured target adapters directory exists: /content/drive/MyDrive/fpdata/geetha_vahini/llama3_8b_qa_finetuned_adapters_standard_hf

--- UPLOAD INSTRUCTIONS ---
1. On your local machine, zip the 'llama3_8b_qa_finetuned_adapters_standard_hf' folder.
   (e.g., right-click folder -> Compress or Send to -> Compressed (zipped) folder)
2. When prompted, select and upload the generated ZIP file (e.g., 'llama3_8b_qa_finetuned_adapters_standard_hf.zip').
--------------------------

Now, please upload the zipped folder containing your fine-tuned adapters.


Error: 'llama3_8b_qa_finetuned_adapters_standard_hf.zip' was not found in the uploaded files.
Please ensure you selected the correct zip file during upload.

Manual upload process complete.


In [11]:
##INFERENCE RUN BATCH 1, FIRST 16 QUESTIONS. unit_test_passage_questions_clean.jsonl
# unit_test_generated_answers_fine_tuned_llama8bQLora.jsonl


import os
import json
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel # For loading LoRA adapters
from huggingface_hub import login # For Hugging Face authentication
from google.colab import drive # Keep drive import for mounting

# --- Configuration (ensure consistency with previous blocks) ---
DATA_PATH = "/content/drive/MyDrive/fpdata/geetha_vahini"
# Path to the saved fine-tuned LoRA adapters from your previous fine-tuning run
FINE_TUNED_ADAPTERS_PATH = os.path.join(DATA_PATH, "llama3_8b_qa_finetuned_adapters_standard_hf")
MODEL_NAME = "meta-llama/Meta-Llama-3-8B-Instruct"
MAX_SEQ_LENGTH = 2048 # Max sequence length used during training

# Path to the uploaded unit test file
UNIT_TEST_FILE_NAME = "unit_test_passage_questions_clean.jsonl"
UNIT_TEST_FILE_PATH = os.path.join(DATA_PATH, UNIT_TEST_FILE_NAME)

# New output file for generated answers
GENERATED_ANSWERS_FILE = os.path.join(DATA_PATH, "unit_test_generated_answers_fine_tuned_llama8bQLora.jsonl")

# --- Mount Google Drive (if not already mounted in this session) ---
print("Mounting Google Drive for inference...")
if not os.path.exists('/content/drive/MyDrive'):
    try:
        drive.mount('/content/drive')
        print("Google Drive mounted successfully!")
    except Exception as e:
        print(f"Error mounting Google Drive: {e}")
        print("Please ensure you are running this in a Google Colab environment and authorize Drive access.")
        exit()
else:
    print("Google Drive already mounted.")

# --- Hugging Face Login (REQUIRED for Llama 3) ---
print("\nLogging into Hugging Face Hub...")
try:
    login()
    print("Hugging Face login successful!")
except Exception as e:
    print(f"Hugging Face login failed: {e}")
    print("Please ensure you have accepted the Llama 3 license and pasted a valid token.")
    exit() # Exit if login fails

# --- Check for existence of fine-tuned adapters directory and unit test file ---
print(f"\nChecking for existence of fine-tuned adapters directory: {FINE_TUNED_ADAPTERS_PATH}")
if not os.path.exists(FINE_TUNED_ADAPTERS_PATH):
    print(f"Error: Fine-tuned adapters directory not found at {FINE_TUNUNED_ADAPTERS_PATH}")
    print("Please ensure Step 2 (LLM Fine-Tuning) was completed successfully and adapters were saved to this path,")
    print("or that you have manually uploaded the entire folder to your Google Drive.")
    exit()
else:
    print("Fine-tuned adapters directory found. Proceeding.")

print(f"\nChecking for existence of unit test file: {UNIT_TEST_FILE_PATH}")
if not os.path.exists(UNIT_TEST_FILE_PATH):
    print(f"Error: Unit test file not found at {UNIT_TEST_FILE_PATH}")
    print("Please ensure you ran Section 2 and successfully uploaded the file.")
    exit()
else:
    print("Unit test file found. Proceeding.")


# --- Load Base Model with 4-bit Quantization ---
print(f"\nLoading base model: {MODEL_NAME} with 4-bit quantization...")
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16, # Use float16 for compute during inference
)

base_model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
# Llama 3 tokenizer doesn't have a default pad_token, set it to eos_token
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left" # For inference, left padding is generally preferred

print("Base model and tokenizer loaded.")

# --- Load Fine-tuned LoRA Adapters ---
print(f"Loading LoRA adapters from: {FINE_TUNED_ADAPTERS_PATH}...")
# Attach the LoRA adapters to the base model
model = PeftModel.from_pretrained(base_model, FINE_TUNED_ADAPTERS_PATH)
print("LoRA adapters loaded.")

# Optional: Merge adapters into the base model for faster inference (requires more VRAM)
print("Merging LoRA adapters into base model (optional, for faster inference)...")
model = model.merge_and_unload()
print("Adapters merged.")

# Set model to evaluation mode
model.eval()

# --- Function to generate response ---
def generate_answer(passage: str, question: str, model, tokenizer, max_new_tokens=100):
    """
    Generates an answer to a question based on a provided passage using the fine-tuned model.
    """
    messages = [
        {"role": "system", "content": "You are a helpful assistant that answers questions based on the provided passage. Be concise and directly answer the question."},
        {"role": "user", "content": f"Passage: {passage}\nQuestion: {question}"},
    ]
    # Apply chat template and tokenize
    input_ids = tokenizer.apply_chat_template(
        messages,
        tokenize=True,
        add_generation_prompt=True, # Important: tells the model to generate the assistant's turn
        return_tensors="pt"
    ).to(model.device)

    # Generate response
    with torch.no_grad(): # No need to calculate gradients during inference
        outputs = model.generate(
            input_ids=input_ids,
            max_new_tokens=max_new_tokens,
            do_sample=True,
            top_p=0.9,
            temperature=0.7,
            pad_token_id=tokenizer.pad_token_id,
        )

    # Decode the generated text
    decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=False)

    # Find the start of the assistant's response
    assistant_start_tag = "<|start_header_id|>assistant<|end_header_id|>\n"
    start_index = decoded_output.find(assistant_start_tag)

    if start_index != -1:
        generated_answer = decoded_output[start_index + len(assistant_start_tag):].strip()
        # Remove any trailing <|eot_id|> or other special tokens
        generated_answer = generated_answer.replace("<|eot_id|>", "").strip()
    else:
        generated_answer = "Could not parse assistant's response." # Fallback

    return generated_answer

# --- Load Unit Test Data and Run Inference ---
print(f"\n--- Running Inference on '{UNIT_TEST_FILE_NAME}' and saving results ---")

unit_test_data = []
try:
    with open(UNIT_TEST_FILE_PATH, 'r', encoding='utf-8') as f:
        for line in f:
            unit_test_data.append(json.loads(line))
    print(f"Loaded {len(unit_test_data)} examples from '{UNIT_TEST_FILE_NAME}'.")
except Exception as e:
    print(f"Error loading unit test data: {e}")
    exit()

# Open the output file for writing generated answers
with open(GENERATED_ANSWERS_FILE, 'w', encoding='utf-8') as f_out:
    for i, example in enumerate(unit_test_data):
        doc_id = example.get('id', f"unknown_id_{i+1}")
        passage = example.get('passage', 'No passage provided.')
        question = example.get('question', 'No question provided.')

        print(f"\n--- Unit Test Example {i+1} (Doc ID: {doc_id}) ---")
        print(f"Passage: {passage.strip()}")
        print(f"Question: {question}")

        generated_answer = generate_answer(passage, question, model, tokenizer)
        print(f"Generated Answer: {generated_answer}")

        # Save the generated answer to the new JSONL file
        output_entry = {
            "id": doc_id,
            "question": question,
            "passage": passage,
            "gen_answer": generated_answer
        }
        json.dump(output_entry, f_out)
        f_out.write('\n') # Add newline for JSONL format

print(f"\nAll unit test inference examples completed. Generated answers saved to: {GENERATED_ANSWERS_FILE}")


Mounting Google Drive for inference...
Google Drive already mounted.

Logging into Hugging Face Hub...


VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

Hugging Face login successful!

Checking for existence of fine-tuned adapters directory: /content/drive/MyDrive/fpdata/geetha_vahini/llama3_8b_qa_finetuned_adapters_standard_hf
Fine-tuned adapters directory found. Proceeding.

Checking for existence of unit test file: /content/drive/MyDrive/fpdata/geetha_vahini/unit_test_passage_questions_clean.jsonl
Unit test file found. Proceeding.

Loading base model: meta-llama/Meta-Llama-3-8B-Instruct with 4-bit quantization...


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Base model and tokenizer loaded.
Loading LoRA adapters from: /content/drive/MyDrive/fpdata/geetha_vahini/llama3_8b_qa_finetuned_adapters_standard_hf...
LoRA adapters loaded.
Merging LoRA adapters into base model (optional, for faster inference)...


The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Adapters merged.

--- Running Inference on 'unit_test_passage_questions_clean.jsonl' and saving results ---
Error loading unit test data: Expecting ',' delimiter: line 1 column 128 (char 127)

--- Unit Test Example 1 (Doc ID: 1) ---
Passage: While uneducated, the uninitiated, the person who has not taught himself the first steps of Sadhana feels he is one with his physical frame. Sath Chith Ananda—the expression indicates the Eternal. Niraakaara means without Aakaara or Form. What form can we posit of the All-pervasive, the All-inclusive? "Para" or "Param" means super, beyond, above, more glorious than all. Parabrahmam indicates the One beyond and behind everything, grander than anything in the three worlds. It is non-dual, unique, the eternal and infinite. "Two" means difference, dissension, inevitable discord.
Question: What do the terms Niraakaara, Para, and Parabrahmamreveal about the nature of the Eternal, and how does this contrast with physical identification?
Generated Answer: 

Run 2 to cover remaining doc ids

In [None]:
########




In [2]:
##INFERENCE RUN BATCH 2, POST 16 QUESTIONS. unit_test_passage_questions_clean2.jsonl
# unit_test_generated_answers_fine_tuned_llama8bQLora_2.jsonl

import os
import json
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from peft import PeftModel # For loading LoRA adapters
from huggingface_hub import login # For Hugging Face authentication
from google.colab import drive, files # Keep drive and files import for mounting/upload
import zipfile # Import zipfile for extraction (still needed for adapter upload function if used)
import tarfile # Import tarfile for extraction (still needed for adapter upload function if used)

# --- Configuration (ensure consistency with previous blocks) ---
DATA_PATH = "/content/drive/MyDrive/fpdata/geetha_vahini"
# Path to the saved fine-tuned LoRA adapters from your previous fine-tuning run
FINE_TUNED_ADAPTERS_PATH = os.path.join(DATA_PATH, "llama3_8b_qa_finetuned_adapters_standard_hf")
MODEL_NAME = "meta-llama/Meta-Llama-3-8B-Instruct"
MAX_SEQ_LENGTH = 2048 # Max sequence length used during training

# Path to the uploaded unit test file
UNIT_TEST_FILE_NAME = "unit_test_passage_questions_clean2.jsonl"
UNIT_TEST_FILE_PATH = os.path.join(DATA_PATH, UNIT_TEST_FILE_NAME)

# New output file for generated answers for this batch
GENERATED_ANSWERS_FILE = os.path.join(DATA_PATH, "unit_test_generated_answers_fine_tuned_llama8bQLora_2.jsonl")

# --- Define the upload and extract function (kept for completeness if needed) ---
def upload_and_extract_fine_tuned_adapters_folder(data_path: str):
    """
    Guides the user to upload a zipped folder containing fine-tuned model adapters
    and extracts its contents to the specified data_path in Google Drive.

    Args:
        data_path (str): The base path in Google Drive where the adapters folder
                         (e.g., 'llama3_8b_qa_finetuned_adapters_standard_hf')
                         should reside.
    """
    FINE_TUNED_ADAPTERS_FOLDER_NAME = "llama3_8b_qa_finetuned_adapters_standard_hf"
    FINE_TUNED_ADAPTERS_PATH_LOCAL = os.path.join(data_path, FINE_TUNED_ADAPTERS_FOLDER_NAME)
    UPLOAD_ZIP_FILE_NAME = f"{FINE_TUNED_ADAPTERS_FOLDER_NAME}.zip" # Expected zip file name

    os.makedirs(FINE_TUNED_ADAPTERS_PATH_LOCAL, exist_ok=True)
    print(f"Ensured target adapters directory exists: {FINE_TUNED_ADAPTERS_PATH_LOCAL}")

    print(f"\n--- UPLOAD INSTRUCTIONS ---")
    print(f"1. On your local machine, zip the '{FINE_TUNED_ADAPTERS_FOLDER_NAME}' folder.")
    print(f"   (e.g., right-click folder -> Compress or Send to -> Compressed (zipped) folder)")
    print(f"2. When prompted, select and upload the generated ZIP file (e.g., '{UPLOAD_ZIP_FILE_NAME}').")
    print(f"--------------------------")

    print(f"\nNow, please upload the zipped folder containing your fine-tuned adapters.")

    uploaded = files.upload()

    if UPLOAD_ZIP_FILE_NAME in uploaded:
        temp_zip_path = os.path.join("/tmp", UPLOAD_ZIP_FILE_NAME)
        with open(temp_zip_path, 'wb') as f:
            f.write(uploaded[UPLOAD_ZIP_FILE_NAME])
        print(f"'{UPLOAD_ZIP_FILE_NAME}' uploaded to temporary location.")

        try:
            if zipfile.is_zipfile(temp_zip_path):
                with zipfile.ZipFile(temp_zip_path, 'r') as zip_ref:
                    zip_ref.extractall(data_path)
                print("Zip file extracted successfully.")
            elif tarfile.is_tarfile(temp_zip_path):
                with tarfile.open(temp_zip_path, 'r:gz') as tar_ref:
                    tar_ref.extractall(data_path)
                print("Tar.gz file extracted successfully.")
            else:
                print(f"Error: Uploaded file '{UPLOAD_ZIP_FILE_NAME}' is not a recognized archive format (zip or tar.gz).")
                os.remove(temp_zip_path)
                return False

            os.remove(temp_zip_path)

            print(f"\nVerifying extracted folder and its contents in: {FINE_TUNED_ADAPTERS_PATH_LOCAL}")
            if os.path.exists(FINE_TUNED_ADAPTERS_PATH_LOCAL) and os.path.isdir(FINE_TUNED_ADAPTERS_PATH_LOCAL):
                print(f"Folder '{FINE_TUNED_ADAPTERS_FOLDER_NAME}' found.")
                contents = os.listdir(FINE_TUNED_ADAPTERS_PATH_LOCAL)
                print(f"Contents of '{FINE_TUNED_ADAPTERS_FOLDER_NAME}': {contents}")
                if "adapter_config.json" in contents and "adapter_model.safetensors" in contents:
                    print("SUCCESS: 'adapter_config.json' and 'adapter_model.safetensors' found.")
                    return True
                else:
                    print("WARNING: Essential adapter files (adapter_config.json or adapter_model.safetensors) not found after extraction.")
                    print("Please ensure your zipped folder contains these files directly at its root.")
                    return False
            else:
                print(f"Verification WARNING: Folder '{FINE_TUNED_ADAPTERS_FOLDER_NAME}' NOT found after extraction.")
                print("Please ensure your zipped file contains the folder directly, not nested too deep.")
                return False

        except Exception as e:
            print(f"An error occurred during extraction: {e}")
            print("Please ensure the zip file is not corrupted and contains the expected folder structure.")
            return False
    else:
        print(f"Error: '{UPLOAD_ZIP_FILE_NAME}' was not found in the uploaded files.")
        print("Please ensure you selected the correct zip file during upload.")
        return False

    print("\nManual upload process complete.")
    return False

# --- Mount Google Drive (if not already mounted in this session) ---
print("Mounting Google Drive for inference...")
if not os.path.exists('/content/drive/MyDrive'):
    try:
        drive.mount('/content/drive')
        print("Google Drive mounted successfully!")
    except Exception as e:
        print(f"Error mounting Google Drive: {e}")
        print("Please ensure you are running this in a Google Colab environment and authorize Drive access.")
        exit()
else:
    print("Google Drive already mounted.")

# --- Hugging Face Login (REQUIRED for Llama 3) ---
print("\nLogging into Hugging Face Hub...")
try:
    login()
    print("Hugging Face login successful!")
except Exception as e:
    print(f"Hugging Face login failed: {e}")
    print("Please ensure you have accepted the Llama 3 license and pasted a valid token.")
    exit() # Exit if login fails

# --- Check for existence of fine-tuned adapters directory ---
print(f"\nChecking for existence of fine-tuned adapters directory: {FINE_TUNED_ADAPTERS_PATH}")
if not os.path.exists(FINE_TUNED_ADAPTERS_PATH):
    print(f"Fine-tuned adapters directory not found. Attempting manual upload and extraction of the zipped adapters folder...")
    # Call the upload function if the directory is missing
    upload_success = upload_and_extract_fine_tuned_adapters_folder(DATA_PATH)
    if not upload_success:
        print("Failed to upload and extract fine-tuned adapters. Exiting.")
        exit()
else:
    print("Fine-tuned adapters directory found. Proceeding.")

# --- Check for existence of unit test file and allow manual upload if missing ---
print(f"\nChecking for existence of unit test file: {UNIT_TEST_FILE_PATH}")
if not os.path.exists(UNIT_TEST_FILE_PATH):
    print(f"Unit test file not found at {UNIT_TEST_FILE_PATH}.")
    print("\n--- UPLOAD INSTRUCTIONS FOR UNIT TEST FILE ---")
    print(f"Please upload your '{UNIT_TEST_FILE_NAME}' file now.")
    print(f"It will be saved to: {UNIT_TEST_FILE_PATH}")
    print("--------------------------------------------")

    uploaded_unit_test = files.upload()

    if UNIT_TEST_FILE_NAME in uploaded_unit_test:
        with open(UNIT_TEST_FILE_PATH, 'wb') as f:
            f.write(uploaded_unit_test[UNIT_TEST_FILE_NAME])
        print(f"'{UNIT_TEST_FILE_NAME}' uploaded and saved to {UNIT_TEST_FILE_PATH}")
        if os.path.exists(UNIT_TEST_FILE_PATH):
            print(f"Verification: '{UNIT_TEST_FILE_NAME}' found at {UNIT_TEST_FILE_PATH}")
        else:
            print(f"Verification WARNING: '{UNIT_TEST_FILE_NAME}' NOT found at {UNIT_TEST_FILE_PATH} after upload.")
            print("There might be a sync issue. Please check your Google Drive manually.")
            exit() # Exit if still not found after upload attempt
    else:
        print(f"Error: '{UNIT_TEST_FILE_NAME}' was not found in the uploaded files.")
        print("Please ensure you selected the correct file during upload.")
        exit() # Exit if upload failed
else:
    print("Unit test file found. Proceeding.")


# --- Load Base Model with 4-bit Quantization ---
print(f"\nLoading base model: {MODEL_NAME} with 4-bit quantization...")
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16, # Use float16 for compute during inference
)

base_model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
# Llama 3 tokenizer doesn't have a default pad_token, set it to eos_token
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left" # For inference, left padding is generally preferred

print("Base model and tokenizer loaded.")

# --- Load Fine-tuned LoRA Adapters ---
print(f"Loading LoRA adapters from: {FINE_TUNED_ADAPTERS_PATH}...")
# Attach the LoRA adapters to the base model
model = PeftModel.from_pretrained(base_model, FINE_TUNED_ADAPTERS_PATH)
print("LoRA adapters loaded.")

# Optional: Merge adapters into the base model for faster inference (requires more VRAM)
print("Merging LoRA adapters into base model (optional, for faster inference)...")
model = model.merge_and_unload()
print("Adapters merged.")

# Set model to evaluation mode
model.eval()

# --- Function to generate response ---
def generate_answer(passage: str, question: str, model, tokenizer, max_new_tokens=70): # Reduced max_new_tokens
    """
    Generates an answer to a question based on a provided passage using the fine-tuned model.
    """
    messages = [
        {"role": "system", "content": "You are a helpful assistant that answers questions based on the provided passage. Be concise and directly answer the question."},
        {"role": "user", "content": f"Passage: {passage}\nQuestion: {question}"},
    ]
    # Apply chat template and tokenize
    input_ids = tokenizer.apply_chat_template(
        messages,
        tokenize=True,
        add_generation_prompt=True, # Important: tells the model to generate the assistant's turn
        return_tensors="pt"
    ).to(model.device)

    # Generate response
    with torch.no_grad(): # No need to calculate gradients during inference
        outputs = model.generate(
            input_ids=input_ids,
            max_new_tokens=max_new_tokens,
            do_sample=True,
            top_p=0.9,
            temperature=0.7,
            pad_token_id=tokenizer.pad_token_id,
        )

    # Decode the generated text
    decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=False)

    # Find the start of the assistant's response
    assistant_start_tag = "<|start_header_id|>assistant<|end_header_id|>\n"
    start_index = decoded_output.find(assistant_start_tag)

    if start_index != -1:
        generated_answer = decoded_output[start_index + len(assistant_start_tag):].strip()
        # Remove any trailing <|eot_id|> or other special tokens
        generated_answer = generated_answer.replace("<|eot_id|>", "").strip()
    else:
        generated_answer = "Could not parse assistant's response." # Fallback

    return generated_answer

# --- Load Unit Test Data and Run Inference (Batch 2) ---
print(f"\n--- Running Inference on '{UNIT_TEST_FILE_NAME}' (Batch 2) and saving results ---")

unit_test_data = []
try:
    with open(UNIT_TEST_FILE_PATH, 'r', encoding='utf-8') as f:
        for line in f:
            unit_test_data.append(json.loads(line))
    print(f"Loaded {len(unit_test_data)} examples from '{UNIT_TEST_FILE_NAME}'.")
except Exception as e:
    print(f"Error loading unit test data: {e}")
    # If the file load fails, we should exit here, as there's no data to process.
    exit()

# Define the starting index for this batch.
# Since the file 'unit_test_passage_questions_clean2.jsonl' itself contains the examples
# that are considered "Batch 2" (e.g., starting from ID 17), we should start processing
# from the beginning of this file (index 0).
START_INDEX_FOR_BATCH = 0 # Changed to 0 to process all examples in this specific file

if START_INDEX_FOR_BATCH >= len(unit_test_data):
    print(f"Warning: START_INDEX_FOR_BATCH ({START_INDEX_FOR_BATCH}) is out of bounds for dataset size ({len(unit_test_data)}). No examples to process in this batch.")
    # If there's no data to process in this batch, exit gracefully.
    exit()

# Slice the data to process only the desired batch
data_for_this_batch = unit_test_data[START_INDEX_FOR_BATCH:]
print(f"Processing {len(data_for_this_batch)} examples starting from index {START_INDEX_FOR_BATCH}.")

# Open the output file for writing generated answers
with open(GENERATED_ANSWERS_FILE, 'w', encoding='utf-8') as f_out:
    # Iterate through the sliced data
    for i, example in enumerate(data_for_this_batch):
        # Calculate the original example number for printing
        # This will now correctly reflect the ID from the file (e.g., 17, 18, ...)
        # assuming your file IDs are sequential from START_INDEX_FOR_BATCH.
        # If your IDs in the file are already sequential from 17, you can just use example.get('id').
        # For general case, we'll use a simple counter for printing.
        original_example_num = i + 1 # Simple counter for this batch

        doc_id = example.get('id', f"unknown_id_{original_example_num}")
        passage = example.get('passage', 'No passage provided.')
        question = example.get('question', 'No question provided.')

        print(f"\n--- Unit Test Example {original_example_num} (Doc ID: {doc_id}) ---")
        print(f"Passage: {passage.strip()}")
        print(f"Question: {question}")

        # Clear CUDA cache before each inference call
        if torch.cuda.is_available():
            torch.cuda.empty_cache()
            # Optional: Print current memory usage for debugging
            # print(f"  CUDA Memory after empty_cache: {torch.cuda.memory_allocated(0) / (1024**3):.2f} GB allocated")

        generated_answer = generate_answer(passage, question, model, tokenizer)
        print(f"Generated Answer: {generated_answer}")

        # Save the generated answer to the new JSONL file
        output_entry = {
            "id": doc_id,
            "question": question,
            "passage": passage,
            "gen_answer": generated_answer
        }
        json.dump(output_entry, f_out)
        f_out.write('\n') # Add newline for JSONL format

print(f"\nAll unit test inference examples for Batch 2 completed. Generated answers saved to: {GENERATED_ANSWERS_FILE}")




Mounting Google Drive for inference...
Google Drive already mounted.

Logging into Hugging Face Hub...


VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

Hugging Face login successful!

Checking for existence of fine-tuned adapters directory: /content/drive/MyDrive/fpdata/geetha_vahini/llama3_8b_qa_finetuned_adapters_standard_hf
Fine-tuned adapters directory found. Proceeding.

Checking for existence of unit test file: /content/drive/MyDrive/fpdata/geetha_vahini/unit_test_passage_questions_clean2.jsonl
Unit test file found. Proceeding.

Loading base model: meta-llama/Meta-Llama-3-8B-Instruct with 4-bit quantization...


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Base model and tokenizer loaded.
Loading LoRA adapters from: /content/drive/MyDrive/fpdata/geetha_vahini/llama3_8b_qa_finetuned_adapters_standard_hf...
LoRA adapters loaded.
Merging LoRA adapters into base model (optional, for faster inference)...


The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Adapters merged.

--- Running Inference on 'unit_test_passage_questions_clean2.jsonl' (Batch 2) and saving results ---
Loaded 7 examples from 'unit_test_passage_questions_clean2.jsonl'.
Processing 7 examples starting from index 0.

--- Unit Test Example 1 (Doc ID: 17) ---
Passage: The Purusha is but the eternal Witness, the Ever-inactive, the Modification-less. Of what can you say, This is Truth? Only of that which persists in the past, the present, and the future, which has neither beginning nor end, which does not move or change, which has uniform form, unified experience giving property. The body, senses, mind, life-force - all these move and change, begin and end. They are inert (Jada), possess the three gunas - Thamas, Rajas, and Sathwa. They lack fundamental Reality and cause delusion. They have only relative value.
Question: What distinguishes Purusha, the eternal Witness, from the ever-changing elements like body and mind, and why are these considered only relatively real?
Gene