**Run 19 July - CONTROL GROUP INFERENCE
**
 - Control group model: MODEL_NAME = "meta-llama/Meta-Llama-3-8B-Instruct"

 - Run Inference in two modes:

 - 1. Generate answer given question, no passage
 - 2. Generate answer given question and a passage

 Questions, passages from input file: unit_test_passage_questions_clean.jsonl


# Part 1: (Aggressive) Installation.

IMPORTANT: follow these steps precisely:
1.	Run this code block.
2.	After it completes, go to Runtime -> Disconnect and delete runtime.
3.	Once the runtime restarts, RUN THIS CODE BLOCK AGAIN.
4.	After this block finishes its second run, you can safely proceed to the next sections.

In [1]:
# --- Installation Block ---
print("Starting library installations and upgrades...")

# Aggressively uninstall to ensure a clean slate, especially for torch and torchvision
!pip uninstall -y torch torchvision torchaudio transformers accelerate bitsandbytes trl peft datasets xformers

# Clear relevant caches
print("Clearing bitsandbytes cache...")
!rm -rf ~/.cache/bitsandbytes
print("Clearing Hugging Face cache...")
!rm -rf ~/.cache/huggingface/hub/*

# Install PyTorch and Torchvision specifically for CUDA 12.1 (common in Colab)
print("Installing PyTorch and Torchvision for CUDA 12.1...")
!pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

# Install other core Hugging Face libraries
print("Installing transformers, accelerate, bitsandbytes, trl, peft, datasets...")
# Pin trl to a known compatible version to avoid import errors
!pip install transformers accelerate bitsandbytes "trl==0.8.6" peft datasets
# xformers is optional, uncomment if you want to try it, but it's not strictly necessary
# !pip install xformers

print("\nLibrary installation complete.")
print("IMPORTANT: Please follow the instructions above about restarting the runtime.")


Starting library installations and upgrades...
Found existing installation: torch 2.5.1+cu121
Uninstalling torch-2.5.1+cu121:
  Successfully uninstalled torch-2.5.1+cu121
Found existing installation: torchvision 0.20.1+cu121
Uninstalling torchvision-0.20.1+cu121:
  Successfully uninstalled torchvision-0.20.1+cu121
Found existing installation: torchaudio 2.5.1+cu121
Uninstalling torchaudio-2.5.1+cu121:
  Successfully uninstalled torchaudio-2.5.1+cu121
Found existing installation: transformers 4.53.2
Uninstalling transformers-4.53.2:
  Successfully uninstalled transformers-4.53.2
Found existing installation: accelerate 1.9.0
Uninstalling accelerate-1.9.0:
  Successfully uninstalled accelerate-1.9.0
Found existing installation: bitsandbytes 0.46.1
Uninstalling bitsandbytes-0.46.1:
  Successfully uninstalled bitsandbytes-0.46.1
Found existing installation: trl 0.8.6
Uninstalling trl-0.8.6:
  Successfully uninstalled trl-0.8.6
Found existing installation: peft 0.16.0
Uninstalling peft-0.16.

# Part 2: Google drive mount, and unit test passage-question clean jsonl file upload

In [2]:
import os
from google.colab import drive, files

# --- Configuration ---
# Define the base path where your data will be stored in Google Drive
DATA_PATH = "/content/drive/MyDrive/fpdata/geetha_vahini"
UNIT_TEST_FILE_NAME = "unit_test_passage_questions_clean.jsonl"
UNIT_TEST_FILE_PATH = os.path.join(DATA_PATH, UNIT_TEST_FILE_NAME)

# --- Mount Google Drive ---
# Check if Google Drive is already mounted
if not os.path.exists('/content/drive/MyDrive'):
    print("Google Drive not detected as mounted. Attempting to mount...")
    try:
        drive.mount('/content/drive')
        print("Google Drive mounted successfully!")
    except Exception as e:
        print(f"Error mounting Google Drive: {e}")
        print("Please ensure you are running this in a Google Colab environment and authorize Drive access.")
        exit() # Exit if Drive cannot be mounted
else:
    print("Google Drive already mounted. Skipping mounting step.")

# Ensure the target directory exists in Google Drive
os.makedirs(DATA_PATH, exist_ok=True)
print(f"Ensured directory exists: {DATA_PATH}")

# --- Upload Unit Test File ---
print(f"\nNow, please upload your '{UNIT_TEST_FILE_NAME}' file.")
print(f"It will be saved to: {UNIT_TEST_FILE_PATH}")

uploaded = files.upload()

if UNIT_TEST_FILE_NAME in uploaded:
    # Save the uploaded file to the specified Google Drive path
    with open(UNIT_TEST_FILE_PATH, 'wb') as f:
        f.write(uploaded[UNIT_TEST_FILE_NAME])
    print(f"'{UNIT_TEST_FILE_NAME}' uploaded and saved to {UNIT_TEST_FILE_PATH}")

    # Verify file existence after upload
    if os.path.exists(UNIT_TEST_FILE_PATH):
        print(f"Verification: '{UNIT_TEST_FILE_NAME}' found at {UNIT_TEST_FILE_PATH}")
    else:
        print(f"Verification WARNING: '{UNIT_TEST_FILE_NAME}' NOT found at {UNIT_TEST_FILE_PATH} after upload.")
        print("There might be a sync issue. Please check your Google Drive manually.")
else:
    print(f"Error: '{UNIT_TEST_FILE_NAME}' was not found in the uploaded files.")
    print("Please ensure you selected the correct file during upload.")
    exit() # Exit if the file wasn't uploaded correctly

print("\nGoogle Drive setup and file upload complete. Proceed to the next section for inference.")


Google Drive not detected as mounted. Attempting to mount...
Mounted at /content/drive
Google Drive mounted successfully!
Ensured directory exists: /content/drive/MyDrive/fpdata/geetha_vahini

Now, please upload your 'unit_test_passage_questions_clean.jsonl' file.
It will be saved to: /content/drive/MyDrive/fpdata/geetha_vahini/unit_test_passage_questions_clean.jsonl


Saving unit_test_passage_questions_clean.jsonl to unit_test_passage_questions_clean.jsonl
'unit_test_passage_questions_clean.jsonl' uploaded and saved to /content/drive/MyDrive/fpdata/geetha_vahini/unit_test_passage_questions_clean.jsonl
Verification: 'unit_test_passage_questions_clean.jsonl' found at /content/drive/MyDrive/fpdata/geetha_vahini/unit_test_passage_questions_clean.jsonl

Google Drive setup and file upload complete. Proceed to the next section for inference.


# Part 4 - Inference on loaded passage, question pairs ("generate answer for the question given the context that is the passage"

 - Generated answer files saved to persistent storage.  unit_test_generated_answers_fine_tuned_llama8bQLora.jsonl

In [3]:
import os
import json
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from huggingface_hub import login # For Hugging Face authentication
from google.colab import drive # Keep drive import for mounting

# --- Configuration ---
DATA_PATH = "/content/drive/MyDrive/fpdata/geetha_vahini"
MODEL_NAME = "meta-llama/Meta-Llama-3-8B-Instruct"
MAX_SEQ_LENGTH = 2048 # Keep consistent with training

# Path to the unit test file (same as used for fine-tuned model inference)
UNIT_TEST_FILE_NAME = "unit_test_passage_questions_clean.jsonl" # UPDATED FILE NAME
UNIT_TEST_FILE_PATH = os.path.join(DATA_PATH, UNIT_TEST_FILE_NAME)

# New output files for generated answers from the BASE model
GENERATED_ANSWERS_BASE_MODEL_WITH_PASSAGE_FILE = os.path.join(DATA_PATH, "unit_test_generated_answers_base_llama8b_with_passage.jsonl")
GENERATED_ANSWERS_BASE_MODEL_WITHOUT_PASSAGE_FILE = os.path.join(DATA_PATH, "unit_test_generated_answers_base_llama8b_no_passage.jsonl")


# --- Mount Google Drive (if not already mounted in this session) ---
print("Mounting Google Drive for base model inference...")
if not os.path.exists('/content/drive/MyDrive'):
    try:
        drive.mount('/content/drive')
        print("Google Drive mounted successfully!")
    except Exception as e:
        print(f"Error mounting Google Drive: {e}")
        print("Please ensure you are running this in a Google Colab environment and authorize Drive access.")
        exit()
else:
    print("Google Drive already mounted.")

# --- Hugging Face Login (REQUIRED for Llama 3) ---
print("\nLogging into Hugging Face Hub...")
try:
    login()
    print("Hugging Face login successful!")
except Exception as e:
    print(f"Hugging Face login failed: {e}")
    print("Please ensure you have accepted the Llama 3 license and pasted a valid token.")
    exit() # Exit if login fails

# --- Check for existence of unit test file ---
print(f"\nChecking for existence of unit test file: {UNIT_TEST_FILE_PATH}")
if not os.path.exists(UNIT_TEST_FILE_PATH):
    print(f"Error: Unit test file not found at {UNIT_TEST_FILE_PATH}")
    print("Please ensure you ran Section 2 and successfully uploaded the file, or manually upload it.")
    exit()
else:
    print("Unit test file found. Proceeding.")


# --- Load Base Model with 4-bit Quantization (NO PEFT Adapters) ---
print(f"\nLoading base model: {MODEL_NAME} with 4-bit quantization (no adapters loaded)...")
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16, # Use float16 for compute during inference
)

model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
# Llama 3 tokenizer doesn't have a default pad_token, set it to eos_token
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "left" # For inference, left padding is generally preferred

print("Base model and tokenizer loaded.")

# Set model to evaluation mode
model.eval()

# --- Function to generate response (unified for flexibility) ---
def generate_answer(question: str, model, tokenizer, passage: str = None, max_new_tokens=70):
    """
    Generates an answer to a question using the model.
    Can optionally include a passage in the prompt.
    """
    messages = [
        {"role": "system", "content": "You are a helpful assistant that answers questions."},
    ]
    if passage:
        messages.append({"role": "user", "content": f"Passage: {passage}\nQuestion: {question}"})
    else:
        messages.append({"role": "user", "content": f"Question: {question}"})

    # Apply chat template and tokenize
    input_ids = tokenizer.apply_chat_template(
        messages,
        tokenize=True,
        add_generation_prompt=True, # Important: tells the model to generate the assistant's turn
        return_tensors="pt"
    ).to(model.device)

    # Generate response
    with torch.no_grad(): # No need to calculate gradients during inference
        outputs = model.generate(
            input_ids=input_ids,
            max_new_tokens=max_new_tokens,
            do_sample=True,
            top_p=0.9,
            temperature=0.7,
            pad_token_id=tokenizer.pad_token_id,
        )

    # Decode the generated text
    decoded_output = tokenizer.decode(outputs[0], skip_special_tokens=False)

    # Find the start of the assistant's response
    assistant_start_tag = "<|start_header_id|>assistant<|end_header_id|>\n"
    start_index = decoded_output.find(assistant_start_tag)

    if start_index != -1:
        generated_answer = decoded_output[start_index + len(assistant_start_tag):].strip()
        # Remove any trailing <|eot_id|> or other special tokens
        generated_answer = generated_answer.replace("<|eot_id|>", "").strip()
    else:
        generated_answer = "Could not parse assistant's response." # Fallback

    return generated_answer

# --- Load Unit Test Data ---
print(f"\n--- Loading Unit Test Data from '{UNIT_TEST_FILE_NAME}' ---")
unit_test_data = []
try:
    with open(UNIT_TEST_FILE_PATH, 'r', encoding='utf-8') as f:
        for line in f:
            unit_test_data.append(json.loads(line))
    print(f"Loaded {len(unit_test_data)} examples from '{UNIT_TEST_FILE_NAME}'.")
except Exception as e:
    print(f"Error loading unit test data: {e}")
    exit()


# --- Mode (a): Generate answers WITHOUT passage ---
print(f"\n--- Running Inference with BASE model WITHOUT passage and saving results to '{GENERATED_ANSWERS_BASE_MODEL_WITHOUT_PASSAGE_FILE}' ---")

with open(GENERATED_ANSWERS_BASE_MODEL_WITHOUT_PASSAGE_FILE, 'w', encoding='utf-8') as f_out:
    for i, example in enumerate(unit_test_data):
        doc_id = example.get('id', f"unknown_id_{i+1}")
        question = example.get('question', 'No question provided.')

        print(f"\n--- Base Model (No Passage) Example {i+1} (Doc ID: {doc_id}) ---")
        print(f"Question: {question}")

        if torch.cuda.is_available():
            torch.cuda.empty_cache()

        # Call generate_answer WITHOUT providing the passage
        generated_answer = generate_answer(question=question, model=model, tokenizer=tokenizer)
        print(f"Generated Answer: {generated_answer}")

        output_entry = {
            "id": doc_id,
            "question": question,
            "passage": example.get('passage', ''), # Still include original passage for context in output file
            "gen_answer": generated_answer
        }
        json.dump(output_entry, f_out)
        f_out.write('\n')

print(f"\nAll base model (no passage) inference examples completed. Generated answers saved to: {GENERATED_ANSWERS_BASE_MODEL_WITHOUT_PASSAGE_FILE}")


# --- Mode (b): Generate answers WITH passage ---
print(f"\n--- Running Inference with BASE model WITH passage and saving results to '{GENERATED_ANSWERS_BASE_MODEL_WITH_PASSAGE_FILE}' ---")

with open(GENERATED_ANSWERS_BASE_MODEL_WITH_PASSAGE_FILE, 'w', encoding='utf-8') as f_out:
    for i, example in enumerate(unit_test_data):
        doc_id = example.get('id', f"unknown_id_{i+1}")
        passage = example.get('passage', 'No passage provided.')
        question = example.get('question', 'No question provided.')

        print(f"\n--- Base Model (With Passage) Example {i+1} (Doc ID: {doc_id}) ---")
        print(f"Passage: {passage.strip()}")
        print(f"Question: {question}")

        if torch.cuda.is_available():
            torch.cuda.empty_cache()

        # Call generate_answer WITH the passage
        generated_answer = generate_answer(question=question, model=model, tokenizer=tokenizer, passage=passage)
        print(f"Generated Answer: {generated_answer}")

        output_entry = {
            "id": doc_id,
            "question": question,
            "passage": passage,
            "gen_answer": generated_answer
        }
        json.dump(output_entry, f_out)
        f_out.write('\n')

print(f"\nAll base model (with passage) inference examples completed. Generated answers saved to: {GENERATED_ANSWERS_BASE_MODEL_WITH_PASSAGE_FILE}")


Mounting Google Drive for base model inference...
Google Drive already mounted.

Logging into Hugging Face Hub...


VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

Hugging Face login successful!

Checking for existence of unit test file: /content/drive/MyDrive/fpdata/geetha_vahini/unit_test_passage_questions_clean.jsonl
Unit test file found. Proceeding.

Loading base model: meta-llama/Meta-Llama-3-8B-Instruct with 4-bit quantization (no adapters loaded)...


config.json:   0%|          | 0.00/654 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Fetching 4 files:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/187 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/51.0k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/73.0 [00:00<?, ?B/s]

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Base model and tokenizer loaded.

--- Loading Unit Test Data from 'unit_test_passage_questions_clean.jsonl' ---
Error loading unit test data: Expecting value: line 2 column 1 (char 1)

--- Running Inference with BASE model WITHOUT passage and saving results to '/content/drive/MyDrive/fpdata/geetha_vahini/unit_test_generated_answers_base_llama8b_no_passage.jsonl' ---

--- Base Model (No Passage) Example 1 (Doc ID: 1) ---
Question: What do the terms Niraakaara, Para, and Parabrahmamreveal about the nature of the Eternal, and how does this contrast with physical identification?
Generated Answer: A profound question!

In Hinduism, the concepts of Niraakaara, Para, and Parabrahmam are central to understanding the nature of the Eternal (Brahman) and its relationship with the physical world. Here's a breakdown:

1. Niraakaara (Non-manifested): This term refers to the unmanifest,

--- Base Model (No Passage) Example 2 (Doc ID: 2) ---
Question: How does understanding the etymological roots of B