<a href="https://colab.research.google.com/github/AnasEhtisham/FYP/blob/main/LLM.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [9]:
# Step 1: Install Necessary Libraries (Focus on upgrading datasets first)

# Try to get the latest possible version of datasets.
# This might pull in a version with more up-to-date fsspec compatibility.
!pip install -q datasets --upgrade

# Now install the other packages. Pip will attempt to reconcile their dependencies
# with what 'datasets --upgrade' has established (including its fsspec version).
# The --upgrade flag encourages pip to fetch later versions if available and compatible.
!pip install -q transformers accelerate peft bitsandbytes huggingface_hub --upgrade

# Optional: After installation, you can list the installed versions to check them
# Remove the '#' from the line below to run it
# !pip list | grep -E "datasets|fsspec|gcsfs|transformers|huggingface-hub"

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/491.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m491.5/491.5 kB[0m [31m35.1 MB/s[0m eta [36m0:00:00[0m
[?25h[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/193.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m193.6/193.6 kB[0m [31m19.0 MB/s[0m eta [36m0:00:00[0m
[?25h[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
torch 2.6.0+cu124 requires nvidia-cublas-cu12==12.4.5.8; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cublas-cu12 12.5.3.2 which is incompatible.
torch 2.6.0+cu124 requires nvidia-cuda-cupti-cu12==12.4.127; platform_system == "Linux" and platform_machine == "x86_64", but you have nvidia-cuda-cupti-cu12 

In [10]:
# Step 2: Import Libraries and Log in to Hugging Face
import torch
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    Trainer,
    DataCollatorForLanguageModeling
)
from datasets import Dataset
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from huggingface_hub import notebook_login

# Log in to Hugging Face Hub to download Llama 2
# You'll need a Hugging Face account and an access token with read permissions.
# Make sure you have accepted the Llama 2 license on Hugging Face.
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [11]:
# Step 3: Prepare Your Data from CSV
print("--- Step 3: Preparing Data from CSV ---")
from datasets import load_dataset, Dataset
import pandas as pd # Import pandas to inspect CSV if needed

# IMPORTANT: Make sure you have uploaded "Dataset - Sheet1.csv" to your Colab session!

csv_file_path = "Dataset.csv"

# --- ACTION REQUIRED: Specify your CSV column names here ---
# Replace these with the actual column names from your CSV file.
# For example, if your job descriptions are in a column named "Job Post"
# and proposals in a column "Proposal Text", update accordingly.
job_description_col = "job_description" # <<< CHANGE THIS if your column name is different
proposal_col = "proposal"               # <<< CHANGE THIS if your column name is different

# ... (rest of the code is the same)

# Load the dataset using pandas and convert to Hugging Face Dataset
try:
    df = pd.read_csv(csv_file_path)  # Load with pandas
    raw_dataset = Dataset.from_pandas(df)  # Convert to Hugging Face Dataset
    print(f"\nSuccessfully loaded dataset. Number of samples: {len(raw_dataset)}")
    print(f"First example: {raw_dataset[0]}")

    # ... (rest of your code remains the same)

    # Ensure the specified columns exist
    if job_description_col not in raw_dataset.column_names or proposal_col not in raw_dataset.column_names:
        print(f"\nERROR: One or both specified column names ('{job_description_col}', '{proposal_col}') not found in the dataset.")
        print(f"Available columns: {raw_dataset.column_names}")
        print("Please update 'job_description_col' and 'proposal_col' variables in this cell.")
        # Stop execution or handle error appropriately
        raise ValueError("Column names not found in CSV.")

    # Rename columns to what the rest of the script expects, if they are different
    # The rest of the script expects 'job_description' and 'proposal'
    # If your columns are already named this, these renames might not be strictly necessary
    # but it's good practice for consistency.
    if job_description_col != "job_description":
        raw_dataset = raw_dataset.rename_column(job_description_col, "job_description")
    if proposal_col != "proposal":
        raw_dataset = raw_dataset.rename_column(proposal_col, "proposal")

    # The 'dataset' variable will be used in subsequent steps
    dataset = raw_dataset
    print("Dataset processed and ready for tokenization.")
    print(f"Using columns: 'job_description' and 'proposal'")

except Exception as e:
    print(f"Error loading or processing CSV: {e}")
    print("Please check the file path, CSV format, and column names.")
    # Handle error, perhaps by stopping or using a default small dataset for testing
    # For now, we'll raise it to stop if data loading fails
    raise

print("Data preparation from CSV complete.\n")

--- Step 3: Preparing Data from CSV ---

Successfully loaded dataset. Number of samples: 84
First example: {'job_description': 'We are seeking a skilled developer to build a simple AI agent or automation tool that scans real estate listings in Orange County, NC and flags properties that may qualify for subdivision under local zoning rules. The agent should: Pull listing data from sources like Zillow, Redfin, or MLS Cross-reference each listing with Orange County GIS and zoning data Evaluate subdivision potential (e.g. lot size, frontage, zoning district) Calculate value per acre for investment analysis Deliver results via email, spreadsheet, or web dashboard — updated on a recurring schedule (e.g. daily or weekly) Ideal candidate will have experience with: AI agents or automation (e.g. Python scripts, Zapier, LangChain) Web scraping or API integration for real estate data Zoning logic, GIS parcel matching Google Sheets API, email automation, or simple dashboard tools This is an excitin

In [12]:
# Step 4: Define Model ID and Quantization Configuration

model_id = "meta-llama/Llama-2-7b-hf" # Using Llama 2 7B

# BitsAndBytesConfig for 4-bit quantization
# This significantly reduces memory usage, crucial for Colab free tier
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",          # Recommended quantization type
    bnb_4bit_compute_dtype=torch.bfloat16, # Use bfloat16 for computation
    bnb_4bit_use_double_quant=True,     # Optional, can improve quality slightly
)

In [13]:
# Step 5: Load Tokenizer and Model

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
# Llama 2 typically doesn't have a pad token, so we set it to eos_token
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right" # Ensure padding is on the right

# Load model with quantization
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=quantization_config,
    device_map="auto", # Automatically maps model layers to available devices (GPU/CPU)
    trust_remote_code=True
)

# Prepare model for k-bit training (important for LoRA + quantization)
model = prepare_model_for_kbit_training(model)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/776 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/609 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

In [14]:
# Step 6: Preprocess Data - Format and Tokenize
print("--- Step 6: Preprocessing Data ---")

# Ensure that 'dataset' (Hugging Face Dataset object from Step 3)
# and 'tokenizer' (loaded in Step 5) are defined before running this cell.

def format_and_tokenize(example):
    # Llama 2 instruction fine-tuning format
    # Adjust max_length based on your typical proposal length and Colab limits
    # Max sequence length for Llama 2 is 4096, but shorter is better for Colab free tier
    max_length = 1024 # Start with a reasonable length, e.g., 512, 1024, 2048

    prompt = f"""<s>[INST] Based on the following job description, write a compelling freelance proposal:

Job Description:
{example['job_description']} [/INST]
{example['proposal']} </s>""" # <s> and </s> are BOS/EOS tokens, [INST] marks instructions

    # Tokenize
    tokenized_inputs = tokenizer(
        prompt,
        max_length=max_length,
        padding="max_length", # Pad to max_length
        truncation=True,      # Truncate if longer than max_length
        return_tensors="pt"   # Return PyTorch tensors
    )
    # For CausalLM, labels are usually the same as input_ids. The model learns to predict the next token.
    tokenized_inputs["labels"] = tokenized_inputs["input_ids"].clone()
    return tokenized_inputs

# Apply the formatting and tokenization to the dataset
# batched=False processes one example at a time.
tokenized_dataset = dataset.map(format_and_tokenize, batched=False)
print("Dataset formatted and tokenized.")

# Display a sample of tokenized data for verification
print("\nSample of tokenized data (raw input_ids tensor from the first example):")
print(tokenized_dataset[0]['input_ids'])

print("\nDecoded sample (first item from the tokenized_dataset, special tokens skipped):")
# Correctly extract the 1D list/tensor of token IDs for decoding
input_ids_for_decode = tokenized_dataset[0]['input_ids'][0] # Get the first (and only) sequence from the batch of 1
if hasattr(input_ids_for_decode, 'tolist'): # Check if it's a tensor
    input_ids_list_for_decode = input_ids_for_decode.tolist()
else: # If not, assume it's already a list
    input_ids_list_for_decode = input_ids_for_decode
print(tokenizer.decode(input_ids_list_for_decode, skip_special_tokens=True))

# To see the version with special tokens (like <s>, </s>, [INST]):
# print("\nDecoded sample with special tokens:")
# print(tokenizer.decode(input_ids_list_for_decode, skip_special_tokens=False))

print("Data preprocessing complete.\n")


--- Step 6: Preprocessing Data ---


Map:   0%|          | 0/84 [00:00<?, ? examples/s]

Dataset formatted and tokenized.

Sample of tokenized data (raw input_ids tensor from the first example):
[[1, 1, 518, 25580, 29962, 16564, 373, 278, 1494, 4982, 6139, 29892, 2436, 263, 752, 7807, 3005, 295, 749, 24963, 29901, 13, 13, 11947, 12953, 29901, 13, 4806, 526, 25738, 263, 2071, 24455, 13897, 304, 2048, 263, 2560, 319, 29902, 10823, 470, 3345, 362, 5780, 393, 885, 550, 1855, 19989, 1051, 886, 297, 26048, 5127, 29892, 25166, 322, 13449, 4426, 393, 1122, 4021, 1598, 363, 1014, 4563, 2459, 1090, 1887, 503, 28259, 6865, 29889, 450, 10823, 881, 29901, 349, 913, 18028, 848, 515, 8974, 763, 796, 453, 340, 29892, 4367, 4951, 29892, 470, 341, 8547, 11189, 29899, 5679, 1269, 18028, 411, 26048, 5127, 402, 3235, 322, 503, 28259, 848, 382, 4387, 403, 1014, 4563, 2459, 7037, 313, 29872, 29889, 29887, 29889, 3287, 2159, 29892, 4565, 482, 29892, 503, 28259, 6474, 29897, 20535, 403, 995, 639, 263, 1037, 363, 13258, 358, 7418, 5556, 2147, 2582, 3025, 4876, 29892, 9677, 9855, 29892, 470, 1856, 1

In [15]:
# Step 7: Configure LoRA (Low-Rank Adaptation)

# LoRA significantly reduces the number of trainable parameters.
lora_config = LoraConfig(
    r=16,  # Rank of the update matrices. Common values: 8, 16, 32, 64.
    lora_alpha=32, # Alpha scaling factor (r * 2 is a common starting point).
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"], # Modules to apply LoRA to in Llama 2.
                                                            # You can find these by printing model architecture.
    lora_dropout=0.05, # Dropout probability for LoRA layers.
    bias="none", # Set bias to 'none' for LoRA.
    task_type="CAUSAL_LM" # Causal Language Modeling.
)

model = get_peft_model(model, lora_config)
model.print_trainable_parameters() # This will show how few parameters are actually being trained.

trainable params: 16,777,216 || all params: 6,755,192,832 || trainable%: 0.2484


In [16]:
# Step 8: Define Training Arguments
print("--- Step 8: Defining Training Arguments ---")
from transformers import TrainingArguments # Ensure this is imported

# Define output directory for checkpoints and final model
output_dir = "./upfreelance_llama2_7b_proposals_lora_csv" # Changed output dir slightly

# With 85 samples, we can increase epochs.
# Effective batch size = per_device_train_batch_size * gradient_accumulation_steps
# Total optimization steps = (num_samples / effective_batch_size) * num_epochs
# For 85 samples, batch_size=1, grad_accum=4 => ~21 steps per epoch.
# 5 epochs = ~105 steps. 10 epochs = ~210 steps.

training_args = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=1,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    num_train_epochs=10,             # << INCREASED from 1 to 10. Adjust as needed (e.g., 5-20).
                                    # More epochs might lead to overfitting if dataset is small/not diverse.
    logging_dir=f"{output_dir}/logs",
    logging_strategy="steps",
    logging_steps=10,               # Log training progress every 10 steps.
    save_strategy="epoch",          # Save a checkpoint at the end of each epoch.
    fp16=False,                     # Set to False when using 4-bit, BitsAndBytes handles precision.
    bf16=True,                      # Set to True if using bnb_4bit_compute_dtype=torch.bfloat16
                                    # and GPU supports it (T4 has some support).
    report_to="tensorboard",        # Optional: for tracking with TensorBoard.
    # max_steps=20,                 # << Optional: For quick testing, uncomment and set to a small number (e.g., 20-50 steps)
                                    # Remove or comment out for full training.
    remove_unused_columns=True,     # Default, good practice.
    # gradient_checkpointing=True,  # Usually enabled by prepare_model_for_kbit_training.
                                    # If you face issues like "too many values to unpack" and have enough memory,
                                    # you could try setting this to False for debugging,
                                    # but it will significantly increase memory usage.
)

# Check GPU compatibility for bf16 and adjust if necessary
if not torch.cuda.is_bf16_supported():
    print("WARNING: BF16 is not fully supported on this GPU. "
          "Consider switching bnb_4bit_compute_dtype to torch.float16 "
          "and training_args.fp16=True, training_args.bf16=False if you encounter issues.")
    # For T4, bf16 usually works for compute_dtype but might not be optimal for full bf16 training args.
    # The current setup with bnb_4bit_compute_dtype=torch.bfloat16 and training_args.bf16=True
    # is a common recommendation for 4-bit training.
    # If issues persist, uncomment the lines below to switch to fp16:
    # print("Switching training_args to use fp16 instead of bf16.")
    # training_args.fp16 = True
    # training_args.bf16 = False
    # And ensure your quantization_config uses torch.float16 for bnb_4bit_compute_dtype in Step 4.

print("Training arguments defined.\n")

--- Step 8: Defining Training Arguments ---
Training arguments defined.



In [17]:
# Step 9: Initialize Trainer and Start Training
print("--- Step 9: Initializing Trainer and Starting Training ---")
from transformers import Trainer, DataCollatorForLanguageModeling # Ensure imports

# Ensure 'model' (LoRA adapted model from Step 7),
# 'training_args' (from Step 8),
# 'tokenized_dataset' (from Step 6),
# and 'tokenizer' (from Step 5) are defined.

# Data collator for language modeling. MLM (Masked Language Modeling) is False for Causal LM.
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_dataset,
    # eval_dataset=tokenized_eval_dataset, # Optional: Create a validation set for evaluation
    data_collator=data_collator,
)

# Start training
print("Starting training (with your CSV dataset)...")
# Note on a previous error: "too many values to unpack (expected 4)"
# This error can sometimes occur due to interactions between gradient checkpointing (enabled by default
# with prepare_model_for_kbit_training), PEFT, and the Trainer.
# Things to ensure:
# 1. Your 'labels' are correctly prepared in the tokenized_dataset (Step 6).
# 2. DataCollatorForLanguageModeling is used.
# 3. If the error persists and you have ruled out data issues, one debugging step
#    (if memory allows, which is unlikely on free Colab for a 7B model without it)
#    could be to try disabling gradient checkpointing by setting
#    `gradient_checkpointing=False` in TrainingArguments (Step 8) AND
#    potentially calling `model.gradient_checkpointing_disable()` before `get_peft_model` in Step 7.
#    However, this will significantly increase memory usage.
#    Often, such errors are also version-specific between libraries.

try:
    trainer.train()
    print("Training finished.")
except Exception as e:
    print(f"An error occurred during training: {e}")
    print("This might be due to resource limitations, configuration issues, or the data itself.")
    print("Ensure your Colab instance has GPU allocated and consider reducing max_length, "
          "batch size, or number of epochs if it's an OOM error.")
    # If it was the "too many values to unpack" error, further debugging as noted above might be needed.

# Save the LoRA adapters
# It's good practice to save after training, even if there were minor issues or warnings.
# If training failed catastrophically, this might not be reached or adapters might be partial.
try:
    lora_adapter_path = f"{training_args.output_dir}/final_lora_adapters"
    model.save_pretrained(lora_adapter_path)
    tokenizer.save_pretrained(lora_adapter_path) # Save tokenizer along with adapters
    print(f"LoRA adapters saved to {lora_adapter_path}")
except Exception as e:
    print(f"Error saving LoRA adapters: {e}")

print("Training step complete.\n")

--- Step 9: Initializing Trainer and Starting Training ---


No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.


Starting training (with your CSV dataset)...


`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`.
  return fn(*args, **kwargs)


An error occurred during training: too many values to unpack (expected 4)
This might be due to resource limitations, configuration issues, or the data itself.
Ensure your Colab instance has GPU allocated and consider reducing max_length, batch size, or number of epochs if it's an OOM error.
LoRA adapters saved to ./upfreelance_llama2_7b_proposals_lora_csv/final_lora_adapters
Training step complete.



In [19]:
# Step 10: Inference with the Fine-Tuned LoRA Adapters
print("--- Step 10: Inference with Fine-Tuned Adapters ---")
from peft import PeftModel
import gc # For garbage collection

# It's good practice to clear GPU memory before loading a new model configuration for inference,
# especially if training was done in the same session.
# Delete model and trainer objects from training to free memory
if 'model' in locals():
    del model
if 'trainer' in locals():
    del trainer
if torch.cuda.is_available():
    torch.cuda.empty_cache()
gc.collect()
print("Cleaned up training objects and cleared CUDA cache.")

# --- Configuration for Inference ---
base_model_name = "meta-llama/Llama-2-7b-hf" # Should match model_id from training
# Path where your LoRA adapters were saved by the Trainer (Step 9)
# Ensure 'training_args.output_dir' matches what was used in Step 8 & 9.
# Default from Step 8 was "./upfreelance_llama2_7b_proposals_lora_csv"
adapter_path = f"./upfreelance_llama2_7b_proposals_lora_csv/final_lora_adapters" # Ensure this path is correct!

print(f"Loading base model: {base_model_name}")
# Load base model in 4-bit for inference (consistent with training)
quant_config_inf = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)
base_model_for_inference = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    quantization_config=quant_config_inf,
    device_map="auto", # Automatically map to GPU
    trust_remote_code=True
)
print("Base model loaded for inference.")

# Load the LoRA adapters onto the base model
print(f"Loading LoRA adapters from: {adapter_path}")
try:
    inference_model = PeftModel.from_pretrained(base_model_for_inference, adapter_path)
    inference_model = inference_model.merge_and_unload() # Optional: merge LoRA weights for faster inference, then model is no longer a PeftModel.
                                                        # If you skip merge_and_unload, it's still a PeftModel.
    inference_model.eval() # Set model to evaluation mode (disables dropout, etc.)
    print("LoRA adapters loaded and model set to evaluation mode.")
except Exception as e:
    print(f"ERROR loading LoRA adapters: {e}")
    print("Please ensure the adapter_path is correct and adapters were saved successfully in Step 9.")
    # Handle error, maybe stop here
    raise

# Load the tokenizer (should be the same as used in training)
# It's good practice to load it from where the adapters were saved, as it might include specific tokens.
tokenizer_inf = AutoTokenizer.from_pretrained(adapter_path, trust_remote_code=True)
if tokenizer_inf.pad_token is None: # Ensure pad token is set for generation
    tokenizer_inf.pad_token = tokenizer_inf.eos_token
    tokenizer_inf.padding_side = "right"
print("Tokenizer loaded for inference.")

# --- Interactive Proposal Generation ---
print("\n--- Interactive Proposal Generation ---")
# Get job description input from the user
new_job_post = input("Please paste the job description for which you want a proposal:\n")

if not new_job_post.strip():
    print("No job description provided. Using a default example.")
    new_job_post = """We are looking for a talented freelance graphic designer to create a new logo and brand identity package for our tech startup. Key deliverables include logo variations, color palette, typography guidelines, and social media assets. Please showcase your portfolio with modern and minimalist designs. Experience with SaaS company branding is a plus."""

# Format the input prompt for the model (MUST match the training format, excluding the answer part)
# The training format was: <s>[INST] Instruction Text {job_description} [/INST] {proposal} </s>
# For inference, we provide: <s>[INST] Instruction Text {job_description} [/INST]
prompt_template_inf = "<s>[INST] Based on the following job description, write a compelling freelance proposal:\n\nJob Description:\n{job_description} [/INST]\n"
inference_prompt = prompt_template_inf.format(job_description=new_job_post)

print(f"\nGenerating proposal for:\n{new_job_post[:300]}...") # Print first 300 chars of job post
print(f"\nUsing prompt:\n{inference_prompt}")

# Tokenize the input
inputs = tokenizer_inf(
    inference_prompt,
    return_tensors="pt",
    truncation=True,
    max_length=1024 # Ensure this is less than or equal to model's max sequence length minus generation length
).to(inference_model.device) # Move inputs to the same device as the model

# Generate text
# Adjust generation parameters as needed
# Note: The quality of the output heavily depends on the fine-tuning data and process.
try:
    with torch.no_grad(): # Ensure no gradients are calculated during inference
        outputs = inference_model.generate(
            input_ids=inputs["input_ids"],
            attention_mask=inputs["attention_mask"], # Important to pass attention_mask
            max_new_tokens=500,  # Max length of the generated proposal
            temperature=0.7,     # Controls randomness. Lower (e.g., 0.2) is more deterministic. Higher (e.g., 0.9) is more random.
            top_p=0.9,           # Nucleus sampling: considers tokens with cumulative probability >= top_p.
            top_k=50,            # Top-k sampling: considers the k most likely tokens.
            do_sample=True,      # Enable sampling. If False, uses greedy decoding (can be repetitive).
            eos_token_id=tokenizer_inf.eos_token_id,
            pad_token_id=tokenizer_inf.pad_token_id if tokenizer_inf.pad_token_id is not None else tokenizer_inf.eos_token_id
        )

    # Decode only the newly generated tokens, not the input prompt
    # The 'outputs' tensor contains the input_ids followed by the generated_ids.
    # So, we slice the output tensor to get only the generated part.
    input_length = inputs["input_ids"].shape[1]
    generated_ids = outputs[0][input_length:]
    generated_proposal = tokenizer_inf.decode(generated_ids, skip_special_tokens=True)

    print("\n--- Generated Proposal ---")
    print(generated_proposal)

except Exception as e:
    print(f"An error occurred during proposal generation: {e}")

print("\nInference step complete.\n")

--- Step 10: Inference with Fine-Tuned Adapters ---
Cleaned up training objects and cleared CUDA cache.
Loading base model: meta-llama/Llama-2-7b-hf


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

Base model loaded for inference.
Loading LoRA adapters from: ./upfreelance_llama2_7b_proposals_lora_csv/final_lora_adapters
LoRA adapters loaded and model set to evaluation mode.
Tokenizer loaded for inference.

--- Interactive Proposal Generation ---
Please paste the job description for which you want a proposal:
We are seeking a skilled professional to implement facial recognition technology for processing a large batch of photos. The ideal candidate will have a strong background in image processing and experience with facial recognition algorithms and software. Your task will include identifying and tagging faces in a collection of images efficiently and accurately. If you are passionate about leveraging technology to improve image analysis, we would love to hear from you

Generating proposal for:
We are seeking a skilled professional to implement facial recognition technology for processing a large batch of photos. The ideal candidate will have a strong background in image processi