<a href="https://colab.research.google.com/github/DylanBingham/fine_tuning_qa_ml_expert/blob/feat%2Fbase_model_evaluation/LlamaBaseModel_ForComparison.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!pip install -U bitsandbytes
!pip install trl
!pip install -U transformers
!pip install accelerate
!pip install numpy
!pip install torch



In [2]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [3]:

from transformers import (
    AutoTokenizer,
    BitsAndBytesConfig,
    AutoModelForCausalLM
)
import torch

# Since the 8B model is so big it is recommended to quantize the model to a lower precision
# Set load_in_4bit to false if no CUDA enabled GPU
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True, bnb_4bit_quant_type='nf4', bnb_4bit_compute_dtype=torch.bfloat16
)

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3.1-8B-Instruct", use_fast=True)

# Load the llama3 model for causal language modeling
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Meta-Llama-3.1-8B-Instruct",
    quantization_config=quantization_config,
    device_map="auto"
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


model.safetensors.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/4 [00:00<?, ?it/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.92G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.17G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/184 [00:00<?, ?B/s]

In [4]:
import pandas as pd

# Assuming the file is a Google Sheet accessible via a link or stored locally
# Replace with your actual file path or Google Sheets URL
file_path_or_url = "/content/drive/MyDrive/Colab Notebooks/data/LLM Dataset - v2.xlsx"

try:
    # Try to read the sheet using pandas
    df = pd.read_excel(file_path_or_url, sheet_name="Final Dataset -  6170 data pts")
    print("Dataset loaded successfully.")
except FileNotFoundError:
    print(f"Error: File not found at {file_path_or_url}. Please check the file path.")
except ValueError:
    print(f"Error: Sheet 'Final Dataset - 6170 data pts' not found in the file. Please check the sheet name.")
except Exception as e:
  print(f"An unexpected error occurred: {e}")


# Now you can work with the DataFrame 'df'
# Example: print the first 5 rows
#print(df.head())

Dataset loaded successfully.


In [5]:
from sklearn.model_selection import train_test_split

# Assuming 'df' is your DataFrame
train_df, test_df = train_test_split(df, test_size=0.1, random_state=24)

print(f"Train set size: {len(train_df)}")
print(f"Test set size: {len(test_df)}")

Train set size: 5552
Test set size: 617


## Begin Testing


In [8]:
from transformers import pipeline
def create_test_prompt(row):
    prompt = dedent(
        f"""
        {row['Question']}
        """
    )
    messages = [
        # the system prompt is very important to adjust the control the behavior of the model, make sure to use properly accoring to your task
        {"role": "system", "content": "You're a domain expert in machine learning and artificial intelligence, answer any questions about these\
         topics as accurately and concisely as possible. You MUST ALSO, no matter what, keep your responses to 300 tokens or less."},
        {"role": "user", "content": prompt},
    ]
    return tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

pipe = pipeline(
    task='text-generation',
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=300,
    return_full_text=False
)

In [None]:
import pandas as pd
from textwrap import dedent

def generate_llama_responses(test_df, pipe, batch_size=8):
    results = []
    prompts = test_df['Question'].tolist()
    for i in range(0, len(prompts), batch_size):
        batch_prompts = prompts[i:i+batch_size]
        batch_inputs = [create_test_prompt({'Question': q}) for q in batch_prompts]
        batch_results = pipe(batch_inputs)
        print(f"On prompt number {i} out of {len(prompts)}")
        for question, output in zip(batch_prompts, batch_results):
            generated_text = output[0]['generated_text'] if isinstance(output, list) else output['generated_text']
            results.append({"question": question, "response": generated_text})
    return results


results = generate_llama_responses(test_df, pipe)
results_df = pd.DataFrame(results)
results_df.to_csv("llama_responses.csv", index=False)

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


On prompt number 0 out of 617


Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


On prompt number 8 out of 617


Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


On prompt number 16 out of 617
