<a href="https://colab.research.google.com/github/weber50432/COMP0258-poker-LLM/blob/master/colab/model_evaluation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import os
from google.colab import drive
drive.mount('/content/drive')
# Define the path to the models directory in Google Drive
models_dir = '/content/drive/MyDrive/models'
# Check if the directory exists
if not os.path.exists(models_dir):
    # If it doesn't exist, create it
    os.makedirs(models_dir)
output_dir = '/content/drive/MyDrive/outputs'
# Check if the directory exists
if not os.path.exists(output_dir):
    # If it doesn't exist, create it
    os.makedirs(output_dir)

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
%%capture
# Normally using pip install unsloth is enough

# Temporarily as of Jan 31st 2025, Colab has some issues with Pytorch
# Using pip install unsloth will take 3 minutes, whilst the below takes <1 minute:
!pip install --no-deps bitsandbytes accelerate xformers==0.0.29 peft trl triton
!pip install --no-deps cut_cross_entropy unsloth_zoo
!pip install sentencepiece protobuf datasets huggingface_hub hf_transfer
!pip install --no-deps unsloth

In [3]:
from unsloth import FastLanguageModel
from tqdm import tqdm
from transformers import TextStreamer
from datasets import load_dataset
import random
import pandas as pd

🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!


In [4]:
pretrained_model_name = "lora_llama_3.2-3B_model-1000"

max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
fourbit_models = [
    "unsloth/Meta-Llama-3.1-8B-bnb-4bit",      # Llama-3.1 15 trillion tokens model 2x faster!
    "unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit",
    "unsloth/Meta-Llama-3.1-70B-bnb-4bit",
    "unsloth/Meta-Llama-3.1-405B-bnb-4bit",    # We also uploaded 4bit for 405b!
    "unsloth/Mistral-Nemo-Base-2407-bnb-4bit", # New Mistral 12b 2x faster!
    "unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit",
    "unsloth/mistral-7b-v0.3-bnb-4bit",        # Mistral v3 2x faster!
    "unsloth/mistral-7b-instruct-v0.3-bnb-4bit",
    "unsloth/Phi-3.5-mini-instruct",           # Phi-3.5 2x faster!
    "unsloth/Phi-3-medium-4k-instruct",
    "unsloth/gemma-2-9b-bnb-4bit",
    "unsloth/gemma-2-27b-bnb-4bit",            # Gemma 2x faster!
] # More models at https://huggingface.co/unsloth


In [5]:
if True:
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = f"{models_dir}/{pretrained_model_name}", # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = max_seq_length,
        dtype = dtype,
        load_in_4bit = load_in_4bit,
    )
    FastLanguageModel.for_inference(model) # Enable native 2x faster inference

# prompt = You MUST copy from above!
prompt = """
### Instruction:
{}

### Response:
{}"""


inputs = tokenizer(
[
    prompt.format(
        "You are a specialist in playing 6-handed No Limit Texas Holdem. The following will be a game scenario and you need to make the optimal decision.\n\nHere is a game summary:\n\nThe small blind is 0.5 chips and the big blind is 1 chips. Everyone started with 100 chips.\nThe player positions involved in this game are UTG, HJ, CO, BTN, SB, BB.\nIn this hand, your position is CO, and your holding is [Ace of Heart and King of Heart].\nYou currently have High Card(Ace-high).\nBefore the flop, CO raise 2.3, and BB raise 13.5. Assume that all other players that is not mentioned folded.\n\nNow it is your turn to make a move.\nTo remind you, the current pot size is 16.3 chips, and your holding is [Ace of Heart and King of Heart].\n\nDecide on an action based on the strength of your hand on this board, your position, and actions before you. Do not explain your answer.\nYour optimal action is:"
        "", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")
text_streamer = TextStreamer(tokenizer)
outputs = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)

==((====))==  Unsloth 2025.2.15: Fast Llama patching. Transformers: 4.48.3.
   \\   /|    GPU: Tesla T4. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.5.1+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.1.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29. FA2 = False]
 "-____-"     Free Apache license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/2.35G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/234 [00:00<?, ?B/s]

Unsloth 2025.2.15 patched 28 layers with 28 QKV layers, 28 O layers and 28 MLP layers.


<|begin_of_text|>
### Instruction:
You are a specialist in playing 6-handed No Limit Texas Holdem. The following will be a game scenario and you need to make the optimal decision.

Here is a game summary:

The small blind is 0.5 chips and the big blind is 1 chips. Everyone started with 100 chips.
The player positions involved in this game are UTG, HJ, CO, BTN, SB, BB.
In this hand, your position is CO, and your holding is [Ace of Heart and King of Heart].
You currently have High Card(Ace-high).
Before the flop, CO raise 2.3, and BB raise 13.5. Assume that all other players that is not mentioned folded.

Now it is your turn to make a move.
To remind you, the current pot size is 16.3 chips, and your holding is [Ace of Heart and King of Heart].

Decide on an action based on the strength of your hand on this board, your position, and actions before you. Do not explain your answer.
Your optimal action is:

### Response:
call<|eot_id|>


In [6]:
prompt = """
### Instruction:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    # inputs       = examples["input"]
    outputs      = examples["output"]
    texts = []
    for instruction, output in zip(instructions, outputs):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = prompt.format(instruction, output) + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
pass

dataset = load_dataset("RZ412/PokerBench", split = "test")
dataset = dataset.map(formatting_prompts_func, batched = True,)

Map:   0%|          | 0/11000 [00:00<?, ? examples/s]

## testing a radom sample

In [7]:
index = random.randint(0, len(dataset))
print("Groud truth: ",dataset[index]['output'])
inputs = tokenizer([prompt.format(dataset[index]['instruction'],"")], return_tensors = "pt").to("cuda")
# text_streamer = TextStreamer(tokenizer)
# outputs = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 10)
outputs = model.generate(**inputs, max_new_tokens = 10)

generated_text = tokenizer.batch_decode(outputs)[0]
generated_text = generated_text.split("### Response:")[1].strip()
generated_text = generated_text.replace(EOS_TOKEN, "")
print("Prediction: ",generated_text) # Print the generated text

if dataset[index]['output'] == generated_text:
    print("Correct!")
else:
    print("Incorrect!")

Groud truth:  check
Prediction:  bet 52
Incorrect!


In [8]:
ground_truths = []
predictions = []

In [9]:
for index in tqdm(range(len(dataset)), desc="Processing"):
    # print(dataset[index]['output'])
    ground_truths.append(dataset[index]['output'])
    inputs = tokenizer([prompt.format(dataset[index]['instruction'],"")], return_tensors = "pt").to("cuda")
    outputs = model.generate(**inputs, max_new_tokens = 10)
    response = tokenizer.batch_decode(outputs)[0]
    response = response.split("### Response:")[1].strip()
    response = response.replace(EOS_TOKEN, "")
    # print(response)
    predictions.append(response)
    # break

Processing: 100%|██████████| 11000/11000 [48:00<00:00,  3.82it/s]


In [10]:
results_df = pd.DataFrame({
    "Prediction": predictions,
    "Ground Truth": ground_truths
})
# Save the DataFrames to CSV files
results_df.to_csv(f"{output_dir}/{pretrained_model_name}_predictions.csv", index=False)

In [11]:
print(ground_truths)
print(predictions)

['check', 'check', 'check', 'bet 10', 'check', 'raise 11', 'call', 'check', 'call', 'raise 92', 'call', 'check', 'check', 'check', 'check', 'fold', 'call', 'call', 'bet 12', 'raise 81', 'raise 64', 'check', 'raise 85', 'call', 'fold', 'call', 'check', 'bet 3', 'check', 'fold', 'call', 'call', 'raise 16', 'raise 73', 'raise 88', 'call', 'call', 'fold', 'call', 'check', 'call', 'fold', 'fold', 'raise 29', 'bet 7', 'check', 'bet 34', 'fold', 'raise 25', 'check', 'check', 'check', 'raise 70', 'fold', 'call', 'fold', 'bet 18', 'bet 3', 'fold', 'fold', 'call', 'call', 'raise 88', 'fold', 'fold', 'fold', 'fold', 'fold', 'raise 88', 'call', 'fold', 'call', 'raise 86', 'fold', 'call', 'call', 'fold', 'call', 'bet 31', 'fold', 'raise 16', 'call', 'call', 'bet 13', 'fold', 'fold', 'check', 'bet 10', 'fold', 'check', 'call', 'check', 'check', 'bet 10', 'call', 'raise 25', 'raise 88', 'call', 'check', 'bet 17', 'fold', 'fold', 'fold', 'raise 13', 'call', 'check', 'fold', 'fold', 'call', 'call', 'be