<a href="https://colab.research.google.com/github/weber50432/2025-ucl-term2-COMP0258-poker-LLM/blob/master/colab/model_evaluation_hf_models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import os
from google.colab import drive
drive.mount('/content/drive')
# Define the path to the models directory in Google Drive
models_dir = '/content/drive/MyDrive/models'
# Check if the directory exists
if not os.path.exists(models_dir):
    # If it doesn't exist, create it
    os.makedirs(models_dir)
output_dir = '/content/drive/MyDrive/outputs'
# Check if the directory exists
if not os.path.exists(output_dir):
    # If it doesn't exist, create it
    os.makedirs(output_dir)

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
%%capture
# Normally using pip install unsloth is enough

# Temporarily as of Jan 31st 2025, Colab has some issues with Pytorch
# Using pip install unsloth will take 3 minutes, whilst the below takes <1 minute:
!pip install --no-deps bitsandbytes accelerate xformers==0.0.29 peft trl triton
!pip install --no-deps cut_cross_entropy unsloth_zoo
!pip install sentencepiece protobuf datasets huggingface_hub hf_transfer
!pip install --no-deps unsloth

In [3]:
from unsloth import FastLanguageModel
from tqdm import tqdm
from transformers import TextStreamer
from datasets import load_dataset
import random
import pandas as pd

ðŸ¦¥ Unsloth: Will patch your computer to enable 2x faster free finetuning.


    PyTorch 2.5.1+cu121 with CUDA 1201 (you have 2.6.0+cu124)
    Python  3.11.11 (you have 3.11.11)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details


ðŸ¦¥ Unsloth Zoo will now patch everything to make training faster!


In [4]:

max_seq_length = 2048 # Choose any! We auto support RoPE Scaling internally!
dtype = None # None for auto detection. Float16 for Tesla T4, V100, Bfloat16 for Ampere+
load_in_4bit = True # Use 4bit quantization to reduce memory usage. Can be False.

# 4bit pre quantized models we support for 4x faster downloading + no OOMs.
finetuned_models = [
    "weber50432/lora-Meta-Llama-3-8B-Instruct",
    "weber50432/lora-Meta-Llama-3.1-8B-Instruct",
    "weber50432/lora-Llama-3.2-3B-Instruct",
    "weber50432/lora-gemma-2-9b-it",
    "weber50432/lora-Qwen2.5-7B-Instruct-1M",
    "weber50432/lora-Qwen2.5-14B-Instruct-1M"
] # More models at https://huggingface.co/unsloth

pretrained_model_name = finetuned_models[1]

In [5]:
if True:
    model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = pretrained_model_name, # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = max_seq_length,
        dtype = dtype,
        load_in_4bit = load_in_4bit,
    )
    FastLanguageModel.for_inference(model) # Enable native 2x faster inference

# prompt = You MUST copy from above!
prompt = """
### Instruction:
{}

### Response:
{}"""


inputs = tokenizer(
[
    prompt.format(
        "You are a specialist in playing 6-handed No Limit Texas Holdem. The following will be a game scenario and you need to make the optimal decision.\n\nHere is a game summary:\n\nThe small blind is 0.5 chips and the big blind is 1 chips. Everyone started with 100 chips.\nThe player positions involved in this game are UTG, HJ, CO, BTN, SB, BB.\nIn this hand, your position is CO, and your holding is [Ace of Heart and King of Heart].\nYou currently have High Card(Ace-high).\nBefore the flop, CO raise 2.3, and BB raise 13.5. Assume that all other players that is not mentioned folded.\n\nNow it is your turn to make a move.\nTo remind you, the current pot size is 16.3 chips, and your holding is [Ace of Heart and King of Heart].\n\nDecide on an action based on the strength of your hand on this board, your position, and actions before you. Do not explain your answer.\nYour optimal action is:"
        "", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")
text_streamer = TextStreamer(tokenizer)
outputs = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)

==((====))==  Unsloth 2025.3.19: Fast Llama patching. Transformers: 4.50.2.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = None. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

model-00001-of-00004.safetensors:   0%|          | 0.00/5.30G [00:00<?, ?B/s]

model-00002-of-00004.safetensors:   0%|          | 0.00/5.35G [00:00<?, ?B/s]

model-00003-of-00004.safetensors:   0%|          | 0.00/4.36G [00:00<?, ?B/s]

model-00004-of-00004.safetensors:   0%|          | 0.00/1.05G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

tokenizer_config.json:   0%|          | 0.00/55.4k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/296 [00:00<?, ?B/s]

weber50432/lora-Meta-Llama-3.1-8B-Instruct does not have a padding token! Will use pad_token = <|finetune_right_pad_id|>.
<|begin_of_text|>
### Instruction:
You are a specialist in playing 6-handed No Limit Texas Holdem. The following will be a game scenario and you need to make the optimal decision.

Here is a game summary:

The small blind is 0.5 chips and the big blind is 1 chips. Everyone started with 100 chips.
The player positions involved in this game are UTG, HJ, CO, BTN, SB, BB.
In this hand, your position is CO, and your holding is [Ace of Heart and King of Heart].
You currently have High Card(Ace-high).
Before the flop, CO raise 2.3, and BB raise 13.5. Assume that all other players that is not mentioned folded.

Now it is your turn to make a move.
To remind you, the current pot size is 16.3 chips, and your holding is [Ace of Heart and King of Heart].

Decide on an action based on the strength of your hand on this board, your position, and actions before you. Do not explain y

In [32]:
# You only have four options to response: "fold", "check", "call" or "raise".
prompt = """
### Instruction:
{}

### Response:
{}"""

EOS_TOKEN = tokenizer.eos_token # Must add EOS_TOKEN
def formatting_prompts_func(examples):
    instructions = examples["instruction"]
    # inputs       = examples["input"]
    texts = []
    for instruction in zip(instructions):
        # Must add EOS_TOKEN, otherwise your generation will go on forever!
        text = prompt.format(instruction, "") + EOS_TOKEN
        texts.append(text)
    return { "text" : texts, }
pass

dataset = load_dataset("RZ412/PokerBench", split = "test")
dataset = dataset.map(formatting_prompts_func, batched = True,)

Map:   0%|          | 0/11000 [00:00<?, ? examples/s]

## testing a radom sample

In [43]:
index = random.randint(0, len(dataset))
print("Groud truth: ",dataset[index]['output'])
inputs = tokenizer([prompt.format(dataset[index]['instruction'],"")], return_tensors = "pt").to("cuda")
# text_streamer = TextStreamer(tokenizer)
outputs = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 5)
# outputs = model.generate(**inputs, max_new_tokens = 5)

generated_text = tokenizer.batch_decode(outputs)[0]
generated_text = generated_text.split("### Response:")[1].strip()
generated_text = generated_text.replace(EOS_TOKEN, "")
# print("Prediction: ",generated_text) # Print the generated text

if dataset[index]['output'] == generated_text:
    print("Correct!")
else:
    print("Incorrect!")

Groud truth:  raise 90
<|begin_of_text|>
### Instruction:


You are a specialist in playing 6-handed No Limit Texas Holdem. The following will be a game scenario and you need to make the optimal decision.

Here is a game summary:

The small blind is 0.5 chips and the big blind is 1 chips. Everyone started with 100 chips.
The player positions involved in this game are UTG, HJ, CO, BTN, SB, BB.
In this hand, your position is SB, and your holding is [King of Spade and Jack of Spade].
Before the flop, SB raise 3.0 chips, BB raise 10.0 chips, and SB call. Assume that all other players that is not mentioned folded.
The flop comes Jack Of Heart, Four Of Heart, and Five Of Spade, then SB check, and BB check.
The turn comes King Of Heart, then SB bet 22 chips, and BB raise 44 chips.


Now it is your turn to make a move.
To remind you, the current pot size is 86.0 chips, and your holding is [King of Spade and Jack of Spade].

Decide on an action based on the strength of your hand on this board, 

In [8]:
ground_truths = []
predictions = []

In [9]:
for index in tqdm(range(len(dataset)), desc="Processing"):
    # print(dataset[index]['output'])
    ground_truths.append(dataset[index]['output'])
    inputs = tokenizer([prompt.format(dataset[index]['instruction'],"")], return_tensors = "pt").to("cuda")
    outputs = model.generate(**inputs, max_new_tokens = 10)
    response = tokenizer.batch_decode(outputs)[0]
    # response = response.split("### Response:")[1].strip()
    response = response.replace(EOS_TOKEN, "")
    # print(response)
    predictions.append(response)
    break

Processing:   0%|          | 0/11000 [00:00<?, ?it/s]


In [None]:
results_df = pd.DataFrame({
    "Prediction": predictions,
    "Ground Truth": ground_truths
})
# Save the DataFrames to CSV files
results_df.to_csv(f"{output_dir}/{pretrained_model_name}_predictions.csv", index=False)

In [None]:
print(ground_truths)
print(predictions)