# Lab 5 Part 3 - Serious Liar's Dice

In Part 2, all AI players have the same prompt. If you wanted to compare between models, you could only compare between the Gemini family. Let's include more model providers in the mix.

Set up an API key from openrouter.ai. Create an account, and in Settings -> Privacy, toggle on `Enable providers that may train on inputs` under Paid Models and toggle on `Enable training and logging (chatroom and API)` under Free Models. This will allow you to use some models for free at a low rate in exchange of your chat data being used for model training. I believe that is acceptable for the purposes of this lab. In general however, please be wary about passing your own data to LLM providers.

Next, paste the API keys in files called `googleapikey.txt` and `openrouterapikey.txt` in the same folder as this notebook. It should look like this:

```
# googleapikey.txt
your-api-key-here

# openrouterapikey.txt
your-api-key-here
```

On top of the packages in the Telephone Game notebook, ensure `langchain-openai` is installed.

Here is an example of how you can run free models from openrouter.ai.

```python
from langchain_openai import ChatOpenAI

llama3_3 = ChatOpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=config("OPENROUTER_API_KEY"),
    model="meta-llama/llama-3.3-8b-instruct:free",
)

chain = (
    ChatPromptTemplate.from_template("Generate a limerick based on: {theme}") | llama3_3 | StrOutputParser()
)

chain.invoke({"theme": "bananas"})
# 'There once was a banana so bright,\nGrew in the tropics with warm delight.\nIt ripened with care,\nAnd was eaten with flair,\nAnd its taste was a pure pleasure in sight!'
```

## Making a serious Liar's Dice game

Go ahead and set up a Liar's Dice game where we pit free models against each other. Use the following:

- "meta-llama/llama-3.3-8b-instruct:free"
- "google/gemma-3-12b-it:free"
- "mistralai/mistral-small-3.1-24b-instruct:free"
- "qwen/qwen3-8b:free"

Note that for qwen3 models, you need to use /nothink in the prompt to prevent it from going into reasoning mode, as it will spend too much of your free limit and will be very slow to return results. Example code:

```python
qwen3_8b = ChatOpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=config("OPENROUTER_API_KEY"),
    model="qwen/qwen3-8b:free",
)
chain = (
    ChatPromptTemplate.from_template("/nothink Generate a limerick based on: {theme}")
    | qwen3_8b | StrOutputParser()
)
```

In [1]:
!pip install langchain_openai

Collecting langchain_openai
  Downloading langchain_openai-0.3.23-py3-none-any.whl.metadata (2.3 kB)
Collecting langchain-core<1.0.0,>=0.3.65 (from langchain_openai)
  Downloading langchain_core-0.3.65-py3-none-any.whl.metadata (5.8 kB)
Collecting langsmith<0.4,>=0.3.45 (from langchain-core<1.0.0,>=0.3.65->langchain_openai)
  Downloading langsmith-0.3.45-py3-none-any.whl.metadata (15 kB)
Collecting packaging<25,>=23.2 (from langchain-core<1.0.0,>=0.3.65->langchain_openai)
  Downloading packaging-24.2-py3-none-any.whl.metadata (3.2 kB)
Downloading langchain_openai-0.3.23-py3-none-any.whl (65 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m65.4/65.4 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading langchain_core-0.3.65-py3-none-any.whl (438 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m438.1/438.1 kB[0m [31m9.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading langsmith-0.3.45-py3-none-any.whl (363 kB)
[2K   [90m━━━━━━━━

In [2]:
import json
from typing import Dict, List, Optional
from collections import defaultdict
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
import re
import random

In [3]:
OPENROUTER_API_KEY = "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"

In [4]:
class LiarsDiceGame:
    def __init__(self):
        self.players = {}
        self.current_bid = None
        self.history = []
        self.turn_order = []
        self.current_turn = 0
        self.round_count = 0
        
    def setup_game(self, ai_players: List, dice_per_player: int = 5):
        """Set up the Liar's Dice game"""
        player_names = [i.name for i in ai_players]
        self.turn_order = player_names
        self.total_dice = len(player_names) * dice_per_player
        
        for name, ai_player in zip(player_names, ai_players):
            self.players[name] = {
                'dice': [random.randint(1, 6) for _ in range(dice_per_player)],
                'num_dice': dice_per_player,
                'ai_player': ai_player
            }
        
        print("LIAR'S DICE!")
        print(f"Players: {', '.join(player_names)}")
        print(f"Total dice: {self.total_dice}")
        
        print("\nDICE (not visible to players):")
        for name, data in self.players.items():
            print(f"{name}: {data['dice']}")

    def display_game_state(self):
        """Display current game state"""
        print(f"\n{'='*60}")
        print(f"ROUND {self.round_count + 1} - GAME STATE")
        print('='*60)
        
        print("PLAYERS:")
        for name in self.turn_order:
            data = self.players[name]
            dice_display = f"{data['dice']} ({data['num_dice']} dice)"
            current_marker = " ← CURRENT TURN" if name == self.turn_order[self.current_turn] else ""
            print(f"   {name}: {dice_display}{current_marker}")
        
        if self.current_bid:
            print(f"\nCURRENT BID: {self.current_bid['quantity']} dice show {self.current_bid['face_value']} (by {self.current_bid['player']})")
        else:
            print("\nCURRENT BID: None (round start)")
        
        print(f"\n📊 TOTAL DICE IN PLAY: {self.total_dice}")
        
        if self.history:
            print(f"\n📜 RECENT HISTORY:")
            for action in self.history[-5:]:  # Show last 5 actions
                print(f"   • {action}")
    
    def is_valid_bid(self, quantity: int, face_value: int) -> bool:
        """Check if bid is higher than current bid"""
        if not self.current_bid:
            return quantity > 0 and 1 <= face_value <= 6
        
        current_q = self.current_bid['quantity']
        current_f = self.current_bid['face_value']
        
        return quantity > current_q or (quantity == current_q and face_value > current_f)
    
    def count_dice(self, face_value: int) -> int:
        """Count dice showing face_value (1s are wild)"""
        total = 0
        for player_data in self.players.values():
            for die in player_data['dice']:
                if die == face_value or die == 1:
                    total += 1
        return total
    
    def play_turn(self) -> bool:
        """Play one turn with enhanced display - returns False if game ends"""
        self.display_game_state()
        
        current_player = self.turn_order[self.current_turn]
        player_data = self.players[current_player]
        
        print(f"\n{current_player}'s turn")
        
        # Get AI decision with full context
        decision = player_data['ai_player'].make_decision(
            player_data['dice'],
            self.total_dice,
            self.current_bid,
            self.history,
        )
        
        if decision["action"] == "bid":
            quantity = decision["quantity"]
            face_value = decision["face_value"]
            
            if self.is_valid_bid(quantity, face_value):
                self.current_bid = {
                    'player': current_player,
                    'quantity': quantity,
                    'face_value': face_value
                }
                action_text = f"{current_player} bids {quantity} dice show {face_value}"
                print(f"✅ {action_text}")
                self.history.append(action_text)
            else:
                print(f"❌ Invalid bid from {current_player}! Auto-correcting...")
                # Force a valid bid
                if self.current_bid:
                    if self.current_bid['face_value'] < 6:
                        quantity = self.current_bid['quantity']
                        face_value = self.current_bid['face_value'] + 1
                    else:
                        quantity = self.current_bid['quantity'] + 1
                        face_value = 2
                else:
                    quantity, face_value = 1, 2
                
                self.current_bid = {
                    'player': current_player,
                    'quantity': quantity,
                    'face_value': face_value
                }
                action_text = f"{current_player} bids {quantity} dice show {face_value} (auto-corrected)"
                print(f"🔧 {action_text}")
                self.history.append(action_text)
                
        elif decision["action"] == "challenge":
            if not self.current_bid:
                print("❌ Can't challenge - no bid to challenge!")
                return True  # Continue game
            
            action_text = f"{current_player} challenges {self.current_bid['player']}'s bid"
            self.history.append(action_text)
            return self.resolve_challenge(current_player)
        
        # Next player
        self.current_turn = (self.current_turn + 1) % len(self.turn_order)
        return True
    
    def resolve_challenge(self, challenger: str) -> bool:
        """Resolve challenge with detailed analysis - returns True if game ends"""
        bid = self.current_bid
        actual_count = self.count_dice(bid['face_value'])
        
        print(f"\n{'🚨 CHALLENGE RESOLUTION 🚨':^60}")
        print(f"{challenger} challenges {bid['player']}'s bid:")
        print(f"BID: {bid['quantity']} dice show {bid['face_value']}")
        
        # Detailed count analysis
        print(f"\nCOUNTING DICE SHOWING {bid['face_value']}:")
        for name, data in self.players.items():
            player_count = sum(1 for die in data['dice'] if die == bid['face_value'] or die == 1)
            ones = data['dice'].count(1)
            targets = data['dice'].count(bid['face_value'])
            print(f"   {name}: {targets} natural {bid['face_value']}s + {ones} wilds = {player_count} total")
        
        print(f"\nFINAL COUNT: {actual_count} dice show {bid['face_value']}")
        print(f"BID CLAIMED: {bid['quantity']} dice show {bid['face_value']}")
        
        if actual_count >= bid['quantity']:
            print(f"✅ BID WAS TRUE! {challenger} loses a die!")
            loser = challenger
            winner = bid['player']
        else:
            print(f"❌ BID WAS FALSE! {bid['player']} loses a die!")
            loser = bid['player']
            winner = challenger
        
        # Remove die from loser
        self.players[loser]['num_dice'] -= 1
        if self.players[loser]['dice']:
            self.players[loser]['dice'].pop()
        
        print(f"{winner} wins the challenge!")
        print(f"{loser} now has {self.players[loser]['num_dice']} dice")
        
        result_text = f"Challenge: {challenger} vs {bid['player']} - {winner} wins"
        self.history.append(result_text)
        
        # Check elimination
        if self.players[loser]['num_dice'] == 0:
            print(f"{loser} is eliminated! 💀")
            self.turn_order.remove(loser)
            if len(self.turn_order) == 1:
                print(f"\n🎉 {self.turn_order[0]} WINS THE GAME! 🎉")
                return False
        
       # Re-roll ALL dice for new round and reset game state
        print(f"\nRolling new dice for next round...")
        for name, data in self.players.items():
            if data['num_dice'] > 0:  # Only re-roll for active players
                data['dice'] = [random.randint(1, 6) for _ in range(data['num_dice'])]
                print(f"   {name}: {data['dice']}")
        
        # Reset for next round
        self.current_bid = None
        self.current_turn = self.turn_order.index(winner)
        self.total_dice = sum(p['num_dice'] for p in self.players.values())
        self.round_count += 1
        
        print(f"\nNEW ROUND {self.round_count + 1}! {winner} starts with fresh dice")
        return True  # Continue game with new round

## EX: Measuring LLM capability

How does each LLM measure up to each other when it comes to playing Liar's Dice? Track the necessary stats over multiple rounds of play and give your analysis based on that data.

In [5]:
class AIPlayer:
    def __init__(self, name: str, llm, is_qwen=False):
        self.name = name
        self.llm = llm
        self.is_qwen = is_qwen
        
        # Game-level stats
        self.wins = 0
        self.losses = 0
        
        # Challenge-related stats
        self.challenges_made = 0      # How many times this player challenged
        self.challenges_won = 0       # How many challenges they won

        # Bid-related stats
        self.bids_made = 0            # How many bids this player made
        self.bids_defended = 0      # How many times this player's bid was challenged and was TRUE
        self.bids_countered = 0     # How many times this player's bid was challenged and was FALSE

        # Thinking prompt
        self.thinking_prompt = ChatPromptTemplate.from_template(
            """You are {player_name} playing Liar's Dice. Analyze the game state:

GAME RULES:
- 1s are WILD and count as any face
- Bid format: <quantity> dice show <face_value>
- New bids must be higher: increase quantity OR same quantity with higher face
- Challenge if you doubt the bid

CURRENT STATE:
Your dice: {my_dice}
Total dice: {total_dice}
Current bid: {current_bid_str}
Recent history:
{history_str}

THINKING:
1. Analyze your hand (remember 1s are wild)
2. Consider bid likelihood using total dice
3. Evaluate risks of bidding vs challenging
4. Strategy considerations"""
        )

        # Decision prompt - add /nothink prefix for Qwen models
        decision_template = "/nothink " if is_qwen else ""
        decision_template += """As {player_name} in Liar's Dice, DECIDE:

CURRENT STATE:
- Your dice: {my_dice}
- Total dice: {total_dice}
- Current bid: {current_bid_str}
- Recent: {history_str}

OPTIONS:
{options}

RULES:
- Must bid if no current bid
- New bid must be higher
- Face must be 2-6 (1s are wild, not biddable)
- Output EXACTLY one of:
  • "CHALLENGE"
  • "BID <quantity> <face_value>" (e.g., "BID 3 4")

DECISION:"""
        
        self.decision_prompt = ChatPromptTemplate.from_template(decision_template)
        self.thinking_chain = self.thinking_prompt | self.llm | StrOutputParser()
        self.decision_chain = self.decision_prompt | self.llm | StrOutputParser()

    def make_decision(
        self,
        my_dice: List[int],
        total_dice: int,
        current_bid: Optional[Dict],
        game_history: List[str],
    ) -> Dict:
        current_bid_str = (
            "None (you start this round)"
            if not current_bid
            else f"{current_bid['quantity']} dice show {current_bid['face_value']} (by {current_bid['player']})"
        )
        history_str = "\\n".join(f"- {h}" for h in game_history[-5:]) if game_history else "None"
        
        if current_bid:
            options = (
                f"- CHALLENGE: If you doubt there are at least {current_bid['quantity']} of face {current_bid['face_value']}\\n"
                f"- BID: Must be higher than current bid ({current_bid['quantity']}, {current_bid['face_value']})"
            )
        else:
            options = "- BID: You must open the bidding (face 2-6)"

        try:
            if not self.is_qwen:
                thought = self.thinking_chain.invoke({
                    "player_name": self.name,
                    "my_dice": my_dice,
                    "total_dice": total_dice,
                    "current_bid_str": current_bid_str,
                    "history_str": history_str
                })
                #print(f"💭 {self.name}'s thoughts:\\n   {thought.replace('\\n', '\\n   ')}")
            
            decision = self.decision_chain.invoke({
                "player_name": self.name,
                "my_dice": my_dice,
                "total_dice": total_dice,
                "current_bid_str": current_bid_str,
                "history_str": history_str,
                "options": options
            })
            print(f"🤖 {self.name} decides: {decision}")
            
            return self._parse_decision(decision)

        except Exception as e:
            print(f"❌ Error for {self.name}: {e}")
            return self._fallback_decision(current_bid)

    def _parse_decision(self, response: str) -> Dict:
        clean_res = response.strip().upper()
        
        if "CHALLENGE" in clean_res:
            return {"action": "challenge"}
        
        if clean_res.startswith("BID"):
            parts = clean_res.split()
            if len(parts) < 3:
                raise ValueError("Invalid bid format - missing values")
                
            try:
                quantity = int(parts[1])
                face_value = int(parts[2])
                if face_value < 2 or face_value > 6:
                    raise ValueError("Face value must be 2-6")
                return {"action": "bid", "quantity": quantity, "face_value": face_value}
            except (ValueError, IndexError):
                raise ValueError("Invalid bid numbers")
        
        raise ValueError("Unrecognized decision format")

    def _fallback_decision(self, current_bid: Optional[Dict]) -> Dict:
        if not current_bid:
            return {
                "action": "bid",
                "quantity": random.randint(1, 2),
                "face_value": random.randint(2, 6),
            }
            
        challenge_prob = min(0.6, current_bid["quantity"] / 10.0) # Simple heuristic
        if random.random() < challenge_prob:
            return {"action": "challenge"}
        elif current_bid["face_value"] < 6:
            return {
                "action": "bid",
                "quantity": current_bid["quantity"],
                "face_value": current_bid["face_value"] + 1,
            }
        else:
            return {
                "action": "bid",
                "quantity": current_bid["quantity"] + 1,
                "face_value": 2,
            }
            
    def record_bid_made(self):
        self.bids_made += 1

    def record_challenge_event(self, i_am_challenger: bool, challenger_won_round: bool):
        """
        Records the outcome of a challenge.
        - i_am_challenger: True if this player was the one making the challenge.
        - challenger_won_round: True if the challenger won the round (i.e., the bid was false).
        """
        if i_am_challenger:
            self.challenges_made += 1
            if challenger_won_round:
                self.challenges_won += 1
        else: # This player was the one whose bid was challenged
            if challenger_won_round: # Challenger won, so my (bidder's) bid was false
                self.bids_countered += 1
            else: # Challenger lost, so my (bidder's) bid was true
                self.bids_defended += 1
            
    def record_game_result(self, won_game: bool):
        if won_game:
            self.wins += 1
        else:
            self.losses += 1
            
    def get_stats(self):
        total_games = self.wins + self.losses
        total_challenged_bids = self.bids_defended + self.bids_countered
        
        return {
            "wins": self.wins,
            "losses": self.losses,
            "win_rate": self.wins / total_games if total_games > 0 else 0,
            "bids_made": self.bids_made,
            "bids_defended": self.bids_defended, # Bid was true when challenged
            "bids_countered": self.bids_countered, # Bid was false when challenged
            "bid_defense_rate": self.bids_defended / total_challenged_bids if total_challenged_bids > 0 else 0, # % of challenged bids that were true
            "challenges_made": self.challenges_made,
            "challenges_won": self.challenges_won, # Challenger correctly identified a false bid
            "challenge_success_rate": self.challenges_won / self.challenges_made if self.challenges_made > 0 else 0,
        }

def update_stats_from_game_history(game_history: List[str], all_players: List[AIPlayer]):
    """
    Parses game history to update detailed player statistics.
    """
    
    def find_player_obj(name_str: str, players_list: List[AIPlayer]) -> Optional[AIPlayer]:
        for p_obj in players_list:
            if p_obj.name == name_str:
                return p_obj
        print(f"Warning: Player object for '{name_str}' not found in player list.")
        return None

    for entry in game_history:
        # Regex for bids: "PlayerName bids ..." (also catches auto-corrected bids)
        # Make the player name capture non-greedy and robust to suffixes like "(auto-corrected)"
        bid_match = re.match(r"(.+?) bids ", entry)
        if bid_match:
            # Extract player name, removing potential " (auto-corrected)" suffix
            bidder_name_full = bid_match.group(1).strip()
            bidder_name = bidder_name_full.replace(" (auto-corrected)", "").strip()
            
            bidder_obj = find_player_obj(bidder_name, all_players)
            if bidder_obj:
                bidder_obj.record_bid_made()
            continue # Processed this entry

        # Regex for challenge resolution: "Challenge: ChallengerName vs BidderName - WinnerName wins"
        resolution_match = re.match(r"Challenge: (.+?) vs (.+?) - (.+?) wins", entry)
        if resolution_match:
            challenger_name = resolution_match.group(1).strip()
            bidder_name = resolution_match.group(2).strip() # Player whose bid was challenged
            winner_name = resolution_match.group(3).strip()

            challenger_obj = find_player_obj(challenger_name, all_players)
            challenged_bidder_obj = find_player_obj(bidder_name, all_players)
            
            challenger_won_this_round = (winner_name == challenger_name)

            if challenger_obj:
                challenger_obj.record_challenge_event(i_am_challenger=True, 
                                                      challenger_won_round=challenger_won_this_round)
            
            if challenged_bidder_obj:
                challenged_bidder_obj.record_challenge_event(i_am_challenger=False, 
                                                             challenger_won_round=challenger_won_this_round)
            continue # Processed this entry

In [6]:
llama3_8b = ChatOpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=OPENROUTER_API_KEY,
    model="meta-llama/llama-3.3-8b-instruct:free",
)

gemma_12b = ChatOpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=OPENROUTER_API_KEY,
    model="google/gemma-3-12b-it:free",
)
mistral_24b = ChatOpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=OPENROUTER_API_KEY,
    model="mistralai/mistral-small-3.1-24b-instruct:free",
)


qwen_8b = ChatOpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=OPENROUTER_API_KEY,
    model="qwen/qwen3-8b:free",
)

# Initialize AI Players
players = [
    AIPlayer("Llama3-8B", llama3_8b),
    AIPlayer("Gemma-3-12B", gemma_12b),
    AIPlayer("Mistral-24B", mistral_24b),
    AIPlayer("Qwen-8B", qwen_8b, is_qwen=True)
]

In [7]:
num_games = 48  # As specified
dice_per_player = 5 

# Reset player stats before a new tournament run
for p in players:
    p.wins = 0
    p.losses = 0
    p.challenges_made = 0
    p.challenges_won = 0
    p.bids_made = 0
    p.bids_defended = 0
    p.bids_countered = 0

# Run multiple games
for game_num in range(num_games):
    print(f"\n{'='*60}")
    print(f"STARTING GAME {game_num+1}/{num_games}")
    print(f"{'='*60}")
    
    # Create a new game instance for each game
    # The LiarsDiceGame class needs to be available in the scope
    # For example, by defining it in a previous cell or importing it.
    # Assuming LiarsDiceGame is defined as per the notebook:
    # class LiarsDiceGame: ... (definition from the notebook)
    game = LiarsDiceGame() # Needs LiarsDiceGame class definition
    
    # The AIPlayer objects in the 'players' list are passed to setup_game
    game.setup_game(players, dice_per_player=dice_per_player) 
    
    turn_count = 0
    # max_turns is not strictly needed if game guarantees end by elimination
    # but can be a safeguard for unexpected long rounds.
    # LiarsDiceGame.play_turn should handle game end conditions.
    game_ended_by_win = False
    
    while not game_ended_by_win: # Loop until play_turn signals game end
        print(f"\n{'--- ROUND IN GAME ' + str(game_num+1) + ', TURN ' + str(turn_count + 1) + ' ---':^60}")
        
        # play_turn returns False if the game ends (a player wins)
        if not game.play_turn():
            game_ended_by_win = True # Game ended normally with a winner
            if game.turn_order and len(game.turn_order) == 1: # Ensure there's a winner
                winner_name = game.turn_order[0] 
                print(f"\n🎉 {winner_name} WINS GAME {game_num+1}! 🎉")
                
                # Record game results for all AIPlayer objects
                for p_obj in players: # Iterate through the global list of AIPlayer objects
                    p_obj.record_game_result(p_obj.name == winner_name)
            else:
                print(f"\nGame {game_num+1} ended without a clear single winner in turn_order.")
                # Handle this case, perhaps by not assigning wins/losses or logging an error.
                # For now, we assume game.play_turn() correctly sets turn_order for the winner.

            break # Exit the while loop for this game
            
        turn_count += 1
        # time.sleep(0.2) # Optional small delay 
    
    if not game_ended_by_win: # Should ideally not be reached if game logic is sound
        print(f"\nGame {game_num+1} ended due to loop break without explicit win signal (e.g., max_turns).")

    print(f"\nGame {game_num+1} finished after {turn_count} effective turns (actions/challenges)!")
    
    # Update detailed stats from this game's history
    # game.history should be populated by the LiarsDiceGame instance
    if hasattr(game, 'history'):
        update_stats_from_game_history(game.history, players)
    else:
        print("Warning: game object does not have 'history' attribute. Cannot update detailed stats.")
        
    # Print interim stats (optional)
    print("\nCURRENT GAME WINS:")
    for p_obj in players:
        print(f"{p_obj.name}: {p_obj.wins} wins")


STARTING GAME 1/48
LIAR'S DICE!
Players: Llama3-8B, Gemma-3-12B, Mistral-24B, Qwen-8B
Total dice: 20

DICE (not visible to players):
Llama3-8B: [3, 6, 1, 3, 3]
Gemma-3-12B: [2, 3, 6, 1, 4]
Mistral-24B: [6, 2, 5, 6, 2]
Qwen-8B: [1, 4, 2, 2, 1]

              --- ROUND IN GAME 1, TURN 1 ---               

ROUND 1 - GAME STATE
PLAYERS:
   Llama3-8B: [3, 6, 1, 3, 3] (5 dice) ← CURRENT TURN
   Gemma-3-12B: [2, 3, 6, 1, 4] (5 dice)
   Mistral-24B: [6, 2, 5, 6, 2] (5 dice)
   Qwen-8B: [1, 4, 2, 2, 1] (5 dice)

CURRENT BID: None (round start)

📊 TOTAL DICE IN PLAY: 20

Llama3-8B's turn
❌ Error for Llama3-8B: Error code: 429 - {'error': {'message': 'Rate limit exceeded: free-models-per-day. Add 10 credits to unlock 1000 free model requests per day', 'code': 429, 'metadata': {'headers': {'X-RateLimit-Limit': '50', 'X-RateLimit-Remaining': '0', 'X-RateLimit-Reset': '1749859200000'}, 'provider_name': None}}, 'user_id': 'user_2yJtrGDpLF6fCWMg8xMwwPlaDZ2'}
✅ Llama3-8B bids 1 dice show 3

         

In [8]:
# Final Statistics Printing
print("\n" + "="*60)
print("FINAL AGGREGATE STATISTICS (Free Models)")
print("="*60)

for player in players:
    stats = player.get_stats()
    print(f"\n📊 {player.name} Performance:")
    print(f"  Overall Win Rate: {stats['win_rate']:.1%} ({stats['wins']}/{stats['wins']+stats['losses']})")
    print(f"  Bids Made: {stats['bids_made']}")
    if (stats['bids_defended'] + stats['bids_countered']) > 0:
        print(f"  Bid Defense Rate (when challenged): {stats['bid_defense_rate']:.1%} ({stats['bids_defended']} defended / {stats['bids_defended'] + stats['bids_countered']} challenged)")
    else:
        print("  Bid Defense Rate (when challenged): N/A (no bids challenged)")
    print(f"  Challenges Made: {stats['challenges_made']}")
    if stats['challenges_made'] > 0:
        print(f"  Challenge Success Rate: {stats['challenge_success_rate']:.1%} ({stats['challenges_won']}/{stats['challenges_made']})\")")
    else:
        print("  Challenge Success Rate: N/A (no challenges made)")


FINAL AGGREGATE STATISTICS (Free Models)

📊 Llama3-8B Performance:
  Overall Win Rate: 31.2% (15/48)
  Bids Made: 1106
  Bid Defense Rate (when challenged): 80.8% (172 defended / 213 challenged)
  Challenges Made: 217
  Challenge Success Rate: 24.4% (53/217)")

📊 Gemma-3-12B Performance:
  Overall Win Rate: 35.4% (17/48)
  Bids Made: 1094
  Bid Defense Rate (when challenged): 83.8% (186 defended / 222 challenged)
  Challenges Made: 223
  Challenge Success Rate: 23.3% (52/223)")

📊 Mistral-24B Performance:
  Overall Win Rate: 14.6% (7/48)
  Bids Made: 1006
  Bid Defense Rate (when challenged): 72.8% (139 defended / 191 challenged)
  Challenges Made: 203
  Challenge Success Rate: 15.3% (31/203)")

📊 Qwen-8B Performance:
  Overall Win Rate: 18.8% (9/48)
  Bids Made: 1116
  Bid Defense Rate (when challenged): 75.9% (173 defended / 228 challenged)
  Challenges Made: 211
  Challenge Success Rate: 22.7% (48/211)")
