<a href="https://colab.research.google.com/github/GemmaRagadini/Pokemon_AIF_24_25/blob/dev/pokemon-vgc-engine-master/example/Notebook_project_Pokemon.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Pokemon_Battle PokeBob team**

The project focuses on the track proposed in the course, specifically the section related to competitions. Our team selected the task involving the simulation of a Pokémon battle between two teams, each composed of three Pokémon. The battle consists of three matches, with the first player to knock out all three Pokémon of the opposing player declared the winner of the match. The battle is considered concluded when a player wins at least two out of three matches.

The objective of the project was to develop an AI agent capable of defeating a random player. This random player selects its Pokémon moves arbitrarily without any strategic logic. Both players operate under the same conditions, with their teams assigned randomly, ensuring that any advantages or disadvantages are also determined by chance
In the following sections, we will explain our approach to solving the task and outline the methodology we adopted, along with the results we obtained.


In [None]:
# Clone the entire repo.
!git clone https://github.com/GemmaRagadini/Pokemon_AIF_24_25.git
%cd Pokemon_AIF_24_25/pokemon-vgc-engine-master
!ls

Cloning into 'Pokemon_AIF_24_25'...
remote: Enumerating objects: 1863, done.[K
remote: Counting objects: 100% (249/249), done.[K
remote: Compressing objects: 100% (172/172), done.[K
remote: Total 1863 (delta 195), reused 125 (delta 77), pack-reused 1614 (from 1)[K
Receiving objects: 100% (1863/1863), 21.61 MiB | 18.54 MiB/s, done.
Resolving deltas: 100% (417/417), done.
/content/Pokemon_AIF_24_25/pokemon-vgc-engine-master/Pokemon_AIF_24_25/pokemon-vgc-engine-master/vgc/Pokemon_AIF_24_25/pokemon-vgc-engine-master/vgc/behaviour/Pokemon_AIF_24_25/pokemon-vgc-engine-master/Pokemon_AIF_24_25/pokemon-vgc-engine-master/Pokemon_AIF_24_25/pokemon-vgc-engine-master
CHANGELOG    example	  organization	requirements.txt  test
competition  LICENSE.txt  README.md	setup.py	  vgc
[31mERROR: Could not find a version that satisfies the requirement requirements.txt (from versions: none)[0m[31m
[0mHINT: You are attempting to install a package literally named "requirements.txt" (which cannot exist).

In [None]:

#from Example_BattleEcosystem import Tournament, main
from Example_BattleEcosystem.py import main

main()

ModuleNotFoundError: No module named 'vgc'

**Related Work**

For this project, we decided to explore some existing approaches related to the concepts studied during the course, as well as to develop a custom approach of our own. These approaches were evaluated through a tournament, the details of which will be provided in the corresponding section.

The related work was derived from the course slides and the accompanying textbook, Artificial Intelligence Fundamentals. Specifically, we focused on the section related to game theory (Chapter 6 of the textbook) and examined various approaches to the Minimax algorithm and its variations. This included the implementation of alpha-beta pruning and heuristics such as the killer move heuristic.

**Methodology**

To achieve our goal, we decided to implement several algorithms discussed in the related work. Initially, we focused our attention on various implementations of the Minimax algorithm. In the following sections, we will provide a detailed explanation of each algorithm we implemented.

In [None]:

import math
import hashlib
import numpy as np
import random
from typing import List
from vgc.behaviour import evalFunctions
from vgc.datatypes.Types import PkmStat, PkmType, WeatherCondition
from vgc.datatypes.Objects import Pkm, GameState,PkmType
from vgc.datatypes.Constants import DEFAULT_PKM_N_MOVES, DEFAULT_PARTY_SIZE, TYPE_CHART_MULTIPLIER, DEFAULT_N_ACTIONS
from vgc.behaviour import BattlePolicy
from copy import deepcopy

#import that are useful to run our code for the implementattion of all the policies

ImportError: cannot import name 'evalFunctions' from 'vgc.behaviour' (/content/Pokemon_AIF_24_25/pokemon-vgc-engine-master/vgc/behaviour/__init__.py)

**Minimax**

The first implementation is the one obout a simple minimax with a simple evaluation function called game_eval. Here below there is our implementation.


In [None]:


def game_state_eval(s: GameState, depth):
    mine = s.teams[0].active
    opp = s.teams[1].active
    return mine.hp / mine.max_hp - 3 * opp.hp / opp.max_hp - 0.3 * depth


def n_fainted(t: PkmTeam):
    fainted = 0
    fainted += t.active.hp == 0
    if len(t.party) > 0:
        fainted += t.party[0].hp == 0
    if len(t.party) > 1:
        fainted += t.party[1].hp == 0
    return fainted




class MyMinimax(BattlePolicy):

    def __init__(self, max_depth: int = 4):
        self.max_depth = max_depth
        self.name = "Minimax"

    def minimax(self, g, depth, is_maximizing_player):
        """
        Classic Minimax with basic evaluation function.

        :param g: current game state.
        :param depth: the depth of the research.
        :param is_maximizing_player: True if the player is a maximaxer or False if it is a minimazer.
        :return: (valutazione, azione migliore)
        """
        if depth == 0:
            # the evaluation function is the basic evaluation function that evaluates the Hp of the pokemons.
            return game_state_eval(g, depth), None

        if is_maximizing_player:
            max_eval = float('-inf')
            best_action = None
            for i in range(DEFAULT_N_ACTIONS):
                g_copy = deepcopy(g)
                s, _, _, _, _ = g_copy.step([i, 99])  # the enemy does not do a correct action the state does not change
                if n_fainted(s[0].teams[0]) > n_fainted(g.teams[0]):
                    continue # ignores the state where our pokemon loose.
                eval_score, _ = self.minimax(s[0], depth - 1, False)
                if eval_score > max_eval:
                    max_eval = eval_score
                    best_action = i
            return max_eval, best_action

        else:  # part of the enemy where he tries to minimize
            min_eval = float('inf')
            best_action = None
            for j in range(DEFAULT_N_ACTIONS):
                g_copy = deepcopy(g)
                s, _, _, _, _ = g_copy.step([99, j])  # The player does not change the action (not valid action)
                # it ignores the state where the defeated pokemon of the enemy increase.
                if n_fainted(s[0].teams[1]) > n_fainted(g.teams[1]):
                    continue
                eval_score, _ = self.minimax(s[0], depth - 1, True)
                if eval_score < min_eval:
                    min_eval = eval_score
                    best_action = j
            return min_eval, best_action

    def get_action(self, g) -> int:
        """
        bets action to do by the maximazer player.

        :param g: current game state.
        :return: bets action to do.
        """
        _, best_action = self.minimax(g, self.max_depth, True)
        return best_action if best_action is not None else 0



class Node:
    def __init__(self, state, parent, player, action=None):
        self.state = state
        self.parent = parent
        self.player = player
        self.action = action
        self.children = []
        self.visits = 0
        self.value = 0

    def is_fully_expanded(self):
        return len(self.get_untried_actions()) == 0

    def get_untried_actions(self):
        # Ottiene le azioni possibili dallo stato
        return [i for i in range(DEFAULT_N_ACTIONS)]


In the cell above, we present our first implementation of the Minimax policy, which employs a basic evaluation function called game_state_eval. This function is designed to:

Encourage states where the player’s active Pokémon (mine) has higher HP relative to its maximum HP.
Penalize states where the opponent’s active Pokémon (opp) has high HP.
Add a penalty proportional to the search depth to prioritize faster victories.
Although this evaluation function is rudimentary, our results demonstrate that it provides a balanced implementation. However, it is not the most effective approach we encountered.

The Minimax implementation is straightforward, comprising a section for the maximizer player and another for the minimizer. The maximizer aims to transition to states where its Pokémon are healthier than the opponent’s Pokémon, while the minimizer seeks to reduce this advantage. Each state is evaluated recursively.

To support these computations, the algorithm uses the n_fainted function, which counts the number of fainted (knocked-out) Pokémon. Additionally, the algorithm determines the next action from the maximizer player’s perspective, as implemented in the get_action method.

Another critical function used is the step function, which simulates actions to predict how the game state evolves. This function works in conjunction with the n_fainted function to enhance decision-making.

The default values for the search depth and weights in the evaluation function were determined empirically. Various configurations were tested, and the ones used here were found to deliver the best performance according to our evaluation metrics.

The Node class, shown at the end of the cell, represents a node in the Minimax tree. It includes several fields to facilitate tree exploration, such as parent, children, value, and state.

Additionally, the class provides two key methods:

fully_expand: Checks whether the node has been fully explored.
get_untried_actions: Retrieves the set of possible actions that can still be taken from the current node.
These fields and methods are critical for efficiently navigating and expanding the Minimax tree during the decision-making process.





**Minimax with Alpha-Beta Pruning and Killer Move Heuristic**

Our second implementation extends the basic Minimax algorithm by incorporating alpha-beta pruning and the killer move heuristic. This implementation was developed to enhance both the performance and efficiency of the Minimax algorithm described above.

The addition of alpha-beta pruning allows the algorithm to eliminate branches in the search tree that cannot influence the final decision, significantly reducing the number of nodes explored. Meanwhile, the killer move heuristic prioritizes moves that are likely to be effective, further optimizing the decision-making process by focusing on promising actions.

The combined use of these techniques aims to not only improve the accuracy of the algorithm but also speed up its execution, enabling faster and more effective decision-making.

In the cells below, we present our implementation of this enhanced Minimax algorithm along with a detailed explanation of how it operates.

In [None]:
class MyMinimaxWithAlphaBetaKiller(BattlePolicy):

    def __init__(self, max_depth: int = 5):
        self.max_depth = max_depth
        self.name = "Minimax with pruning alpha beta killer"
        self.killer_moves = {depth: [] for depth in range(max_depth + 1)}  # Memorizza le killer moves per profondità

    def minimax(self, g, depth, alpha, beta, is_maximizing_player):
        if depth == 0:
            return evalFunctions.game_state_eval(g, depth), None

        if is_maximizing_player:
            max_eval = float('-inf')
            best_action = None

            # Ottieni le azioni disponibili
            moves = list(range(DEFAULT_N_ACTIONS))

            # Prioritizza le killer moves
            killer_moves = self.killer_moves.get(depth, [])
            moves = sorted(moves, key=lambda move: move in killer_moves, reverse=True)

            for i in moves:
                g_copy = deepcopy(g)
                s, _, _, _, _ = g_copy.step([i, 99])
                if evalFunctions.n_fainted(s[0].teams[0]) > evalFunctions.n_fainted(g.teams[0]):
                    continue

                eval_score, _ = self.minimax(s[0], depth - 1, alpha, beta, False)
                if eval_score > max_eval:
                    max_eval = eval_score
                    best_action = i

                alpha = max(alpha, eval_score)
                if beta <= alpha:
                    # Aggiorna le killer moves
                    if i not in self.killer_moves[depth]:
                        self.killer_moves[depth].append(i)
                        if len(self.killer_moves[depth]) > 2:
                            self.killer_moves[depth].pop(0)
                    break
            return max_eval, best_action

        else:
            min_eval = float('inf')
            best_action = None

            # Ottieni le azioni disponibili
            moves = list(range(DEFAULT_N_ACTIONS))

            # Prioritizza le killer moves
            killer_moves = self.killer_moves.get(depth, [])
            moves = sorted(moves, key=lambda move: move in killer_moves, reverse=True)

            for j in moves:
                g_copy = deepcopy(g)
                s, _, _, _, _ = g_copy.step([99, j])
                if evalFunctions.n_fainted(s[0].teams[1]) > evalFunctions.n_fainted(g.teams[1]):
                    continue

                eval_score, _ = self.minimax(s[0], depth - 1, alpha, beta, True)
                if eval_score < min_eval:
                    min_eval = eval_score
                    best_action = j

                beta = min(beta, eval_score)
                if beta <= alpha:
                    # Aggiorna le killer moves
                    if j not in self.killer_moves[depth]:
                        self.killer_moves[depth].append(j)
                        if len(self.killer_moves[depth]) > 2:
                            self.killer_moves[depth].pop(0)
                    break
            return min_eval, best_action

    def get_action(self, g) -> int:
        _, best_action = self.minimax(g, self.max_depth, float('-inf'), float('inf'), True)
        return best_action if best_action is not None else 0

Here we illustrate the key components of this algorithm.

Alpha-Beta Pruning is a fundamental optimization technique used in Minimax algorithms as we said before. It reduces the number of nodes evaluated in the game tree by eliminating branches that cannot influence the final decision. Specifically:
Alpha represents the best score achievable by the maximizing player, ensuring the current node's value is not less than this threshold.
Beta represents the best score achievable by the minimizing player, ensuring the current node's value does not exceed this threshold. By comparing node evaluations against these bounds, the algorithm can skip unnecessary evaluations and focus on the most promising branches.
The Killer Move Heuristics prioritize actions (or moves) that previously led to a cutoff during Alpha-Beta Pruning at the same depth. These "killer moves" are stored in a depth-specific list and revisited with high priority in subsequent iterations. The rationale is that actions causing significant pruning in similar scenarios are likely to be effective again, reducing the time spent on less impactful moves.

The algorithm explores the game tree up to a predefined depth, *max_depth*. At this point, it evaluates the game state using a heuristic evaluation function, at the moment the same of the latest implementation of Minimax. This depth limit balances the trade-off between computational feasibility and decision quality and it was selected in a empirical way.

Algorithm Workflow

The algorithm begins by initializing necessary parameters, including the maximum search depth and a set of "killer moves" for each depth level. These killer moves are essentially a list of actions that have proven effective in causing cutoffs during previous searches. By storing these moves, the algorithm can prioritize their exploration in subsequent iterations, aiming to reduce unnecessary computations.

Once initialized, the algorithm proceeds to explore the game tree using the Minimax method. This exploration is depth-limited, meaning the algorithm only evaluates the game tree to a specific depth (*max_depth*) to keep the computation manageable. At the root node, the algorithm decides whether it is currently the turn of the maximizing or minimizing player and selects actions accordingly.

For the maximizing player, the goal is to find the action that yields the highest evaluation score and where the number of fainted pokemon is higher for the opponent, representing the most advantageous outcome. Conversely, for the minimizing player, the focus is on selecting the action that minimizes the evaluation score, simulating an opponent trying to counteract the maximizing player’s strategies. At each level of the game tree, Alpha and Beta values are used to dynamically track the best and worst outcomes possible for the respective players. These values guide the pruning process, helping the algorithm decide when to stop exploring certain branches.

The algorithm introduces a prioritization mechanism at this stage by sorting available actions based on the killer moves. If a move caused a cutoff at the same depth in a previous search, it is considered likely to do so again and is explored first. This ensures the algorithm focuses on the most promising actions early on, increasing the chances of quickly achieving cutoffs. For example, if the algorithm identifies a move that significantly improves the maximizing player’s position, it will immediately consider pruning all remaining moves that cannot surpass this outcome.

As the game tree is traversed, the algorithm continuously updates the Alpha and Beta values. When a node’s evaluation score falls outside the range defined by Alpha and Beta, the algorithm terminates further exploration of that branch. This process, known as pruning, saves computational resources by avoiding the evaluation of irrelevant or unpromising paths. If a cutoff occurs, the current move is added to the list of killer moves for the corresponding depth, ensuring that it will be prioritized in future iterations.

Once all possible actions are evaluated, the algorithm determines the best action for the maximizing player at the root node.

**Custom Evaluation Function**

We decided to implement an other evaluation function that differs in the approch respect to the one described above. This new evaluation function is a more aggressive evaluation function in fact take into consideration only the power and the effectivness of a move against the opponent. The aim of this new evaluation function was to create a more aggresive agent so the battles could and in more rapid way. *The result obtained are illustrated and commented in the Evaluation section*.

In [None]:
def my_eval_fun(s:GameState, depth):
    """
    Funzione di valutazione che considera la compatibilità tra i pokemon, la game_state_eval rispetto agli hp e
    la possibilità di infliggere danno
    """
    my_active = s.teams[0].active
    opp_active = s.teams[1].active
    attack_stage = s.teams[0].stage[PkmStat.ATTACK]
    defense_stage = s.teams[1].stage[PkmStat.DEFENSE]
    matchup = evaluate_matchup(my_active.type, opp_active.type, list(map(lambda m: m.type, my_active.moves))) # in [0,2]
    eval_hp = game_state_eval(s,depth) + 4 # circa in [0-5]
    max_damage = maxDamage(my_active, opp_active.type, attack_stage, defense_stage, s.weather) # in [0,140]
    return max_damage/70 + matchup/2 + eval_hp


def maxDamage(my_active: Pkm, opp_active_type:PkmType, attack_stage: int, defense_stage: int,weather: WeatherCondition ):
    """
    Ritorna il massimo danno il pokemon attivo poù infliggere all'avversario con una mossa
    """
    mvs_damage = []
    # stimo il danno per ogni mossa del mio pokemon
    for m in my_active.moves:
        mvs_damage.append(estimate_damage(m.type,my_active.type, m.power, opp_active_type ,attack_stage, defense_stage, weather))
    return np.max(mvs_damage)


def evaluate_matchup(pkm_type: PkmType, opp_pkm_type: PkmType, moves_type: List[PkmType]) -> float:
    """
    Valuta l'abbinamento tra il pokemon attivo e il pokemon avversario,
    considerando i tipi dei pokemon e delle mosse disponibili.
    """
    for mtype in moves_type: # cerca mossa super efficace
        if TYPE_CHART_MULTIPLIER[mtype][pkm_type] == 2.0:
            return 2.0  # ritorna 2 nel caso in cui ci sia una mossa super efficace
    # altrimenti considera solo la valutazione rispetto al tipo di pokemon
    return TYPE_CHART_MULTIPLIER[opp_pkm_type][pkm_type]


**Custom Policy**

The third and last approch was the one about a our custom Policy. This policy differ form the other that has been implemented

In [None]:
class MyPolicy(BattlePolicy):

    def __init__(self):
        self.hail_used = False
        self.sandstorm_used = False
        self.name = "My Policy"

    def estimate_damages(self, active_pkm: Pkm, opp_pkm_type: PkmType, attack_stage: int, defense_stage: int, weather: WeatherCondition)-> int:
        # valutazione mosse
        damages: List[float] = []
        for move in active_pkm.moves:
            damages.append(evalFunctions.estimate_damage(move.type, active_pkm.type, move.power, opp_pkm_type, attack_stage,
                                          defense_stage, weather))
        return damages


    def get_action(self, g: GameState) -> int:
        # la mia squadra
        my_team = g.teams[0]
        active_pkm = my_team.active
        bench = my_team.party
        my_attack_stage = my_team.stage[PkmStat.ATTACK]

        # squadra avversaria
        opp_team = g.teams[1]
        opp_active_pkm = opp_team.active
        opp_defense_stage = opp_team.stage[PkmStat.DEFENSE]

        # meteo
        weather = g.weather.condition

        try:
            # stima dei danni di ogni mossa
            damages = self.estimate_damages(active_pkm, opp_active_pkm.type, my_attack_stage, opp_defense_stage, weather)
            move_id = int(np.argmax(damages))
        except Exception as e:
            import traceback
            traceback.print_exc()

        # se elimina l'avversario oppure il tipo di mossa è superefficace si usa subito:
        if (damages[move_id] >= opp_active_pkm.hp) or (damages[move_id] > 0 and TYPE_CHART_MULTIPLIER[active_pkm.moves[move_id].type][opp_active_pkm.type] == 2.0) :
            return move_id
        try:
            defense_type_multiplier = evalFunctions.evaluate_matchup(active_pkm.type, opp_active_pkm.type,
                                                    list(map(lambda m: m.type, opp_active_pkm.moves)))
        except Exception as e:
            import traceback
            traceback.print_exc()

        if defense_type_multiplier <= 1.0:
            return move_id

        # considera il cambio pokemon
        matchup: List[float] = []
        not_fainted = False

        try:
            for j in range(len(bench)):
                if bench[j].hp == 0.0:
                    matchup.append(0.0)
                else:
                    not_fainted = True
                    matchup.append(
                        evalFunctions.evaluate_matchup(bench[j].type, opp_active_pkm.type, list(map(lambda m: m.type, bench[j].moves))))

            best_switch_matchup = int(np.max(matchup))
            best_switch = np.argmax(matchup)
            current_matchup = evalFunctions.evaluate_matchup(active_pkm.type, opp_active_pkm.type,list(map(lambda m: m.type, active_pkm.moves)))
        except Exception as e:
            import traceback
            traceback.print_exc()

        if not_fainted and best_switch_matchup >= current_matchup+1:
            return best_switch + 4

        return move_id


Descrizione della custom Policy

**Evaluation of the performances**

We implemented several approaches and, in order to determine which policy performed best, we decided to have each agent battle against the Random agent. This allowed us to demonstrate that each policy outperforms the Random agent.

For the second evaluation, we organized a tournament involving all the players:

*Random Player*

*Minimax Player*

*Minimax with Alpha-Beta Pruning and Killer Move Heuristic Player*

*Custom Policy Player*

Each match consisted of 10 battles, and the winner was determined by the player who won the most matches, thus achieving the best win rate. There two types of battle the first 10 battles the two agents involved have always the same team, the second 10 games the two players have two different teams. In all of these cases the team was selected in a random way.

The results of each individual battle, as well as the overall winner of the tournament, are reported in the cells below.

In [None]:
import time as t
from example.Example_Competitor import MyCompetitor0, MyCompetitor1, MyCompetitor2, MyCompetitor3
from vgc.balance.meta import StandardMetaData
from vgc.competition.Competitor import CompetitorManager
from vgc.ecosystem.BattleEcosystem import BattleEcosystem
from vgc.util.generator.PkmRosterGenerators import RandomPkmRosterGenerator
from vgc.util.generator.PkmTeamGenerators import RandomTeamFromRoster
import sys
import random
#this are the import needed to to run the evaluation of the application

ModuleNotFoundError: No module named 'customtkinter'

**First Battle**

Minimax vs Random Player: the first result is when the two player have always a different team and the second when they have always the same team

In [None]:
N_PLAYERS = 2


def main():

    roster = RandomPkmRosterGenerator().gen_roster()
    meta_data = StandardMetaData()
    le = BattleEcosystem(meta_data, debug=True)
    n_epochs = 10

    times1, wins_0_different,rate1, policy_name1 =different_teams(n_epochs,le,roster)
    times2, wins_0_same, rate2, policy_name2 =same_team(n_epochs,le,roster)

    print(f"Player 0 con diverso team con {policy_name1} ha vinto: {wins_0_different} partite su {n_epochs}, win rate {rate1}, tempo impiegato {times1}")
    print(f"Player 1 stesso team con {policy_name2} ha vinto: {wins_0_same} partite su {n_epochs}, win rate {rate2}, tempo impiegato {times2}")



def different_teams(n_epochs,le:BattleEcosystem,roster):

    wins_player0 = 0
    wins_player1 = 0
    start_time = t.time()
    for i in range(n_epochs):
        cm1 = CompetitorManager(MyCompetitor2("Player 2"))
        team = RandomTeamFromRoster(roster).get_team()
        # cm1.team = RandomTeamFromRoster(roster).get_team()
        cm1.team = team
        le.register(cm1)
        cm2 = CompetitorManager(MyCompetitor1("Player 1"))
        team2 = RandomTeamFromRoster(roster).get_team()
        cm2.team = team2
        # cm2.team = RandomTeamFromRoster(roster).get_team()
        le.register(cm2)
        # Esegui una singola epoca
        le.run(1)
        # Stampa le vittorie di Player 0 e Player 1
        wins_player0 += le.win_counts[cm1]
        wins_player1 += le.win_counts[cm2]

        le.unregister(cm1)
        le.unregister(cm2)
    end_time = t.time()
    time = end_time - start_time

    return time,wins_player0,(wins_player0/n_epochs)*100,cm1.competitor.battle_policy.name


def same_team(n_epochs, le:BattleEcosystem,roster):

    wins_player0 = 0
    wins_player1 = 0
    start_time = t.time()

    for i in range(n_epochs):
        cm1 = CompetitorManager(MyCompetitor2("Player 2"))
        cm1.team = RandomTeamFromRoster(roster).get_team()
        le.register(cm1)
        cm2 = CompetitorManager(MyCompetitor1("Player 1"))
        cm2.team = RandomTeamFromRoster(roster).get_team()
        le.register(cm2)
        # Esegui una singola epoca
        le.run(1)
        # Conta le vittorie di Player 0 e Player 1
        wins_player0 += le.win_counts[cm1]
        wins_player1 += le.win_counts[cm2]
        le.unregister(cm1)
        le.unregister(cm2)
    end_time = t.time()
    time = end_time-start_time

    return time,wins_player0,(wins_player0/n_epochs)*100, cm1.competitor.battle_policy.name

Piccolo commento sui risultati

**Second Battle**

Minimax with Alpha-Beta Pruning and Killer Move Heuristic Player vs Random Player. The first result is when the two player have always a different team and the second when they have always the same team

In [None]:
N_PLAYERS = 2


def main():

    roster = RandomPkmRosterGenerator().gen_roster()
    meta_data = StandardMetaData()
    le = BattleEcosystem(meta_data, debug=True)
    n_epochs = 10

    times1, wins_0_different,rate1, policy_name1 =different_teams(n_epochs,le,roster)
    times2, wins_0_same, rate2, policy_name2 =same_team(n_epochs,le,roster)

    print(f"Player 0 con diverso team con {policy_name1} ha vinto: {wins_0_different} partite su {n_epochs}, win rate {rate1}, tempo impiegato {times1}")
    print(f"Player 1 stesso team con {policy_name2} ha vinto: {wins_0_same} partite su {n_epochs}, win rate {rate2}, tempo impiegato {times2}")



def different_teams(n_epochs,le:BattleEcosystem,roster):

    wins_player0 = 0
    wins_player1 = 0
    start_time = t.time()
    for i in range(n_epochs):
        cm1 = CompetitorManager(MyCompetitor3("Player 3"))
        team = RandomTeamFromRoster(roster).get_team()
        # cm1.team = RandomTeamFromRoster(roster).get_team()
        cm1.team = team
        le.register(cm1)
        cm2 = CompetitorManager(MyCompetitor1("Player 1"))
        team2 = RandomTeamFromRoster(roster).get_team()
        cm2.team = team2
        # cm2.team = RandomTeamFromRoster(roster).get_team()
        le.register(cm2)
        # Esegui una singola epoca
        le.run(1)
        # Stampa le vittorie di Player 0 e Player 1
        wins_player0 += le.win_counts[cm1]
        wins_player1 += le.win_counts[cm2]

        le.unregister(cm1)
        le.unregister(cm2)
    end_time = t.time()
    time = end_time - start_time

    return time,wins_player0,(wins_player0/n_epochs)*100,cm1.competitor.battle_policy.name


def same_team(n_epochs, le:BattleEcosystem,roster):

    wins_player0 = 0
    wins_player1 = 0
    start_time = t.time()

    for i in range(n_epochs):
        cm1 = CompetitorManager(MyCompetitor3("Player 3"))
        cm1.team = RandomTeamFromRoster(roster).get_team()
        le.register(cm1)
        cm2 = CompetitorManager(MyCompetitor1("Player 1"))
        cm2.team = RandomTeamFromRoster(roster).get_team()
        le.register(cm2)
        # Esegui una singola epoca
        le.run(1)
        # Conta le vittorie di Player 0 e Player 1
        wins_player0 += le.win_counts[cm1]
        wins_player1 += le.win_counts[cm2]
        le.unregister(cm1)
        le.unregister(cm2)
    end_time = t.time()
    time = end_time-start_time

    return time,wins_player0,(wins_player0/n_epochs)*100, cm1.competitor.battle_policy.name

piccolo commento sui risultati

**Third Battle**
Custom Player vs Random Player. The first result is when the two player have always a different team and the second when they have always the same team

In [None]:
N_PLAYERS = 2


def main():

    roster = RandomPkmRosterGenerator().gen_roster()
    meta_data = StandardMetaData()
    le = BattleEcosystem(meta_data, debug=True)
    n_epochs = 10

    times1, wins_0_different,rate1, policy_name1 =different_teams(n_epochs,le,roster)
    times2, wins_0_same, rate2, policy_name2 =same_team(n_epochs,le,roster)

    print(f"Player 0 con diverso team con {policy_name1} ha vinto: {wins_0_different} partite su {n_epochs}, win rate {rate1}, tempo impiegato {times1}")
    print(f"Player 1 stesso team con {policy_name2} ha vinto: {wins_0_same} partite su {n_epochs}, win rate {rate2}, tempo impiegato {times2}")



def different_teams(n_epochs,le:BattleEcosystem,roster):

    wins_player0 = 0
    wins_player1 = 0
    start_time = t.time()
    for i in range(n_epochs):
        cm1 = CompetitorManager(MyCompetitor0("Player 0"))
        team = RandomTeamFromRoster(roster).get_team()
        # cm1.team = RandomTeamFromRoster(roster).get_team()
        cm1.team = team
        le.register(cm1)
        cm2 = CompetitorManager(MyCompetitor1("Player 1"))
        team2 = RandomTeamFromRoster(roster).get_team()
        cm2.team = team2
        # cm2.team = RandomTeamFromRoster(roster).get_team()
        le.register(cm2)
        # Esegui una singola epoca
        le.run(1)
        # Stampa le vittorie di Player 0 e Player 1
        wins_player0 += le.win_counts[cm1]
        wins_player1 += le.win_counts[cm2]

        le.unregister(cm1)
        le.unregister(cm2)
    end_time = t.time()
    time = end_time - start_time

    return time,wins_player0,(wins_player0/n_epochs)*100,cm1.competitor.battle_policy.name


def same_team(n_epochs, le:BattleEcosystem,roster):

    wins_player0 = 0
    wins_player1 = 0
    start_time = t.time()

    for i in range(n_epochs):
        cm1 = CompetitorManager(MyCompetitor0("Player 0"))
        cm1.team = RandomTeamFromRoster(roster).get_team()
        le.register(cm1)
        cm2 = CompetitorManager(MyCompetitor1("Player 1"))
        cm2.team = RandomTeamFromRoster(roster).get_team()
        le.register(cm2)
        # Esegui una singola epoca
        le.run(1)
        # Conta le vittorie di Player 0 e Player 1
        wins_player0 += le.win_counts[cm1]
        wins_player1 += le.win_counts[cm2]
        le.unregister(cm1)
        le.unregister(cm2)
    end_time = t.time()
    time = end_time-start_time

    return time,wins_player0,(wins_player0/n_epochs)*100, cm1.competitor.battle_policy.name

piccolo commento sul risultato

**Tuornament**

In this tournament all the player involved play agenst each other in this way. Every player plays against each other and the final result is reported in a table where the most winning player on top and the less winning player at the bottom. The numbers beside the name of the player represent the number of winnig battles. In every game the player played under the same conditions infact the team was always the same for every player and it was choosen randomly

Piccolo commento sul risultato

**Conclusion**

The result from the the simulations of the battle was allineated with what we thought. In fact every player with a policy different from the random one was able to defeat the random playeer, not always with outstanding results but they culd beat the random player. Our new evaluation function apllied to the minimax and to the other minimax with the alpha beta pruning and killer move heuristic function aimed to obtain a more aggressive player more focused on the attack fase and on the power of the moves in the roster. At the beginning we thougth that this could be the better approch and that battle would be ended in a very fast way. This was partialy true beacuse it was only partially more faster but the win rate was not as good as we fought maybe a more conservative approch wuolb be better. The best out of the 4 Player was the one with the custom policy. This because the custom policy takes into consideration variuos aspect of the game for example take into consideration when to do a switch between our pokemon and with whom to do this switch, so the policy was aggressive becauese aimed to search the best and powerful move but take into account also the matchup with the pokemon of the opponent, giving to the Custom policy a better knowledge of the state of the game (da rivedere)

**Appendix**
(se vogliamo metterci qualcosa ma vediamo dopo)