<a href="https://colab.research.google.com/github/Dagobert42/WinProbabilityModel-AoE2/blob/main/WinProbabilityModel_AoE2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# A Non-Linear Win Probability Model for Age of Empires II

Age of Empires II ["AoE2"] is a real-time strategy game set in the middle-ages. Players have to build a base and command units with the goal of defeating each other in matches of up to 8 players. When participating in AoE2’s competitive online-multiplayer they are rated with an Elo rating system not unlike Chess. This rating gives a good idea of the current form of a player and the rating difference can be converted directly into probability distributions for the outcome of a match or series between players.

Rating conversion example

The game was designed for different strategies to work well with certain civs and for civs to have bonuses on specific maps and disadvantages on others, similarly players may favour a certain map due to their experience or personal play style. The goal of this project is to give a model capable of predicting in a meaningful way the outcome of matches played on the one-versus-one random map ladder ["RM_1v1"].
The project conforms to Microsoft's [Game Content Usage Rules](https://www.xbox.com/en-us/developers/rules).

In [1]:
import numpy as np
import pandas as pd
import pickle
from tqdm import tqdm
from IPython.display import clear_output

In [2]:
matches = pd.read_csv('matches.csv')
players = pd.read_csv('match_players.csv')

# we will only consider 1v1 matches from the patch with the most examples
matches = matches.loc[matches['ladder'] == 'RM_1v1']
matches = matches.loc[matches['patch'] == 37906]

# some players have a NaN rating, so we exclude them
invalid_players = players.loc[players['rating'].isna()]
players = players.loc[players['rating'].notna()]

# matches of players with NaN ratings are treated the same way
matches = matches.loc[~matches['token'].isin(invalid_players['match'])]
players = players.loc[players['match'].isin(matches['token'])]

# TODO!!!!

# finally all unneeded information is dropped from both frames
matches = matches.drop(columns=['patch', 'ladder', 'mirror', 'average_rating', 'map_size', 'num_players', 'server', 'duration'])
players = players.drop(columns=['token', 'color', 'winner'])

print(matches.info())
print(matches.head())
print(players.info())
print(players.head())

  interactivity=interactivity, compiler=compiler, result=result)
  interactivity=interactivity, compiler=compiler, result=result)


<class 'pandas.core.frame.DataFrame'>
Int64Index: 139046 entries, 4 to 1306128
Data columns (total 3 columns):
 #   Column        Non-Null Count   Dtype 
---  ------        --------------   ----- 
 0   token         139046 non-null  object
 1   winning_team  139046 non-null  int64 
 2   map           139046 non-null  object
dtypes: int64(1), object(2)
memory usage: 4.2+ MB
None
               token  winning_team        map
4   U198Wdc3kzJPBVqh             1  acropolis
5   NM6127b06GcGmt3J             2  gold_rush
6   UCJkNXbwDIgTnUXi             2     arabia
28  BAgzGyGMFarQnabm             1    hideout
29  xEN7qulZgPDvfNvq             1      arena
<class 'pandas.core.frame.DataFrame'>
Int64Index: 43972 entries, 6 to 1652282
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   match   43972 non-null  object 
 1   rating  43972 non-null  float64
 2   civ     43972 non-null  object 
 3   team    43972 non-null  int64  
dtypes: flo

In [3]:
data = pd.DataFrame(columns=['map', 'rating_p1', 'civ_p1', 'rating_p2', 'civ_p2'])

player_dict = players.to_dict('records')

print_every = 500
count = 0
for match in tqdm(matches.to_dict('records')):
    p1 = None
    p2 = None
    for i, player in enumerate(player_dict):
        if player['match'] == match['token']:
            if player['team'] == 1:
                p1 = player
                p1_i = i
            elif player['team'] == 2:
                p2 = player
                p2_i = i
            else:
                raise Exception('something wrong with the teams in the data')
        # to increase speed in later iterations 
        # lets remove already assigned players
        if p1 and p2:
            if p1_i < p2_i:
                player_dict.pop(p2_i)
                player_dict.pop(p1_i)
            else:
                player_dict.pop(p1_i)
                player_dict.pop(p2_i)
            break

    if not p1 or not p2:
        continue
    else:
        entry = {
            'map' : match['map'],
            'rating_p1' : p1['rating'],
            'civ_p1' : p1['civ'],
            'rating_p2' : p2['rating'],
            'civ_p2' : p2['civ'],
            'winner' : match['winning_team']
        }
        data = data.append(entry, ignore_index=True)

        if count % print_every == 0:
            clear_output()
            print(data.info())
            print(data.head())
        count+=1
print(count)

 98%|█████████▊| 136494/139046 [20:21<00:04, 623.44it/s]

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21501 entries, 0 to 21500
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   map        21501 non-null  object 
 1   rating_p1  21501 non-null  float64
 2   civ_p1     21501 non-null  object 
 3   rating_p2  21501 non-null  float64
 4   civ_p2     21501 non-null  object 
 5   winner     21501 non-null  float64
dtypes: float64(3), object(3)
memory usage: 1008.0+ KB
None
          map  rating_p1  civ_p1  rating_p2      civ_p2  winner
0  four_lakes      866.0   Khmer      866.0    Japanese     2.0
1  ghost_lake     1088.0  Mayans     1099.0        Huns     2.0
2       arena     1158.0   Celts     1165.0       Goths     2.0
3  golden_pit     1165.0  Franks     1189.0  Vietnamese     1.0
4      arabia     1058.0   Celts     1065.0     Mongols     1.0


100%|██████████| 139046/139046 [20:24<00:00, 113.51it/s]

21932





In [4]:
data.to_csv('1v1_matches.csv')

In [5]:
print(count)

21932
