In [1]:
import json
import numpy as np

In [2]:
!ls

EDA.ipynb             [34mdiplomacy-v1-27k-msgs[m[m


In [3]:
!ls ./diplomacy-v1-27k-msgs/

other_maps.jsonl                  standard_press_without_msgs.jsonl
standard_no_press.jsonl           standard_public_press.jsonl
standard_press_with_msgs.jsonl


# Loading the data

In [4]:
#creating a dataset generators
no_press_dataset = (json.loads(line) for line in open("./diplomacy-v1-27k-msgs/standard_no_press.jsonl", "r"))
press_dataset = (json.loads(line) for line in open("./diplomacy-v1-27k-msgs/standard_press_without_msgs.jsonl", "r"))

In [5]:
# reading no_press dataset
no_press = []
for game in no_press_dataset:
    no_press.append(game)
no_press = np.array(no_press)

# EDA

In [9]:
print(f"no_press dataset has: {len(no_press)} games")

no_press dataset has: 33279 games


An example of a singlular game record

In [10]:
no_press[0]

{'id': 'uXFQ2zgI-DUrgwlS',
 'map': 'standard',
 'rules': ['NO_PRESS', 'POWER_CHOICE'],
 'phases': [{'name': 'S1901M',
   'state': {'timestamp': 1542989951416213,
    'zobrist_hash': '1919110489198082658',
    'note': '',
    'name': 'S1901M',
    'units': {'AUSTRIA': ['A BUD', 'A VIE', 'F TRI'],
     'ENGLAND': ['F EDI', 'F LON', 'A LVP'],
     'FRANCE': ['F BRE', 'A MAR', 'A PAR'],
     'GERMANY': ['F KIE', 'A BER', 'A MUN'],
     'ITALY': ['F NAP', 'A ROM', 'A VEN'],
     'RUSSIA': ['A WAR', 'A MOS', 'F SEV', 'F STP/SC'],
     'TURKEY': ['F ANK', 'A CON', 'A SMY']},
    'centers': {'AUSTRIA': ['BUD', 'TRI', 'VIE'],
     'ENGLAND': ['EDI', 'LON', 'LVP'],
     'FRANCE': ['BRE', 'MAR', 'PAR'],
     'GERMANY': ['BER', 'KIE', 'MUN'],
     'ITALY': ['NAP', 'ROM', 'VEN'],
     'RUSSIA': ['MOS', 'SEV', 'STP', 'WAR'],
     'TURKEY': ['ANK', 'CON', 'SMY']},
    'homes': {'AUSTRIA': ['BUD', 'TRI', 'VIE'],
     'ENGLAND': ['EDI', 'LON', 'LVP'],
     'FRANCE': ['BRE', 'MAR', 'PAR'],
     'GERMANY

At the top of the structure we have 4 keys:
1. the game id (not important)
2. the map of the game (maybe important since there are variations)
3. rules (as the paper suggest, there are important variations)
4. the phases of the game (our main data)

In [14]:
print(f"main keys: {no_press[0].keys()}")

main keys: dict_keys(['id', 'map', 'rules', 'phases'])


### Exploring the maps

In [49]:
maps = []
for game in no_press:
    maps.append(game["map"])
    
print(f"unique maps in no press: {set(maps)}")

unique maps in no press: {'standard', 'standard_germany_italy', 'standard_france_austria'}


There are 3 variations of maps in the no_press dataset:
* standard, where all 7 players play the game  
* standard_germany_italy & standard_france_austria, where there are only 2 players (this is reflected in the data):

In [73]:
# players of the standard map
no_press[0]["phases"][0]["state"]["units"]

{'AUSTRIA': ['A BUD', 'A VIE', 'F TRI'],
 'ENGLAND': ['F EDI', 'F LON', 'A LVP'],
 'FRANCE': ['F BRE', 'A MAR', 'A PAR'],
 'GERMANY': ['F KIE', 'A BER', 'A MUN'],
 'ITALY': ['F NAP', 'A ROM', 'A VEN'],
 'RUSSIA': ['A WAR', 'A MOS', 'F SEV', 'F STP/SC'],
 'TURKEY': ['F ANK', 'A CON', 'A SMY']}

In [74]:
# players of the standard_germany_italy
no_press[3]["phases"][0]["state"]["units"]

{'GERMANY': ['F KIE', 'A BER', 'A MUN'], 'ITALY': ['F NAP', 'A ROM', 'A VEN']}

In [152]:
# map distributions
map_arr = np.array(maps)
map_names, counts = np.unique(map_arr, return_counts = True)
dict(zip(map_names, counts / len(map_arr)))

{'standard': 0.6585233931308032,
 'standard_france_austria': 0.22311367529072387,
 'standard_germany_italy': 0.11836293157847291}

As the paper says, the standard map is the most played one.

### Exploring the rules

In [118]:
rules = []
for game in no_press:
    rules.append(game["rules"])
rules = np.array(rules, dtype = list)

In [129]:
print(f"Unique sets of rules for no_press dataset are:\n {np.unique(b)}")

Unique sets of rules for no_press dataset are:
 [list(['BUILD_ANY', 'NO_PRESS', 'POWER_CHOICE'])
 list(['NO_CHECK', 'NO_PRESS', 'POWER_CHOICE'])
 list(['NO_PRESS', 'POWER_CHOICE'])]


Just as described in the paper, there are variations in the rules. The most important one being the "no_check" rule, which according to the paper does not allow to send invalid orders.

In [157]:
# rule set distrivutions
rule_sets, counts = np.unique(rules, return_counts = True)
list(zip(rule_sets, counts / len(rules)))

[(['BUILD_ANY', 'NO_PRESS', 'POWER_CHOICE'], 3.004897983713453e-05),
 (['NO_CHECK', 'NO_PRESS', 'POWER_CHOICE'], 0.06730971483518135),
 (['NO_PRESS', 'POWER_CHOICE'], 0.9326602361849815)]

It seems, that the "no_check" rule is quite rarely used, as it appears only in ~7% of the games

## Exploring phases and their data structures

In [191]:
no_press[1]["phases"][0]

{'name': 'S1901M',
 'state': {'timestamp': 1542989951906804,
  'zobrist_hash': '3359746340891294214',
  'note': '',
  'name': 'S1901M',
  'units': {'GERMANY': ['F KIE', 'A BER', 'A MUN'],
   'ITALY': ['F NAP', 'A ROM', 'A VEN']},
  'centers': {'GERMANY': ['BER', 'KIE', 'MUN'],
   'ITALY': ['NAP', 'ROM', 'VEN']},
  'homes': {'GERMANY': ['BER', 'KIE', 'MUN'], 'ITALY': ['NAP', 'ROM', 'VEN']},
  'influence': {'GERMANY': ['KIE', 'BER', 'MUN'],
   'ITALY': ['NAP', 'ROM', 'VEN']},
  'civil_disorder': {'GERMANY': 0, 'ITALY': 0},
  'builds': {'GERMANY': {'count': 0, 'homes': []},
   'ITALY': {'count': 0, 'homes': []}},
  'game_id': '0Fl3BreFivkwKWvd',
  'map': 'standard_germany_italy',
  'rules': ['NO_PRESS', 'POWER_CHOICE'],
  'retreats': {'GERMANY': {}, 'ITALY': {}}},
 'orders': {'GERMANY': ['F KIE - HOL', 'A MUN - TYR', 'A BER - SIL'],
  'ITALY': ['F NAP - ION', 'A ROM - APU', 'A VEN - TYR']},
 'results': {'F KIE': [],
  'A BER': [],
  'A MUN': ['bounce'],
  'F NAP': [],
  'A ROM': [],
  'A 

Let's analyse this by taking each field and looking if it's important and how to interpret it

## Name

In [192]:
no_press[1]["phases"][0]["name"]

'S1901M'

name - it's the name of the phase. This names encodes 3 things:
1. The season of the year (S - spring, F - fall, W - winter)
2. The year (only a vanity I believe)
3. The phase variant (M - move, R - retreat, A - adjust)

How seasons work and their variants are explained in the paper, however the only noteworthy thing is that it seems like reatreating phases (either Spring or Fall) are initiated on demand (i.e. after a succesful attack), because retreatement phases do not always follow the move phases.

## State

Then we have a "state" field which includes all of the information for the current board state (excluding the move orders, attack results)

In [193]:
no_press[1]["phases"][0]["state"]

{'timestamp': 1542989951906804,
 'zobrist_hash': '3359746340891294214',
 'note': '',
 'name': 'S1901M',
 'units': {'GERMANY': ['F KIE', 'A BER', 'A MUN'],
  'ITALY': ['F NAP', 'A ROM', 'A VEN']},
 'centers': {'GERMANY': ['BER', 'KIE', 'MUN'], 'ITALY': ['NAP', 'ROM', 'VEN']},
 'homes': {'GERMANY': ['BER', 'KIE', 'MUN'], 'ITALY': ['NAP', 'ROM', 'VEN']},
 'influence': {'GERMANY': ['KIE', 'BER', 'MUN'],
  'ITALY': ['NAP', 'ROM', 'VEN']},
 'civil_disorder': {'GERMANY': 0, 'ITALY': 0},
 'builds': {'GERMANY': {'count': 0, 'homes': []},
  'ITALY': {'count': 0, 'homes': []}},
 'game_id': '0Fl3BreFivkwKWvd',
 'map': 'standard_germany_italy',
 'rules': ['NO_PRESS', 'POWER_CHOICE'],
 'retreats': {'GERMANY': {}, 'ITALY': {}}}

There is some repeating or unnecessary information such as rules, map name, game id, timestamps, ect.

## Units

In [195]:
no_press[1]["phases"][0]["state"]["units"]

{'GERMANY': ['F KIE', 'A BER', 'A MUN'], 'ITALY': ['F NAP', 'A ROM', 'A VEN']}

This field includes all the powers and their units together with their locations
The structure is: (```unit_type```  ```province```)  

```unit_type``` - A for army, F for fleet  
```province``` - the province name abreviation

## Centers

In [196]:
no_press[1]["phases"][0]["state"]["centers"]

{'GERMANY': ['BER', 'KIE', 'MUN'], 'ITALY': ['NAP', 'ROM', 'VEN']}

This field includes all the powers and their center locations
The structure is: (```province1```, ```province2``` ... )  

```province``` - the province name abreviation where the center stands

## Homes

In [198]:
no_press[1]["phases"][0]["state"]["homes"]

{'GERMANY': ['BER', 'KIE', 'MUN'], 'ITALY': ['NAP', 'ROM', 'VEN']}

These are the original power's supply center locations, These are the only centers which can build units during adjustment phases. The structure is: (```province1```, ```province2``` ... )  

```province``` - the province name abreviation where the home supply center stands

## Influence

In [241]:
no_press[1]["phases"][1]["state"]["influence"]

{'GERMANY': ['KIE', 'BER', 'MUN', 'HOL', 'SIL'],
 'ITALY': ['NAP', 'ROM', 'VEN', 'ION', 'APU']}

These are the supply centers which are currently occupied by a certain power. NOTE - "influence" and "center" fields are different, because "center" field shows OWNED center, because centers do not transfer ownership on successful ocupation, but rather during the winter adjustmend phase to whoever currently occupies them. Thus this "influence" field shows the currently occupied centers. The structure is: (```province1```, ```province2``` ... )  

```province``` - the province name abreviation where the supply center is currently occupied by the power

## civil_disorder (not confident on the information)

In [249]:
no_press[1]["phases"][1]["state"]["civil_disorder"]

{'GERMANY': 0, 'ITALY': 0}

According to the game manual:

If a player leaves the game, or fails to submit orders in a given Spring or Fall season, it is assumed that civil government in his/her country has collapsed. His/her units hold in position, but do not support each other. If they are dislodged, they are disbanded. No new Units are raised for this country. 

The structure is not clear for me :(

## Builds

In [251]:
no_press[1]["phases"][1]["state"]["builds"]

{'GERMANY': {'count': 0, 'homes': []}, 'ITALY': {'count': 0, 'homes': []}}

The information on the units curretnly (during Winter adjustment phase) being built. The structure is: (```count``` : x, ```homes``` : [x1, x2, x3])

```count``` - how many new units are being built. The number usually corresponds with newly captured supply centers which just transfered ownership at the begining of the Winter phase

```homes``` - I belive it's the same information that the "homes" field has

## Retreats

In [266]:
no_press[1]["phases"][12]["state"]["retreats"]

{'GERMANY': {}, 'ITALY': {'A WAR': ['LVN', 'UKR']}}

(Not 100% sure). This field is only relevant during the retreat phase and it display the posible locations for unit to retreat to. The structure is: (```unit_location```: [```possible_location1```, ```possible_location2```])

```unit_location``` - the current unit location province
```possible_location1``` - an adjacent province to which the unit can retreat to

Note, the player will decide where to retreat and their decision will be shown in the "order" field. More on this later.

## Orders

In [292]:
no_press[1]["phases"][1]["orders"]

{'GERMANY': ['F HOL - BEL', 'A MUN - TYR', 'A SIL - WAR'],
 'ITALY': ['A APU - GRE VIA', 'A VEN - TRI', 'F ION C A APU - GRE']}

The order for units for this phase

structre differs according to the order type, how ever these are the main building blocks:

**unit** is contructed as: ```unit_type``` ```unit_location``` (e.g. A MUN = Army is in Munich)

" - " sign denotes that the unit is going to the after this sign specified location (e.g. A MUN - TYR = Army in Munich is going to Tyrolia)

**Move**: ```unit '-' location``` ('A SIL - WAR')

**Support**: ```unit 's' unit to support '-' location where the supported unit is going```  ('A VEN S A PIE - TYR')

**Convoy**:
1. The army which is being convoyed has the structure: ```unit '-' location 'VIA'``` (e.g.'A APU - GRE VIA')
2. The fleet which will convoy the army has the structure: ```unit 'c' unit '-' location``` (e.g. 'F ION C A APU - GRE')

**Hold**: ```unit 'H'``` (e.g. 'F POR H')

**Build**: ```new unit in the supply center location 'B' ``` (e.g. 'A MUN B' - build an army in the Munich center) 

**Retreat**: ```unit 'R' location``` (e.g. 'A VIE R BOH')

**Disband**: ```unit 'D'``` (e.g. 'A WAR D')

## Results

In [325]:
no_press[1]["phases"][6]["results"]

{'A VIE': [],
 'A BUR': [],
 'F MAO': [],
 'A BUD': ['cut', 'dislodged'],
 'A BOH': [],
 'A MUN': ['bounce'],
 'F KIE': [],
 'A SER': [],
 'A PIE': [],
 'F TUN': [],
 'A TRI': [],
 'A BUL': [],
 'A VEN': ['bounce'],
 'F ROM': [],
 'F NAP': []}

This field shows what happened to the units after adjudication. Structure is: ```unit```: [```result1```, ```result2```]

Possible results:

**empty** - it means that the unit performed its order successfully

**"Bounce"** - it means that two apposing units with equat powers tried to move into a sama teritory, thus they bounced off each other because neither was able to win

**"cut"** - it means that this unit was supporting another unit but got attacked, thus the support got interrupted.

**"dislodged"** - it means that unit was defeated and it now has to either retreat or be disbanded.