# Betrayal Detection in Diplomacy

Diplomacy is a strategic board game with social deception elements. The game consists of phases where players may negotiate and betray each other. We will attempt to predict whether betrayal occurs based off of the messages that players send to each other.

In [143]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split,cross_val_score

from sklearn.tree import DecisionTreeClassifier

from sklearn.metrics import classification_report,confusion_matrix

# Basic Preprocessing

First we preprocess the dataset into a dataframe that is easier to interpret for exploration/training.

In Diplomacy, each game is divided into multiple phases called seasons. Within each season, players can communicate with each other to coordinate attacks. Every entry in the original dataset consists of a particular interaction between two players in a game, containing statistical data regarding their messages spread across all seasons.

Therefore, in our preprocessed dataframe we will aggregate all message data per each season of the game, as we are only concerned about whether or not there was a betrayal in the interaction for the game as a whole.

In [144]:
df = pd.read_json('diplomacy_data.json')
# entries represent an interaction between two players in a particular game, spread across
# all of the games seasons

data = [] # 2d array to hold processed data

print(df.info()) # format

seasons = df['seasons'] # pandas series that has all the seasons info for this particular entry

seasons_in_second_entry = seasons[1] # list containing all seasons in this particular entry 
print(len(seasons_in_second_entry))

first_season_of_second_entry = seasons_in_second_entry[0]
print(type(first_season_of_second_entry))
# print(first_season_of_second_entry) # the first season of this entry

msgs_of_betrayer_in_first_season_of_first_entry = first_season_of_second_entry['messages']['victim']
# list of each message sent by the betrayer in the first season of the first entry

print(msgs_of_betrayer_in_first_season_of_first_entry)

first_msg_of_betrayer = msgs_of_betrayer_in_first_season_of_first_entry[0]
# contains important data for the message of each betrayer

for key in first_msg_of_betrayer.keys():
    print(key)
    print(first_msg_of_betrayer[key])
    print("")
    
# remove punctuation and numbers from word list


# we create a new processed pandas dataframe based off of the old one

entry = [] # represents a row of our new pandas dataframe
'''
Our processed data will contain the following structure:

ID | Betrayal (T/F) | victim # msgs | betrayer # msgs | victim # sentences | betrayer # sentences |
victim # words | betrayer # words | victim avg words per msg | betrayer avg words per msg | 
victim # requests | betrayer # requests | victim politeness | betrayer politeness | 
victim neg sentiment proportion | betrayer neg sentiment proportion |
victim neu sentiment proportion | betrayer neu sentimen proportion | 
victim pos sentiment proportion | betrayer pos sentiment proportion | 

'''

betrayals = df['betrayal']

# create our new processed df
for i in range(len(df)):
    betrayal = betrayals[i]
    
    print(betrayal)
    entry.append(betrayal)
    
    
    data.append(entry)

# print(data)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   seasons   500 non-null    object
 1   game      500 non-null    int64 
 2   betrayal  500 non-null    bool  
 3   idx       500 non-null    int64 
 4   people    500 non-null    object
dtypes: bool(1), int64(2), object(2)
memory usage: 16.2+ KB
None
10
<class 'dict'>
[{'sentiment': {'positive': 0, 'neutral': 1, 'negative': 3}, 'lexicon_words': {'allsubj': ['back', 'really', 'much', 'least']}, 'n_requests': 0, 'frequent_words': ['at', 'but', '.', 'much', 'is', ',', 'ber', 'swe', 'expect', 'we', '.', 'least', "can't", "i'll", ',', 'germany', 's', 'that', 'past', 'bal', 'ok', '.', 'ordered', 'back', 'bot', ',', 'get', 'really', '-'], 'n_words': 29, 'politeness': 0.228691677137034, 'n_sentences': 4}]
sentiment
{'positive': 0, 'neutral': 1, 'negative': 3}

lexicon_words
{'allsubj': ['back', 'really', 'much', 