# Network Data Generation

### Purpose
This notebook was created to generate the data for a network of brawlers who are likely to win with each other.

### Data Used
This notebook will use game data of pairs of brawlers who won/lost together. This data was collected by both team members.

### Approach
This project will feature the following networks:
- a network of brawlers who are more likely to win with each other
- a network of brawlers who are more likely to lose with each other

#### Why not a Network with Weighted Edges
A network with weighted edges will be extremely sparse. Thus, connecting brawlers who are likely to win/lose together may make more sense for this project.

## Importing Libraries

In [1]:
import pandas as pd
import networkx
from scipy.stats import binomtest

## Writing Constants for this Notebook

In [2]:
BRAWLER_PAIR_VICTORIES_FILEPATH = (
    "../outputs/graph-data/brawler-pair-victories-beginner.csv"
)
BRAWLER_PAIR_DEFEATS_FILEPATH = (
    "../outputs/graph-data/brawler-pair-defeats-beginner.csv"
)

WINNING_EDGES_FILEPATH = (
    "../outputs/graph-data/edges/winning-edges-beginners.csv"
)
LOSING_EDGES_FILEPATH = (
    "../outputs/graph-data/edges/losing-edges-beginners.csv"
)

ALPHA = 0.05

## Downloading Edge Data

### Getting Times Brawler Pairs Won

In [3]:
df_wins = pd.read_csv(BRAWLER_PAIR_VICTORIES_FILEPATH)

# Group by Brawler_1 and Brawler_2, summing the Weight column
df_wins = (
    df_wins.groupby(
        ['Brawler_1', 'Brawler_2'], 
        as_index=False
    )['Weight'].sum()
)

# Renaming weight column
df_wins = df_wins.rename(
    columns={'Weight': 'Wins'}
)

df_wins.head()

Unnamed: 0,Brawler_1,Brawler_2,Wins
0,8-BIT,8-BIT,32
1,8-BIT,ALLI,14
2,8-BIT,AMBER,12
3,8-BIT,ANGELO,10
4,8-BIT,ASH,3


### Getting Times Brawler Pairs Lost

In [4]:
df_losses = pd.read_csv(BRAWLER_PAIR_DEFEATS_FILEPATH)

# Group by Brawler_1 and Brawler_2, summing the Weight column
df_losses = (
    df_losses.groupby(
        ['Brawler_1', 'Brawler_2'], 
        as_index=False
    )['Weight'].sum()
)

# Renaming weight column
df_losses = df_losses.rename(
    columns={'Weight': 'Losses'}
)

df_losses.head()

Unnamed: 0,Brawler_1,Brawler_2,Losses
0,8-BIT,8-BIT,1
1,8-BIT,ALLI,11
2,8-BIT,AMBER,18
3,8-BIT,ANGELO,7
4,8-BIT,ASH,4


### Data Engineering

#### Combining Win and Loss Data

In [5]:
# Merge df_wins and df_losses on Brawler_1 and Brawler_2
df_brawler_pairs = pd.merge(
    df_wins,
    df_losses,
    on=['Brawler_1', 'Brawler_2'],
    how='outer'
).fillna(0)  # Fill NaN with 0 for pairs that only appear in one DataFrame

# Ensure wins and Losses columns are integers
df_brawler_pairs['Wins'] = df_brawler_pairs['Wins'].astype(int)
df_brawler_pairs['Losses'] = df_brawler_pairs['Losses'].astype(int)

# Counting total brawler pairs found
df_brawler_pairs["Observations"] = (
    df_brawler_pairs['Wins'] + df_brawler_pairs['Losses']
)

# Display the first few rows
df_brawler_pairs.head()

Unnamed: 0,Brawler_1,Brawler_2,Wins,Losses,Observations
0,8-BIT,8-BIT,32,1,33
1,8-BIT,ALLI,14,11,25
2,8-BIT,AMBER,12,18,30
3,8-BIT,ANGELO,10,7,17
4,8-BIT,ASH,3,4,7


#### Looking at Total Brawler Pairs

In [6]:
df_brawler_pairs["Observations"].sum()

123842

### Setting Data for Gephi

#### Connection Type

In [7]:
df_brawler_pairs["Type"] = "Undirected"

#### Initial Weight to be Changed Later

In [8]:
df_brawler_pairs["Weight"] = 1

### Determining Connections with Statistical Tests

#### Finding Winning Brawler Pairs

In [9]:
def run_win_hypothesis_test(x):
    p_val = binomtest(
        x["Wins"], x["Observations"], p=0.5, alternative="greater"
    ).pvalue
    return p_val < ALPHA

df_brawler_pairs["Winning_Pair"] = df_brawler_pairs.apply(
    run_win_hypothesis_test, axis = "columns"
)

#### Finding Losing Brawler Pairs

In [10]:
def run_loss_hypothesis_test(x):
    p_val = binomtest(
        x["Losses"], x["Observations"], p=0.5, alternative="greater"
    ).pvalue
    return p_val < ALPHA

df_brawler_pairs["Losing_Pair"] = df_brawler_pairs.apply(
    run_loss_hypothesis_test, axis = "columns"
)

### Setting Weights

#### Getting Probability depending on Winning/Losing

In [11]:
def set_probability(row):
    return (
        (int(row["Winning_Pair"]) * row["Wins"]) + 
        (int(row["Losing_Pair"]) * row["Losses"])
    ) / row["Observations"]

df_brawler_pairs["probability"] = df_brawler_pairs.apply(
    set_probability, axis = "columns"
)

#### Setting Weights to the Inverses of Probabilities

In [12]:
df_brawler_pairs["Weight"] = (1 / df_brawler_pairs["probability"]) + 1e-5

#### Evaluating Results

In [13]:
print(f"Winning pair edges: {df_brawler_pairs["Winning_Pair"].sum()}")
print(f"Losing pair edges: {df_brawler_pairs["Losing_Pair"].sum()}")

Winning pair edges: 276
Losing pair edges: 232


In [14]:
df_brawler_pairs[df_brawler_pairs["Winning_Pair"] == True].head(40).tail(5)

Unnamed: 0,Brawler_1,Brawler_2,Wins,Losses,Observations,Type,Weight,Winning_Pair,Losing_Pair,probability
646,BELLE,CORDELIUS,17,2,19,Undirected,1.117657,True,False,0.894737
649,BELLE,DOUG,21,10,31,Undirected,1.4762,True,False,0.677419
654,BELLE,EMZ,45,29,74,Undirected,1.644454,True,False,0.608108
660,BELLE,GENE,20,8,28,Undirected,1.40001,True,False,0.714286
672,BELLE,KENJI,24,9,33,Undirected,1.37501,True,False,0.727273


In [15]:
df_brawler_pairs[df_brawler_pairs["Losing_Pair"] == True].head(40).tail(5)

Unnamed: 0,Brawler_1,Brawler_2,Wins,Losses,Observations,Type,Weight,Winning_Pair,Losing_Pair,probability
534,BARLEY,SHELLY,15,45,60,Undirected,1.333343,False,True,0.75
538,BARLEY,STU,12,31,43,Undirected,1.387107,False,True,0.72093
542,BARLEY,WILLOW,0,6,6,Undirected,1.00001,False,True,1.0
567,BEA,EDGAR,78,113,191,Undirected,1.690275,False,True,0.591623
568,BEA,EL PRIMO,11,29,40,Undirected,1.37932,False,True,0.725


### Saving Edge Data

#### Winning Edges

In [16]:
# Getting only winning edges
df_winning_edges = df_brawler_pairs[
    df_brawler_pairs["Winning_Pair"] == True
]

# Renaming columns
df_winning_edges = df_winning_edges.rename(
    columns={
        'Brawler_1': 'Source', 
        'Brawler_2': 'Target'
    }
)

# Getting relevant edge information
df_winning_edges = df_winning_edges[[
    "Source", "Target", "Type", "Weight", "probability"
]]

In [17]:
df_winning_edges.to_csv(
    WINNING_EDGES_FILEPATH, 
    index = False
)

#### Losing Edges

In [18]:
# Getting only winning edges
df_losing_edges = df_brawler_pairs[
    df_brawler_pairs["Losing_Pair"] == True
]

df_losing_edges = df_losing_edges.rename(
    columns={
        'Brawler_1': 'Source', 
        'Brawler_2': 'Target'
    }
)

# Getting relevant edge information
df_losing_edges = df_losing_edges[[
    "Source", "Target", "Type", "Weight", "probability"
]]

In [19]:
df_losing_edges.to_csv(
    LOSING_EDGES_FILEPATH, 
    index = False
)